Could somebody please help me understand how CMOV improves branching?
Well, it does NOT improve branching, it removes it. A CMOV could be seen as two instructions in one, a MOV and a NOP. Which one is executed depends on the flags. So internally it may look like
if (cond) {
mov dst, src
} else {
nop
}
…
Surely the problem domain is still the same- we do not know the address of the next instruction to execute?
Well, no. The next instruction is always the one following the CMOV, so the instruction pipeline is not invalidated and reloaded (branch prediction and other optimiziations left aside). It is one continuos flow of macro-opcodes. A simple example is following
if (ecx==5)
eax = TRUE
else
eax = FALSE
in basic asm:
cmp ecx,5 ; is ecx==5
jne unequal ; what is the address of the next instruction? conditional branch
mov eax,TRUE ; possibility one
jmp fin
unequal: : possibility two
mov eax,FALSE
fin:
nop
with CMOV
cmp ecx,5
mov eax, FALSE ; mov doesn't affect flags
mov ebx, TRUE ; because CMOV doesn't take immediate src operands, use EBX for alternative
cmove eax, ebx ; executes as MOV if zero-flag is set, otherwise as NOP
nop ; always the next instruction, no pipeline stall
Is it worth it on current CPUs? A clear YES. From my experience and (of course) depending on the algorithm, the speed gain is significant and worth the effort.