What is it about CMOV which improves CPU pipeline performance?

Could somebody please help me understand how CMOV improves branching?

Well, it does NOT improve branching, it removes it. A CMOV could be seen as two instructions in one, a MOV and a NOP. Which one is executed depends on the flags. So internally it may look like

if (cond) {
    mov dst, src
} else {
    nop
}

Surely the problem domain is still the same- we do not know the address of the next instruction to execute?

Well, no. The next instruction is always the one following the CMOV, so the instruction pipeline is not invalidated and reloaded (branch prediction and other optimiziations left aside). It is one continuos flow of macro-opcodes. A simple example is following

if (ecx==5)
    eax = TRUE
else
    eax = FALSE

in basic asm:

cmp ecx,5      ; is ecx==5
jne unequal    ; what is the address of the next instruction? conditional branch
mov eax,TRUE   ; possibility one
jmp fin
unequal:       : possibility two
mov eax,FALSE
fin:
nop

with CMOV

cmp ecx,5
mov eax, FALSE   ; mov doesn't affect flags
mov ebx, TRUE    ; because CMOV doesn't take immediate src operands, use EBX for alternative
cmove eax, ebx   ; executes as MOV if zero-flag is set, otherwise as NOP
nop              ; always the next instruction, no pipeline stall

Is it worth it on current CPUs? A clear YES. From my experience and (of course) depending on the algorithm, the speed gain is significant and worth the effort.

Leave a Comment