Which is faster: while(1) or while(2)?

Question

Both loops are infinite, but we can see which one takes more instructions/resources per iteration.

Using gcc, I compiled the two following programs to assembly at varying levels of optimization:

int main(void) {
    while(1) {}
    return 0;
}

int main(void) {
    while(2) {}
    return 0;
}

Even with no optimizations (-O0), the generated assembly was identical for both programs. Therefore, there is no speed difference between the two loops.

For reference, here is the generated assembly (using gcc main.c -S -masm=intel with an optimization flag):

With -O0:

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    push    rbp
    .seh_pushreg    rbp
    mov rbp, rsp
    .seh_setframe   rbp, 0
    sub rsp, 32
    .seh_stackalloc 32
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

With -O1:

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    sub rsp, 40
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

With -O2 and -O3 (same output):

    .file   "main.c"
    .intel_syntax noprefix
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.startup,"x"
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    sub rsp, 40
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
.L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

In fact, the assembly generated for the loop is identical for every level of optimization:

 .L2:
    jmp .L2
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

The important bits being:

.L2:
    jmp .L2

I can’t read assembly very well, but this is obviously an unconditional loop. The jmp instruction unconditionally resets the program back to the .L2 label without even comparing a value against true, and of course immediately does so again until the program is somehow ended. This directly corresponds to the C/C++ code:

L2:
    goto L2;

Edit:

Interestingly enough, even with no optimizations, the following loops all produced the exact same output (unconditional jmp) in assembly:

while(42) {}

while(1==1) {}

while(2==2) {}

while(4<7) {}

while(3==3 && 4==4) {}

while(8-9 < 0) {}

while(4.3 * 3e4 >= 2 << 6) {}

while(-0.1 + 02) {}

And even to my amazement:

#include<math.h>

while(sqrt(7)) {}

while(hypot(3,4)) {}

Things get a little more interesting with user-defined functions:

int x(void) {
    return 1;
}

while(x()) {}

#include<math.h>

double x(void) {
    return sqrt(7);
}

while(x()) {}

At -O0, these two examples actually call x and perform a comparison for each iteration.

First example (returning 1):

.L4:
    call    x
    testl   %eax, %eax
    jne .L4
    movl    $0, %eax
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

Second example (returning sqrt(7)):

.L4:
    call    x
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    jp  .L4
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    jne .L4
    movl    $0, %eax
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (tdm64-2) 4.8.1"

However, at -O1 and above, they both produce the same assembly as the previous examples (an unconditional jmp back to the preceding label).

TL;DR

Under GCC, the different loops are compiled to identical assembly. The compiler evaluates the constant values and doesn’t bother performing any actual comparison.

The moral of the story is:

There exists a layer of translation between C++ source code and CPU instructions, and this layer has important implications for performance.
Therefore, performance cannot be evaluated by only looking at source code.
The compiler should be smart enough to optimize such trivial cases. Programmers should not waste their time thinking about them in the vast majority of cases.

TL;DR

Leave a Comment Cancel reply