Efficient integer compare function

Question

This one has no branches, and doesn’t suffer from overflow or underflow:

return (a > b) - (a < b);

With gcc -O2 -S, this compiles down to the following six instructions:

xorl    %eax, %eax
cmpl    %esi, %edi
setl    %dl
setg    %al
movzbl  %dl, %edx
subl    %edx, %eax

Here’s some code to benchmark various compare implementations:

#include <stdio.h>
#include <stdlib.h>

#define COUNT 1024
#define LOOPS 500
#define COMPARE compare2
#define USE_RAND 1

int arr[COUNT];

int compare1 (int a, int b)
{
    if (a < b) return -1;
    if (a > b) return 1;
    return 0;
}

int compare2 (int a, int b)
{
    return (a > b) - (a < b);
}

int compare3 (int a, int b)
{
    return (a < b) ? -1 : (a > b);
}

int compare4 (int a, int b)
{
    __asm__ __volatile__ (
        "sub %1, %0 \n\t"
        "jno 1f \n\t"
        "cmc \n\t"
        "rcr %0 \n\t"
        "1: "
    : "+r"(a)
    : "r"(b)
    : "cc");
    return a;
}

int main ()
{
    for (int i = 0; i < COUNT; i++) {
#if USE_RAND
        arr[i] = rand();
#else
        for (int b = 0; b < sizeof(arr[i]); b++) {
            *((unsigned char *)&arr[i] + b) = rand();
        }
#endif
    }

    int sum = 0;

    for (int l = 0; l < LOOPS; l++) {
        for (int i = 0; i < COUNT; i++) {
            for (int j = 0; j < COUNT; j++) {
                sum += COMPARE(arr[i], arr[j]);
            }
        }
    }

    printf("%d=0\n", sum);

    return 0;
}

The results on my 64-bit system, compiled with gcc -std=c99 -O2, for positive integers (USE_RAND=1):

compare1: 0m1.118s
compare2: 0m0.756s
compare3: 0m1.101s
compare4: 0m0.561s

Out of C-only solutions, the one I suggested was the fastest. user315052’s solution was slower despite compiling to only 5 instructions. The slowdown is likely because, despite having one less instruction, there is a conditional instruction (cmovge).

Overall, FredOverflow’s 4-instruction assembly implementation was the fastest when used with positive integers. However, this code only benchmarked the integer range RAND_MAX, so the 4-instuction test is biased, because it handles overflows separately, and these don’t occur in the test; the speed may be due to successful branch prediction.

With a full range of integers (USE_RAND=0), the 4-instruction solution is in fact very slow (others are the same):

compare4: 0m1.897s

Leave a Comment Cancel reply