How to find the position of the only-set-bit in a 64-bit value using bit manipulation efficiently?

Question

Multiply the value by a carefully designed 64-bit constant, then mask off the upper 4 bits. For any CPU with fast 64-bit multiplication, this is probably as optimal as you can get.

int field_set(uint64_t input) {
    uint64_t field = input * 0x20406080a0c0e1ULL;
    return (field >> 60) & 15;
}

// field_set(0x0000000000000000ULL) = 0
// field_set(0x0000000000000080ULL) = 1
// field_set(0x0000000000008000ULL) = 2
// field_set(0x0000000000800000ULL) = 3
// field_set(0x0000000080000000ULL) = 4
// field_set(0x0000008000000000ULL) = 5
// field_set(0x0000800000000000ULL) = 6
// field_set(0x0080000000000000ULL) = 7
// field_set(0x8000000000000000ULL) = 8

clang implements this in three x86_64 instructions, not counting the frame setup and cleanup:

_field_set:
    push   %rbp
    mov    %rsp,%rbp
    movabs $0x20406080a0c0e1,%rax
    imul   %rdi,%rax
    shr    $0x3c,%rax
    pop    %rbp
    retq

Note that the results for any other input will be pretty much random. (So don’t do that.)

I don’t think there’s any feasible way to extend this method to return values in the 7..63 range directly (the structure of the constant doesn’t permit it), but you can convert the results to that range by multiplying the result by 7.

With regard to how this constant was designed: I started with the following observations:

Unsigned multiplication is a fast operation on most CPUs, and can have useful effects. We should use it. 🙂
Multiplying anything by zero results in zero. Since this matches with the desired result for a no-bits-set input, we’re doing well so far.
Multiplying anything by 1ULL<<63 (i.e, your “pos=63” value) can only possibly result in the same value, or zero. (It cannot possibly have any lower bits set, and there are no higher bits to change.) Therefore, we must find some way for this value to be treated as the correct result.
A convenient way of making this value be its own correct result is by right-shifting it by 60 bits. This shifts it down to “8”, which is a convenient enough representation. We can proceed to encode the other outputs as 1 through 7.
Multiplying our constant by each of the other bit fields is equivalent to left-shifting it by a number of bits equal to its “position”. The right-shift by 60 bits causes only the 4 bits to the left of a given position to appear in the result. Thus, we can create all of the cases except for one as follows:
```
 uint64_t constant = (
      1ULL << (60 - 7)
    | 2ULL << (60 - 15)
    | 3ULL << (60 - 23)
    | 4ULL << (60 - 31)
    | 5ULL << (60 - 39)
    | 6ULL << (60 - 47)
    | 7ULL << (60 - 55)
 );
```

So far, the constant is 0x20406080a0c0e0ULL. However, this doesn’t give the right result for pos=63; this constant is even, so multiplying it by that input gives zero. We must set the lowest bit (i.e, constant |= 1ULL) to get that case to work, giving us the final value of 0x20406080a0c0e1ULL.

Note that the construction above can be modified to encode the results differently. However, the output of 8 is fixed as described above, and all other output must fit into 4 bits (i.e, 0 to 15).

Leave a Comment Cancel reply