swar – Tarik Billa

Subtracting packed 8-bit integers in an 64-bit integer by 1 in parallel, SWAR without hardware SIMD

February 7, 2023 by Tarik

If you have a CPU with efficient SIMD instructions, SSE/MMX paddb (_mm_add_epi8) is also viable. Peter Cordes’ answer also describes GNU C (gcc/clang) vector syntax, and safety for strict-aliasing UB. I strongly encourage reviewing that answer as well. Doing it yourself with uint64_t is fully portable, but still requires care to avoid alignment problems and … Read more