SIMD instructions lowering CPU frequency

The frequency impact depends on the width of the operation and the specific instruction used. There are three frequency levels, so-called licenses, from fastest to slowest: L0, L1 and L2. L0 is the “nominal” speed you’ll see written on the box: when the chip says “3.5 GHz turbo”, they are referring to the single-core L0 … Read more

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

Most compilers will automatically define: __SSE__ __SSE2__ __SSE3__ __AVX__ __AVX2__ etc, according to whatever command line switches you are passing. You can easily check this with gcc (or gcc-compatible compilers such as clang), like this: $ gcc -msse3 -dM -E – < /dev/null | egrep “SSE|AVX” | sort #define __SSE__ 1 #define __SSE2__ 1 #define … Read more

memory bandwidth for many channels x86 systems

The hardware prefetcher is tuned differently on server vs workstation CPUs. Servers are expected to handle many threads, so the prefetcher will request smaller chunks from RAM. Here is a paper that goes into detail about the issue you’re experiencing, but from the other side of the coin: Hardware Prefetcher Aggressiveness Controllers: Do We Need … Read more

tech