Performance optimisations of x86-64 assembly – Alignment and branch prediction

Alignment optimisations 1. Use .p2align <abs-expr> <abs-expr> <abs-expr> instead of align. Grants fine-grained control using its 3 params param1 – Align to what boundary. param2 – Fill padding with what (zeroes or NOPs). param3 – Do NOT align if padding would exceed specified number of bytes. 2. Align the start of a frequently used code … Read more

SSE2 option in Visual C++ (x64)

Seems to be all 64-bit processors has SSE2. Since compiler option always switched on by default no need to switch it on manually. From Wikipedia: SSE instructions: The original AMD64 architecture adopted Intel’s SSE and SSE2 as core instructions. SSE3 instructions were added in April 2005. SSE2 replaces the x87 instruction set’s IEEE 80-bit precision … Read more

Why is strcmp not SIMD optimized?

In a SSE2 implementation, how should the compiler make sure that no memory accesses happen over the end of the string? It has to know the length first and this requires scanning the string for the terminating zero byte. If you scan for the length of the string you have already accomplished most of the … Read more

SSE SSE2 and SSE3 for GNU C++ [closed]

Sorry don’t know of a tutorial. Your best bet (IMHO) is to use SSE via the “intrinsic” functions Intel provides to wrap (generally) single SSE instructions. These are made available via a set of include files named *mmintrin.h e.g xmmintrin.h is the original SSE instruction set. Begin familiar with the contents of Intel’s Optimization Reference … Read more