Why is strcmp not SIMD optimized?

In a SSE2 implementation, how should the compiler make sure that no memory accesses happen over the end of the string? It has to know the length first and this requires scanning the string for the terminating zero byte. If you scan for the length of the string you have already accomplished most of the … Read more

Intel SSE and AVX Examples and Tutorials [closed]

For the visually inclined SIMD programmer, Stefano Tommesani’s site is the best introduction to x86 SIMD programming. http://www.tommesani.com/index.php/simd/46-sse-arithmetic.html The diagrams are only provided for MMX and SSE2, but once a learner gets proficient with SSE2, it is relatively easy to move on and read the formal specifications. Intel IA-32 Instructions beginning with A to M … Read more

Difference between MOVDQA and MOVAPS x86 instructions?

In functionality, they are identical. On some (but not all) micro-architectures, there are timing differences due to “domain crossing penalties”. For this reason, one should generally use movdqa when the data is being used with integer SSE instructions, and movaps when the data is being used with floating-point instructions. For more information on this subject, … Read more

What does ordered / unordered comparison mean?

An ordered comparison checks if neither operand is NaN. Conversely, an unordered comparison checks if either operand is a NaN. This page gives some more information on this: http://csapp.cs.cmu.edu/public/waside/waside-sse.pdf (section 5) The idea here is that comparisons with NaN are indeterminate. (can’t decide the result) So an ordered/unordered comparison checks if this is (or isn’t) … Read more

AVX2 what is the most efficient way to pack left based on a mask?

AVX2 + BMI2. See my other answer for AVX512. (Update: saved a pdep in 64bit builds.) We can use AVX2 vpermps (_mm256_permutevar8x32_ps) (or the integer equivalent, vpermd) to do a lane-crossing variable-shuffle. We can generate masks on the fly, since BMI2 pext (Parallel Bits Extract) provides us with a bitwise version of the operation we … Read more

How to determine if memory is aligned?

#define is_aligned(POINTER, BYTE_COUNT) \ (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0) The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *. If you want type safety, consider using an inline function: static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count) { … Read more

Where can I find an official reference listing the operation of SSE intrinsic functions?

As well as Intel’s vol.2 PDF manual, there is also an online intrinsics guide. The Intel® Intrinsics Guide contains reference information for Intel intrinsics, which provide access to Intel instructions such as Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Advanced Vector Extensions (Intel® AVX), and Intel® Advanced Vector Extensions 2 (Intel® AVX2). It has a … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)