Using SSE instructions

SSE instructions are processor specific. You can look up which processor supports which SSE version on wikipedia. If SSE code will be faster or not depends on many factors: The first is of course whether the problem is memory-bound or CPU-bound. If the memory bus is the bottleneck SSE will not help much. Try simplifying … Read more

best cross-platform method to get aligned memory

As long as you’re ok with having to call a special function to do the freeing, your approach is okay. I would do your #ifdefs the other way around though: start with the standards-specified options and fall back to platform-specific ones. For example If __STDC_VERSION__ >= 201112L use aligned_alloc. If _POSIX_VERSION >= 200112L use posix_memalign. … Read more

What are the best instruction sequences to generate vector constants on the fly?

All-zero: pxor xmm0,xmm0 (or xorps xmm0,xmm0, one instruction-byte shorter.) There isn’t much difference on modern CPUs, but on Nehalem (before xor-zero elimination), the xorps uop could only run on port 5. I think that’s why compilers favour pxor-zeroing even for registers that will be used with FP instructions. All-ones: pcmpeqw xmm0,xmm0. This is the usual … Read more

print a __m128i variable

Use this function to print them: #include <stdint.h> #include <string.h> void print128_num(__m128i var) { uint16_t val[8]; memcpy(val, &var, sizeof(val)); printf(“Numerical: %i %i %i %i %i %i %i %i \n”, val[0], val[1], val[2], val[3], val[4], val[5], val[6], val[7]); } You split 128bits into 16-bits(or 32-bits) before printing them. This is a way of 64-bit splitting and … Read more

SSE instructions: which CPUs can do atomic 16B memory operations?

In the IntelĀ® 64 and IA-32 Architectures Developer’s Manual: Vol. 3A, which nowadays contains the specifications of the memory ordering white paper you mention, it is said in section 8.1.1 that: The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically: Reading or writing a … Read more

Are different mmx, sse and avx versions complementary or supersets of each other?

They are complementary. Each new instruction set extension add new instructions and eventually a new programming model (new registers for example). None are deprecated, deprecating instructions is almost impossible to do for compatibility reasons. However some optional extensions may be absent or removed from newer models (like the FMA4 of AMD) if not very wide … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)