Is there a version of TensorFlow not compiled for AVX instructions?

A best practices approach suggested by peter-cordes is to see what gcc is going to make of your ‘what capabilities your cpu has’ by issuing the following: gcc -O3 -fverbose-asm -march=native -xc /dev/null -S -o- | less This command will provide information (all) about your cpu capabilities from the view of gcc, whom is going … Read more

What are the best instruction sequences to generate vector constants on the fly?

All-zero: pxor xmm0,xmm0 (or xorps xmm0,xmm0, one instruction-byte shorter.) There isn’t much difference on modern CPUs, but on Nehalem (before xor-zero elimination), the xorps uop could only run on port 5. I think that’s why compilers favour pxor-zeroing even for registers that will be used with FP instructions. All-ones: pcmpeqw xmm0,xmm0. This is the usual … Read more

Are different mmx, sse and avx versions complementary or supersets of each other?

They are complementary. Each new instruction set extension add new instructions and eventually a new programming model (new registers for example). None are deprecated, deprecating instructions is almost impossible to do for compatibility reasons. However some optional extensions may be absent or removed from newer models (like the FMA4 of AMD) if not very wide … Read more

Intel SSE and AVX Examples and Tutorials [closed]

For the visually inclined SIMD programmer, Stefano Tommesani’s site is the best introduction to x86 SIMD programming. http://www.tommesani.com/index.php/simd/46-sse-arithmetic.html The diagrams are only provided for MMX and SSE2, but once a learner gets proficient with SSE2, it is relatively easy to move on and read the formal specifications. Intel IA-32 Instructions beginning with A to M … Read more

tech