Are different mmx, sse and avx versions complementary or supersets of each other?

Question

They are complementary.

Each new instruction set extension add new instructions and eventually a new programming model (new registers for example).

None are deprecated, deprecating instructions is almost impossible to do for compatibility reasons. However some optional extensions may be absent or removed from newer models (like the FMA4 of AMD) if not very wide spread.
Some are vestigial though, everything that can be done with FPU and MMX for example can be done more efficiently with SSE+.

They are not mutually exclusive in the sense that you can use one or another, after all they are instructions not modes of operation (like real vs protected mode for example).
The only possible “conflict” is between MMX and FPU as they share the lower part of the same set of register but have different programming model.
The new vector registers have grown from 128 bit to 256 bit and to 512 bit, each time the previous registers have become the low part of the newer ones.

You can use all them together, they offer specific hardware support implementing simple operations.

They are like Lego bricks, you are only limited by your imagination (or the imagination of the designers).

Here a simple list of this instruction set extensions.
Only some features are listed, for the complete reference see Intel Manual Vol1 from chapter 9 to 14.

See also https://hjlebbink.github.io/x86doc/ for a table of contents of Intel’s volume 2 (instruction set reference) manual, with a list of extensions that added instructions to that manual entry.

MMX
Introduce eight 64 bit registers (MM0-MM7) and instructions to work with eight signed/unsigned bytes, four signed/unsigned words, two signed/unsigned dwords.
3DNow!
Add support for single precision floating point operand to MMX. Few operation supported, for example addition, subtraction, multiplication.
SSE
Introduce eight/sixteen 128 bit registers (XMM0-XMM7/15) and instruction to work with four single precision floating point operands. Add integer operations on MMX registers too. (The MMX-integer part of SSE is sometimes called MMXEXT, and was implemented on a few non-Intel CPUs without xmm registers and the floating point part of SSE.)
SSE2
Introduces instruction to work with 2 double precision floating point operands, and with packed byte/word/dword/qword integers in 128-bit xmm registers.
SSE3
Add a few varied instructions (mostly floating point), including a special kind of unaligned load (lddqu) that was better on Pentium 4, synchronization instruction, horizontal add/sub.
SSSE3
Again a varied set of instructions, mostly integer. The first shuffle that takes its control operand from a register instead of hard-coded (pshufb). More horizontal processing, shuffle, packing/unpacking, mul+add on bytes, and some specialized integer add/mul stuff.
SSE4 (SSE4.1, SSE4.2)
Add a lot of instructions: Filling in a lot of the gaps by providing min and max and other operations for all integer data types (especially 32-bit integer had been lacking), where previously integer min was only available for unsigned bytes and signed 16-bit. Also scaling, FP rounding, blending, linear algebra operation, text processing, comparisons. Also a non temporal load for reading video memory, or copying it back to main memory. (Previously only NT stores were available.)
AESNI
Add support for accelerating AES symmetric encryption/decryption.
AVX
Add eight/sixteen 256 bit registers (YMM0-YMM7/15).
Support all previous floating point datatype. Three operand instructions.
FMA
Add Fused Multiply Add and correlated instructions.
AVX2
Add support for integer data types.
AVX512F
Add eight/thirty-two 512 bit registers (ZMM0-ZMM7/31) and eight 64-bit mask register (k0-k7). Promote most previous instruction to 512 bit wide. Optional parts of AVX512 add instruction for exponentials & reciprocals (AVX512ER), scatter/gather prefetching (AVX512PF), scatter conflict detection (AVX512CD), compress, expand.
IMCI (Intel Xeon Phi)
Early development of AVX512 for the first-gen Intel Xeon Phi (Knight’s Corner) coprocessor.

Leave a Comment Cancel reply