Strangely I have a simple answer: Because ICC isn’t optimal.
When you write own compiler you get started with some very basic set of operation codes: NOP
, MOV
, ADD
… up to 10 opcodes. You don’t use SUB
for a while because it might easily be replaced by: ADD NEGgative operand
. NEG
isn’t basic as well, as it might be replaced by: XOR FFFF...; ADD 1
.
So you implement rather complex bit-based addressing of operand types and sizes. You do it for a single machine code instruction (eg. ADD
) and plan to use it further for most other instructions. But by this time your co-worker finishes implementation of optimal calculation of remainder without use of SUB
! Imagine – it’s already called “Optimal_Mod” so you miss some inoptimal thing inside not because you’re a bad guy and hate AMD but just because you see – it’s already called optimal, optimized.
Intel Compiler is pretty good in general, but it has a long version history, so it can behave strange in some rare cases. I suggest you inform Intel about this issue and look what will happen.