Performance difference between Windows and Linux using Intel compiler: looking at the assembly
In both cases the arguments and results are passed only in registers, as per the respective calling conventions on Windows and GNU/Linux. In the GNU/Linux variant, the xmm1 is used for accumulating the sum. Since it’s a call-clobbered register (a.k.a caller-saved) it’s stored (and restored) in the stack frame of the caller on each call. … Read more