Performance difference between Windows and Linux using Intel compiler: looking at the assembly

In both cases the arguments and results are passed only in registers, as per the respective calling conventions on Windows and GNU/Linux.

In the GNU/Linux variant, the xmm1 is used for accumulating the sum. Since it’s a call-clobbered register (a.k.a caller-saved) it’s stored (and restored) in the stack frame of the caller on each call.

In the Windows variant, the xmm6 is used for accumulating the sum. This register is callee-saved in the Windows calling convention (but not in the GNU/Linux one).

So, in summary, the GNU/Linux version saves/restores both xmm0 (in the callee[1]) and xmm1 (in the caller), whereas the Windows version saves/restores only xmm6 (in the callee).

[1] need to look at std::errf to figure out why.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)