Is fastcall really faster?
It depends on the platform. For a Xenon PowerPC, for example, it can be an order of magnitude difference due to a load-hit-store issue with passing data on the stack. I empirically timed the overhead of a cdecl function at about 45 cycles compared to ~4 for a fastcall. For an out-of-order x86 (Intel and … Read more