Cycles/cost for L1 Cache hit vs. Register on x86?

Here’s a great article on the subject: http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1 To answer your question – yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite costly 😉 PS: The specifics will vary, but this link has some good ballpark figures: Approximate cost to access various caches … Read more

x86_64 best way to reduce 64 bit register to 32 bit retaining zero or non-zero status

Fewest uops (front-end bandwidth): 1 uop, latency 3c (Intel) or 1c (Zen). Also smallest code-size, 5 bytes. popcnt %rax, %rax # 5 bytes, 1 uop for one port # if using a different dst, note the output dependency on Intel before ICL On most CPUs that have it at all, it’s 3c latency, 1c throughput … Read more

Why are DateTime.Now DateTime.UtcNow so slow/expensive

TickCount just reads a constantly increasing counter. It’s just about the simplest thing you can do. DateTime.UtcNow needs to query the system time – and don’t forget that while TickCount is blissfully ignorant of things like the user changing the clock, or NTP, UtcNow has to take this into account. Now you’ve expressed a performance … Read more

x > -1 vs x >= 0, is there a performance difference

It is very much dependent on the underlying architecture, but any difference will be minuscule. If anything, I’d expect (x >= 0) to be slightly faster, as comparison with 0 comes for free on some instruction sets (such as ARM). Of course, any sensible compiler will choose the best implementation regardless of which variant is … Read more

Why does breaking the “output dependency” of LZCNT matter?

This is simply a limitation in the micro-architecture of your Intel Haswell CPU and several previous1 CPUs. It has been fixed for tzcnt and lzcnt as of Skylake-S (client), but the issue remained for popcnt until it was fixed in Cannon Lake. On those micro-architectures the destination operand for tzcnt, lzcnt and popcnt is treated … Read more

Why are loops always compiled into “do…while” style (tail jump)?

Related: asm loop basics: While, Do While, For loops in Assembly Language (emu8086) Terminology: Wikipedia says “loop inversion” is the name for turning a while(x) into if(x) do{}while(x), putting the condition at the bottom of the loop where it belongs. Fewer instructions / uops inside the loop = better. Structuring the code outside the loop … Read more

Why do none of the major compilers optimize this conditional store that checks if the value is already set?

The object might be const It wouldn’t be safe for static const int val = 1; living in read-only memory. The unconditional-store version will segfault trying to write to read-only memory. The version that checks first is safe to call on that object in the C++ abstract machine (via const_cast), so the optimizer has to … Read more

Does calculating Sqrt(x) as x * InvSqrt(x) make any sense in the Doom 3 BFG code?

I can see two reasons for doing it this way: firstly, the “fast invSqrt” method (really Newton Raphson) is now the method used in a lot of hardware, so this approach leaves open the possibility of taking advantage of such hardware (and doing potentially four or more such operations at once). This article discusses it … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)