Is performance reduced when executing loops whose uop count is not a multiple of processor width?

I did some investigation with Linux perf to help answer this on my Skylake i7-6700HQ box, and Haswell results have been kindly provided by another user. The analysis below applies to Skylake, but it is followed by a comparison versus Haswell. Other architectures may vary0, and to help sort it all out I welcome additional … Read more

Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC

TL;DR The discrepancy you are observing between RDTSC and REFTSC and is due to TurboBoost P-state transitions. During these transitions, most of the core, including the fixed-function performance counter REF_TSC, is halted for approximately 20000-21000 cycles (8.5us), but rdtsc continues at its invariant frequency. rdtsc is probably in an isolated power and clock domain because … Read more

Cycles/cost for L1 Cache hit vs. Register on x86?

Here’s a great article on the subject: http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1 To answer your question – yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite costly 😉 PS: The specifics will vary, but this link has some good ballpark figures: Approximate cost to access various caches … Read more

What is locality of reference?

This would not matter if your computer was filled with super-fast memory. But unfortunately that’s not the case and computer-memory looks something like this1: +———-+ | CPU | <<– Our beloved CPU, superfast and always hungry for more data. +———-+ |L1 – Cache| <<– ~4 CPU-cycles access latency (very fast), 2 loads/clock throughput +———-+ |L2 … Read more

Atomicity of loads and stores on x86

It sounds like the atomic operations on memory will be executed directly on memory (RAM). Nope, as long as every possible observer in the system sees the operation as atomic, the operation can involve cache only. Satisfying this requirement is much more difficult for atomic read-modify-write operations (like lock add [mem], eax, especially with an … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)