Efficient integer floor function in C++

Casting to int is notoriously slow. Maybe you’ve been living under a rock since x86-64, or otherwise missed that this hasn’t been true for a while on x86. 🙂 SSE/SSE2 have an instruction to convert with truncation (instead of the default rounding mode). The ISA supports this operation efficiently precisely because conversion with C semantics … Read more

Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC

TL;DR The discrepancy you are observing between RDTSC and REFTSC and is due to TurboBoost P-state transitions. During these transitions, most of the core, including the fixed-function performance counter REF_TSC, is halted for approximately 20000-21000 cycles (8.5us), but rdtsc continues at its invariant frequency. rdtsc is probably in an isolated power and clock domain because … Read more

Why did GCC generate mov %eax,%eax and what does it mean?

In x86-64, 32-bit instructions implicitly zero-extend: bits 32-63 are cleared (to avoid false dependencies). So sometimes that’s why you’ll see odd-looking instructions. (Is mov %esi, %esi a no-op or not on x86-64?) However, in this case the previous mov-load is also 32-bit so the high half of %rax is already cleared. The mov %eax, %eax … Read more

Big differences in GCC code generation when compiling as C++ vs C

The extra code is for handling misalignment because the instruction used, vmovdqa64, requires 64 byte alignment. My testing shows that even though the standard doesn’t, gcc does allow a definition in another module to override the one here when in C mode. That definition might only comply with the basic alignment requirements (4 bytes) thus … Read more

Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?

These instruction prefixes have no effect on modern processors (anything newer than Pentium 4). They just cost one byte of code space, and thus, not generating them is the right thing. For details, see Agner Fog’s optimization manuals, in particular 3. Microarchitecture: http://www.agner.org/optimize/ The “Intel® 64 and IA-32 Architectures Optimization Reference Manual” no longer mentions … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)