micro-optimization – Page 4

INC instruction vs ADD 1: Does it matter?

April 20, 2023 by Tarik

Update: the Efficiency cores on Alder Lake are Gracemont, and run inc reg as a single uop, but at only 1/clock, vs. 4/clock for add reg, 1 (https://uops.info/). This may be a false dependency on FLAGS like P4 had; the uops.info tests didn’t try adding a dep-breaking instruction. Other than the TL:DR, I haven’t updated … Read more

“enter” vs “push ebp; mov ebp, esp; sub esp, imm” and “leave” vs “mov esp, ebp; pop ebp”

April 2, 2023 by Tarik

There is a performance difference, especially for enter. On modern processors this decodes to some 10 to 20 µops, while the three instruction sequence is about 4 to 6, depending on the architecture. For details consult Agner Fog’s instruction tables. Additionally the enter instruction usually has a quite high latency, for example 8 clocks on … Read more

Divide by 10 using bit shifts?

March 31, 2023 by Tarik

Editor’s note: this is not actually what compilers do, and gives the wrong answer for large positive integers ending with 9, starting with div10(1073741829) = 107374183 not 107374182 (Godbolt). It is exact for inputs smaller than 0x40000005, though, which may be sufficient for some uses. Compilers (including MSVC) do use fixed-point multiplicative inverses for constant … Read more

Weird use of `?:` in `typeid` code

March 30, 2023 by Tarik

I think it is an optimisation! A little known and rarely (you could say “never”) used feature of typeid is that a null dereference of the argument of typeid throws an exception instead of the usual UB. What? Are you serious? Are you drunk? Indeed. Yes. No. int *p = 0; *p; // UB typeid … Read more

What is the fastest way to find if a number is even or odd?

March 26, 2023 by Tarik

It is pretty well known that static inline int is_odd_A(int x) { return x & 1; } is more efficient than static inline int is_odd_B(int x) { return x % 2; } But with the optimizer on, will is_odd_B be no different from is_odd_A? No — with gcc-4.2 -O2, we get, (in ARM assembly): _is_odd_A: … Read more

How to force GCC to assume that a floating-point expression is non-negative?

March 24, 2023 by Tarik

You can write assert(x*x >= 0.f) as a compile-time promise instead of a runtime check as follows in GNU C: #include <cmath> float test1 (float x) { float tmp = x*x; if (!(tmp >= 0.0f)) __builtin_unreachable(); return std::sqrt(tmp); } (related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable() in a macro and call … Read more

Does using xor reg, reg give advantage over mov reg, 0? [duplicate]

March 21, 2023 by Tarik

an actual answer for you: Intel 64 and IA-32 Architectures Optimization Reference Manual Section 3.5.1.7 is where you want to look. In short there are situations where an xor or a mov may be preferred. The issues center around dependency chains and preservation of condition codes. In processors based on Intel Core microarchitecture, a number … Read more

Fast method to copy memory with translation – ARGB to BGR

March 7, 2023 by Tarik

I wrote 4 different versions which work by swapping bytes. I compiled them using gcc 4.2.1 with -O3 -mssse3, ran them 10 times over 32MB of random data and found the averages. Editor’s note: the original inline asm used unsafe constraints, e.g. modifying input-only operands, and not telling the compiler about the side effect on … Read more

Is reading the `length` property of an array really that expensive an operation in JavaScript?

February 16, 2023 by Tarik

Well, I would have said it was expensive, but then I wrote a little test @ jsperf.com and to my surprise using i<array.length actually was faster in Chrome, and in FF(4) it didn’t matter. My suspicion is that length is stored as an integer (Uint32). From the ECMA-specs (262 ed. 5, page 121): Every Array … Read more

Is it possible to tell the branch predictor how likely it is to follow the branch?

February 8, 2023 by Tarik

Yes, but it will have no effect. Exceptions are older (obsolete) architectures pre Netburst, and even then it doesn’t do anything measurable. There is an “branch hint” opcode Intel introduced with the Netburst architecture, and a default static branch prediction for cold jumps (backward predicted taken, forward predicted non taken) on some older architectures. GCC … Read more