micro-optimization – Tarik Billa

What is faster: many ifs, or else if?

April 12, 2024 by Tarik

Java: if-return-if-return vs if-return-elseif-return

January 6, 2024 by Tarik

The generated byte code is identical for those two cases, so it’s purely a matter of style. I produced two methods e1 and e2 and both produced this byte code (read using javap -v): public boolean e1(java.lang.Object); Code: Stack=2, Locals=2, Args_size=2 0: aload_0 1: aload_1 2: if_acmpne 7 5: iconst_1 6: ireturn 7: aload_1 8: … Read more

array_push() vs. $array[] = …. Which is fastest? [duplicate]

January 5, 2024 by Tarik

Why date() works twice as fast if we set time zone from code?

December 26, 2023 by Tarik

Modern x86 cost model

December 25, 2023 by Tarik

The best reference is the Intel Optimization Manual, which provides fairly detailed information on architectural hazards and instruction latencies for all recent Intel cores, as well as a good number of optimization examples. Another excellent reference is Agner Fog’s optimization resources, which have the virtue of also covering AMD cores. Note that specific cost models … Read more

How expensive is it to convert between int and double?

December 25, 2023 by Tarik

Here’s what I could dig up myself, for x86-64 doing FP math with SSE2 (not legacy x87 where changing the rounding mode for C++’s truncation semantics was expensive): When I take a look at the generated assembly from clang and gcc, it looks like the cast int to double, it boils down to one instruction: … Read more

‘Correct’ unsigned integer comparison

December 16, 2023 by Tarik

Well, you’ve correctly typified the situation: C/C++ have no way of doing a full signed int/unsigned int comparison with a single compare. I would be surprised if promotion to int64 was faster than doing two comparisons. In my experience, compilers are quite good at realizing that a subexpression like that is pure (has no side … Read more

Is there a faster algorithm for max(ctz(x), ctz(y))?

December 13, 2023 by Tarik

I don’t think there’s anything better than the naive approach for the maximum. One attempt is using the identity x + y = min(x, y) + max(x, y) and thus max(ctz(x), ctz(y)) = ctz(x) + ctz(y) – min(ctz(x), ctz(y)) This way, we can reduce the max function to the min function we already optimized, albeit … Read more

Why does n++ execute faster than n=n+1?

December 7, 2023 by Tarik

That would be true if you are working on a “stone-age” compiler… In case of “stone-age”: ++n is faster than n++ is faster than n=n+1 Machine usually have increment x as well as add const to x In case of n++, you will have 2 memory access only (read n, inc n, write n ) … Read more

Is performance reduced when executing loops whose uop count is not a multiple of processor width?

December 2, 2023 by Tarik

I did some investigation with Linux perf to help answer this on my Skylake i7-6700HQ box, and Haswell results have been kindly provided by another user. The analysis below applies to Skylake, but it is followed by a comparison versus Haswell. Other architectures may vary0, and to help sort it all out I welcome additional … Read more