When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

I agree to the point that programming with streams is nice and easier for some scenarios but when we’re losing out on performance, why do we need to use them? Performance is rarely an issue. It would be usual for 10% of your streams would need to be rewritten as loops to get the performance … Read more

What does `rep ret` mean?

There’s a whole blog named after this instruction. And the first post describes the reason behind it: http://repzret.org/p/repzret/ Basically, there was an issue in the AMD’s branch predictor when a single-byte ret immediately followed a conditional jump as in the code you quoted (and a few other situations), and the workaround was to add the … Read more

Why is a conditional move not vulnerable to Branch Prediction Failure?

Mis-predicted branches are expensive A modern processor generally executes between one and three instructions each cycle if things go well (if it does not stall waiting for data dependencies for these instructions to arrive from previous instructions or from memory). The statement above holds surprisingly well for tight loops, but this shouldn’t blind you to … Read more

Is “IF” expensive?

At the very lowest level (in the hardware), yes, ifs are expensive. In order to understand why, you have to understand how pipelines work. The current instruction to be executed is stored in something typically called the instruction pointer (IP) or program counter (PC); these terms are synonymous, but different terms are used with different … Read more

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

GCC supports the function __builtin_expect(long exp, long c) to provide this kind of feature. You can check the documentation here. Where exp is the condition used and c is the expected value. For example in you case you would want if (__builtin_expect(normal, 1)) Because of the awkward syntax this is usually used by defining two … Read more

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

Several of the answers in the question you link talk about rewriting the code to be branchless and thus avoiding any branch prediction issues. That’s what your updated compiler is doing. Specifically, clang++ 10 with -O3 vectorizes the inner loop. See the code on godbolt, lines 36-67 of the assembly. The code is a little … Read more

What is the effect of ordering if…else if statements by probability?

As a general rule, most if not all Intel CPUs assume forward branches are not taken the first time they see them. See Godbolt’s work. After that, the branch goes into a branch prediction cache, and past behavior is used to inform future branch prediction. So in a tight loop, the effect of misordering is … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)