Why would introducing useless MOV store instructions speed up a tight loop in x86_64 assembly?
The most likely cause of the speed improvement is that: inserting a MOV shifts the subsequent instructions to different memory addresses one of those moved instructions was an important conditional branch that branch was being incorrectly predicted due to aliasing in the branch prediction table moving the branch eliminated the alias and allowed the branch … Read more