Does a memory barrier ensure that the cache coherence has been completed?

The memory barriers present on the x86 architecture – but this is true in general – not only guarantee that all the previous1 loads, or stores, are completed before any subsequent load or store is executed – they also guarantee that the stores have became globally visible. By globally visible it is meant that other … Read more

What’s the relative speed of floating point add vs. floating point multiply

It also depends on instruction mix. Your processor will have several computation units standing by at any time, and you’ll get maximum throughput if all of them are filled all the time. So, executing a loop of mul’s is just as fast as executing a loop or adds – but the same doesn’t hold if … Read more

What’s the purpose of the rotate instructions (ROL, RCL on x86)?

Rotates are required for bit shifts across multiple words. When you SHL the lower word, the high-order bit spills out into the carry. To complete the operation, you need to shift the higher word(s) while bringing in the carry to the low-order bit. RCL is the instruction that accomplishes this. High word Low word CF … Read more

Why isn’t the instruction pointer a normal register usable with MOV or ADD?

You can’t access it directly because there’s no legitimate use case. Having any arbitrary instruction change eip would make branch prediction very difficult, and would probably open up a whole host of security issues. You can edit eip using jmp, call or ret. You just can’t directly read from or write to eip using normal … Read more

What are the best instruction sequences to generate vector constants on the fly?

All-zero: pxor xmm0,xmm0 (or xorps xmm0,xmm0, one instruction-byte shorter.) There isn’t much difference on modern CPUs, but on Nehalem (before xor-zero elimination), the xorps uop could only run on port 5. I think that’s why compilers favour pxor-zeroing even for registers that will be used with FP instructions. All-ones: pcmpeqw xmm0,xmm0. This is the usual … Read more

Modern x86 cost model

The best reference is the Intel Optimization Manual, which provides fairly detailed information on architectural hazards and instruction latencies for all recent Intel cores, as well as a good number of optimization examples. Another excellent reference is Agner Fog’s optimization resources, which have the virtue of also covering AMD cores. Note that specific cost models … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)