Why do compilers duplicate some instructions?

Question

Compilers can perform optimisations that are not obvious to people and removing instructions does not always make things faster.

A small amount of searching shows that various AMD processors have branch prediction problems when a RET is immediately after a conditional branch. By filling that slot with what is essentially a no-op, the performance problem is avoided.

Update:

Example reference, section 6.2 of the “Software Optimization Guide for AMD64 Processors” (see http://support.amd.com/TechDocs/25112.PDF) says:

Specifically, avoid the following two situations:

Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET instruction as its target. See “Examples.”
A conditional branch that occurs in the code directly before the single-byte near-return RET instruction.

It also goes into detail on why jump targets should have alignment which is also likely to explain the duplicate RETs at the end of the function.

Leave a Comment Cancel reply