Why does gcc not emit the specified instruction?
A compiler produces code that must have the observable behavior specified by the Standard. Anything that is not observable can be changed (and optimized) at will, as it does not change the behavior of the program (as specified).
How can you beat it into submission?
The trick is to make the compiler believe that the behavior of the particular piece of code is actually observable.
Since this a problem frequently encountered in micro-benchmark, I advise you to look how (for example) Google-Benchmark addresses this. From benchmark_api.h
we get:
template <class Tp>
inline void DoNotOptimize(Tp const& value) {
asm volatile("" : : "g"(value) : "memory");
}
The details of this syntax are boring, for our purpose we only need to know:
"g"(value)
tells thatvalue
is used as input to the statement"memory"
is a compile-time read/write barrier
So, we can change the code to:
asm volatile("" : : : "memory");
__m128 result = _mm_div_ss(s1, s2);
asm volatile("" : : "g"(result) : );
Which:
- forces the compiler to consider that
s1
ands2
may have been modified between their initialization and use - forces the compiler to consider that the result of the operation is used
There is no need for any flag, and it should work at any level of optimization (I tested it on https://gcc.godbolt.org/ at -O3).