What does `std::kill_dependency` do, and why would I want to use it?

Question

The purpose of memory_order_consume is to ensure the compiler does not do certain unfortunate optimizations that may break lockless algorithms. For example, consider this code:

int t;
volatile int a, b;

t = *x;
a = t;
b = t;

A conforming compiler may transform this into:

a = *x;
b = *x;

Thus, a may not equal b. It may also do:

t2 = *x;
// use t2 somewhere
// later
t = *x;
a = t2;
b = t;

By using load(memory_order_consume), we require that uses of the value being loaded not be moved prior to the point of use. In other words,

t = x.load(memory_order_consume);
a = t;
b = t;
assert(a == b); // always true

The standard document considers a case where you may only be interested in ordering certain fields of a structure. The example is:

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);

This instructs the compiler that it is allowed to, effectively, do this:

predicted_r2 = x->index; // unordered load
r1 = x; // ordered load
r2 = r1->index;
do_something_with(a[predicted_r2]); // may be faster than waiting for r2's value to be available

Or even this:

predicted_r2 = x->index; // unordered load
predicted_a  = a[predicted_r2]; // get the CPU loading it early on
r1 = x; // ordered load
r2 = r1->index; // ordered load
do_something_with(predicted_a);

If the compiler knows that do_something_with won’t change the result of the loads for r1 or r2, then it can even hoist it all the way up:

do_something_with(a[x->index]); // completely unordered
r1 = x; // ordered
r2 = r1->index; // ordered

This allows the compiler a little more freedom in its optimization.

Leave a Comment Cancel reply