What is the difference between using explicit fences and std::atomic?

Question

Although various answers cover bits and pieces of what the potential problem is and/or provide useful information, no answer correctly describes the potential issues for all three cases.

In order to synchronize memory operations between threads, release and acquire barriers are used to specify ordering.
In the diagram, memory operations A in thread 1 cannot move down across the (one-way) release barrier (regardless whether that is a release operation on an atomic store,
or a standalone release fence followed by a relaxed atomic store). Hence memory operations A are guaranteed to happen before the atomic store.
Same goes for memory operations B in thread 2 which cannot move up across the acquire barrier; hence the atomic load happens before memory operations B.

enter image description here

The atomic ptr itself provides inter-thread ordering based on the guarantee that it has a single modification order. As soon as thread 2 sees a value for ptr,
it is guaranteed that the store (and thus memory operations A) happened before the load. Because the load is guaranteed to happen before memory operations B,
the rules for transitivity say that memory operations A happen before B and synchronization is complete.

With that, let’s look at your 3 cases.

Case 1 is broken because ptr, a non-atomic type, is modified in different threads. That is a classical example of a data race and it causes undefined behavior.

Case 2 is correct. As an argument, the integer allocation with new is sequenced before the release operation. This is equivalent to:

// Case 2: atomic var, automatic fence
std::atomic<int*> ptr;
// ...
int *tmp = new int(-4);
ptr.store(tmp, std::memory_order_release);

Case 3 is broken, albeit in a subtle way. The problem is that even though the ptr assignment is correctly sequenced after the standalone fence,
the integer allocation (new) is also sequenced after the fence, causing a data race on the integer memory location.

the code is equivalent to:

// Case 3: atomic var, manual fence
std::atomic<int*> ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);

int *tmp = new int(-4);
ptr.store(tmp, std::memory_order_relaxed);

If you map that to the diagram above, the new operator is supposed to be part of memory operations A. Being sequenced below the release fence,
ordering guarantees no longer hold and the integer allocation may actually be reordered with memory operations B in thread 2.
Therefore, a load() in thread 2 may return garbage or cause other undefined behavior.

Leave a Comment Cancel reply