Acquire/release semantics with 4 threads

You are thinking in terms of sequential consistency, the strongest (and default) memory order. If this memory order is used, all accesses to atomic variables constitute a total order, and the assertion indeed cannot be triggered. However, in this program, a weaker memory order is used (release stores and acquire loads). This means, by definition … Read more

Does a memory barrier ensure that the cache coherence has been completed?

The memory barriers present on the x86 architecture – but this is true in general – not only guarantee that all the previous1 loads, or stores, are completed before any subsequent load or store is executed – they also guarantee that the stores have became globally visible. By globally visible it is meant that other … Read more

Does std::mutex create a fence?

As I understand this is covered in: 1.10 Multi-threaded executions and data races Para 5: The library defines a number of atomic operations (Clause 29) and operations on mutexes (Clause 30) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization … Read more

Can atomics suffer spurious stores?

Your code makes use of fetch_add() on the atomic, which gives the following guarantee: Atomically replaces the current value with the result of arithmetic addition of the value and arg. The operation is read-modify-write operation. Memory is affected according to the value of order. The semantics are crystal clear: before the operation it’s m, after … Read more

Behavior of memory barrier in Java

Doug Lea is right. You can find the relevant part in section §17.4.4 of the Java Language Specification: §17.4.4 Synchronization Order [..] A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where “subsequent” is defined according to the synchronization order). [..] The memory model of the concrete … Read more

When is a compiler-only memory barrier (such as std::atomic_signal_fence) useful?

To answer all 5 questions: 1) A compiler fence (by itself, without a CPU fence) is only useful in two situations: To enforce memory order constraints between a single thread and asynchronous interrupt handler bound to that same thread (such as a signal handler). To enforce memory order constraints between multiple threads when it is … Read more

Atomicity of loads and stores on x86

It sounds like the atomic operations on memory will be executed directly on memory (RAM). Nope, as long as every possible observer in the system sees the operation as atomic, the operation can involve cache only. Satisfying this requirement is much more difficult for atomic read-modify-write operations (like lock add [mem], eax, especially with an … Read more

Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?

Bottom line (TL;DR): LFENCE alone indeed seems useless for memory ordering, however it does not make SFENCE a substitute for MFENCE. The “arithmetic” logic in the question is not applicable. Here is an excerpt from Intel’s Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in … Read more

tech