memory-barriers – Tarik Billa

Acquire/release semantics with 4 threads

April 11, 2024 by Tarik

You are thinking in terms of sequential consistency, the strongest (and default) memory order. If this memory order is used, all accesses to atomic variables constitute a total order, and the assertion indeed cannot be triggered. However, in this program, a weaker memory order is used (release stores and acquire loads). This means, by definition … Read more

Memory fences: acquire/load and release/store

January 8, 2024 by Tarik

Say I write some data, and then I write an indication that the data is now ready. It’s imperative that no other thread who sees the indication that the data is ready not see the write of the data itself. So prior writes cannot move past that write. Say I read that some data is … Read more

Does a memory barrier ensure that the cache coherence has been completed?

January 5, 2024 by Tarik

The memory barriers present on the x86 architecture – but this is true in general – not only guarantee that all the previous1 loads, or stores, are completed before any subsequent load or store is executed – they also guarantee that the stores have became globally visible. By globally visible it is meant that other … Read more

Does std::mutex create a fence?

December 28, 2023 by Tarik

As I understand this is covered in: 1.10 Multi-threaded executions and data races Para 5: The library deﬁnes a number of atomic operations (Clause 29) and operations on mutexes (Clause 30) that are specially identiﬁed as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization … Read more

Can atomics suffer spurious stores?

December 25, 2023 by Tarik

Your code makes use of fetch_add() on the atomic, which gives the following guarantee: Atomically replaces the current value with the result of arithmetic addition of the value and arg. The operation is read-modify-write operation. Memory is affected according to the value of order. The semantics are crystal clear: before the operation it’s m, after … Read more

Behavior of memory barrier in Java

September 14, 2023 by Tarik

Doug Lea is right. You can find the relevant part in section §17.4.4 of the Java Language Specification: §17.4.4 Synchronization Order [..] A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where “subsequent” is defined according to the synchronization order). [..] The memory model of the concrete … Read more

When is a compiler-only memory barrier (such as std::atomic_signal_fence) useful?

September 13, 2023 by Tarik

To answer all 5 questions: 1) A compiler fence (by itself, without a CPU fence) is only useful in two situations: To enforce memory order constraints between a single thread and asynchronous interrupt handler bound to that same thread (such as a signal handler). To enforce memory order constraints between multiple threads when it is … Read more

Atomicity of loads and stores on x86

September 11, 2023 by Tarik

It sounds like the atomic operations on memory will be executed directly on memory (RAM). Nope, as long as every possible observer in the system sees the operation as atomic, the operation can involve cache only. Satisfying this requirement is much more difficult for atomic read-modify-write operations (like lock add [mem], eax, especially with an … Read more

Why do I need a memory barrier?

August 10, 2023 by Tarik

Barrier #2 guarentees that the write to _complete gets committed immediately. Otherwise it could remain in a queued state meaning that the read of _complete in B would not see the change caused by A even though B effectively used a volatile read. Of course, this example does not quite do justice to the problem … Read more

Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?

July 13, 2023 by Tarik

Bottom line (TL;DR): LFENCE alone indeed seems useless for memory ordering, however it does not make SFENCE a substitute for MFENCE. The “arithmetic” logic in the question is not applicable. Here is an excerpt from Intel’s Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in … Read more