How do laziness and parallelism coexist in Haskell?

Question

Yes, GHC’s RTS uses thunks to implement non-strict evaluation, and they use mutation under the hood, so they require some synchronisation. However, this is simplified due to the fact that most heap objects are immutable and functions are referentially transparent.

In a multithreaded program, evaluation of a thunk proceeds as follows:

The thunk is atomically^† replaced with a BLACKHOLE object
If the same thread attempts to force the thunk after it’s been updated to a BLACKHOLE, this represents an infinite loop, and the RTS throws an exception (<<loop>>)
If a different thread attempts to force the thunk while it’s a BLACKHOLE, it blocks until the original thread has finished evaluating the thunk and updated it with a value
When evaluation is complete, the original thread atomically^† replaces the thunk with its result

^† e.g., using a compare-and-swap (CAS) instruction

So there is a potential race here: if two threads attempt to force the same thunk at the same time, they may both begin evaluating it. In that case, they will do some redundant work—however, one thread will succeed in overwriting the BLACKHOLE with the result, and the other thread will simply discard the result that it calculated, because its CAS will fail.

Safe code cannot detect this, because it can’t obtain the address of an object or determine the state of a thunk. And in practice, this type of collision is rare for a couple of reasons:

Concurrent code generally partitions workloads across threads in a manner suited to the particular problem, so there is low risk of overlap
Evaluation of thunks is generally fairly “shallow” before you reach weak head normal form, so the probability of a “collision” is low

So thunks ultimately provide a good performance tradeoff when implementing non-strict evaluation, even in a concurrent context.

Leave a Comment Cancel reply