How to tell whether Haskell will cache a result or recompute it?

Question

primes, in your code, is not a function, but a constant, in haskellspeak known as a CAF. If it took a parameter (say, ()), you would get two different versions of the same list back if calling it twice, but as it is a CAF, you get the exact same list back both times;

As a ghci top-level definition, primes never becomes unreachable, thus the head of the list it points to (and thus its tail/the rest of the computation) is never garbage collected. Adding a parameter prevents retaining that reference, the list would then be garbage collected as (!!) iterates down it to find the right element, and your second call to (!!) would force repetition of the whole computation instead of just traversing the already-computed list.

Note that in compiled programs, there is no top-level scope like in ghci and things get garbage collected when the last reference to them is gone, quite likely before the whole program exits, CAF or not, meaning that your first call would take long, the second one not, and after that, “the future of your program” not referencing the CAF anymore, the memory the CAF takes up is recycled.

The primes package provides a function that takes an argument for (primarily, I’d claim) this very reason, as carrying around half a terabyte of prime numbers might not be what one wants to do.

If you want to really get down to the bottom of this, I recommend reading the STG paper. It doesn’t include newer developments in GHC, but does a great job of explaining how Haskell maps onto assembly, and thus how thunks get eaten by strictness, in general.

Leave a Comment Cancel reply