The performance penalties for types/constraints in Raku?

Question

Raku mandates that type constraints written into the program are enforced at runtime at latest. How that promise is kept is up to the compiler and runtime implementer. I’ll discuss how the Rakudo (compiler) and MoarVM (runtime) pairing does it, because that’s what I’ve worked on.

The initial compilation itself does rather little in terms of analysis to eliminate type checks, and thus the bytecode we produce has a lot of type checks in it. The bet being made here is that analysis takes time, only some of the code will actually find itself on a hot path (or for very short scripts, there is no hot path), so we might as well leave it to the VM to figure out what’s hot, and then focus on those bits.

The VM does the typical profiling a modern runtime does, not only recording what code is hot, but also recording statistics on parameter types, return types, lexical types, and so forth. Despite the amount of potential dynamism that could occur, in a given application the reality is that a huge amount of code is monomorphic (only ever sees one type, or for a routine, one argument type tuple). Another bunch is polymorphic (sees a few different types), and a comparatively tiny amount is megamorphic (loads of types).

Based on the data it gets, the runtime produces specializations: versions of the code compiled based on assumptions about what exact types will show up. Guarding against exact types is cheaper than having to care for subtyping relations and so forth. So at this point we’ve got a version of the code where we have some cheap preconditions up front, and we’ve used them to eliminate the more costly type checks (as well as some extra guards scattered through the code replacing other type checks). However, this isn’t really free…yet.

When calls are made, one of two things can happen:

For small callees, inlining takes place. We inline a specialization of the callee. If the knowledge of types in the caller is already sufficient to prove the type assumptions – which it often is – then there’s no need for any guarding. Essentially, type checks in the callee just became free. We can inline multiple levels deep. Furthermore, inlining lets us trace data flows through the callee, which may let us eliminate further guards, for example about return value types in the callee.
For larger callees, we can perform specialization linking – that is, calling a specialization directly and bypassing its guards, because we can use the type knowledge in the caller to prove we’re meeting the guard assumptions. Again, the callee parameter type checks thus become free.

But what about type-y things that aren’t calls, such as return value type checks and assignments? We compile those as calls too, so we can re-use the same machinery. For example, a return type check, in the case it’s monomorphic (often), turns into a guard + a call to the identity function, and whenever we can prove the guard, that just turns into the identity function, which is a trivial inline.

There’s more to come yet. Of note:

The mechanisms I’ve described above are built around various kinds of caches and guard trees and it’s not all quite so beautiful as I’ve made it sound. Sometimes one needs to build ugly to learn enough to know how to build nice. Thankfully, a current bunch of work is folding all of these learnings into a new, unified, guard and dispatch mechanism, which will also take on various aspects of the language that are very poorly optimized today. That’s due to land in releases within a couple of months.
The current runtime already does some very limited escape analysis and scalar replacement. This means that it can trace data flows into short-lived objects, and thus find even more type checks to eliminate (on top of having eliminated the memory allocations). There’s work underway to make it more powerful, providing partial escape analysis, transitive analysis in order to scalar replace entire object graphs and thus be able to trace data flows, and so types, through them.

Last year, a paper titled Transient typechecks are (almost) free was published. It’s not about Raku/Rakudo/MoarVM at all, but it’s the closest description I’ve seen in academic literature to what we’re doing. That was the first time I realized that maybe we are doing something kinda innovative in this area. 🙂

Leave a Comment Cancel reply