What are the problems with an ADT encoding that associates types with data constructors? (Such as Scala.)

Question

To the best of my knowledge, there are two reasons why Scala’s idiomatic encoding of case classes can be bad: type inference, and type specificity. The former is a matter of syntactic convenience, while the latter is a matter of increased scope of reasoning.

The subtyping issue is relatively easy to illustrate:

val x = Some(42)

The type of x turns out to be Some[Int], which is probably not what you wanted. You can generate similar issues in other, more problematic areas:

sealed trait ADT
case class Case1(x: Int) extends ADT
case class Case2(x: String) extends ADT

val xs = List(Case1(42), Case1(12))

The type of xs is List[Case1]. This is basically guaranteed to be not what you want. In order to get around this issue, containers like List need to be covariant in their type parameter. Unfortunately, covariance introduces a whole bucket of issues, and in fact degrades the soundness of certain constructs (e.g. Scalaz compromises on its Monad type and several monad transformers by allowing covariant containers, despite the fact that it is unsound to do so).

So, encoding ADTs in this fashion has a somewhat viral effect on your code. Not only do you need to deal with subtyping in the ADT itself, but every container you ever write needs to take into account the fact that you’re landing on subtypes of your ADT at inopportune moments.

The second reason not to encode your ADTs using public case classes is to avoid cluttering up your type space with “non-types”. From a certain perspective, ADT cases are not really types: they are data. If you reason about ADTs in this fashion (which is not wrong!), then having first-class types for each of your ADT cases increases the set of things you need to carry in your mind to reason about your code.

For example, consider the ADT algebra from above. If you want to reason about code which uses this ADT, you need to be constantly thinking about “well, what if this type is Case1?” That just not a question anyone really needs to ask, since Case1 is data. It’s a tag for a particular coproduct case. That’s all.

Personally, I don’t care much about any of the above. I mean, the unsoundness issues with covariance are real, but I generally just prefer to make my containers invariant and instruct my users to “suck it up and annotate your types”. It’s inconvenient and it’s dumb, but I find it preferable to the alternative, which is a lot of boilerplate folds and “lower-case” data constructors.

As a wildcard, a third potential disadvantage to this sort of type specificity is it encourages (or rather, allows) a more “object-oriented” style where you put case-specific functions on the individual ADT types. I think there is very little question that mixing your metaphors (case classes vs subtype polymorphism) in this way is a recipe for bad. However, whether or not this outcome is the fault of typed cases is sort of an open question.

Leave a Comment Cancel reply