Stefan's Site

Types Representing Meaning Instead of Storage


I have a broad background in engineering not being raised in computer science. I am mostly autodidact in programming and software engineering which makes me sometimes think in other ways compared to genuine computer science people. During the past years, I have been jumping on the functional programming train and started learning the beauties of the constructions there in a similar way as I did with object oriented programming during the 1990s.

Recently I worked with a web based application depending on a relatively complicated database. The unique keys of the ables were represented as integers, and their names were in some cases a bit too similar to each other.

For this particular project, I was using ScalaJS, which is a pragmatic approach to learning functional programming while having the opportunity to write imperative-style object oriented code at the same time. Scala also has a powerful type system. Based on my Scala knowledge back then I ended up representing the unique keys as type aliases:

type ServerId = Long

The ServerId alias served its purpose well as a code annotation but did not make the compiler assist when I started sending wrong identifiers to functions resulting in strange runtime errors and corrupted database relations.

I focused too much on the actual representation of the identifier (Long) when I should have tried to express the meaning instead. At the time I had not come across Value Classes yet, and I looked jealously at the newtype construction of Haskell. One solution would have been to box the Long into a trait with the advantage of being able to express which identifiers are similar and which are not through a class hierarchy. But for identifiers, that type of complexity is usually not needed since an identifier either represents something or not. Also, boxing a value in a class results in some (possibly negligible) overhead.

Value Classes

Then I found out about value classes. Suddenly I was able to give contextual meaning to a zero overhead representation of a storage type. The compiler would just not allow me to mix these types at all. Now I could utilize my unique keys as:

case class UserId(id: Long) extends AnyVal
case class GroupId(id: Long) extends AnyVal

case class SomeDbRow(user: UserId, group: GroupId, value: String)

The compiler does not let me put a UserId where a GroupId is expected, and vice versa since they are handled as different incompatible types. If I would be using type aliases the compiler would not stop me. For the Java people, it seems as if you will get value types in version 10.

Namespaced Keywords

Next, I moved on to Clojure. I found parts of the language simple and beautiful (there is something special about LISP) but as the program grew I started missing the type system from Scala more and more. My experience from working with Python taught me that I started losing grip of the program around 3000 lines of code. Scala in combination with IntelliJ did free a lot of my cognitive resources by not having to keep track of names and types of fields.

I know Rich Hickey (the creator of Clojure) has been talking about simplicity and limited cognitive capability of humans as a motivation for not introducing a type system in Clojure. Sure, working with dynamically typed data structures is productive to start with and designing similar data structures in Scala requires you to define your data structures explicitly (consuming cognitive functions), but my experience is that being explicit about definitions pays off in the long run as software grows. In that sense getting assistance from IntelliJ frees some of my cognitive resources.

So, the dynamic type system of Clojure did not seem to be able to help me much to avoid mixing up identifiers. But the liberal naming conventions of keywords makes it possible to use namespaced keywords like :user/id and :group/id instead of just :id. Using this in combination with schema or spec can at least give you sensible runtime errors without trying to run as hard into a wall as possible as JavaScript does.

Units of Measure

In engineering, we usually do not talk about storage space for values, but rather its unit of measure. A value can take several different numerical values which, if converted to the same unit of measure, are identical, i.e. they describe the same thing. However, units of measure do not give the value a particular context, like "this is the diameter of my bicycle rim" versus "this is the depth of the Mariana Trench." Both of these measures can be given in units representing length, but those units alone does not tell us "the length of what?"

There are surprisingly few programming languages which incorporate units of measure natively. F# is one language which has included the possibility to define values with types for units of measure. F# also has the ability to convert values between compatible units automatically and infer the resulting unit of some calculation.

Dependent Types

Dependent type systems (like in Coq, Agda and Idris can, for example, express matrix dimensions as types in linear algebra like:

matrixDot : Matrix n m x -> Matrix m k y -> Matrix n k z

which in math notation would be \(x_{n,m} \cdot y_{m,k} = z_{n,k}\) where \(x\), \(y\) and \(z\) are matrices with their dimensionality in their respective suffix notation. This gives the compiler the opportunity to stop us from applying functions to matrices of incompatible dimensionality.


To conclude I think dependent type systems is a step in the right direction, but more programming languages need to include units of measure into their type systems. I believe a good type system should be able to express:

  1. Storage of types, like int32, double, uint8, etc.
  2. The context of types, like identifiers, domains for values.
  3. Units of measure of a type, like a length, volume, and mass.
  4. Properties of the type, like matrix dimensions and general dependent types.

If all of these four traits are included and used in a sensible way many potential problems can be identified at compile time rather than during runtime. And when combined they give values a deeper meaning than just telling us how much memory it will consume.