Variants/case classes/algebraic data types/sums/oh my!

Tue Jun 7 14:05:47 UTC 2016

Hello!

As a Java developer with a background in functional programming and
type systems, I often (which is to say, more or less constantly) find
myself writing emulations of algebraic data types to solve day-to-day
problems in all of my Java projects.

For those that don't know what an algebraic data type is, you may
possibly know them by one of the other names often used:

  * Variant types
  * Case classes (from Scala, Kotlin, Ceylon, etc)
  * Sum types (a term more common in literature than languages)

For those that still don't know them, I have a somewhat old article
here aimed at Java programmers that will probably serve as a reasonable
introduction:

  http://io7m.com/documents/ccat/

I won't spend a great deal of time advocating for their use here.
Suffices to say that they are a very fundamental part of the type
systems such as Haskell, O'Caml, etc, and are used to great effect in
writing very concise code that enjoys many safety and correctness
properties. The types themselves have also been adopted by the majority
of statically typed JVM languages with varying degrees of completeness:

  http://io7m.com/documents/adt-jvm/

An excellent video on how Jane Street use algebraic data types to write
software that absolutely must work correctly is Yaron Minsky's
Effective ML:

  https://www.youtube.com/watch?v=DM2hEBwEWPc

The section at 18 minutes ("Make illegal states unrepresentable") is
where the discussion on these types begins.

Currently, almost all of the statically typed alternative JVM languages
implement some form of algebraic data types. Languages such as Scala
and Frege support them as completely as languages such as Haskell, with
full structural pattern matching. Languages such as Kotlin and Ceylon
implement a more limited form known as case analysis.

There are also numerous library implementations of the types in Java,
but they all have various shortcomings due to lacking the required
support from both the virtual machine and the Java language itself.

I believe that all of the existing languages could benefit from both
JVM and Java language support for the types, and I believe that support
could be provided with minimal, non-intrusive changes to the virtual
machine (an extra piece of class metadata, no new bytecode
instructions, a couple of extra syntax rules in the Java language that
likely build upon the existing switch statement, and no new keywords).

The reason for the additional virtual machine support is that I would
hope that sooner or later the various language implementations would
standardize on an internal representation for the types so that, for
example, a type declared in Kotlin could be used from Java in the same
manner that it could be in Kotlin, with the same static guarantees.
Currently, each language implements the types using proprietary
metadata stored as annotations, the types cannot interoperate properly
between languages.

The reason for the Java language support is because currently, library
implementations of the types have to rely on rather clumsy Visitor-based
approaches, and incur a cost in both performance and ease of use.

I'm writing to this list because I believe there is significant overlap
with the work being done on value types, as algebraic data types are
almost always used in a value-typeish context. Naturally, additions to
the language and VM will overlap other projects, but I had to start
somewhere!

I recently wrote a very rough and low-detail proposal in a GitHub
comment for a project that is looking to implement yet another
generator for pseudo algebraic data types in Java:

  https://github.com/immutables/immutables/issues/47#issuecomment-221533004

I'm hoping that this can be the starting point for discussion. As with
all language proposals: none of the syntax is final and is obviously
subject to changes/improvements/outright destruction.

Regards,
Mark