Project Valhalla: Goals

Fri Oct 14 19:23:21 UTC 2016

I've heard a number of people describe Valhalla recently as being 
"primarily about performance".  While it is understandable why people 
might come to that conclusion -- many of the motivations for Valhalla 
are, in fact, rooted in performance considerations -- this 
characterization misses something very important.  Yes, performance is 
an important part of the story -- but so are safety, abstraction, 
encapsulation, expressiveness, maintainability, and compatible library 
evolution.

The major goals of Valhalla are:

  - Align JVM memory layout behavior with the cost model of modern hardware;
  - Extend generics to allow abstraction over all types, including 
primitives, values, and even void;
  - Enable existing libraries -- especially the JDK -- to compatibly 
evolve to fully take advantage of these features.

Let's take these in turn.

*1. **Align JVM memory layout behavior with the cost model of modern 
hardware. *

Java's approach of "(almost) everything is an object" was a reasonable 
match for the hardware and compilation technology of the mid-nineties, 
when the cost of a memory fetch and an arithmetic operation were of 
comparable magnitude.  But since then, these costs have diverged by a 
factor of several hundred. At the same time, memory costs have come to 
dominate application provisioning costs.  These two considerations 
combine to make the graph-of-small-objects data representation, which 
typical Java programs result in, suboptimal in both program performance 
and provisioning cost.

The root cause of this is a partly accidental one: object identity.  
Every object has an identity, but not all objects /need/ an identity -- 
many objects represent values, such as decimal numbers, currency 
amounts, cursors, or dates and times, and do not need this identity.  
This need to preserve identity foils many otherwise powerful optimizations.

Our solution for this is /value types/; value types are aggregates, like 
traditional Java classes, that renounce their identity.  In return, this 
enables us to create data structures that are /flatter/ (because values 
can be inlined into objects, arrays, and other values, just as 
primitives are today) and /denser/ (because we don't waste space on 
object headers and pointers, which can increase memory usage by up to 
4x), with a programming model more like objects -- supporting nominal 
substructure, behavior, subtyping, and encapsulation. /Codes like a 
class, works like an int./

If you view values as being "faster objects", you could indeed view this 
fundamental aspect of Valhalla as being primarily about efficiency.  But 
equally, you could view them as "programmable primitives", in which case 
it also becomes about better abstraction, encapsulation, readability, 
maintainability, and type safety.

Which is the real point -- that we need not force users to choose 
between abstraction/encapsulation/safety and performance.  We can have 
both.  Whether you call that "cheaper objects" or "richer primitives", 
the end result is the same.

*2. **E**xtend generics to allow abstraction over all types, including 
primitives, values, and even void.**
*
Generics are currently limited to abstracting only over reference 
types.  Sometimes this is merely a performance cost (one can always 
appeal to boxing), but in reality this not only increases the cost, but 
decreases the expressiveness, of libraries.

Methods like Arrays.fill() have to be written nine times; this is nine 
methods to write, nine methods to test, and nine methods for the user to 
wade through in the docs.

Real-world libraries like Streams often resort to hand-coded 
specializations like IntStream; not only is this an inconvenience for 
the writer, but was also a significant constraint in the design of the 
stream library, forcing some undesirable tradeoffs. And this approach 
rarely provides total coverage; users complain that we have IntStream 
and LongStream but not CharStream.  (For generic types with multiple 
type variables, like Map, the number of specializations starts to 
explode out of control.)  The functional interfaces introduced in Java 8 
are another consequence of the limitations of generics; because generics 
are boxed and erased, we had to provide a large number of 
hand-specialized versions (and still, not all the ones people want.)  
You don't need Predicate if you have can simply say Function<T, boolean> 
and not suffer boxing or erasure.

Everyone would be better off if we could write a generic class or method 
once -- and abstract over all the possible data types, not just 
reference types.  (This includes not only primitives and values, but 
also void.  Treating void uniformly is no mere "abstract all the 
things"; HashSet is based on HashMap, for implementation convenience, 
but suffers a lot of wasted space as a result; we could use a HashMap<T, 
void>.  And the XxxConsumer functional interfaces are really just 
Function<T, void> -- we don't need separate abstractions here.)

Being able to write things once -- rather than having an ad-hoc 
explosion of types and implementations (which often propagates into 
further explosion at the client site) -- means simpler, more expressive, 
more regular, more testable, more composible libraries.  Without giving 
up performance when dealing with primitives and values, as boxing does 
today.

Which is to say, again, just at the data-abstraction layer instead of 
the data-modeling layer: we need not force users to choose between 
abstraction/encapsulation/safety and performance.

*3. **Enable existing libraries -- especially the JDK -- to compatibly 
evolve to fully take advantage of these features.**
*
The breadth and quality of Java libraries is one of the core assets of 
the Java ecosystem.  We don't want people to have to replace their 
libraries to migrate to a value-ful world; nor do we want existing 
libraries to be "frozen in time." (Imagine if we did lambdas and 
streams, but didn't do default methods, that allowed the Collection 
classes to evolve to take advantage of them -- Collections would have 
instantly looked ten years older.) There should be a straightforward 
path to extending existing libraries -- especially core JDK libraries -- 
to supporting values and enhanced generics, in a way that makes them 
"look built in".  This may require additional linguistic tools to allow 
libraries to evolve while providing compatibility with older clients and 
subclasses that have not yet migrated.

Just providing these features for new libraries is not enough; if widely 
used libraries can't compatibly evolve to take advantage of this new 
world, they are effectively consigned to a slow death. This might be OK 
for some classes -- deprecating OldX and replacing it with NewX -- but 
only when uses of the OldX types aren't strewn through lots of other 
libraries.  That rules out starting fresh with Collections, Streams, 
JSR-310, and many other abstractions, without rewriting the whole JDK 
(and many third party libraries).  So instead, we have to provide a 
compatible path for such libraries (and their clients) to modernize.

So, to summarize: Valhalla may be motivated by performance 
considerations, but a better way to view it as enhancing abstraction, 
encapsulation, safety, expressiveness, and maintainability -- /without/ 
giving up performance.