From brian.goetz at oracle.com Fri Oct 14 19:23:21 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Oct 2016 15:23:21 -0400 Subject: Project Valhalla: Goals Message-ID: <9dbb7b64-ac60-9357-2f02-5f5a5d657e24@oracle.com> I've heard a number of people describe Valhalla recently as being "primarily about performance". While it is understandable why people might come to that conclusion -- many of the motivations for Valhalla are, in fact, rooted in performance considerations -- this characterization misses something very important. Yes, performance is an important part of the story -- but so are safety, abstraction, encapsulation, expressiveness, maintainability, and compatible library evolution. The major goals of Valhalla are: - Align JVM memory layout behavior with the cost model of modern hardware; - Extend generics to allow abstraction over all types, including primitives, values, and even void; - Enable existing libraries -- especially the JDK -- to compatibly evolve to fully take advantage of these features. Let's take these in turn. *1. **Align JVM memory layout behavior with the cost model of modern hardware. * Java's approach of "(almost) everything is an object" was a reasonable match for the hardware and compilation technology of the mid-nineties, when the cost of a memory fetch and an arithmetic operation were of comparable magnitude. But since then, these costs have diverged by a factor of several hundred. At the same time, memory costs have come to dominate application provisioning costs. These two considerations combine to make the graph-of-small-objects data representation, which typical Java programs result in, suboptimal in both program performance and provisioning cost. The root cause of this is a partly accidental one: object identity. Every object has an identity, but not all objects /need/ an identity -- many objects represent values, such as decimal numbers, currency amounts, cursors, or dates and times, and do not need this identity. This need to preserve identity foils many otherwise powerful optimizations. Our solution for this is /value types/; value types are aggregates, like traditional Java classes, that renounce their identity. In return, this enables us to create data structures that are /flatter/ (because values can be inlined into objects, arrays, and other values, just as primitives are today) and /denser/ (because we don't waste space on object headers and pointers, which can increase memory usage by up to 4x), with a programming model more like objects -- supporting nominal substructure, behavior, subtyping, and encapsulation. /Codes like a class, works like an int./ If you view values as being "faster objects", you could indeed view this fundamental aspect of Valhalla as being primarily about efficiency. But equally, you could view them as "programmable primitives", in which case it also becomes about better abstraction, encapsulation, readability, maintainability, and type safety. Which is the real point -- that we need not force users to choose between abstraction/encapsulation/safety and performance. We can have both. Whether you call that "cheaper objects" or "richer primitives", the end result is the same. *2. **E**xtend generics to allow abstraction over all types, including primitives, values, and even void.** * Generics are currently limited to abstracting only over reference types. Sometimes this is merely a performance cost (one can always appeal to boxing), but in reality this not only increases the cost, but decreases the expressiveness, of libraries. Methods like Arrays.fill() have to be written nine times; this is nine methods to write, nine methods to test, and nine methods for the user to wade through in the docs. Real-world libraries like Streams often resort to hand-coded specializations like IntStream; not only is this an inconvenience for the writer, but was also a significant constraint in the design of the stream library, forcing some undesirable tradeoffs. And this approach rarely provides total coverage; users complain that we have IntStream and LongStream but not CharStream. (For generic types with multiple type variables, like Map, the number of specializations starts to explode out of control.) The functional interfaces introduced in Java 8 are another consequence of the limitations of generics; because generics are boxed and erased, we had to provide a large number of hand-specialized versions (and still, not all the ones people want.) You don't need Predicate if you have can simply say Function and not suffer boxing or erasure. Everyone would be better off if we could write a generic class or method once -- and abstract over all the possible data types, not just reference types. (This includes not only primitives and values, but also void. Treating void uniformly is no mere "abstract all the things"; HashSet is based on HashMap, for implementation convenience, but suffers a lot of wasted space as a result; we could use a HashMap. And the XxxConsumer functional interfaces are really just Function -- we don't need separate abstractions here.) Being able to write things once -- rather than having an ad-hoc explosion of types and implementations (which often propagates into further explosion at the client site) -- means simpler, more expressive, more regular, more testable, more composible libraries. Without giving up performance when dealing with primitives and values, as boxing does today. Which is to say, again, just at the data-abstraction layer instead of the data-modeling layer: we need not force users to choose between abstraction/encapsulation/safety and performance. *3. **Enable existing libraries -- especially the JDK -- to compatibly evolve to fully take advantage of these features.** * The breadth and quality of Java libraries is one of the core assets of the Java ecosystem. We don't want people to have to replace their libraries to migrate to a value-ful world; nor do we want existing libraries to be "frozen in time." (Imagine if we did lambdas and streams, but didn't do default methods, that allowed the Collection classes to evolve to take advantage of them -- Collections would have instantly looked ten years older.) There should be a straightforward path to extending existing libraries -- especially core JDK libraries -- to supporting values and enhanced generics, in a way that makes them "look built in". This may require additional linguistic tools to allow libraries to evolve while providing compatibility with older clients and subclasses that have not yet migrated. Just providing these features for new libraries is not enough; if widely used libraries can't compatibly evolve to take advantage of this new world, they are effectively consigned to a slow death. This might be OK for some classes -- deprecating OldX and replacing it with NewX -- but only when uses of the OldX types aren't strewn through lots of other libraries. That rules out starting fresh with Collections, Streams, JSR-310, and many other abstractions, without rewriting the whole JDK (and many third party libraries). So instead, we have to provide a compatible path for such libraries (and their clients) to modernize. So, to summarize: Valhalla may be motivated by performance considerations, but a better way to view it as enhancing abstraction, encapsulation, safety, expressiveness, and maintainability -- /without/ giving up performance.