Towards a plan for L10 and L20

Sat Mar 9 16:56:07 UTC 2019

I’ve been pulling together a rough draft of high-level requirements for the L10 and L20 milestones.  These are mostly through the lens of the programming model, but they indirectly affect VM requirements (usually in obvious ways.)  

(Note to observers: this is mostly capturing the active plan, rather than the rationale and justification for these decisions.  These will follow in a separate document.)  

# Towards requirements for L10 and L20 milestones

The last year has been a tremendous one for Project Valhalla; the
L-World prototype has proven more successful than we could have hoped,
and building on the success of L-World, the end-to-end plan for
specialized generics with gradual migration compatibility has started
to come into full view.  It's time to put a stake in the ground and
talk about some deliverables.

As L-World started to look like it was going to succeed, we identified
three buckets, into which features could be sorted, as a structuring
mechanism:

  - L1 -- First usable prototype of L-World; now delivered
  - L10 -- First preview-able milestone; this would include the
    ability to declare value classes which can be flattened into
    arrays and objects,  and instantiate _erased_ generics over values
  - L100 -- Specializable generics over values, and migrating key generic
    classes (e.g., Collections and Streams)

These constituted not so much a plan, as a recognition that there were
several significant phases ahead of us.  Let's put some meat on these
phases, and carve things a little finer.  

## L10 -- First previewable delivery

Nontrivial language and JVM features typically go through a round of
[_preview_][jep12], in which they are part of the specification and
implementation but behind a flag and with the possibility of
refinement before they become a permanent part of the platform.  For
a feature as significant as value types, we will surely want to
Preview it before finalization.

A preview feature should be complete; in the context of Valhalla, this
means that a complete and self-consistent programming model is needed;
features should not look like they are "bolted on".  

The main feature of L10 is the ability to declare and use value
classes; we can think of the remaining features as the "closure" of
adding this feature to Java, which is to say, all the other things we
have to add in to arrive at a sensible, self-consistent,
understandable programming model.

First and foremost, a value class is a class, and is declared like
one:

```{.java}
value class Point {
    int x;
    int y;

    Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
}
```

(A modifier is not the only option; this could be indicated by a
choice of supertype as well; syntax is all TBD.)   

Value types are non-nullable; each declared value class `V` gives rise
to two types: `V`, and its nullable counterpart, `V?`.  The value set
of `V?` is the union of the value set of `V`, with the singleton set
`{ null }`.  For a value class `V`, we get the following subtype
relationships:

    V <: V? <: ValObject <: Object

In addition, if value class `V` implements interface `I`, then `V <:
I`.

Value classes have some restrictions, which are enforced by both the
compiler and runtime:

  - They cannot extend any other class
  - They are implicitly final
  - Their fields are implicitly final
  - Their instances are are non-nullable
  - They cannot be synchronized on (as well as other
    identity-sensitive operations, such as Object.wait())

The type `V` is translated as `QV;`; the type `V?` is translated as
`LV;`.

#### Object model

We add two new "top" types to the object model; `java.lang.RefObject`
and `java.lang.ValObject` (names to be bikeshod.)  All "regular"
classes ("identity classes") are subtypes of `RefObject`; all value
types are subtypes of `ValObject`.  (Interfaces can not be subtypes of
either `RefObject` or `ValObject`, but interfaces can be implemented
by both value classes and identity classes.  It is often helpful to
think of `Object` as being an "honorary" interface.)  Whether
`RefObject` is a class or interface is also currently an open issue.

While it is disruptive to retrofit new top types into the hierarchy,
there are several good reasons for doing so.  The first is that the
object hierarchy is a powerful pedagogical tool -- not only is
"everything an object", but "everything is an `Object`."  Users
learning the language, upon learning that there are identity objects
and value objects, will see this division prominently reflected in the
root types of the object hierarchy.

Another reason is that there will likely be behavior which is common
to all identity objects, or to all value objects, and these types
represent a sensible place to declare that behavior, using tools
(static methods, final methods, etc) that users already understand.

The final (and perhaps most important) reason is that it lets us talk
about this important distinction in the language.  Having these as
types means users can dynamically test whether something is a value
object or identity object when they need to:

```
if (x instanceof RefObject) { ... }
if (y instanceof ValObject) { ... }
```

Methods that rely on identity can declare this in their signature:

```
void m(RefObject o) { ... }
```

And generic classes that only make sense to be instantiated with
reference types can do so in the standard way:

```
class Foo<T extends RefObject> { ... }
```

For all the same reasons, we will want to reflect nullability in the
type system (such as with an interface type `Nullable`, which would be
implemented by all identity types, plus nullable value types.)

Unlike primitives, and unlike earlier Vahalla designs, there _are no
box types_.  `V?` is not a box for `V`; boxes serve to connect
non-Object values to `Object`, but values _already_ are `Object`s.

#### Intrinsic operations

Being objects, value types inherit all the members of `Object`, and we
must provide sensible default behaviors for them.  For identity
objects, the default behavior for `equals()`, `hashCode()`, and
`toString()` are identity-based (identity equality, identity hash
code,  and the name of the class appended with the identity hash
code); for value objects, they should be state-based.

We define a relation over all values (identity classes, value classes,
and primitives), called _substitutability_, as follows:

  - Two identity instances are substitutable if they refer to the same
    object.
  - Two primitives are substitutable if they are `==` (modulo special
    pleading for `NaN`, as per`Float::equals` and `Double::equals`).
  - Two value instances `a` and `b` are substitutable if they are of
    the same type, and for each of the fields `f` of that type, `a.f`
    and `b.f` are substitutable.

We then say that for any two values, `a == b` iff a and b are
substitutable.  The default implementation of `Object::equals` for
value classes implements `a == b`, as it does for identity classes.

Similarly, we define a total _substitutability hash code_ function, as
follows:

   - For an identity instance, it is the value of
     `System::identityHashCode`;
   - For a primitive, it is the value of the `hashCode` method of the
     corresponding wrapper type;
   - For a value, it is constructed deterministically from the
     substitutability hash codes of the value's fields.

The method `System::identityHashCode` should return the
substitutability hash code for value arguments; as for reference
classes, the default `Object::hashCode` for value classes also returns
the substitutability hash code.  (If we were starting clean, we might
prefer separate API points for identity and substitutability, and
then a merged API point.)

Certain operations that are nominally allowable on all objects are
forbidden (and result in runtime exceptions): synchronization,
and `Object::wait` and friends.  

It is an open question what we should do for weak references to values
that contain references to identity objects; perhaps weak references
are restricted to `RefObject`.

Values are instantiated with instance creation expressions: `new
V(...)` (though such expressions are not necessarily translated in the
same way as for identity classes.)  Value classes have constructors,
and these constructors are written like constructors for reference
types.  Because all fields are final, they must initialize all the
fields of the class.

#### Mirrors and reflection

For each value class `V`, there are two reflection mirrors: a standard
mirror (for `V`), and a nullable mirror (for `V?`).  The latter is
used for reflection over members who use `V?` in their signature; the
method `Object::getClass` returns the standard mirror for all
instances of `V`.  Similarly, mirrors for `V[].class` and `V?[].class`
are needed.  A value class name `V` can appear on the RHS of
`instanceof`; both `V` and `V?` can be used as cast targets.  

We will likely want a reflective method `Class::isValueClass`.
Fields, methods, and constructors for value classes can be reflected
using existing abstractions.

#### Arrays

Arrays are covariant; if `T <: U`, then `T[] <: U[]`.  The subtyping
relations above therefore give rise to their array counterparts:

    V[] <: V?[] <: ValObject[] <: Object[]    

The array type `V[]` is translated as `[QV;`; the array type `V?[]`
is translated as `[LV;`.

#### Nullable types

Nullable value types (`V?`) are nullable in the same sense that all
reference types are -- `null` is a member of their value set, but the
dereference operators can throw NPE when applied to a null operand.
(We realize there is likely to be a highly-vocal constituency who are
really hoping that null-freedom would be enforced by the compiler (and
therefore that we'd introduce null-safe operators such as `?.`)  Our
decision here is not out of ignorance that this is potentially
desirable, nor out of ignorance that doing it this way makes it even
harder to achieve the null-safe nirvana that such users long for.)

#### Serialization

Value classes are classes; to not be able to opt into serialization
like other classes would be a significant irregularity.  However, many
of the serialization mechanisms (like `readObject`) depend on
mutatation; additional mechanisms for safely serializing value classes
may be needed.  This is an open issue.  

#### Values and generics in L10

With respect to generics, there are two undesirable fates that L10
must steer clear of.  We know that specialized generics are coming,
and L10 embodies a deliberate choice to ship values before we ship
specializable generics.  One wrong move would be to simply ban the use
of generics over values; this would be a huge loss for reuse, as there
are so many useful, well-tested, well-understood generic libraries out
there.  The other wrong move would be to interpret `Foo<V>` as an
erased instantiation of `Foo`.  For erased type parameters, `null` is
always considered to be a member of the value set, which means that we
might get unexpected NPEs when generic code puts a `null` where it is
within its rights to do so.  (In the worst case, methods that use
`null` as a sentinel, like `Map::get`, become unusable.)

What we will do is allow the instantiation of erased generics with
_nullable_ values; we can say `Foo<V?>`, but not `Foo<V>` -- just as
we can say `Foo<Integer>` but not `Foo<int>` today.  (This leaves us
free to assign a meaning to `Foo<V>` later for specializable
generics.)  So users can declare value classes, and generify over
their nullable counterparts, with erasure, and later, they'll be able
to generify over the value itself, with specialization.  One can think
of this as all type variables of erased generic classes (which is all
generic classes, today) as having an implicit bound `T extends
Nullable`.

For any type `T`, the expression `T.default` evaluates to the default
value for type `T` -- the value initially held by fields or array
elements.  For identity classes, this is `null`; for value classes,
this is one where all fields have their default (zero) value.  The
locution `T.default` can be used both for concrete types `T` and for
type variables (for type variables in erased generic classes, this is
equivalent to `null`.)

## L20 -- Migration support for value-based classes

The next sensible milestone after L10 adds one new feature: the
ability to migrate existing [value-based classes][vbc] to value types.

While theoretically we could merge this into L10, the reason to
separate them is that L10 is useful to a variety of use cases
(numeric-intensive code, machine learning, optimized data structures)
who have no immediate need for migration, and we don't want to delay
the delivery of L10 (and the critical feedback that will come with
broad distribution) for the sake of optimizing some JDK classes.

Value instances, like all other values, are initialized with an
all-zero value (null, zero, false, etc.)  However, for some value
classes,  the all-zero value is not a natural member of the domain,
and asking class implementations to deal with it is likely to be a
sharp edge.  

For these types -- and also for value types migrated from value-based
classes -- we introduce a new mechanism: _null-default value classes_.
This is a value class whose default all-zero bit value is interpreted
as `null`, rather than one whose fields all hold their default value.
(The opposite of null-default is _zero-default_.)  A null-default
value class is declared with the `null-default` modifier, and is a
nullable type (implicitly implements `Nullable`):

```{.java}
null-default value class Person {
    String first;
    String last;
}
```

A `null-default` value class _is implicitly zero-hostile_; if the
state on exit from the constructor has zeros for all fields, an
exception is thrown.  Classes that are intended to be compatibly
migrated from value-based classes to value classes must be declared
`null-default` (and therefore their implementations must conform to
the zero-hostility requirements).  Inner value classes are implicitly
`null-default`.  For a null-default type, `T.default` evaluates to
`null`.  For a null-default value type `T`, `T?` denotes the same type
as `T`.

#### Translation

To ease migration compatibility, we adopt a hybrid translation
strategy for null-default value classes.  When a null-default value
class appears in a _method descriptor_, we translate it with an `L`
descriptor; only when it appears in a _field descriptor_ do we use the
more precise `Q` descriptor.  This is a trade-off; using slightly
looser types in method descriptors may give up some calling-convention
optimizations, but allows us to compatibly migrate classes like
`LocalDateTime`.  (This strategy of "loose types on the stack, sharp
types on the heap" will show up again later when we get to migration
of erased generics to specialized.)

#### Field linkage

Because we use the sharper types in field descriptors, it is possible
that existing code will have `Constant_FieldRef_info` for a migrated
field that refers to the field by `L` descriptor, but the descriptor
in the target class has been migrated to `Q`.  Link resolution of
field bytecodes is adjusted to paper over this potential mismatch.

#### Null-default value types and erased generics

Null-default value types are nullable, and so can be used to
instantiate erased generic classes (unlike zero-default value classes,
which require an explicit indication of nullability at the use site).
For a migrated type `T`, there will likely be existing code that uses
generics such as `List<T>`; when these types are migrated to
null-default value classes, these locutions continue to be valid (and
continue to mean the same thing -- erased instantiation of `List` with
`T`).

#### Library support

As part of this milestone, we should expect to migrate classes such as
`Optional`, `LocalDateTime`, and other suitable value-based classes.