Collapsing the requirements

Sun Aug 4 20:06:26 UTC 2019

----- Mail original -----
> De: "Remi Forax" <forax at univ-mlv.fr>
> À: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Samedi 3 Août 2019 19:48:01
> Objet: Re: Collapsing the requirements

> ----- Mail original -----
>> De: "Brian Goetz" <brian.goetz at oracle.com>
>> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
>> Envoyé: Samedi 3 Août 2019 18:37:56
>> Objet: Collapsing the requirements
> 
>> As Remi noted, we had some good discussions at JVMLS this week.  Combining that
>> with some discussions John and I have been having over the past few weeks, I
>> think the stars are aligning to enable us to dramatically slim down the
>> requirements.  The following threads have been in play for a while:
>> 
>> - John: I hate the LPoint/QPoint distinction
>> - Brian: I hate null-default types
>> - Remi: I hate the V? type
>> 
>> But the argument for each of these depended, in some way, on the others.  I
>> believe, with a few compromises, we can now prune them as a group, which would
>> bring us to a much lower energy state.
>> 
>> ## L^Q World — Goodbye `LV;`
>> 
>> We’ve taken it as a requirement that for a value type V, we have to support both
>> LV and QV, where LV is the null-adjunction of QV.  This has led to a lot of
>> complexity in the runtime, where we have to manage dual mirrors.
>> 
>> The main reason why we wanted LV was to support in-place migration.  (In
>> Q-world, LV was the box for QV, so it was natural for migration.)  But, as
>> we’ve worked our migration story, we’ve discovered we may not need LV for
>> migration. And if we don’t, we surely don’t need it for anything else;
>> worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we’re
>> worried about erasure and overloading, to something like `LObject//V` using
>> John’s type-operator notation.)
>> 
>> Assuming we can restructure the migration story to not require LV to represent a
>> VM-generated “box" — which I believe we can, see below — we can drop the
>> requirement for LV.  An inline class V gives rise to a single type descriptor,
>> QV (or whatever we decide to call it; John may have plans here.)
>> 
>> ## Goodbye `V?`
>> 
>> The other reason we wanted LV was that it was the obvious representation for the
>> language type `V?` (V adjoined with null.)  Uses for `V?` include:
>> 
>> - Denoting non-flattened value fields;
>> - Denoting non-flattened value arrays;
>> - Denoting erased generics over values (`Foo<V?>`);
>> - Denoting the type that is the adjunction of null to V (V | Null), when we
>> really want to talk about nullability.
>> 
>> But, we can do all this without a `V?` type; for every V, there is already at
>> least one super type of V that includes `V|Null` — Object, and any interface
>> implemented by V.  If we arrange that every value type V has a super type V’,
>> not implemented by any other type — then the value set of this V’ is exactly
>> that of `V?`.  And we can use V’ to do all the things `V?` did with respect to
>> V — including sub typing.  The language doesn’t need the `?` type operator, it
>> just needs to ensure that V’ always exists.  Which turns out to be easy, and
>> also turns out to be essential to the migration story.
>> 
>> #### Eclairs
>> 
>> We can formalize this by requiring that every value type have a companion
>> interface (or abstract class) supertype.  Define an envelope-class pair
>> (“eclair”) as a pair (V, I) such that:
>> 
>> - V is an inline class
>> - I is a sealed type
>> - I permits V (and only V)
>> - V <: I
>> 
>> (We can define eclairs for indirect classes, but they are less interesting —
>> because indirect classes already contain null.)
>> 
>> If every value type be a member of an eclair, we  can use V when we want the
>> flattenable, non-nullable, specializable type; and we use I when we want the
>> non-flattenable, nullable, erased “box”.  We don’t need to denote `V?`; we can
>> just use I, which is an ordinary, nominal type.
>> 
>> Note that the VM can optimize eclairs about as well as it could for LV; it knows
>> that I is the adjunction of null to V, so that all non-null values of I are
>> identity free and must be of type V.
>> 
>> What we lose relative to V? is access to fields; it was possible to do
>> `getfield` on a LV, but not on I.  If this is important (and maybe it’s not),
>> we can handle this in other ways.
>> 
>> #### With sugar on top, please
>> 
>> We can provide syntax sugar (please, let’s not bike shed it now) so that an
>> inline clause _automatically_ acquires a corresponding interface (if one is not
>> explicitly provided), onto which the public members (and type variables, and
>> other super types) of C are lifted.  For sake of exposition, let’s say this is
>> called `C.Box` — and is a legitimate inner class of C (which can be generated
>> by the compiler as an ordinary classfile.)  We’ve been here before, and
>> abandoned it because “Box” seemed misleading, but let’s call it that for now.
>> And now it is a real nominal type, not a fake type.  In the simplest case,
>> merely declaring an inline class could give rise to V.Box.
>> 
>> Now, the type formerly known as `V?` is an ordinary, nominal interface (or
>> abstract class) type.  The user can say what they mean, and no magic is needed
>> by either the language or the VM.  Goodbye `V?`.
>> 
>> #### Boxing conversion
>> 
>> Given the constraints of the eclair relationship, it would be reasonable for the
>> compiler to derive from this that there is a boxing conversion between C and I
>> (I is just the value set of C, plus null — which is the relationship boxes have
>> with their corresponding primitives.)  The boxing operation is a no-op (since C
>> <: I) and the unboxing operation is a null checking cast.
>> 
>> #### Erased generics
>> 
>> Using the eclair wrapper also kicks the problem of erased generics down the
>> road; if we use `Foo<I>` for erased generics, and temporarily ban `Foo<V>`,
>> when we get to specialized generics, it will be obvious what `Foo<V>` means
>> (their common super type will be `Foo<? extends I>`).  This is a less confusing
>> world, as then “List of erased V” and “List of specialized V” don’t coexist;
>> there’s only “List of V” and “List of V’s Box”.
>> 
>> ## Migration
>> 
>> The ability to migrate Optional and friends to values has been an important
>> goal, but it has been the source of significant complexity.  Our previous story
>> leaned hard on “When we migrate X to a value, LX will describe the box, so old
>> callsites will continue to link.”  But it turned out that brought a lot of
>> baggage (forwarding bridges, null-default values) and compromises (null-default
>> values lose their calling-convention optimizations), and over the past few
>> weeks John and I have been cooking up a simpler eclair-based recipe for this.
>> 
>> The world is indeed full of existing utterances of `LOptional`, and they will
>> still want to work.  Fortunately, Optional follows the rules for being a
>> value-based class.  We start with migrating Optional from a reference class to
>> an eclair with a public abstract class and a private value implementation.
>> Now, existing code just works (source and binary) — and optionals are values.
>> But, this isn’t good enough; existing variables of type Optional are not
>> flattened.
>> 
>> One of the objections raised to in-place migration was nullity; in order to
>> migrate Optional to a true value, it would have to be a null-default value, and
>> this already entailed compromises.  If we’re willing to compromise further, we
>> can get what we want without the baggage.  And that compromises is: give up the
>> name.
>> 
>> So we define a new public value class `Opt<T>` which is the value half of the
>> eclair, and the existing Optional is the interface/abstract class half.  Now,
>> existing fields / arrays can migrate gradually to Opt, as they want the benefit
>> of flattening; existing APIs can continue to truck in Optional (which have
>> about the same optimizations as a null-default value would have on the stack.)
>> 
>> This works because of the boxing conversion.  Suppose we have old code that
>> does:
>> 
>>    Optional o = makeAnOptional()
>> 
>> when the user changes this to
>> 
>>    Opt o = …
>> 
>> the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a
>> boxing conversion between them, so we insert an unbox conversion (null check)
>> and we’re done.  Users can migrate their fields gradually.  The cost: the good
>> name gets burned.  But there is a compatible migration path from ref to value.
>> 
>> Later, when we have bridges (we don’t need them yet!), we can migrate the
>> library uses from Optional to Opt.
>> 
>> ## Null-default values
>> 
>> About 75% of the motivation for null-default values — another huge source of
>> complexity — was to support the migration of value-based classes.  And it
>> wasn’t even a great solution — because we still lost some key optimizations
>> (e.g., calling conventions.)   With the Optional -> Opt path, we don’t need
>> null-default values, we get ordinary values.  So while we pay the cost of
>> changing the name, we gain the benefit that the new values, once the full
>> migration is effected, we don’t carry the legacy performance baggage.
>> 
>> Another 20% of the motivation was for security-sensitive classes whose default
>> value did not represent a useful value, for which we wanted not
>> null-default-ness but really initialization safety.  Let’s look at another way
>> to get there.
>> 
>> There are a few ways to get there.  One is to treat this problem as protecting
>> such classes from uninitialized fields or array elements; another is to ensure
>> that such classes (a) have no public fields and (b) perform the correct check
>> at the top of each method (which can be injected by the compiler.)  I don’t
>> want to solve that problem right here, but I think there enough ways to get
>> there that we can assume this isn’t a hard requirement.
>> 
>> The other 5% was just the user-based “I want null in my value set.”  For those,
>> we can tell users: use the interface box when you need null.
>> 
>> ## Summary
>> 
>> In one swoop, we can banish LV from the VM,  V? from the language, and
>> null-default values, by making a simple requirement: every value type is paired
>> with an interface or abstract class “box”.  For most values, this can be
>> automatically generated by the compiler and denoted via a well-known name
>> (e.g., V.Box); for some values, such as those that are migrated from reference
>> types, we can explicitly declare the box type and pick explicit names for both
>> types.
>> 
>> There’s a lot to work out, but I think it should be clear enough that this is a
>> much, much lower energy state than what we were aiming at for L10, and also a
>> simpler user model.
>> 
>> Let’s focus discussions on validating the model first before we dive into
>> mechanism or surface syntax.
> 
> Trying to implement the Eclair interface by hand,
> it seems we need to have the method of the interface and the one of the
> implementation to use covariant return types,
> the box version retuning a box while the inline class version returning the
> inline class (which is fine because it's a subtype),
> otherwise when you call a method of the inline class the result is the box so
> you are loosing the non-null property when chaining calls.

so depending on
- if you want to 'emulate' a value based class, in that case the eclair is by example Optional and the inline class can have a specific name
- if you want an inline class and an eclair only for interacting with erased generics, like Complex and Complex.box.
- if the inline class use co-variant return types.

so no good solution that will fit them all, which suggests that we should not provide any special compiler support.

Rémi