Collapsing the requirements

Remi Forax forax at univ-mlv.fr
Sat Aug 3 17:48:01 UTC 2019


----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Samedi 3 Août 2019 18:37:56
> Objet: Collapsing the requirements

> As Remi noted, we had some good discussions at JVMLS this week.  Combining that
> with some discussions John and I have been having over the past few weeks, I
> think the stars are aligning to enable us to dramatically slim down the
> requirements.  The following threads have been in play for a while:
> 
> - John: I hate the LPoint/QPoint distinction
> - Brian: I hate null-default types
> - Remi: I hate the V? type
> 
> But the argument for each of these depended, in some way, on the others.  I
> believe, with a few compromises, we can now prune them as a group, which would
> bring us to a much lower energy state.
> 
> ## L^Q World — Goodbye `LV;`
> 
> We’ve taken it as a requirement that for a value type V, we have to support both
> LV and QV, where LV is the null-adjunction of QV.  This has led to a lot of
> complexity in the runtime, where we have to manage dual mirrors.
> 
> The main reason why we wanted LV was to support in-place migration.  (In
> Q-world, LV was the box for QV, so it was natural for migration.)  But, as
> we’ve worked our migration story, we’ve discovered we may not need LV for
> migration. And if we don’t, we surely don’t need it for anything else;
> worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we’re
> worried about erasure and overloading, to something like `LObject//V` using
> John’s type-operator notation.)
> 
> Assuming we can restructure the migration story to not require LV to represent a
> VM-generated “box" — which I believe we can, see below — we can drop the
> requirement for LV.  An inline class V gives rise to a single type descriptor,
> QV (or whatever we decide to call it; John may have plans here.)
> 
> ## Goodbye `V?`
> 
> The other reason we wanted LV was that it was the obvious representation for the
> language type `V?` (V adjoined with null.)  Uses for `V?` include:
> 
> - Denoting non-flattened value fields;
> - Denoting non-flattened value arrays;
> - Denoting erased generics over values (`Foo<V?>`);
> - Denoting the type that is the adjunction of null to V (V | Null), when we
> really want to talk about nullability.
> 
> But, we can do all this without a `V?` type; for every V, there is already at
> least one super type of V that includes `V|Null` — Object, and any interface
> implemented by V.  If we arrange that every value type V has a super type V’,
> not implemented by any other type — then the value set of this V’ is exactly
> that of `V?`.  And we can use V’ to do all the things `V?` did with respect to
> V — including sub typing.  The language doesn’t need the `?` type operator, it
> just needs to ensure that V’ always exists.  Which turns out to be easy, and
> also turns out to be essential to the migration story.
> 
> #### Eclairs
> 
> We can formalize this by requiring that every value type have a companion
> interface (or abstract class) supertype.  Define an envelope-class pair
> (“eclair”) as a pair (V, I) such that:
> 
> - V is an inline class
> - I is a sealed type
> - I permits V (and only V)
> - V <: I
> 
> (We can define eclairs for indirect classes, but they are less interesting —
> because indirect classes already contain null.)
> 
> If every value type be a member of an eclair, we  can use V when we want the
> flattenable, non-nullable, specializable type; and we use I when we want the
> non-flattenable, nullable, erased “box”.  We don’t need to denote `V?`; we can
> just use I, which is an ordinary, nominal type.
> 
> Note that the VM can optimize eclairs about as well as it could for LV; it knows
> that I is the adjunction of null to V, so that all non-null values of I are
> identity free and must be of type V.
> 
> What we lose relative to V? is access to fields; it was possible to do
> `getfield` on a LV, but not on I.  If this is important (and maybe it’s not),
> we can handle this in other ways.
> 
> #### With sugar on top, please
> 
> We can provide syntax sugar (please, let’s not bike shed it now) so that an
> inline clause _automatically_ acquires a corresponding interface (if one is not
> explicitly provided), onto which the public members (and type variables, and
> other super types) of C are lifted.  For sake of exposition, let’s say this is
> called `C.Box` — and is a legitimate inner class of C (which can be generated
> by the compiler as an ordinary classfile.)  We’ve been here before, and
> abandoned it because “Box” seemed misleading, but let’s call it that for now.
> And now it is a real nominal type, not a fake type.  In the simplest case,
> merely declaring an inline class could give rise to V.Box.
> 
> Now, the type formerly known as `V?` is an ordinary, nominal interface (or
> abstract class) type.  The user can say what they mean, and no magic is needed
> by either the language or the VM.  Goodbye `V?`.
> 
> #### Boxing conversion
> 
> Given the constraints of the eclair relationship, it would be reasonable for the
> compiler to derive from this that there is a boxing conversion between C and I
> (I is just the value set of C, plus null — which is the relationship boxes have
> with their corresponding primitives.)  The boxing operation is a no-op (since C
> <: I) and the unboxing operation is a null checking cast.
> 
> #### Erased generics
> 
> Using the eclair wrapper also kicks the problem of erased generics down the
> road; if we use `Foo<I>` for erased generics, and temporarily ban `Foo<V>`,
> when we get to specialized generics, it will be obvious what `Foo<V>` means
> (their common super type will be `Foo<? extends I>`).  This is a less confusing
> world, as then “List of erased V” and “List of specialized V” don’t coexist;
> there’s only “List of V” and “List of V’s Box”.
> 
> ## Migration
> 
> The ability to migrate Optional and friends to values has been an important
> goal, but it has been the source of significant complexity.  Our previous story
> leaned hard on “When we migrate X to a value, LX will describe the box, so old
> callsites will continue to link.”  But it turned out that brought a lot of
> baggage (forwarding bridges, null-default values) and compromises (null-default
> values lose their calling-convention optimizations), and over the past few
> weeks John and I have been cooking up a simpler eclair-based recipe for this.
> 
> The world is indeed full of existing utterances of `LOptional`, and they will
> still want to work.  Fortunately, Optional follows the rules for being a
> value-based class.  We start with migrating Optional from a reference class to
> an eclair with a public abstract class and a private value implementation.
> Now, existing code just works (source and binary) — and optionals are values.
> But, this isn’t good enough; existing variables of type Optional are not
> flattened.
> 
> One of the objections raised to in-place migration was nullity; in order to
> migrate Optional to a true value, it would have to be a null-default value, and
> this already entailed compromises.  If we’re willing to compromise further, we
> can get what we want without the baggage.  And that compromises is: give up the
> name.
> 
> So we define a new public value class `Opt<T>` which is the value half of the
> eclair, and the existing Optional is the interface/abstract class half.  Now,
> existing fields / arrays can migrate gradually to Opt, as they want the benefit
> of flattening; existing APIs can continue to truck in Optional (which have
> about the same optimizations as a null-default value would have on the stack.)
> 
> This works because of the boxing conversion.  Suppose we have old code that
> does:
> 
>    Optional o = makeAnOptional()
> 
> when the user changes this to
> 
>    Opt o = …
> 
> the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a
> boxing conversion between them, so we insert an unbox conversion (null check)
> and we’re done.  Users can migrate their fields gradually.  The cost: the good
> name gets burned.  But there is a compatible migration path from ref to value.
> 
> Later, when we have bridges (we don’t need them yet!), we can migrate the
> library uses from Optional to Opt.
> 
> ## Null-default values
> 
> About 75% of the motivation for null-default values — another huge source of
> complexity — was to support the migration of value-based classes.  And it
> wasn’t even a great solution — because we still lost some key optimizations
> (e.g., calling conventions.)   With the Optional -> Opt path, we don’t need
> null-default values, we get ordinary values.  So while we pay the cost of
> changing the name, we gain the benefit that the new values, once the full
> migration is effected, we don’t carry the legacy performance baggage.
> 
> Another 20% of the motivation was for security-sensitive classes whose default
> value did not represent a useful value, for which we wanted not
> null-default-ness but really initialization safety.  Let’s look at another way
> to get there.
> 
> There are a few ways to get there.  One is to treat this problem as protecting
> such classes from uninitialized fields or array elements; another is to ensure
> that such classes (a) have no public fields and (b) perform the correct check
> at the top of each method (which can be injected by the compiler.)  I don’t
> want to solve that problem right here, but I think there enough ways to get
> there that we can assume this isn’t a hard requirement.
> 
> The other 5% was just the user-based “I want null in my value set.”  For those,
> we can tell users: use the interface box when you need null.
> 
> ## Summary
> 
> In one swoop, we can banish LV from the VM,  V? from the language, and
> null-default values, by making a simple requirement: every value type is paired
> with an interface or abstract class “box”.  For most values, this can be
> automatically generated by the compiler and denoted via a well-known name
> (e.g., V.Box); for some values, such as those that are migrated from reference
> types, we can explicitly declare the box type and pick explicit names for both
> types.
> 
> There’s a lot to work out, but I think it should be clear enough that this is a
> much, much lower energy state than what we were aiming at for L10, and also a
> simpler user model.
> 
> Let’s focus discussions on validating the model first before we dive into
> mechanism or surface syntax.

Trying to implement the Eclair interface by hand,
it seems we need to have the method of the interface and the one of the implementation to use covariant return types,
the box version retuning a box while the inline class version returning the inline class (which is fine because it's a subtype),
otherwise when you call a method of the inline class the result is the box so you are loosing the non-null property when chaining calls.

with Option the inline class and Option.Box the eclair interface,
  Option.Box<String> box = Option.Box.of("foo");
  box.filter() should return an Option.Box<String>
while
  Option<String> option =
  option.filter() should return an Option<String>

if i'm not wrong about the need of covariant return types, it seems to suggest that we need a desugaring mechanism to at least take care of the covariant return type automatically. 

Here is an example, but in reverse order, with OptionEclair the eclair and OptionEclair.val being the inline type
  https://github.com/forax/valuetype-lworld/blob/master/src/main/java/fr.umlv.valuetype/fr/umlv/valuetype/OptionEclair.java

A quick run of a benchmark seems to indicate that if the code use the eclair on stack, it's as fast as using an inline class when the code is fully inlined.
  https://github.com/forax/valuetype-lworld/blob/master/src/test/java/fr.umlv.valuetype/fr/umlv/valuetype/perf/OptionBenchMark.java#L67

Rémi








More information about the valhalla-spec-observers mailing list