Collapsing the requirements
Remi Forax
forax at univ-mlv.fr
Sun Aug 4 20:06:26 UTC 2019
----- Mail original -----
> De: "Remi Forax" <forax at univ-mlv.fr>
> À: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Samedi 3 Août 2019 19:48:01
> Objet: Re: Collapsing the requirements
> ----- Mail original -----
>> De: "Brian Goetz" <brian.goetz at oracle.com>
>> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
>> Envoyé: Samedi 3 Août 2019 18:37:56
>> Objet: Collapsing the requirements
>
>> As Remi noted, we had some good discussions at JVMLS this week. Combining that
>> with some discussions John and I have been having over the past few weeks, I
>> think the stars are aligning to enable us to dramatically slim down the
>> requirements. The following threads have been in play for a while:
>>
>> - John: I hate the LPoint/QPoint distinction
>> - Brian: I hate null-default types
>> - Remi: I hate the V? type
>>
>> But the argument for each of these depended, in some way, on the others. I
>> believe, with a few compromises, we can now prune them as a group, which would
>> bring us to a much lower energy state.
>>
>> ## L^Q World — Goodbye `LV;`
>>
>> We’ve taken it as a requirement that for a value type V, we have to support both
>> LV and QV, where LV is the null-adjunction of QV. This has led to a lot of
>> complexity in the runtime, where we have to manage dual mirrors.
>>
>> The main reason why we wanted LV was to support in-place migration. (In
>> Q-world, LV was the box for QV, so it was natural for migration.) But, as
>> we’ve worked our migration story, we’ve discovered we may not need LV for
>> migration. And if we don’t, we surely don’t need it for anything else;
>> worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we’re
>> worried about erasure and overloading, to something like `LObject//V` using
>> John’s type-operator notation.)
>>
>> Assuming we can restructure the migration story to not require LV to represent a
>> VM-generated “box" — which I believe we can, see below — we can drop the
>> requirement for LV. An inline class V gives rise to a single type descriptor,
>> QV (or whatever we decide to call it; John may have plans here.)
>>
>> ## Goodbye `V?`
>>
>> The other reason we wanted LV was that it was the obvious representation for the
>> language type `V?` (V adjoined with null.) Uses for `V?` include:
>>
>> - Denoting non-flattened value fields;
>> - Denoting non-flattened value arrays;
>> - Denoting erased generics over values (`Foo<V?>`);
>> - Denoting the type that is the adjunction of null to V (V | Null), when we
>> really want to talk about nullability.
>>
>> But, we can do all this without a `V?` type; for every V, there is already at
>> least one super type of V that includes `V|Null` — Object, and any interface
>> implemented by V. If we arrange that every value type V has a super type V’,
>> not implemented by any other type — then the value set of this V’ is exactly
>> that of `V?`. And we can use V’ to do all the things `V?` did with respect to
>> V — including sub typing. The language doesn’t need the `?` type operator, it
>> just needs to ensure that V’ always exists. Which turns out to be easy, and
>> also turns out to be essential to the migration story.
>>
>> #### Eclairs
>>
>> We can formalize this by requiring that every value type have a companion
>> interface (or abstract class) supertype. Define an envelope-class pair
>> (“eclair”) as a pair (V, I) such that:
>>
>> - V is an inline class
>> - I is a sealed type
>> - I permits V (and only V)
>> - V <: I
>>
>> (We can define eclairs for indirect classes, but they are less interesting —
>> because indirect classes already contain null.)
>>
>> If every value type be a member of an eclair, we can use V when we want the
>> flattenable, non-nullable, specializable type; and we use I when we want the
>> non-flattenable, nullable, erased “box”. We don’t need to denote `V?`; we can
>> just use I, which is an ordinary, nominal type.
>>
>> Note that the VM can optimize eclairs about as well as it could for LV; it knows
>> that I is the adjunction of null to V, so that all non-null values of I are
>> identity free and must be of type V.
>>
>> What we lose relative to V? is access to fields; it was possible to do
>> `getfield` on a LV, but not on I. If this is important (and maybe it’s not),
>> we can handle this in other ways.
>>
>> #### With sugar on top, please
>>
>> We can provide syntax sugar (please, let’s not bike shed it now) so that an
>> inline clause _automatically_ acquires a corresponding interface (if one is not
>> explicitly provided), onto which the public members (and type variables, and
>> other super types) of C are lifted. For sake of exposition, let’s say this is
>> called `C.Box` — and is a legitimate inner class of C (which can be generated
>> by the compiler as an ordinary classfile.) We’ve been here before, and
>> abandoned it because “Box” seemed misleading, but let’s call it that for now.
>> And now it is a real nominal type, not a fake type. In the simplest case,
>> merely declaring an inline class could give rise to V.Box.
>>
>> Now, the type formerly known as `V?` is an ordinary, nominal interface (or
>> abstract class) type. The user can say what they mean, and no magic is needed
>> by either the language or the VM. Goodbye `V?`.
>>
>> #### Boxing conversion
>>
>> Given the constraints of the eclair relationship, it would be reasonable for the
>> compiler to derive from this that there is a boxing conversion between C and I
>> (I is just the value set of C, plus null — which is the relationship boxes have
>> with their corresponding primitives.) The boxing operation is a no-op (since C
>> <: I) and the unboxing operation is a null checking cast.
>>
>> #### Erased generics
>>
>> Using the eclair wrapper also kicks the problem of erased generics down the
>> road; if we use `Foo<I>` for erased generics, and temporarily ban `Foo<V>`,
>> when we get to specialized generics, it will be obvious what `Foo<V>` means
>> (their common super type will be `Foo<? extends I>`). This is a less confusing
>> world, as then “List of erased V” and “List of specialized V” don’t coexist;
>> there’s only “List of V” and “List of V’s Box”.
>>
>> ## Migration
>>
>> The ability to migrate Optional and friends to values has been an important
>> goal, but it has been the source of significant complexity. Our previous story
>> leaned hard on “When we migrate X to a value, LX will describe the box, so old
>> callsites will continue to link.” But it turned out that brought a lot of
>> baggage (forwarding bridges, null-default values) and compromises (null-default
>> values lose their calling-convention optimizations), and over the past few
>> weeks John and I have been cooking up a simpler eclair-based recipe for this.
>>
>> The world is indeed full of existing utterances of `LOptional`, and they will
>> still want to work. Fortunately, Optional follows the rules for being a
>> value-based class. We start with migrating Optional from a reference class to
>> an eclair with a public abstract class and a private value implementation.
>> Now, existing code just works (source and binary) — and optionals are values.
>> But, this isn’t good enough; existing variables of type Optional are not
>> flattened.
>>
>> One of the objections raised to in-place migration was nullity; in order to
>> migrate Optional to a true value, it would have to be a null-default value, and
>> this already entailed compromises. If we’re willing to compromise further, we
>> can get what we want without the baggage. And that compromises is: give up the
>> name.
>>
>> So we define a new public value class `Opt<T>` which is the value half of the
>> eclair, and the existing Optional is the interface/abstract class half. Now,
>> existing fields / arrays can migrate gradually to Opt, as they want the benefit
>> of flattening; existing APIs can continue to truck in Optional (which have
>> about the same optimizations as a null-default value would have on the stack.)
>>
>> This works because of the boxing conversion. Suppose we have old code that
>> does:
>>
>> Optional o = makeAnOptional()
>>
>> when the user changes this to
>>
>> Opt o = …
>>
>> the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a
>> boxing conversion between them, so we insert an unbox conversion (null check)
>> and we’re done. Users can migrate their fields gradually. The cost: the good
>> name gets burned. But there is a compatible migration path from ref to value.
>>
>> Later, when we have bridges (we don’t need them yet!), we can migrate the
>> library uses from Optional to Opt.
>>
>> ## Null-default values
>>
>> About 75% of the motivation for null-default values — another huge source of
>> complexity — was to support the migration of value-based classes. And it
>> wasn’t even a great solution — because we still lost some key optimizations
>> (e.g., calling conventions.) With the Optional -> Opt path, we don’t need
>> null-default values, we get ordinary values. So while we pay the cost of
>> changing the name, we gain the benefit that the new values, once the full
>> migration is effected, we don’t carry the legacy performance baggage.
>>
>> Another 20% of the motivation was for security-sensitive classes whose default
>> value did not represent a useful value, for which we wanted not
>> null-default-ness but really initialization safety. Let’s look at another way
>> to get there.
>>
>> There are a few ways to get there. One is to treat this problem as protecting
>> such classes from uninitialized fields or array elements; another is to ensure
>> that such classes (a) have no public fields and (b) perform the correct check
>> at the top of each method (which can be injected by the compiler.) I don’t
>> want to solve that problem right here, but I think there enough ways to get
>> there that we can assume this isn’t a hard requirement.
>>
>> The other 5% was just the user-based “I want null in my value set.” For those,
>> we can tell users: use the interface box when you need null.
>>
>> ## Summary
>>
>> In one swoop, we can banish LV from the VM, V? from the language, and
>> null-default values, by making a simple requirement: every value type is paired
>> with an interface or abstract class “box”. For most values, this can be
>> automatically generated by the compiler and denoted via a well-known name
>> (e.g., V.Box); for some values, such as those that are migrated from reference
>> types, we can explicitly declare the box type and pick explicit names for both
>> types.
>>
>> There’s a lot to work out, but I think it should be clear enough that this is a
>> much, much lower energy state than what we were aiming at for L10, and also a
>> simpler user model.
>>
>> Let’s focus discussions on validating the model first before we dive into
>> mechanism or surface syntax.
>
> Trying to implement the Eclair interface by hand,
> it seems we need to have the method of the interface and the one of the
> implementation to use covariant return types,
> the box version retuning a box while the inline class version returning the
> inline class (which is fine because it's a subtype),
> otherwise when you call a method of the inline class the result is the box so
> you are loosing the non-null property when chaining calls.
so depending on
- if you want to 'emulate' a value based class, in that case the eclair is by example Optional and the inline class can have a specific name
- if you want an inline class and an eclair only for interacting with erased generics, like Complex and Complex.box.
- if the inline class use co-variant return types.
so no good solution that will fit them all, which suggests that we should not provide any special compiler support.
Rémi
More information about the valhalla-spec-observers
mailing list