Collapsing the requirements
Brian Goetz
brian.goetz at oracle.com
Mon Aug 5 02:15:16 UTC 2019
I’m not really following the concern you’re having.
Let’s divide into two cases:
- Inline classes declared explicitly with their box (e.g., Optional)
- Inline classes declared without a box, which get an implicit box.
So let’s say you have:
inline class Foo<T> implements Bar<T> {
Foo<T> id(Foo<T> f) { return f; }
}
This would acquire a nested interface Foo.Box, as a super type of Foo:
interface Box<T> extends Bar<T> {
Foo<T> id(Foo<T> f);
}
Can you outline what you see as going wrong here?
> On Aug 4, 2019, at 1:06 PM, Remi Forax <forax at univ-mlv.fr> wrote:
>
> ----- Mail original -----
>> De: "Remi Forax" <forax at univ-mlv.fr>
>> À: "Brian Goetz" <brian.goetz at oracle.com>
>> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
>> Envoyé: Samedi 3 Août 2019 19:48:01
>> Objet: Re: Collapsing the requirements
>
>> ----- Mail original -----
>>> De: "Brian Goetz" <brian.goetz at oracle.com>
>>> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
>>> Envoyé: Samedi 3 Août 2019 18:37:56
>>> Objet: Collapsing the requirements
>>
>>> As Remi noted, we had some good discussions at JVMLS this week. Combining that
>>> with some discussions John and I have been having over the past few weeks, I
>>> think the stars are aligning to enable us to dramatically slim down the
>>> requirements. The following threads have been in play for a while:
>>>
>>> - John: I hate the LPoint/QPoint distinction
>>> - Brian: I hate null-default types
>>> - Remi: I hate the V? type
>>>
>>> But the argument for each of these depended, in some way, on the others. I
>>> believe, with a few compromises, we can now prune them as a group, which would
>>> bring us to a much lower energy state.
>>>
>>> ## L^Q World — Goodbye `LV;`
>>>
>>> We’ve taken it as a requirement that for a value type V, we have to support both
>>> LV and QV, where LV is the null-adjunction of QV. This has led to a lot of
>>> complexity in the runtime, where we have to manage dual mirrors.
>>>
>>> The main reason why we wanted LV was to support in-place migration. (In
>>> Q-world, LV was the box for QV, so it was natural for migration.) But, as
>>> we’ve worked our migration story, we’ve discovered we may not need LV for
>>> migration. And if we don’t, we surely don’t need it for anything else;
>>> worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we’re
>>> worried about erasure and overloading, to something like `LObject//V` using
>>> John’s type-operator notation.)
>>>
>>> Assuming we can restructure the migration story to not require LV to represent a
>>> VM-generated “box" — which I believe we can, see below — we can drop the
>>> requirement for LV. An inline class V gives rise to a single type descriptor,
>>> QV (or whatever we decide to call it; John may have plans here.)
>>>
>>> ## Goodbye `V?`
>>>
>>> The other reason we wanted LV was that it was the obvious representation for the
>>> language type `V?` (V adjoined with null.) Uses for `V?` include:
>>>
>>> - Denoting non-flattened value fields;
>>> - Denoting non-flattened value arrays;
>>> - Denoting erased generics over values (`Foo<V?>`);
>>> - Denoting the type that is the adjunction of null to V (V | Null), when we
>>> really want to talk about nullability.
>>>
>>> But, we can do all this without a `V?` type; for every V, there is already at
>>> least one super type of V that includes `V|Null` — Object, and any interface
>>> implemented by V. If we arrange that every value type V has a super type V’,
>>> not implemented by any other type — then the value set of this V’ is exactly
>>> that of `V?`. And we can use V’ to do all the things `V?` did with respect to
>>> V — including sub typing. The language doesn’t need the `?` type operator, it
>>> just needs to ensure that V’ always exists. Which turns out to be easy, and
>>> also turns out to be essential to the migration story.
>>>
>>> #### Eclairs
>>>
>>> We can formalize this by requiring that every value type have a companion
>>> interface (or abstract class) supertype. Define an envelope-class pair
>>> (“eclair”) as a pair (V, I) such that:
>>>
>>> - V is an inline class
>>> - I is a sealed type
>>> - I permits V (and only V)
>>> - V <: I
>>>
>>> (We can define eclairs for indirect classes, but they are less interesting —
>>> because indirect classes already contain null.)
>>>
>>> If every value type be a member of an eclair, we can use V when we want the
>>> flattenable, non-nullable, specializable type; and we use I when we want the
>>> non-flattenable, nullable, erased “box”. We don’t need to denote `V?`; we can
>>> just use I, which is an ordinary, nominal type.
>>>
>>> Note that the VM can optimize eclairs about as well as it could for LV; it knows
>>> that I is the adjunction of null to V, so that all non-null values of I are
>>> identity free and must be of type V.
>>>
>>> What we lose relative to V? is access to fields; it was possible to do
>>> `getfield` on a LV, but not on I. If this is important (and maybe it’s not),
>>> we can handle this in other ways.
>>>
>>> #### With sugar on top, please
>>>
>>> We can provide syntax sugar (please, let’s not bike shed it now) so that an
>>> inline clause _automatically_ acquires a corresponding interface (if one is not
>>> explicitly provided), onto which the public members (and type variables, and
>>> other super types) of C are lifted. For sake of exposition, let’s say this is
>>> called `C.Box` — and is a legitimate inner class of C (which can be generated
>>> by the compiler as an ordinary classfile.) We’ve been here before, and
>>> abandoned it because “Box” seemed misleading, but let’s call it that for now.
>>> And now it is a real nominal type, not a fake type. In the simplest case,
>>> merely declaring an inline class could give rise to V.Box.
>>>
>>> Now, the type formerly known as `V?` is an ordinary, nominal interface (or
>>> abstract class) type. The user can say what they mean, and no magic is needed
>>> by either the language or the VM. Goodbye `V?`.
>>>
>>> #### Boxing conversion
>>>
>>> Given the constraints of the eclair relationship, it would be reasonable for the
>>> compiler to derive from this that there is a boxing conversion between C and I
>>> (I is just the value set of C, plus null — which is the relationship boxes have
>>> with their corresponding primitives.) The boxing operation is a no-op (since C
>>> <: I) and the unboxing operation is a null checking cast.
>>>
>>> #### Erased generics
>>>
>>> Using the eclair wrapper also kicks the problem of erased generics down the
>>> road; if we use `Foo<I>` for erased generics, and temporarily ban `Foo<V>`,
>>> when we get to specialized generics, it will be obvious what `Foo<V>` means
>>> (their common super type will be `Foo<? extends I>`). This is a less confusing
>>> world, as then “List of erased V” and “List of specialized V” don’t coexist;
>>> there’s only “List of V” and “List of V’s Box”.
>>>
>>> ## Migration
>>>
>>> The ability to migrate Optional and friends to values has been an important
>>> goal, but it has been the source of significant complexity. Our previous story
>>> leaned hard on “When we migrate X to a value, LX will describe the box, so old
>>> callsites will continue to link.” But it turned out that brought a lot of
>>> baggage (forwarding bridges, null-default values) and compromises (null-default
>>> values lose their calling-convention optimizations), and over the past few
>>> weeks John and I have been cooking up a simpler eclair-based recipe for this.
>>>
>>> The world is indeed full of existing utterances of `LOptional`, and they will
>>> still want to work. Fortunately, Optional follows the rules for being a
>>> value-based class. We start with migrating Optional from a reference class to
>>> an eclair with a public abstract class and a private value implementation.
>>> Now, existing code just works (source and binary) — and optionals are values.
>>> But, this isn’t good enough; existing variables of type Optional are not
>>> flattened.
>>>
>>> One of the objections raised to in-place migration was nullity; in order to
>>> migrate Optional to a true value, it would have to be a null-default value, and
>>> this already entailed compromises. If we’re willing to compromise further, we
>>> can get what we want without the baggage. And that compromises is: give up the
>>> name.
>>>
>>> So we define a new public value class `Opt<T>` which is the value half of the
>>> eclair, and the existing Optional is the interface/abstract class half. Now,
>>> existing fields / arrays can migrate gradually to Opt, as they want the benefit
>>> of flattening; existing APIs can continue to truck in Optional (which have
>>> about the same optimizations as a null-default value would have on the stack.)
>>>
>>> This works because of the boxing conversion. Suppose we have old code that
>>> does:
>>>
>>> Optional o = makeAnOptional()
>>>
>>> when the user changes this to
>>>
>>> Opt o = …
>>>
>>> the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a
>>> boxing conversion between them, so we insert an unbox conversion (null check)
>>> and we’re done. Users can migrate their fields gradually. The cost: the good
>>> name gets burned. But there is a compatible migration path from ref to value.
>>>
>>> Later, when we have bridges (we don’t need them yet!), we can migrate the
>>> library uses from Optional to Opt.
>>>
>>> ## Null-default values
>>>
>>> About 75% of the motivation for null-default values — another huge source of
>>> complexity — was to support the migration of value-based classes. And it
>>> wasn’t even a great solution — because we still lost some key optimizations
>>> (e.g., calling conventions.) With the Optional -> Opt path, we don’t need
>>> null-default values, we get ordinary values. So while we pay the cost of
>>> changing the name, we gain the benefit that the new values, once the full
>>> migration is effected, we don’t carry the legacy performance baggage.
>>>
>>> Another 20% of the motivation was for security-sensitive classes whose default
>>> value did not represent a useful value, for which we wanted not
>>> null-default-ness but really initialization safety. Let’s look at another way
>>> to get there.
>>>
>>> There are a few ways to get there. One is to treat this problem as protecting
>>> such classes from uninitialized fields or array elements; another is to ensure
>>> that such classes (a) have no public fields and (b) perform the correct check
>>> at the top of each method (which can be injected by the compiler.) I don’t
>>> want to solve that problem right here, but I think there enough ways to get
>>> there that we can assume this isn’t a hard requirement.
>>>
>>> The other 5% was just the user-based “I want null in my value set.” For those,
>>> we can tell users: use the interface box when you need null.
>>>
>>> ## Summary
>>>
>>> In one swoop, we can banish LV from the VM, V? from the language, and
>>> null-default values, by making a simple requirement: every value type is paired
>>> with an interface or abstract class “box”. For most values, this can be
>>> automatically generated by the compiler and denoted via a well-known name
>>> (e.g., V.Box); for some values, such as those that are migrated from reference
>>> types, we can explicitly declare the box type and pick explicit names for both
>>> types.
>>>
>>> There’s a lot to work out, but I think it should be clear enough that this is a
>>> much, much lower energy state than what we were aiming at for L10, and also a
>>> simpler user model.
>>>
>>> Let’s focus discussions on validating the model first before we dive into
>>> mechanism or surface syntax.
>>
>> Trying to implement the Eclair interface by hand,
>> it seems we need to have the method of the interface and the one of the
>> implementation to use covariant return types,
>> the box version retuning a box while the inline class version returning the
>> inline class (which is fine because it's a subtype),
>> otherwise when you call a method of the inline class the result is the box so
>> you are loosing the non-null property when chaining calls.
>
> so depending on
> - if you want to 'emulate' a value based class, in that case the eclair is by example Optional and the inline class can have a specific name
> - if you want an inline class and an eclair only for interacting with erased generics, like Complex and Complex.box.
> - if the inline class use co-variant return types.
>
> so no good solution that will fit them all, which suggests that we should not provide any special compiler support.
>
> Rémi
More information about the valhalla-spec-observers
mailing list