Collapsing the requirements
John Rose
john.r.rose at oracle.com
Mon Aug 5 21:41:48 UTC 2019
Yay! Excellent summary of a major break in the logjam!
On Aug 3, 2019, at 9:37 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> As Remi noted, we had some good discussions at JVMLS this week. Combining that with some discussions John and I have been having over the past few weeks, I think the stars are aligning to enable us to dramatically slim down the requirements. The following threads have been in play for a while:
>
> - John: I hate the LPoint/QPoint distinction
(Expansion: I have come to dislike the costs, in the JVMS, of disambiguating Point into indirect-Point and inline-Point. The root cause is two meanings for the same name Point.)
> - Brian: I hate null-default types
> - Remi: I hate the V? type
>
> But the argument for each of these depended, in some way, on the others. I believe, with a few compromises, we can now prune them as a group, which would bring us to a much lower energy state.
>
> ## L^Q World — Goodbye `LV;`
>
> We’ve taken it as a requirement that for a value type V, we have to support both LV and QV, where LV is the null-adjunction of QV. This has led to a lot of complexity in the runtime, where we have to manage dual mirrors.
>
> The main reason why we wanted LV was to support in-place migration. (In Q-world, LV was the box for QV, so it was natural for migration.) But, as we’ve worked our migration story, we’ve discovered we may not need LV for migration. And if we don’t, we surely don’t need it for anything else; worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we’re worried about erasure and overloading, to something like `LObject//V` using John’s type-operator notation.)
(The basic move Brian is alluding to is to distinguish the verifier type T0 from additional “type decorations” T1, and encode the descriptor in the form T0//T1. The verifier ignores “//T1”, probably, or at least pays attention only to a small set of “hardwired” instances of “//x”, like “//n” for not-null maybe. The JVM is free to use only the T0 prefix to build calling sequences. The “//T1” part is not necessarily enforced but gives translation strategies a hook to attach unchecked “intentionality” to the bare type T0. JITs might use the //T1 part in speculative predication, which would win as long as the translation strategy wasn’t “polluted”. Overloads of m(T0//T1) and m(T0//T2) and m(T0) are all distinct. The effect is similar to that of interface types which are also unchecked, at least as arguments and returns of simple method calls. The reflective properties of T0//T1 are TBD.)
>
> Assuming we can restructure the migration story to not require LV to represent a VM-generated “box" — which I believe we can, see below — we can drop the requirement for LV. An inline class V gives rise to a single type descriptor, QV (or whatever we decide to call it; John may have plans here.)
More on John’s plans: The best way to trigger pre-loading of V robustly and consistently seems to be to have a new descriptor letter x != ‘L’; right now x == ‘Q’. We’ve talked through a lot of other ways to keep “LV;” and drive preloading (or other oracular schema queries) from another signal channel, but nothing works as simply and reliably as a new descriptor letter. My best prediction right now is that we keep ‘Q’ for now, and if we want to further decouple null-hostility (which is an aspect of today’s ‘Q’) from preloading (which is logically independent), then we shift to another letter (‘G’) to drive the preloading and signal non-nullity using a type operator like “//n”, which might be partially enforced by the verifier. The factor which might drive such decoupling in the future is the support for templates. A template instance might be created in response to a preload-mode descriptor like “QOpt[QPoint;];”, even if the expansion of the template for some reason works out to a pointer which permits null. In such a hypothetical case, the presence of ‘Q’ means “expand when you see it”, not “preload and by the way not null”. Make sense?
>
> ## Goodbye `V?`
>
> The other reason we wanted LV was that it was the obvious representation for the language type `V?` (V adjoined with null.) Uses for `V?` include:
>
> - Denoting non-flattened value fields;
> - Denoting non-flattened value arrays;
> - Denoting erased generics over values (`Foo<V?>`);
> - Denoting the type that is the adjunction of null to V (V | Null), when we really want to talk about nullability.
>
> But, we can do all this without a `V?` type; for every V, there is already at least one super type of V that includes `V|Null` — Object, and any interface implemented by V. If we arrange that every value type V has a super type V’, not implemented by any other type — then the value set of this V’ is exactly that of `V?`. And we can use V’ to do all the things `V?` did with respect to V — including sub typing. The language doesn’t need the `?` type operator, it just needs to ensure that V’ always exists. Which turns out to be easy, and also turns out to be essential to the migration story.
>
> #### Eclairs
>
> We can formalize this by requiring that every value type have a companion interface (or abstract class) supertype. Define an envelope-class pair (“eclair”) as a pair (V, I) such that:
>
> - V is an inline class
> - I is a sealed type
> - I permits V (and only V)
> - V <: I
>
> (We can define eclairs for indirect classes, but they are less interesting — because indirect classes already contain null.)
>
> If every value type be a member of an eclair, we can use V when we want the flattenable, non-nullable, specializable type; and we use I when we want the non-flattenable, nullable, erased “box”. We don’t need to denote `V?`; we can just use I, which is an ordinary, nominal type.
>
> Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V.
>
> What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it’s not), we can handle this in other ways.
>
> #### With sugar on top, please
>
> We can provide syntax sugar (please, let’s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. For sake of exposition, let’s say this is called `C.Box` — and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) We’ve been here before, and abandoned it because “Box” seemed misleading, but let’s call it that for now. And now it is a real nominal type, not a fake type.
> In the simplest case, merely declaring an inline class could give rise to V.Box.
I sincerely hope we can do this little trick, so that you can write one-liner inline types (such as records) without mandated interface boilerplate. This interacts with the rules for lifting methods from V into V.Box (see below).
But IDEs would be able to refactor the code to make the sugary default visible (for further editing) or invisible again. Point of comparison: This is roughly how Java treats empty constructors (in objects which don’t define their own constructors). It is as if the object’s author had written the “obvious trivial” constructor with no arguments and an empty body. And IDEs can in principle reveal or suppress such trivial members as routine refactorings.
> Now, the type formerly known as `V?` is an ordinary, nominal interface (or abstract class) type. The user can say what they mean, and no magic is needed by either the language or the VM. Goodbye `V?`.
(Thunderous applause as the crowd goes wild! At least, I’m getting a little excited here.)
Part of the model, I think, is that the I aspect of an eclair is “just as functional” as the V aspect, excepting only the presence of null and NPE. This means we need a mechanism (which I won’t speculate on here; there are multiple possibilities) for lifting the public methods of V into I so they can be invoked via I. But here are some related user model questions:
Is there ever a case where a non-sealed super of V is “good enough”, or is there always a 1-1 relation between every V and its I?
(A type V without a sealed I would still have supers like Object and Comparable<? extends Comparable & ValObject>. But its unbox operation would go to a type like Object which isn’t unambiguously unboxed back to V. One might call such a half-baked type “weakly boxable”, a raw biscuit rather than a tasty eclair. Is it possible? Is it worth it?)
Given a proper eclair (V and I in 1-1 relation), we can also ask, how do the following sets compare: The instance methods of V and those of I? Also the static methods of V and those of I? Also the instance fields of V and the methods of I? Also the static fields of V and the fields and/or methods of I? And the nested types of V and I?
(Quick take: Public instance fields of V are all present in I by some means TBD. Users can, but shouldn’t, define members in I such as default methods, constants, static factories, etc. Such things always work better on V, unless there is a compatibility play going on, in which case perhaps the members should be defined in *both* places as a best practice.)
Do we always nest I=V.Box inside of V, or do we sometimes allow other couplings, such as I, V as sibling members of another type or package, or V=I.Val inside of I?
(Quick take: Flexibility is good, plus clear best practices. Assuming all the 1-1 relations are always derivable at compile time and run time.)
How does this interact with nested classes? If one inline nests inside another, is there any tricky way to go from inner to outer via the box? (Probably not.)
Given a nested inner inline type V as C.V, where every V has an outer C instance, what is C.V.default? (That was a solve from NDVs. Maybe we can somehow rewrite vulnerable expressions of type C.V as C.V.Box and let them go to null? Maybe the array type C.V[] is hard to obtain and the language directs you instead t C.V.Box[]?)
Since V=I.Val and I=V.Box are statically and dynamically related, it appears that an erased generic Foo<T> can potentially instantiate as Foo<I> and then inside its type signature make use of the static type V=I.Val, spelled as “T.Val” or “unbox<T>” or some such. The unboxing casts to V (which reject null) would be planted around the edges of the generic at all uses of its instance, not inside it. This seems possible to me; is it useful enough to justify the weird intrinsic type operator in the JLS? Probably not, but I’m putting it out there anyway.
(What’s T.Val / unbox<T> when T is not a ValObject? Maybe it’s just T. Or maybe T.Val is only allowed to instantiate when T is of the form V.Box, but there’s no bound that expresses that, currently, a hint that this “feature” is DOA. But, a possible use of T.Val / unbox<T> would be in argument and return positions of erased generics, in all places where null should be excluded. There’s no need for T.Box / box<T> since T cannot ever be an inline type V. Today’s generics could be upgraded to be smarter about nulls if they could “guard” their arguments and return values with box<T> instead of T.)
— John
More information about the valhalla-spec-observers
mailing list