value type hygiene

Thu May 10 09:11:37 UTC 2018

----- Mail original -----
> De: "daniel smith" <daniel.smith at oracle.com>
> À: "John Rose" <john.r.rose at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Jeudi 10 Mai 2018 01:46:07
> Objet: Re: value type hygiene

>> On May 6, 2018, at 3:17 AM, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> Like many of us, I have been thinking about the problems of keeping values,
>> nulls,
>> and objects separate in L-world.  I wrote up some long-ish notes on the subject.
>> I hope it will help us wrap our arms around the problem, and get it solved.
>> 
>> TL;DR:  Remi was right in January.  We need a ValueTypes attribute.
>> 
>> http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html
> 
> So I've been digesting this for a few days. I don't like it much. Subtle
> contextual dependencies are a good recipe for exploits and general confusion.
> If it were the only way forward, okay, but I find myself frequently thinking,
> "yeah, but... Q types!"
> 
> The way you've framed the problem has evolved from the original idea. Which is
> fine, but it's helpful to review: the idea was to make a choice between two
> type hierarchies, U-world and L-world:
> 
>  U
> / \
> L   Q
> 
> or
> 
>  L
> / \
> R   Q
> 
> The crux of the choice was: in what way do value types interact with legacy
> bytecode? Does the old code reject values, or does it get automatically
> enhanced to work with them?
> 
> We acknowledged that, in the latter hierarchy, we must push many operations into
> the top, which minimizes the need for 'R' and 'Q', perhaps so much that they
> can be elided entirely. You said in a November write-up:
> 
> "The Q-type syntax is *maybe* needed, but in any case does not appear in a
> parallel position of importance with the dominant L-type syntax."
> 
> In other words, working exclusively with L types wasn't a requirement, it was a
> might-be-nice.
> 
> So we set out on an experiment to see how far we could get without 'R' and 'Q'.
> My read of the current situation is that we've probably stretched that to the
> breaking point, so: good experiment, we've learned some things, and we
> understand what value 'Q' types give us.
> 
> Another read is that we're not ready to end the experiment yet, we have a few
> tricks up our sleeves, and we can force this to work. That's fair, but I'm not
> convinced we need to force it. Not changing descriptors is not a hard
> requirement.

Q-Type (if the roots is j.l.Object + interfaces) and having a ValueTypes attributes are two different encoding of the same semantics, either the descriptor is a Q-type or the descriptor is a L-type and you have a side table that says it's a Q-type.

> 
> (To be clear about my preferred alternative: we introduce Q types as first-class
> types (applicable to value classes only), update the descriptor syntax, assert
> QFoo <: LFoo, and ask compilers to use Qs when they want to guarantee
> non-nullability and allow flattenability. Compilers generate bridge methods
> (and bridge fields?) where needed/if desired.)

The main difference between the two encodings is that you have to generate bridges in case of Q-type.

Generating bridges in general is far from obvious (that's why invokedynamic to the adaptation at caller site btw), you need a subtype relation, like String <: T for generics, if you do not have a subtype relationship you can not generate bridges.

For value types, QFoo <: LFoo is not what we need, by example, we want the following example to work,
let say i have:
  class A {
    void m(LFoo)
  }
  class B extends A {
    void m(LFoo)
  }
Foo is now declared as value type, and now i recompile B
  class B extends A {
    void m(QFoo)
  }
if i call A::m, i want B::m to be valid at runtime, so QFoo has also to be a super type of LFoo.

so the relation between QFoo and LFoo is more like auto-boxing, you have QFoo <: LFoo but you also have QFoo <: LFoo because of the separate compilation issue, and if you do not have a subtyping relationship between types, you can not generate bridges.

> 
> You talk a little about why it's nice to avoid changing descriptors:
> 
> "L-world is backward compatible with tools that must parse classfile
> descriptors, since it leaves descriptor syntax unchanged. There have been no
> changes to this syntax in almost thirty years, and there is a huge volume of
> code that depends on its stability. The HotSpot JVM itself makes hundreds of
> distinct decisions based on descriptor syntax which would need careful review
> and testing if they were to be adapted to take account of a new descriptor type
> ("QFoo;", etc.)."
> 
> Okay, put that in the "pro" column for "Should we leave descriptors untouched?"
> In the "con" column is all the weird new complexity in this proposal. Notably:
> 
> - The mess of overloading and implicit adaptations. Huge complexity cost here,
> from spec to implementation to debugging. We've been there before, and have
> always thrown up our hands and retreated (not always for the same reasons, but
> still).

i believe you have the same mess of adaptation whatever the encoding, it's due to the fact that you want to allow people to upgrade to value type from a reference type.

> 
> - The JVM "knows" internally about the two kinds of types, but we won't give
> users the ability to directly express them, or inspect them with reflection.
> That mismatch seems bound to bite us repeatedly.

The fact that Java the language surface if a type is a value type or not is a language issue and it's true for both encoding.
For the refection, at runtime, you now if a class is a value type or not, the same is true for both encoding.
If you mean, that at runtime, you can not see if a method was compiled with the knowledge that a type is a value type or not, again,
it depends if you surface Q-type or the ValueTypes attributes at runtime, so this choice is independent of the encoding. 

> 
> - We talk a lot about nullability being a migration problem, but it is sometimes
> just a really nice feature! All things being equal, not being able to freely
> talk about nullable value types is limiting.

again, it's a language thing, it's the same issue for both encoding.

> 
> I'd rather spend the feature budget on getting dusty code to work with shiny new
> descriptors than on dealing with these problems/compromises.

all problems are the same for both encodings, the only difference is that you avoid the bridging problem by using a side attribute.

So the question is more, should we allow to retrofit a reference type to be a value type seamlessly,
if the answer is yes, then QFoo <: LFoo is not enough so we can not use Q-type but we can use a side table,
if the answer is no, then QFoo <: LFoo is ok, we permit to retrofit a L-type to a Q-type, but user code as to wait that all its dependencies have been updated to use the Q-type before being able to use it.

> 
> I guess that, before going all in on this approach, it would be helpful for me
> to see a more complete exploration of the relative costs.

regards,
Rémi