value type hygiene

Thu May 10 16:54:48 UTC 2018

> On May 10, 2018, at 3:11 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> Q-Type (if the roots is j.l.Object + interfaces) and having a ValueTypes attributes are two different encoding of the same semantics, either the descriptor is a Q-type or the descriptor is a L-type and you have a side table that says it's a Q-type.

Yes, with some huge caveats attached to the attribute strategy:
- You have to pick one mode for all types of a given class in your class file
- The semantics are indirect; people will get used to reading them as a property of the class name, when in reality they're a property of a side attribute ("Debugging: I know Foo is a value class, so why is this null slipping through?...")
- Descriptor equality is redefined so that non-equal descriptors match (that is, where one descriptor uses a Q type and one uses an L type); adaptations are necessary to make mismatched descriptors cooperate
- We'll probably try very hard to present users with the fiction that there is only one type (e.g., in reflection)

> The main difference between the two encodings is that you have to generate bridges in case of Q-type.
> 
> Generating bridges in general is far from obvious (that's why invokedynamic to the adaptation at caller site btw), you need a subtype relation, like String <: T for generics, if you do not have a subtype relationship you can not generate bridges.
> 
> For value types, QFoo <: LFoo is not what we need, by example, we want the following example to work,
> let say i have:
>  class A {
>    void m(LFoo)
>  }
>  class B extends A {
>    void m(LFoo)
>  }
> Foo is now declared as value type, and now i recompile B
>  class B extends A {
>    void m(QFoo)
>  }
> if i call A::m, i want B::m to be valid at runtime, so QFoo has also to be a super type of LFoo.
> 
> so the relation between QFoo and LFoo is more like auto-boxing, you have QFoo <: LFoo but you also have QFoo <: LFoo because of the separate compilation issue, and if you do not have a subtyping relationship between types, you can not generate bridges.

Tentatively, the bridge generation strategy I envision looks like this:
- When I convert a class to a value class, I annotate it ("@WasAReferenceClass")
- When a descriptor mentions a Q type, the compiler also generates an L bridge

There are problems with this: for example, when mentioning n distinct Q types, you need 2^n bridges. And maybe there are things the JVM can do to help—we've explored lots of general-purpose "this class has moved" features. My preference is to tackle those problems as needed, on their own terms.

But, yes, I'll grant that probably having the JVM totally ignore the problem ultimately won't work.

>> - The JVM "knows" internally about the two kinds of types, but we won't give
>> users the ability to directly express them, or inspect them with reflection.
>> That mismatch seems bound to bite us repeatedly.
> 
> The fact that Java the language surface if a type is a value type or not is a language issue and it's true for both encoding.
> For the refection, at runtime, you now if a class is a value type or not, the same is true for both encoding.
> If you mean, that at runtime, you can not see if a method was compiled with the knowledge that a type is a value type or not, again,
> it depends if you surface Q-type or the ValueTypes attributes at runtime, so this choice is independent of the encoding. 

The reflection question boils down to: are there two java.lang.Class objects per value class, or one? My read of the goals here is that we'd very much like for there to be only one, for the same reason that we'd like to not change the spelling of descriptors. In that world, I think it will be hard to reason about where null checks happen. (Sure, maybe you can figure it out by consulting the ValueTypes attributes, but that's a huge pain.)

>> - We talk a lot about nullability being a migration problem, but it is sometimes
>> just a really nice feature! All things being equal, not being able to freely
>> talk about nullable value types is limiting.
> 
> again, it's a language thing, it's the same issue for both encoding.

I don't buy this. If the JVM doesn't give me (a compiler writer) the direct ability to talk about nullable value types, I can maybe work around that. But there will be seams. It will be confusing. Debugging will be messy.

> So the question is more, should we allow to retrofit a reference type to be a value type seamlessly,
> if the answer is yes, then QFoo <: LFoo is not enough so we can not use Q-type but we can use a side table,
> if the answer is no, then QFoo <: LFoo is ok, we permit to retrofit a L-type to a Q-type, but user code as to wait that all its dependencies have been updated to use the Q-type before being able to use it.

For the "answer is no" case: in the scenario where I've started using QFoo, but a library still uses LFoo, what can I do?
- Subtyping still works, so I can pass QFoos in. Great!
- When I get LFoos out, I will want to null check them and convert to QFoo. Fine.
- A QFoo[] that I pass in will reject nulls. Which is to be expected. If the semantics demand nullability, I should use an LFoo[] instead.
- Similar with objects I pass in that have LFoo->QFoo bridge methods: there will be null checks, if that's a problem, the objects shouldn't operate on QFoos.
- No new identities or boxes get created. It's the same values passing between the two APIs.
- The library doesn't get the flattening benefits. It needs to make a choice to opt in to them first.

This seems like a fine picture. Ideally, it envisions a language that gives some fine-grained control over whether "Foo" means QFoo or LFoo. Maybe we'll provide that ability in Java—I don't know. It's nice if the JVM gives languages the ability to make that choice.