User model stacking: current status

Mon Jun 6 13:05:20 UTC 2022

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "daniel smith" <daniel.smith at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Friday, June 3, 2022 9:14:39 PM
> Subject: Re: User model stacking: current status

> Continuing to shake this tree.

> I'm glad we went through the exploration of "flattenable B3.ref"; while I think
> we probably could address the challenges of tearing across the null channel /
> data channels boundary, I'm pretty willing to let this one go. Similarly I'm
> glad we went through the "atomicity orthogonal to buckets" exploration, and am
> ready to let that one go too.

> What I'm not willing to let go of us making atomicity explicit in the model. Not
> only is piggybacking non-atomicity on something like val-ness too subtle and
> surprising, but non-atomicity seems like it is a property that the class author
> needs to ask for. Flatness is an important benefit, but only when it doesn't
> get in the way of safety.

> Recall that we have three different representation techniques:

> - no-flat -- use a pointer
> - low-flat -- for sufficiently small (depending on size of atomic instructions
> provided by the hardware) values, pack multiple fields into a single,
> atomically accessed unit.
> - full-flat -- flatten the layout, access individual individual fields directly,
> may allow tearing.

> The "low-flat" bucket got some attention recently when we discovered that there
> are usable 128-bit atomics on Intel (based on a recent revision of the chip
> spec), but this is not a slam-dunk; it requires some serious compiler heroics
> to pack multiple values into single accesses. But there may be targets of
> opportunity here for single-field values (like Optional) or final fields. And
> we can always fall back to no-flat whenever the VM feels like it.

> One of the questions that has been raised is how similar B3.ref is to B2,
> specifically with respect to atomicity. We've gone back and forth on this.

> Having shaken the tree quite a bit, what feels like the low energy state to me
> right now is:

> - The ref type of all on-identity classes are treated uniformly; B3.ref and
> B2.ref are translated the same, treated the same, have the same atomicity, the
> same nullity, etc.
> - The only difference across the spectrum of non-identity classes is the
> treatment of the val type. For B2, this means the val type is *illegal*; for
> B3, this means it is atomic; for B3n, it is non-atomic (which in practice will
> mean more flatness.)
> - (controversial) For all types, the ref type is the default. This means that
> some current value-based classes can migrate not only to B2, but to B3 or B3n.
> (And that we could migrate to B2 today and further to B3 tomorrow.)

> While this is technically four flavors, I don't think it needs to feel that
> complex. I'll pick some obviously silly modifiers for exposition:

> - class B1 { }
> - zero-hostile value class B2 { }
> - value class B3 { }
> - tearing-happy value class B3n { }

> In other words: one new concept ("value class"), with two sub-modifiers
> (zero-hostile, and tearing-happy) which affect the behavior of the val type
> (forbidden for B2, loosened for B3n.)

> For heap flattening, what this gets us is:

> - B1 -- no-flat
> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel)
> - B3 -- low-flat (atomic, no null channel)
> - B3n -- full-flat (non-atomic, no null channel)

> This is a slight departure from earlier tree-shakings with respect to tearing.
> In particular, refs do not tear at all, so programs that use all refs will
> never see tearing (but it is still possible to get a torn value using .val and
> then box that into a ref.)

> If you turn this around, the declaration-site decision tree becomes:

> - Do I need identity (mutability, subclassing, aliasing)? Then B1.
> - Are uninitialized values unacceptable? Then B2.
> - Am I willing to tolerate tearing to enable more flattening? Then B3n.
> - Otherwise, B3.

> And the use-site decision tree becomes:

> - For B1, B2 -- no choices to make.
> - Do I need nullity? Then .ref
> - Do I need atomicity, and the class doesn't already provide it? Then .ref
> - Otherwise, can use .val

> The main downside of making ref the default is that people will grumble about
> having to say .val at the use site all the time. And they will! And it does
> feel a little odd that you have to opt into val-ness at both the declaration
> and use sites. But it unlocks a lot of things (see Kevin's list for more):

> - The default name is the safest version.
> - Every unadorned name works the same way; it's always a reference type. You
> don't need to maintain a mental database around "which kind of name is this".
> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had
> hoped for when we started this game.)

> (The one thing to still worry about is that while refs can't tear, you can still
> observe a torn value through a ref, if someone tore it and then boxed it. I
> don't see how we defend against this, but the non-atomic label should be enough
> of a warning.)
I think B3 being ref by default is a mistake, but this is mistake that stem from a more important mistake, the notion of reference type. 

I don't think it's a good idea to introduce the notion of reference type in the Java spec. 
We have spend a lot of time in thinking that identity and value are two different types, they are not, being a value is a runtime capability not a capability inherited from a type. We have remove the idea of the interfaces ValueObject / IdentityObject for this exact reason. 

I think the moto code like a class works like an int fails us, because what we want is more code like a class, being optimized like an int. 
The VM representation, being flatten or not, is not something the Java spec should be aware of. 
This change makes the spec easier to write and the semantics easier to explain. 

Let me try to explain what i think is a better model: 
The addition of value class does not change the existing Java model, apart from primitive types, everything is an object, an instance of class. that's why we declare a class, use new to create it, can call methods on a value class exactly like on an identity class. 

What change as Brian said several times is that a value class does not have null as default value but all fields with zeroes. It's not the only difference, == (acmp) tests the fields, synchronized and weak refs do not work but having a different default value is the most important difference. 
Thus fhe fact that the VM can directly use the value of a value class (the immediate value) is not a property by itself, it's something VM implementations are free to do so it should not be part of the Java spec. 
We do not need to introduce the concept of reference type vs value type, but only the concept of reference projection (as C# does with what they call the nullable value types). 

So from the user POV, everything is an object, an instance of a class. A value class is a special kind class where the default value is not null but a bunch of zeroes and has no observable identity (hence the semantics of ==). 

>From that, we offer three different trade-off, 
- you may want to keep null has the default value, using a zero-hostile value class, in that case the VM may not be able to do all optimizations, 
but in exchange, you have a mostly binary backward compatible class with an identity class. 
- you may want to use existing code that suppose that null is the default value (generic collections by example), 
for that you can use the .ref projection that allow a value class to be nullable. 
In the future, you will need less of the .ref projection because generics will be overhaul to work with classes with a non null default. 
Note that .ref is a type projection, not a class. You can not write new Point.ref() but you can write Point.ref point = new Point(); 
- you may want the read/write of the value class being non-atomic, so the VM can do more optimization when storing/reading 
an instance of the value class from fields/arrays. 

Why this is better than making everything nullable by default ? 
First, making everything nullable by default goes against the idea that a value class is just a classical class with a different default value. If a value class is nullable by default, then a value class does not have a different default ? Right ?? If a value class is nullable by default, it inherently makes the model hard to understand because the discrepancies between how we explain the model, has all zeroes by default and the semantics which is nullable by default. 
Then, making everything nullable makes the performance model murky, nullable by default is equivalent to say, let's use Integer instead of int by default so we will have the very same kind of performance pot holes. Having the right defaults in term of the performance model is very important here because we have started Valhalla because of these performance issues. 
And making everything nullable does not work with the future generics code which uses T.ref as a type projection. 

To summarize, making value class nullable by default is an unproven design (remember that what C# calls value types is not nullable by default) which is based on the idea of the model being describe in term of reference type vs value type, which is IMO not the right way to describe the model. It does not work like an int, it is optimizable (we used flatenable in the past) as an int. 

Obviously there is at least a drawback to not use nullable value class by default, you can not refactor an identity class or a null-hostile value class to a value class because .ref is a type projection and not a real class. 
I can live with that. 

Rémi 

> On 5/6/2022 10:04 AM, Brian Goetz wrote:

>> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the
>> stacking I've been discussing. Is that what you're saying?

>> class B1 { } // ref, identity, atomic
>> value-based class B2 { } // ref, non-identity, atomic
>> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections
>> share atomicity

>> If we go with ref-default, then this is a small leap from yesterday's stacking,
>> because "B3" and "B2" are both reference types, so if you want a tearable,
>> non-atomic reference type, saying `non-atomic value class B3` and then just
>> using B3 gets you that. Then:

>> - B2 is like B1, minus identity
>> - B3 means "uninitialized values are OK, you get two types, a zero-default and a
>> non-default"
>> - Non-atomicity is an extra property we can add to B3, to get more flattening in
>> exchange for less integrity
>> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the
>> default)

>> I think this still has the properties I want; I can freely choose the reasonable
>> subsets of { identity, has-zero, nullable, atomicity } that I want; the
>> orthogonality of non-atomic across buckets becomes orthogonality of non-atomic
>> with nullity, and the "B3.ref is just like B2" is shown to be the "false
>> friend."