User model stacking: current status

Tue Jun 14 07:13:05 UTC 2022

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "daniel smith" <daniel.smith at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Tuesday, June 14, 2022 1:04:39 AM
> Subject: Re: User model stacking: current status

> I've done a little more shaking of this tree. It involves keeping the notion
> that the non-identity buckets differ only in the treatment of their val
> projection, but makes a further normalization that enables the buckets to
> mostly collapse away.

> "value class X" means:

> - Instances are identity-free
> - There are two types, X.ref (reference, nullable) and X.val (direct,
> non-nullable)
> - Reference types are atomic, as always
> - X is an alias for X.ref

> Now, what is the essence of B2? B2 means not "I hate zeros", but "I don't like
> that uninitialized variables are initialized to zero." It doesn't mean the .val
> projection is meaningless, it means that we don't trust arbitrary clients with
> it. So, we can make a slight adjustment:

> - The .val type is always there, but for "B2" classes, it is *inaccessible
> outside the nest*, as per ordinary accessibility.

> This means that within the nest, code that understands the restrictions can,
> say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let
> the zero escape. This gives B2 classes a lot more latitude to use the .val type
> in safe ways. Basically: if you don't trust people with the .val type, don't
> let the val type escape.
I don't trust myself with a B2.val. 
The val type for B2 should not exist at all, otherwise any libraries using the reflection can do getClass() on a X.val[] (even typed as a X[]). 

> There's a bikeshed to paint, but it might look something like:

> value class B2 {
> private class val { }
> }

> or, flipping the default:

> value class B3a {
> public class val { }
> }

> So B2 is really a B3a whose value projection is encapsulated.
and here you lost me, .ref and .val are supposed to be projection types not classes, at runtime there is only one class. 

> The other bucket, B3n, I think can live with a modifier:

> non-atomic value class B3n { }

> While these are all the same buckets as before, this feels much more like "one
> new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't
> think of this as creating a different bucket of fields.)
yes ! 

> Summary:

> class B1 { }
> value class B2 { private class val { } }
> value class B3a { }
> non-atomic value class B3n { }

> Value class here is clearly the star of the show; all value classes are treated
> uniformly (ref-default, have a val); some value classes encapsulate the val
> type; some value classes further relax the integrity requirements of instances
> on the heap, to get better flattening and performance, when their semantics
> don't require it.

> It's an orthogonal choice whether the default is "val is private" and "val is
> public".
It makes B2.val a reality, but B3 has no sane default value otherwise it's a B3, so B2.val should not exist. 

regards, 
Rémi 

> On 6/3/2022 3:14 PM, Brian Goetz wrote:

>> Continuing to shake this tree.

>> I'm glad we went through the exploration of "flattenable B3.ref"; while I think
>> we probably could address the challenges of tearing across the null channel /
>> data channels boundary, I'm pretty willing to let this one go. Similarly I'm
>> glad we went through the "atomicity orthogonal to buckets" exploration, and am
>> ready to let that one go too.

>> What I'm not willing to let go of us making atomicity explicit in the model. Not
>> only is piggybacking non-atomicity on something like val-ness too subtle and
>> surprising, but non-atomicity seems like it is a property that the class author
>> needs to ask for. Flatness is an important benefit, but only when it doesn't
>> get in the way of safety.

>> Recall that we have three different representation techniques:

>> - no-flat -- use a pointer
>> - low-flat -- for sufficiently small (depending on size of atomic instructions
>> provided by the hardware) values, pack multiple fields into a single,
>> atomically accessed unit.
>> - full-flat -- flatten the layout, access individual individual fields directly,
>> may allow tearing.

>> The "low-flat" bucket got some attention recently when we discovered that there
>> are usable 128-bit atomics on Intel (based on a recent revision of the chip
>> spec), but this is not a slam-dunk; it requires some serious compiler heroics
>> to pack multiple values into single accesses. But there may be targets of
>> opportunity here for single-field values (like Optional) or final fields. And
>> we can always fall back to no-flat whenever the VM feels like it.

>> One of the questions that has been raised is how similar B3.ref is to B2,
>> specifically with respect to atomicity. We've gone back and forth on this.

>> Having shaken the tree quite a bit, what feels like the low energy state to me
>> right now is:

>> - The ref type of all on-identity classes are treated uniformly; B3.ref and
>> B2.ref are translated the same, treated the same, have the same atomicity, the
>> same nullity, etc.
>> - The only difference across the spectrum of non-identity classes is the
>> treatment of the val type. For B2, this means the val type is *illegal*; for
>> B3, this means it is atomic; for B3n, it is non-atomic (which in practice will
>> mean more flatness.)
>> - (controversial) For all types, the ref type is the default. This means that
>> some current value-based classes can migrate not only to B2, but to B3 or B3n.
>> (And that we could migrate to B2 today and further to B3 tomorrow.)

>> While this is technically four flavors, I don't think it needs to feel that
>> complex. I'll pick some obviously silly modifiers for exposition:

>> - class B1 { }
>> - zero-hostile value class B2 { }
>> - value class B3 { }
>> - tearing-happy value class B3n { }

>> In other words: one new concept ("value class"), with two sub-modifiers
>> (zero-hostile, and tearing-happy) which affect the behavior of the val type
>> (forbidden for B2, loosened for B3n.)

>> For heap flattening, what this gets us is:

>> - B1 -- no-flat
>> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel)
>> - B3 -- low-flat (atomic, no null channel)
>> - B3n -- full-flat (non-atomic, no null channel)

>> This is a slight departure from earlier tree-shakings with respect to tearing.
>> In particular, refs do not tear at all, so programs that use all refs will
>> never see tearing (but it is still possible to get a torn value using .val and
>> then box that into a ref.)

>> If you turn this around, the declaration-site decision tree becomes:

>> - Do I need identity (mutability, subclassing, aliasing)? Then B1.
>> - Are uninitialized values unacceptable? Then B2.
>> - Am I willing to tolerate tearing to enable more flattening? Then B3n.
>> - Otherwise, B3.

>> And the use-site decision tree becomes:

>> - For B1, B2 -- no choices to make.
>> - Do I need nullity? Then .ref
>> - Do I need atomicity, and the class doesn't already provide it? Then .ref
>> - Otherwise, can use .val

>> The main downside of making ref the default is that people will grumble about
>> having to say .val at the use site all the time. And they will! And it does
>> feel a little odd that you have to opt into val-ness at both the declaration
>> and use sites. But it unlocks a lot of things (see Kevin's list for more):

>> - The default name is the safest version.
>> - Every unadorned name works the same way; it's always a reference type. You
>> don't need to maintain a mental database around "which kind of name is this".
>> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had
>> hoped for when we started this game.)

>> (The one thing to still worry about is that while refs can't tear, you can still
>> observe a torn value through a ref, if someone tore it and then boxed it. I
>> don't see how we defend against this, but the non-atomic label should be enough
>> of a warning.)

>> On 5/6/2022 10:04 AM, Brian Goetz wrote:

>>> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the
>>> stacking I've been discussing. Is that what you're saying?

>>> class B1 { } // ref, identity, atomic
>>> value-based class B2 { } // ref, non-identity, atomic
>>> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections
>>> share atomicity

>>> If we go with ref-default, then this is a small leap from yesterday's stacking,
>>> because "B3" and "B2" are both reference types, so if you want a tearable,
>>> non-atomic reference type, saying `non-atomic value class B3` and then just
>>> using B3 gets you that. Then:

>>> - B2 is like B1, minus identity
>>> - B3 means "uninitialized values are OK, you get two types, a zero-default and a
>>> non-default"
>>> - Non-atomicity is an extra property we can add to B3, to get more flattening in
>>> exchange for less integrity
>>> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the
>>> default)

>>> I think this still has the properties I want; I can freely choose the reasonable
>>> subsets of { identity, has-zero, nullable, atomicity } that I want; the
>>> orthogonality of non-atomic across buckets becomes orthogonality of non-atomic
>>> with nullity, and the "B3.ref is just like B2" is shown to be the "false
>>> friend."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220614/c6e185e9/attachment-0001.htm>