User model stacking: current status

Sun May 8 16:32:09 UTC 2022

To track the progress of the spiral:

  - We originally came up with the B2/B3 division to carve off B2 as the 
"safe subset", where you get less flattening but nulls and more 
integrity.  This provided a safe migration target for existing VBCs, as 
well as a reasonable target for creating new VBCs that want to be mostly 
class-like but enjoy some additional optimization (and shed accidental 
identity for safety reasons.)

  - When we put all the flesh on the bones of B2/B3, there were some 
undesirable consequences, such as (a) tearing was too subtle, and (b) 
both the semantics and cost model differences between B2/B3 were going 
to be hard to explain (and in some cases, users have bad choices between 
semantics and performance.)

  - A few weeks ago, we decided to more seriously consider separating 
atomicity out as an explicit thing on its own.  This had the benefit of 
putting semantics first, and offered a clearer cost model: you could 
give up identity but keep null-default and integrity (B2), further give 
up nulls to get some more density (B3.val), and further further give up 
atomicity to get more flatness (non-atomic B3.)  This was honest, but 
led people to complain "great, now there are four buckets."

  - We explored making non-atomicity a cross-cutting concern, so there 
are two new buckets (VBC and primitive-like), either of which can choose 
their atomicity constraints, and then within the primitive-like bucket, 
the .val and .ref projections differ only with respect to the 
consequences of nullity.  This felt cleaner (more orthogonal), but the 
notion of a non-atomic B2 itself is kind of weird.

So where this brings us is back to something that might feel like the 
four-bucket approach in the third bullet above, but with two big 
differences: atomicity is an explicit property of a class, rather than a 
property of reference-ness, and a B3.ref is not necessarily the same as 
a B2.  This recognizes that the main distinction between B2 or B3 is 
*whether a class can tolerate its zero value.*

More explicitly:

  - B1 remains unchanged

  - B2 is for "ordinary" value-based classes.  Always atomic, always 
nullable, always reference; the only difference with B1 is that it has 
shed its identity, enabling routine stack-based flattening, and perhaps 
some heap flattening depending on VM sophistication and heroics.  B2 is 
a good target for migrating many existing value-based classes.

  - B3 means that a class can tolerate its zero (uninitialized) value, 
and therefore gives rise to two types, which we'll call B3.ref and 
B3.val.  The former is a reference type and is therefore nullable and 
null-default; the latter is a direct/immediate/value type whose default 
is zero.

  - B3 classes can further be marked non-atomic; this unlocks greater 
flattening in the heap at the cost of tearing under race, and is 
suitable for classes without cross-field invariants.  Non-atomicity 
accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears 
(and therefore might expose its zero under race, as per friday's 
discussions.)

Syntactically (reminder: NOT an invitation to discuss syntax at this 
point), this might look like:

     class B1 { }                // identity, reference, atomic

     value-based class B2 { }    // non-identity, reference, atomic

     value class B3 { }          // non-identity, .ref and .val, both atomic

     non-atomic value class B3 { }  // similar to B3, but both are 
non-atomic

So, two new (but related) class modifiers, of which one has an 
additional modifier.  (The spelling of all of these can be discussed 
after the user model is entirely nailed down.)

So, there's a monotonic sequence of "give stuff up, get other stuff":

  - B2 gives up identity relative to B1, gains some flattening
  - B3 optionally gives up null-defaultness relative to B2, yielding two 
types, one of which sheds some footprint
  - non-atomic B3 gives up atomicity relative to B3, gaining more 
flatness, for both type projections

On 5/6/2022 10:04 AM, Brian Goetz wrote:
> Thinking more about Dan's concerns here ...
>
> On 5/5/2022 6:00 PM, Dan Smith wrote:
>> This is significant because the primary reason to declare a B2 rather 
>> than a B3 is to guarantee that the all-zeros value cannot be created. 
>
> This is a little bit of a circular argument; it takes a property that 
> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be 
> "the whole point" of B2.  It may be that exposure of the zero is so 
> bad we may eventually want to back away from the idea, but let's come 
> up with a fair picture of what a non-atomic B2 means, and ask if 
> that's sufficiently useful.
>
>> This leads me to conclude that if you're declaring a non-atomic B2, 
>> you might as well just declare a non-atomic B3. 
>
> Fair point, but let's pull on this string for a moment.  Suppose I 
> want a null-default, flattenable value, and I'm willing to take the 
> tearing to get there.  So you're saying "then declare a B3 and use 
> B3.ref".  But B3.ref was supposed to have the same semantics as an 
> equivalent B2!  (I realize I'm doing the same thing I just accused you 
> of above -- taking an old invariant and positiioning it as "the 
> point".  Stay tuned.)  Which means either that we lose flattening, 
> again, or we create yet another asymmetry between B3.ref and B2. Maybe 
> you're saying that the combination of nullable and full-flat is just 
> too much to ask, but I am not sure it is; in any case, let's convince 
> ourselves of this before we rule it out.
>
> Or maybe, what you're saying is that my claim that B3.ref and B2 are 
> the same thing is the stale thing here, and we can let it go and get 
> it back in another form.  In which case you're positing a model where:
>
>  - B1 is unchanged
>  - B2 is always atomic, reference, nullable
>  - B3 really means "the zero is OK", comes with .ref and .val, and 
> (non-atomic B3).ref is still tearable?
>
> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) 
> in the stacking I've been discussing.  Is that what you're saying?
>
>     class B1 { }  // ref, identity, atomic
>     value-based class B2 { }  // ref, non-identity, atomic
>     [ non-atomic ] value class B3 { }  // ref or val, zero is ok, both 
> projections share atomicity
>
> If we go with ref-default, then this is a small leap from yesterday's 
> stacking, because "B3" and "B2" are both reference types, so if you 
> want a tearable, non-atomic reference type, saying `non-atomic value 
> class B3` and then just using B3 gets you that. Then:
>
>  - B2 is like B1, minus identity
>  - B3 means "uninitialized values are OK, you get two types, a 
> zero-default and a non-default"
>  - Non-atomicity is an extra property we can add to B3, to get more 
> flattening in exchange for less integrity
>  - The use cases for non-atomic B2 are served by non-atomic B3 (when 
> .ref is the default)
>
> I think this still has the properties I want; I can freely choose the 
> reasonable subsets of { identity, has-zero, nullable, atomicity } that 
> I want; the orthogonality of non-atomic across buckets becomes 
> orthogonality of non-atomic with nullity, and the "B3.ref is just like 
> B2" is shown to be the "false friend."
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20220508/8f4554ad/attachment.htm>