[External] : Re: User model stacking
Brian Goetz
brian.goetz at oracle.com
Wed Apr 27 23:17:42 UTC 2022
(somehow two versions of this got sent, along with some cut and paste from another thread; please disregard whatever looks weird.)
> On Apr 27, 2022, at 5:50 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Let me try and put some more color on the bike shed (but, again, let’s focus on model, not syntax, for now.)
>
> We have two axes of variation we want to express with non-identity classes: atomicity constraints, and whether there is an additional zero-default companion type. These can be mostly orthogonal; you can have either, neither, or both. We've been previously assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which has been bothering us because primitives also code like a class. So we were trying to claw back some atomicity for primitives.
>
> But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening than a B3. That's a little sad. And also that it seems like a gratuitous difference, which makes the user model more complicated. So we’re suggesting restacking towards:
>
> - Value classes are those without identity
> - Value classes can be atomic or non-atomic, the default is atomic (safe by default)
> - Value classes can further opt into having a "val" projection (name TBD, val is probably not it)
> - Val projections are non-nullable, zero-default — this is the only difference
> - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null
>
> Example: classic B2
>
> value class B2a { }
>
> Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened (“full flat”), multi-field classes can be flattened with "fat load and store" heroics in the future (“low flat”), otherwise, indirection (“no flat”)
>
> Example: non-atomic B2
>
> non-atomic value class B2n { }
>
> Here, the user has said "I have no atomicity rquirements." A B2n is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode, could use slack pointer bits etc.)
>
> Example: atomic B3
>
> zero-capable value class B3a { }
>
> This says I am declaring two types, B3a and B3a.zero. (The syntax in this quadrant sucks; need to find better.) B3a is just like B2a above, because we haven’t activated the zero capability at the use site. B3a.zero/val/flat/whatever is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices on the part of the VM.
>
> Example: full primitive
>
> non-atomic zero-capable value class B3n { }
>
> Here, B3n is like B2n, and B3n.zero is a full classic-B3 Q primitive with full flattening.
>
> So:
>
> - value-ness means "no identity, == means state equality"
> - You can add non-atomic to value-ness, meaning you give up state integrity
> - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion, which inherits the atomic-ness
>
> Some of the characteristics of this scheme:
>
> - The default is atomicity / integrity FOR ALL BUCKETS (safe by default)
> - The default is nullability FOR ALL BUCKETS
> - All unadorned type names are reference types / nullable
> - All Val-adorned type names (X.val) are non-nullable (or .zero, or .whatever)
> - Atomicity is determined by declaration site, can’t be changed at use site
>
> The main syntactic hole is finding the right spelling for "zeroable" / .val. There is some chance we can get away with spelling it `T!`, though this has risks.
>
> Spelling zero-happy as any form of “flat” is probably a bad idea, because B2 can still be flat.
>
> A possible spelling for “non-atomic” is “relaxed”:
>
> relaxed value class B3n { }
>
> Boilerplate-measurers would point out that to get full flattening, you have to say three things at the declaration site and one extra thing at the use site:
>
> relaxed zero-happy value class Complex { }
> …
> Complex! c;
>
> If you forget relaxed, you might get atomicity (but might not cost anything, if the value is small.) If you forget zero-happy, you can’t say `Complex!`, you can only say Complex, and the compiler will remind you. If you forget the !, you maybe get some extra footprint for the null bit. None of these are too bad, but the verbosity police might want to issue a warning here.
>
> It is possible we might want to flip the declaration of zero-capable, where classes with no good default can opt OUT of the zero companion, rather than the the other way around:
>
> null-default value class LocalDate { }
>
> which says that LocalDate must use the nullable (LocalDate) form, not the non-nullable (LocalDate.val/zero/bang) form.
>
>
> On 4/22/2022 2:24 PM, Brian Goetz wrote:
> I think I have a restack of Dan's idea that feels like fewer buckets.
>
> We have two axes of variation we want to express with flattenable types: atomicity constraints, and whether there is an additional zero-default companion type.
>
> We've been assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which is bothering us because primitives also code like a class. So we're trying to claw back some atomicity for primitives.
>
> But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening. That's a little sad. Let's restack the pieces (again).
>
> - Value classes are those without identity
> - Value classes can be atomic or non-atomic, the default is atomic (safe)
> - Value classes can further opt into having a "val" projection (name TBD, val is probably not it)
> - Val projections are non-nullable, zero-default
> - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null
>
> Example: classic B2
>
> value class B2 { }
>
> Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened, multi-field classes can be flattened with "fat load and store" heroics in the future, otherwise, indirection.
>
> Example: non-atomic B2
>
> non-atomic value class B2a { }
>
> Here, the user has said "I have no atomicity rquirements." A B2a is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode.)
>
> Example: atomic B3
>
> zero-capable value class B3a { }
>
> This says I am declaring two types, B3a and B3a.zero. (These names suck; need better ones.) B3a is just like B2 above. B3a.zero is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices.
>
> Example: full primitive
>
> non-atomic zero-capable value class B3b { }
>
> Here, B3b is like B2a, and B3b.zero is a full classic-B3 Q primitive with full flattening.
>
>
> So the stacking is:
>
> - value-ness means "no identity, == means state equality"
> - You can add non-atomic to value-ness, meaning you give up state integrity
> - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion
>
> This is starting to feel more honest....
>
>
>
>
>
> On 4/19/2022 6:45 PM, Brian Goetz wrote:
> By choosing to modify the class, we are implicitly splitting into Buckets 3a and 3n:
>
> - B2 gives up identity
> - B3a further gives up nullity
> - B3n further gives up atomicity
>
> Which opens us up to a new complaint: people didn't even like the B2/B3 split ("why does there have to be two"), and now there are three.
>
> Given that atomic/non-atomic only work with primitive, maybe there's a way to compress this further?
>
> On 4/19/2022 6:25 PM, Dan Smith wrote:
> On Apr 19, 2022, at 2:49 PM, Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
> So, what shall we do when the user says non-atomic, but the constructor expresses a multi-field invariant?
>
> Lint warning, if we can detect it and that warning is turned on.
>
>
> On Apr 19, 2022, at 3:22 PM, Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
> Stepping back, what you're saying is that we manage atomicity among a subset of fields by asking the user to arrange the related fields in a separate class, and give that class extra atomicity. If we wanted to express ColoredDiagonalPoint, in this model we'd say something like:
>
> non-atomic primitive ColoredDiagonalPoint {
> private DiagonalPoint p;
> private Color c;
>
> private atomic primitive DiagonalPoint {
> private int x, y;
>
> DiagonalPoint(int x, int y) {
> if (x != y) throw;
> ...
> }
> }
> }
>
> Right?
>
> Yep. Good illustration of how just providing a class modifier gives programmers significant fine-grained control.
>
>
> We exempt the single-field classes from having an opinion. We could also exempt primitive records with no constructor behavior.
>
> Yeah, but (1) hard to identify all assumed invariants—some might appear in factories, etc., or informally in javadoc; and (2) even in a class with no invariants, it's probably useful for the author to explicitly acknowledge that they understand tearing risks.
>
>
> What it gives up (without either a change in programming model, or compiler heroics), is the ability to correlate between user-written invariants and the corresponding atomicity constraints, which could guide users away from errors. Right?
>
> Right. Could still do that if we wanted, but my opinion is that it's too much language surface for the scale of the problem. If we did have additional construction constraints, I'd prefer that atomic primitives allow full imperative construction logic & encapsulation.
>
> This feels analogous to advanced typing analyses that might prove certain casts to be safe/unsafe. Sure, the language could try to be helpful by implementing that analysis, but it would add lots of complexity, and ultimately it's either a best-effort check or annoyingly restrictive.
>
>> On Apr 27, 2022, at 2:51 PM, Dan Heidinga <heidinga at redhat.com> wrote:
>>
>> I'm trying to understand how this refactoring fits the VM physics.
>>
>> In particular, __non-atomic & __zero-ok fit together at the VM level
>> because the VM's natural state for non-atomic (flattened) data is zero
>> filled. When those two items are decoupled, I'm unclear on what the
>> VM would offer in that case. Thoughts?
>>
>> How does "__non-atomic __non-id class B2a { }" fit with the "no new
>> nulls" requirements?
>>
>> --Dan
>>
>> On Wed, Apr 27, 2022 at 12:45 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>>>
>>> Here’s some considerations for stacking the user model. (Again, please let’s resist the temptation to jump to the answer and then defend it.)
>>>
>>> We have a stacking today which says:
>>>
>>> - B1 is ordinary identity classes, giving rise to a single reference type
>>> - B2 are identity-free classes, giving rise to a single reference type
>>> - B3 are flattenable identity-free classes, giving rise to both a reference (L/ref) and primitive (Q/val) type.
>>>
>>> This stacking has some pleasant aspects. B2 differs from B1 by “only one bit”: identity. The constraints on B2 are those that come from the lack of identity (mutability, extensibility, locking, etc.) B2 references behave like the object references we are familiar with; nullability, final field guarantees, etc. B3 further makes reference-ness optional; reference-free B3 values give up the affordances of references: they are zero-default and tearable. This stacking is nice because it can framed as a sequence of “give up some X, get some Y”.
>>>
>>> People keep asking “do we need B2, or could we get away with B1/B3”. The main reason for having this distinction is that some id-free classes have no sensible default, and so want to use null as their default. This is a declaration-site property; B3 means that the zero value is reasonable, and use sites can opt into / out of zero-default / nullity. We’d love to compress away this bucket but forcing a zero on classes that can’t give it a reasonable interpretation is problematic. But perhaps we can reduce the visibility of this in the model.
>>>
>>> The degrees of freedom we could conceivably offer are
>>>
>>> { identity or not, zero-capable or not, atomic or not } x { use-site, declaration-site }
>>>
>>> In actuality, not all of these boxes make sense (disavowing the identity of an ArrayList at the use site), and some have been disallowed by the stacking (some characteristics have been lumped.) Here’s another way to stack the declaration:
>>>
>>> - Some classes can disavow identity
>>> - Identity-free classes can further opt into zero-default (currently, B3, polarity chosen at use site)
>>> - Identity-free classes can further opt into tearability (currently, B3, polarity chosen at use site)
>>>
>>> It might seem the sensible move here is to further split B3 into B3a and B3b (where all B3 support zero default, and a/b differ with regard to whether immediate values are tearable). But that may not be the ideal stacking, because we want good flattening for B2 (and B3.ref) also. Ideally, the difference between B2 and B3.val is nullity only (Kevin’s antennae just went up.)
>>>
>>> So another possible restacking is to say that atomicity is something that has to be *opted out of* at the declaration site (and maybe also at the use site.) With deliberately-wrong syntax:
>>>
>>> __non-id class B2 { }
>>>
>>> __non-atomic __non-id class B2a { }
>>>
>>> __zero-ok __non-id class B3 { }
>>>
>>> __non-atomic __zero-ok __non-id class B3a { }
>>>
>>> In this model, you can opt out of identity, and then you can further opt out of atomicity and/or null-default. This “pulls up” the atomicity/tearaiblity to a property of the class (I’d prefer safe by default, with opt out), and makes zero-*capability* an opt-in property of the class. Then for those that have opted into zero-capability, at the use site, you can select .ref (null) / .val (zero). Obviously these all need better spellings. This model frames specific capabilities as modifiers on the main bucket, so it could be considered either a two bucket, or a four bucket model, depending on how you look.
>>>
>>> The author is in the best place to make the atomicity decision, since they know the integrity constraints. Single field classes, or classes with only single field invariants (denominator != 0), do not need atomicity. Classes with multi-field invariants do.
>>>
>>> This differs from the previous stacking in that it moves the spotlight from _references_ and their properties, to the properties themselves. It says to class writers: you should declare the ways in which you are willing to trade safety for performance; you can opt out of the requirement for references and nulls (saving some footprint) and atomicity (faster access). It says to class *users*, you can pick the combination of characteristics, allowed by the author, that meet your needs (can always choose null default if you want, just use a ref.)
>>>
>>> There are many choices here about “what are the defaults”. More opting in at the declaration site might mean less need to opt in at the use site. Or not.
>>>
>>> (We are now in the stage which I call “shake the box”; we’ve named all the moving parts, and now we’re looking for the lowest-energy state we can get them into.)
>>>
>>
>
More information about the valhalla-spec-observers
mailing list