From john.r.rose at oracle.com  Sun May  1 03:22:20 2022
From: john.r.rose at oracle.com (John Rose)
Date: Sat, 30 Apr 2022 20:22:20 -0700
Subject: [External] : Re: User model stacking
In-Reply-To: <8C6F4291-889A-4A61-9872-8476F9ABAEEA@oracle.com>
References: <B063744C-06A9-4A38-B195-647C80F93024@oracle.com>
 <CAJq4Gi57Ps5ujo0L3NJjb19fSNVZKxkr45x0JfRxg9kat7SP4A@mail.gmail.com>
 <8C6F4291-889A-4A61-9872-8476F9ABAEEA@oracle.com>
Message-ID: <55B81113-FF77-4C1F-BBAE-6E680DAC5B7D@oracle.com>

On 27 Apr 2022, at 16:12, Brian Goetz wrote:

> We can divide the VM flattening strategy into three rough categories (would you like some milk with your eclair?):
>
>  - non-flat ? use a pointer
>  - full-flat ? inline the layout into the enclosing container, access with narrow loads
>  - low-flat ? use some combination of atomic operations to cram multiple fields into 64 or 128 bits, access with wide loads

There?s a another kind of strategy here, call it ?fat-flat?.  That would encompass any hardware and/or software transaction memory mechanism that uses storage of more than 64 bits.  I think all such techniques include a fast and slow path, which means unpredictable performance.  Such techniques usually require ?slack? of some sort in the data structure, either otherwise unencoded states (like pseudo-oops) or extra words (injected STM headers).  This is not completely off the table, because (remember) we are often going to inject an extra word just to represent the null state.  In for a penny, in for a pound:  If we add a word to encode the null state, it can also encode an inflated ?synchronized access? state.  That?s part of the ?VM physics? that Dan is asking about.

>
> B1 will always take the non-flat strategy.  Non-volatile B3 that are smaller than some threshold (e.g., full cache line) will prefer the full-flat strategy.  Non-atomic B2 can also pursue the full-flat strategy, but may have an extra field for the null channel.  Atomic B2/B3 may try the low-flat strategy, and fall back to non-flat where necessary.  Volatiles will likely choose non-flat, unless they fit in the CAS window.  But it is always VM?s choice.

A fat-flat strategy can cover atomic B2/B3, even volatiles.

Thing to remember:  Even if a class designer selects the non-atomic option, a use-site volatile annotation surely overrides that.  A non-atomic B2 is a funny type:  It is usually non-atomic, except for volatile variables.  That suggests to me there?s a hole in the user model, a way to select atomic-but-not-volatile use sites (variables and array elements, in particular) for non-atomic B2?s.

From brian.goetz at oracle.com  Tue May  3 17:56:04 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 3 May 2022 13:56:04 -0400
Subject: Null channels (was: User model stacking)
In-Reply-To: <CAJq4Gi7Ad_V1o6EBxWKEY9pTumpGOs8AX9sMA-kG6hL8Cy772A@mail.gmail.com>
References: <B063744C-06A9-4A38-B195-647C80F93024@oracle.com>
 <CAJq4Gi57Ps5ujo0L3NJjb19fSNVZKxkr45x0JfRxg9kat7SP4A@mail.gmail.com>
 <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com>
 <0FDEFA76-E212-4636-9E64-A603F703D0A5@oracle.com>
 <CAJq4Gi7Ad_V1o6EBxWKEY9pTumpGOs8AX9sMA-kG6hL8Cy772A@mail.gmail.com>
Message-ID: <ebe0d725-e59d-0a71-8fcd-870409f85718@oracle.com>

About six months ago we started working on flattening references in 
calling conventions in the Valhalla repos.? We use the Preload attribute 
to force preloading of classes that are known to be (or expected to be) 
value classes, but which are referenced only via L descriptors, so that 
at the (early) time that calling convention is chosen, we have the 
additional information that that this is an identity-free class.? In 
these cases, we scalarize the calling convention as we do with Q types, 
but we add an extra boolean channel for null; it is as if we add a 
boolean field to the object layout.? When we adapt between the 
scalarized and indirected forms (e.g., c2i adapters), we apply the 
obvious semantics to the null channel.

We have not yet applied the same treatment to field layout, but we can 
(and it has the same timing constraints, so it also needs Preload), and 
the VM has additional degrees of implementation freedom in doing so.? 
The simplest is to let the layout engine choose to flatten a preloaded L 
value type by injecting a boolean field which represents nullity, and 
adapting null checks to check this field (which can be hoisted etc.)

The layout engine has other tricks available to it as well, to further 
reduce the footprint of representing "might be null", if it can find 
suitable slack space in the representation.? Such tricks could include 
using slack bits in boolean fields (potentially seven of them), low 
order bits of pointers (a la compressed OOPs), unused color bits of 64 
bit pointers, etc.? Some of these choices require transforms on 
load/store (e.g., those that use pointer bits), not unlike what we do 
with compressed OOPs.? This is entirely "VM's choice" and affects only 
quality of implementation; there is nothing in the classfile that 
conditions this, other than the ACC_VALUE indication and L/Q type 
carriers.? So the VM has a rich set of footprint/computation tradeoffs 
for encoding the null channel, but logically, it is an "extra boolean 
field" that all nullable value types have.

> I'd like to reserve judgement on this stacking as I'm uncomfortable
> (uncertain maybe?) about the practicality of the extra null channel.
> Without having validated the extra null channel, I'm concerned we're
> exposing a broader set of options in the language that will, in
> practice, map down to the existing 3 buckets we've been talking about.
> Maybe this factoring allows a slightly larger number of classes to be
> flattened or leaves the door open for them to get it in the future?

What I'm trying to do here is decomplect flattening from nullity. Right 
now, we have an unfortunate interaction which both makes certain 
combinations impossible, and makes the user model harder to reason about.

Identity-freedom unlocks flattening in the stack (calling convention.)? 
The lesson of that exercise (which was somewhat surprising, but good) is 
that nullity is mostly a non-issue here -- we can treat the nullity 
information as just being an extra state component when scalarizing, 
with some straightforward fixups when we adapt between direct and 
indirect representations.? This is great, because we're not asking users 
to choose between nullability and flattening; users pick the combination 
of { identity, nullability } they want, and they get the best flattening 
we can give:

 ??? case (identity, _) -> 1; // no flattening
 ??? case (non-identity, non-nullable) -> nFields;? // scalarize fields
 ??? case (non-identity, nullable) -> nFields + 1;? // scalarize fields 
with extra null channel

Asking for nullability on top of non-identity means only that there is a 
little more "footprint" in the calling convention, but not a qualitative 
difference.? That's good.

In the heap, it is a different story.? What unlocks flattening in the 
heap (in addition to identity-freedom) is some permission for 
_non-atomicity_ of loads and stores.? For sufficiently simple classes 
(e.g., one int field) this is a non-issue, but because loads and stores 
of references must be atomic (at least, according to the current JMM), 
references to wide values (B2 and B3.ref) cannot be flattened as much as 
B3.val.? There are various tricks we can do (e.g., stuffing two 32 bit 
fields into a 64 bit atomic) to increase the number of classes that can 
get good flattening, but it hits a wall much faster than "primitives".

What I'd like is for the flattening story on the heap and the stack to 
be as similar as possible.? Imagine, for a moment, that tearing was not 
an issue.? Then where we would be in the heap is the same story as 
above: no flattening for identity classes, scalarization in the heap for 
non-nullable values, and scalarization with an extra boolean field 
(maybe, same set of potential optimizations as on the stack) for 
nullable values.? This is very desirable, because it is so much easier 
to reason about:

 ?- non-identity unlocks scalarization on the stack
 ?- non-atomicity unlocks flattening in the heap
 ?- in both, ref-ness / nullity means maybe an extra byte of footprint 
compared to the baseline

(with additional opportunistic optimizations that let us get more 
flattening / better footprint in various special cases, such as very 
small values.)

> In previous discussions around the extra null channel for flattened
> values, we were really looking at narrowly applicable optimization -
> basically for nullable values that would fit within 64bits.  With this
> stacking, and the info about intel allowing atomicity up to 128bits,
> the extra null channel becomes more widely applicable.

Yes.? What I'm trying to do is separate this all from the details of 
what instructions CPU X has, and instead connect optimizations to 
semantics: nullity requires extra footprint (unless it can be optimized 
away by stealing bits somehow), and does so uniformly across the buckets 
/ heap / stack / whatever.? Nullability is a semantic property; 
providing this property may have a cost, but the more uniform we can 
make it, the simpler it is to reason about, and the simpler to implement 
(since we can use the same encoding tricks in both stack and heap.)

> Some of my hesitation comes from experiences writing structs or
> multi-field invariants in C where memory barriers and careful
> read/write protocols are important to ensure consistent data in the
> face of races.  Widening the set of cases that have a multifield
> invariant *created and enforced by the VM* by adding an additional
> null channel will make it more likely the VM (and optimized jit code!)
> can do the wrong thing.

Yes, this is why I want to bring it into the programming model.? I don't 
want to magically analyze the constructor and say "whoa, that looks like 
a cross-field invariant"; I want the class author to say "you have 
permission to shred" or "you do not have permission to shred", and we 
optimize within the semantic properties declared by the author.

In addition to cross-field invariants being part of the boundary between 
whether or not we need atomicity, transparency also comes into play.? 
When we "construct" a long, we have a pretty clear idea how the value 
maps to all the bits; with encapsulation, we do not (but for records, we 
do again, because we've constrained away the ability to let 
representation diverge from interface.)? Again, though, I think we are 
better off having the author declare the required atomicity properties 
rather than trying to derive them from other things (e.g., constructor 
body, record-ness, etc.)

> I have always been somewhat uneasy about the injected nullchannel
> approach and concerned about how difficult it will be for service
> engineers to support when something goes wrong.  If there's experience
> that can be shared that shows this works well in an implementation,
> then I'll be less concerned.

Perhaps Tobias and Frederic can share more about what we've discovered here?


From brian.goetz at oracle.com  Tue May  3 22:17:29 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 3 May 2022 18:17:29 -0400
Subject: User model stacking
In-Reply-To: <4F70E6B7-FAC8-4845-8969-8D545B6FB4FB@oracle.com>
References: <B063744C-06A9-4A38-B195-647C80F93024@oracle.com>
 <CAGKkBksUenadCsDc-C_-CGQ0uR9T1FtMq3UPiXadXOfa8fVVSA@mail.gmail.com>
 <4F70E6B7-FAC8-4845-8969-8D545B6FB4FB@oracle.com>
Message-ID: <c9cb4888-cc93-4209-11af-46f0f80e7670@oracle.com>


> Just so we don't lose this history, a reminder that back when we settled on the 3 buckets, we viewed it as a useful simplification from a more general approach with lots of "knobs". Instead of asking developers to think about 3-4 mostly-orthogonal properties and set them all appropriately, we preferred a model in which *objects* and *primitive values* were distinct entities with distinct properties. Atomicity, nullability, etc., weren't extra things to have to reason about independently, they were natural consequences of what it meant to be (or not) a variable that stores objects.

Indeed; it is often a process of "spiraling", where we seem to return to 
places we've already been, but perhaps in a lower energy state.? We came 
by the earlier bucket model honestly, as it approximated the use cases 
we envisioned as most important.?? I think its time to rethink the 
three-bucket model, not because three is too big or small a number, but 
because (a) the relationship between the buckets is complex, (b) it puts 
users to some difficult choices between semantics and performance, and 
(c) we have real concerns that hiding the permission to tear behind some 
proxy (e.g., "non null" or "B3") will be too subtle and potentially 
astonishing.

> That was awhile ago, we may have learned some things since then, but I think there's still something to the idea that we can expect everybody to understand the difference between objects and primitives, even if they don't totally understand all the implications. (When they eventually discover some corner of the implications, we hope they'll say, "oh, sure, that makes sense because this is/isn't an object.")

I think this is true for all the aspects _except_ tearing.?? I tried the 
argument "it can tear because its not an object" on for size, and I just 
can't imagine people not forgetting it routinely.


> My inclination would probably be to abandon the object/value 
> dichotomy, revert to "everything is an object", perhaps revisit our 
> ideas about conversions/subtyping between ref and val types, and 
> develop a model that allows tearing of some objects. Probably all 
> do-able, but I'm not sure it's a better model. 

I don't think we have to go so far as this.? Just as Valhalla questions 
the previously-universal property of "all objects have identity", we can 
play the same game with "all objects provide integrity guarantees" 
(final field semantics.)? Some classes can shed identity; some further 
can shed the integrity requirements. (Both require a judgement on the 
part of the class author.)? We can then optimize accordingly.

By factoring out atomicity/integrity as an orthogonal semantic 
constraint, we get to a lower energy state for B2 vs B3: "does this 
class have a good zero".? Complex does; LocalDate does not.? And we get 
to a simpler performance consequence of B3.ref vs B3.val: at most an 
extra bit of footprint.? These are both easier to understand.


From forax at univ-mlv.fr  Tue May  3 22:52:21 2022
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 4 May 2022 00:52:21 +0200 (CEST)
Subject: Null channels (was: User model stacking)
In-Reply-To: <ebe0d725-e59d-0a71-8fcd-870409f85718@oracle.com>
References: <B063744C-06A9-4A38-B195-647C80F93024@oracle.com>
 <CAJq4Gi57Ps5ujo0L3NJjb19fSNVZKxkr45x0JfRxg9kat7SP4A@mail.gmail.com>
 <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com>
 <0FDEFA76-E212-4636-9E64-A603F703D0A5@oracle.com>
 <CAJq4Gi7Ad_V1o6EBxWKEY9pTumpGOs8AX9sMA-kG6hL8Cy772A@mail.gmail.com>
 <ebe0d725-e59d-0a71-8fcd-870409f85718@oracle.com>
Message-ID: <738292834.20526867.1651618341891.JavaMail.zimbra@u-pem.fr>

----- Original Message -----
> From: "Brian Goetz" <brian.goetz at oracle.com>

[...]

> 
> What I'm trying to do here is decomplect flattening from nullity. Right
> now, we have an unfortunate interaction which both makes certain
> combinations impossible, and makes the user model harder to reason about.
> 
> Identity-freedom unlocks flattening in the stack (calling convention.)
> The lesson of that exercise (which was somewhat surprising, but good) is
> that nullity is mostly a non-issue here -- we can treat the nullity
> information as just being an extra state component when scalarizing,
> with some straightforward fixups when we adapt between direct and
> indirect representations.? This is great, because we're not asking users
> to choose between nullability and flattening; users pick the combination
> of { identity, nullability } they want, and they get the best flattening
> we can give:
> 
> ??? case (identity, _) -> 1; // no flattening
> ??? case (non-identity, non-nullable) -> nFields;? // scalarize fields
> ??? case (non-identity, nullable) -> nFields + 1;? // scalarize fields
> with extra null channel
> 
> Asking for nullability on top of non-identity means only that there is a
> little more "footprint" in the calling convention, but not a qualitative
> difference.? That's good.
> 
> In the heap, it is a different story.? What unlocks flattening in the
> heap (in addition to identity-freedom) is some permission for
> _non-atomicity_ of loads and stores.? For sufficiently simple classes
> (e.g., one int field) this is a non-issue, but because loads and stores
> of references must be atomic (at least, according to the current JMM),
> references to wide values (B2 and B3.ref) cannot be flattened as much as
> B3.val.? There are various tricks we can do (e.g., stuffing two 32 bit
> fields into a 64 bit atomic) to increase the number of classes that can
> get good flattening, but it hits a wall much faster than "primitives".
> 
> What I'd like is for the flattening story on the heap and the stack to
> be as similar as possible.? Imagine, for a moment, that tearing was not
> an issue.? Then where we would be in the heap is the same story as
> above: no flattening for identity classes, scalarization in the heap for
> non-nullable values, and scalarization with an extra boolean field
> (maybe, same set of potential optimizations as on the stack) for
> nullable values.? This is very desirable, because it is so much easier
> to reason about:
> 
> ?- non-identity unlocks scalarization on the stack
> ?- non-atomicity unlocks flattening in the heap
> ?- in both, ref-ness / nullity means maybe an extra byte of footprint
> compared to the baseline
> 
> (with additional opportunistic optimizations that let us get more
> flattening / better footprint in various special cases, such as very
> small values.)

yes, choosing (non-)identity x (non-)nullability x (non-)atomicity at declaration site makes the performance model easier to understand.
At declaration site, there are still nullability x atomicity with .ref and volatile respectively.

I agree with John that being able to declare array items volatile is missing but i believe it's an Array 2.0 feature.

Once we get universal generics, what we win is that not only ArrayList<int> is compact on heap but ArrayList<Integer> too. 

R?mi

From brian.goetz at oracle.com  Wed May  4 14:27:52 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 4 May 2022 10:27:52 -0400
Subject: User model: ref as default, vs universal generics
Message-ID: <83ace890-ab9d-f3ac-b8cf-9300b33f08e2@oracle.com>

Just to record a constraint: there's somewhat of a conflict between the 
idea of "make ref the default", as Kevin advocated, and universal 
generics, which we need to keep in mind as we stack the whole tower.

If a B3 class gives us Foo and Foo.val, then Map::get (currently) has no 
way to declare its return value as "ref T". The plan of record has been:

 ??? V.ref get(K key)

but if V.ref is not denotable, we have a problem.? That means we can't 
*just* have Foo and Foo.val; we need at least to be able to say T.ref 
for type variables, if not Foo.ref for all B3 classes.

If we can manage to use T!, then this is an obvious application for T?, 
but this approach brings new questions.


From daniel.smith at oracle.com  Wed May  4 14:31:45 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 4 May 2022 14:31:45 +0000
Subject: EG meeting, 2022-05-04
Message-ID: <DC8C49B9-2530-4CCD-895C-62A0EE70BF17@oracle.com>

EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT).

We've had a flurry of activity in the last couple of weeks. I think we can summarize as follows:

- "Spec change documents for Value Objects": revised JVMS to align with previous discussions about Value Objects, and a new JLS changes document to match

- "We need help to migrate from bucket 1 to 2; and, the == problem": Kevin asked about JEP 390 applying to non-JDK classes, and about whether javac should discourage use of '=='

- "Foo / Foo.ref is a backward default": Kevin and Brian argued that we should prefer treating B3 classes as reference-default, with something like '.val' to opt in to a primitive value type

- "User model stacking": Brian discussed treating atomicity as an orthogonal property, no longer tied to B3

From brian.goetz at oracle.com  Wed May  4 15:05:24 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 4 May 2022 11:05:24 -0400
Subject: User model: terminology
Message-ID: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>

Let's talk about terminology.? (This is getting dangerously close to a 
call-for-bikeshed, so let's exercise restraint.)

Currently, we have primitives and classes/references, where primitives 
have box/wrapper reference companions.? The original goal of Bucket 3 
was to model primitive/box pairs. We have tentatively been calling these 
"primitives", but there are good arguments why we should not overload 
this term.

We have tentatively assigned the phrase "value class" to all 
identity-free classes, but it is also possible we can use value to 
describe what we've been calling primitives, and use something else 
(identity-free, non-identity) to describe the bigger family.

So, in our search for how to stack the user model, we should bear in 
mind that names that have been tentatively assigned to one thing might 
be a better fit for something else (e.g., the "new primitives").? We are 
looking for:

 ?- A term for all non-identity classes.? (Previously, all classes had 
identity.)
- A term for? what we've been calling atomicity: that instances cannot 
appear to be torn, even when published under race.? (Previously, all 
classes had this property.)
 ?- A term for those non-identity classes which do not _require_ a 
reference.? These must have a valid zero, and give rise to two types, 
what we've been calling the "ref" and "val" projections.
 ?- A term for what we've been calling the "ref" and "val" projections.

Let's start with _terms_, not _declaration syntax_.

From forax at univ-mlv.fr  Wed May  4 15:44:09 2022
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 4 May 2022 17:44:09 +0200 (CEST)
Subject: User model: terminology
In-Reply-To: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
Message-ID: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr>

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Wednesday, May 4, 2022 5:05:24 PM
> Subject: User model: terminology

> Let's talk about terminology. (This is getting dangerously close to a
> call-for-bikeshed, so let's exercise restraint.)

> Currently, we have primitives and classes/references, where primitives have
> box/wrapper reference companions. The original goal of Bucket 3 was to model
> primitive/box pairs. We have tentatively been calling these "primitives", but
> there are good arguments why we should not overload this term.

> We have tentatively assigned the phrase "value class" to all identity-free
> classes, but it is also possible we can use value to describe what we've been
> calling primitives, and use something else (identity-free, non-identity) to
> describe the bigger family.

> So, in our search for how to stack the user model, we should bear in mind that
> names that have been tentatively assigned to one thing might be a better fit
> for something else (e.g., the "new primitives"). We are looking for:

> - A term for all non-identity classes. (Previously, all classes had identity.)

I've used the term "immediate", immediate object vs reference object. 

> - A term for what we've been calling atomicity: that instances cannot appear to
> be torn, even when published under race. (Previously, all classes had this
> property.)

As you said, the default should be non-tearable. 
I believe that we should use a term that indicates that the object is composed of several values, a term like "compound", "composite" or perhaps "aggregate". 
I think i prefer compound due to its Latin root. 

The other solution is instead of saying that it's non-terable by default, is to force users to always use a keyword to indicate the "atomiciy" state, 
(non-)splitable, (non-)secable (secable is more or less the latin equivalent of the greek atomic). 

> - A term for those non-identity classes which do not _require_ a reference.
> These must have a valid zero, and give rise to two types, what we've been
> calling the "ref" and "val" projections.

I like "zero-default" (as opposite of null-default) but mostly because it's a valid hyphenated keyword. 

> - A term for what we've been calling the "ref" and "val" projections.

Technically, what we called the ref projection is now a nullable projection, we are adding null into the set of possible values. 

> Let's start with _terms_, not _declaration syntax_.

R?mi 

From brian.goetz at oracle.com  Wed May  4 17:32:39 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 4 May 2022 13:32:39 -0400
Subject: [External] : Re: User model: terminology
In-Reply-To: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
 <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr>
Message-ID: <a33b4aba-9c35-cc4b-48c7-f7a2fdac79da@oracle.com>


>     ?- A term for those non-identity classes which do not _require_ a
>     reference.? These must have a valid zero, and give rise to two
>     types, what we've been calling the "ref" and "val" projections.
>
>
> I like "zero-default" (as opposite of null-default) but mostly because 
> it's a valid hyphenated keyword.

In addition to staying away from declaration syntax for purposes of this 
thread, let's also stay away from defaults, so we can stay focused on 
concepts.

"Default is zero" as a concept is part of the story, but I worry it may 
be the dependent part.? Because before you can get to "default is zero", 
you need a way to say "has a sensible zero".? For LocalDate, the zero is 
not sensible (well, it could be, but 1970 is a pretty lousy zero value), 
whereas for Complex, zero is not only valid, but is arguably a great 
value.? This is a semantic statement about the domain.

For a "has no sensible zero" type, the only choice is a reference, which 
brings its own default -- null.? So "has a sensible zero" gates "has a 
val projection", but does not yet say anything about which (ref/val) is 
the default.

It's nice to say "zero" directly, but I'm not sure it says what we mean 
by "zero-default", since the default of the ref projection is null, like 
all other refs.?? Obviously the example I had in my earlier mail 
("zero-happy") are silly and were meant only to be evocative.

So what we're looking for is a word for "the zero value is good, so the 
concept of a non-nullable instance makes sense".

Which brings me to another observation: this is a different sense of 
non-nullable than what we might mean by:

 ??? void foo(String! s) { ... }

Because, a Foo.val is a type we can use as, say, an array component type 
(because the zero is valid), but the traditional interpretation of 
`Foo!` makes it ineligible for use as an array component (and probably a 
field), because references in the heap are null-default. So when we talk 
about non-nullable instances, we're really saying "there is *another* 
good default other than null."


From kevinb at google.com  Wed May  4 18:27:38 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 4 May 2022 11:27:38 -0700
Subject: User model: terminology
In-Reply-To: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
Message-ID: <CAGKkBku2zkwkvVLu8yCAmjDjqAN3AXc1oODgML_v_CBp4c9N-g@mail.gmail.com>

My favorite kind of thread...

At the risk of inducing groans, a reminder that much of my own terminology
backstory is found in Data in Java Programs
<https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/preview>,
and that when there are places we disagree below, the disagreement is
probably highlighted by something in *that* document first.

Since for all we know it might be out of step with how typical Java devs
really think, I'll just mention that it's at least been well-received by
reddit
<https://www.reddit.com/r/java/search/?q=%22data%20in%20java%20programs%22>
twice (if reddit can find something to complain about, they usually do!).


On Wed, May 4, 2022 at 8:05 AM Brian Goetz <brian.goetz at oracle.com> wrote:

Currently, we have primitives and classes/references, where primitives have
> box/wrapper reference companions.  The original goal of Bucket 3 was to
> model primitive/box pairs.  We have tentatively been calling these
> "primitives", but there are good arguments why we should not overload this
> term.
>
> We have tentatively assigned the phrase "value class" to all identity-free
> classes, but it is also possible we can use value to describe what we've
> been calling primitives, and use something else (identity-free,
> non-identity) to describe the bigger family.
>
> So, in our search for how to stack the user model, we should bear in mind
> that names that have been tentatively assigned to one thing might be a
> better fit for something else (e.g., the "new primitives").  We are looking
> for:
>
>  - A term for all non-identity classes.  (Previously, all classes had
> identity.)
>

The term applies to the objects first and foremost. The object either has
identity or does not.

What *is* identity? I'll claim it's exactly like an ordinary immutable
field-based property, with one special provision: it is *always*
auto-assigned to be unique, and thus can never be copied. That feels to me
like it tells the whole story. So the difference between these kinds of
objects is exactly a "with identity" / "without identity" distinction, and
as we know from interface naming ("HasFoo"), it is often impossible to turn
that into adjective form.

The second complication here is the backward default. *Having* identity is
actually the special property! I do think we should lean into that. Part of
upgrading your code to be "Java 21-modern" (or whatever) really should be
marking all your classes that you really *want* to have identity and
letting the rest lose it. The terms that feel right are "identity object"
and "class that produces identity objects" shortened to "identity class".

For the most part I think we'll end up talking about "identity classes" and
"classes in general", and more rarely needing to refer to "classes without
identity" or "non-identity classes". So I think it's okay to let them use
"A \ B"-style terminology as I've done here. (I furthermore still think
it's okay to have an IdentityObject interface but no ValueObject interface,
as the latter doesn't really embody additional client-facing capabilities.)

This is one of at least four examples of backward defaults in the language.
We are either stuck with painful/awkward terminology choices in all of
them, or we could pursue the idea of letting source files declare their
language level, upon which the problem vanishes.


>  - A term for what we've been calling atomicity: that instances cannot
> appear to be torn, even when published under race.  (Previously, all
> classes had this property.)
>

I think this term we really need is this one's negation. You never need to
(or can) mention it with identity classes; with the rest you can use it to
opt into more risk. The English words that come to mind are
https://www.thesaurus.com/browse/fragile.


>  - A term for those non-identity classes which do not _require_ a
> reference.  These must have a valid zero, and give rise to two types, what
> we've been calling the "ref" and "val" projections.
>

I think we need to name the *type* first before the class. Today we have

1. primitive types (the values are the instances)
2. reference types (the values are references to the instances)

But this isn't the *heart* of what it means to be "primitive"; it just
happens to be true of primitives so far. And sure, we'll certainly explain
all of this *partly* by saying these types are "primitive-LIKE". But what
is the quality that they and true primitive types have in common? It's "the
values are the instances", so this can either lead to "value type" or go
back to "inline/direct/immediate type".

At this moment I like both "value type" and "inline type" well enough.
Value *is* overloaded, to be sure, because of "value semantics" (aka why
AutoValue is called AutoValue). But the connection is strong enough imho. I
can delve deep into this topic if desired.

Then, back to your question, what is the name for a *class* that *also*
gives rise to a value type -- a "valuable class"?


>  - A term for what we've been calling the "ref" and "val" projections.
>

Note I think we should only invoke the concept of "projection" once we get
into type variables. Otherwise we simply have two types for one class. (And
the reason for that is very solid / easy to defend, just by appealing to
how we'd've preferred int and Integer had worked.) I would just call them
the reference type and the (name debated just above) type, simple as that.


> Let's start with _terms_, not _declaration syntax_.
>

Yes, and even the term we like 2nd best for a thing can still be useful in
the documentation of that thing.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Wed May  4 18:36:27 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 4 May 2022 11:36:27 -0700
Subject: EG meeting, 2022-05-04
In-Reply-To: <DC8C49B9-2530-4CCD-895C-62A0EE70BF17@oracle.com>
References: <DC8C49B9-2530-4CCD-895C-62A0EE70BF17@oracle.com>
Message-ID: <CAGKkBksUX4n4gkj1+k2UX4gzugW=0gfnCOvxM0wAujREoUC7wQ@mail.gmail.com>

I wish I hadn't missed this meeting, but I was still paying the
consequences for a bad decision to take an "overnight layover" coming home
Monday night/Tuesday morning.


On Wed, May 4, 2022 at 7:31 AM Dan Smith <daniel.smith at oracle.com> wrote:

>
> - "We need help to migrate from bucket 1 to 2; and, the == problem": Kevin
> asked about JEP 390 applying to non-JDK classes, and about whether javac
> should discourage use of '=='
>

I will try to pitch this `obj==` problem more comprehensively soon.


> - "Foo / Foo.ref is a backward default": Kevin and Brian argued that we
> should prefer treating B3 classes as reference-default, with something like
> '.val' to opt in to a primitive value type
>

I will say that I have not personally found the opposition to this change
to be nearly as strong as the principal arguments in favor. It creates a
very valuable uniformity in how things work. I hope it goes this way.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Wed May  4 19:01:17 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 4 May 2022 15:01:17 -0400
Subject: [External] : Re: User model: terminology
In-Reply-To: <CAGKkBku2zkwkvVLu8yCAmjDjqAN3AXc1oODgML_v_CBp4c9N-g@mail.gmail.com>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
 <CAGKkBku2zkwkvVLu8yCAmjDjqAN3AXc1oODgML_v_CBp4c9N-g@mail.gmail.com>
Message-ID: <ae787cec-5481-5121-6569-1bba007dd3e9@oracle.com>


>
> What *is* identity? I'll claim it's exactly like an ordinary immutable 
> field-based property, with one special provision: it is *always* 
> auto-assigned to be unique, and thus can never be copied. That feels 
> to me like it tells the whole story. So the difference between these 
> kinds of objects is exactly a "with identity" / "without identity" 
> distinction, and as we know from interface naming ("HasFoo"), it is 
> often impossible to turn that into adjective form.

There's an interesting parallel with nullability here, where nullability 
is also like an immutable field-based property, which is automatically 
checked before accessing other fields.

> The second complication here is the backward default. *Having* 
> identity is actually?the special property! I do think we should lean 
> into that. Part of upgrading your code to be "Java 21-modern" (or 
> whatever) really should be marking all your classes that you really 
> *want* to have identity and letting the rest lose it. The terms that 
> feel right are "identity object" and "class that produces identity 
> objects" shortened to "identity class".

In addition to having picked a few wrong defaults in the past, we have 
also committed the sin of not making both states denotable; there are no 
keywords for the opposite of static, abstract, or final, or for 
package-private access.? (Part of the motivation for putting the non-X 
stake in the ground that did with non-sealed is to provide an easy 
extension to non-abstract, non-final, non-static, if we later want.)? 
Not being able to denote "identity class" except by the absence of some 
other keywords would be another instance of that.

>     - A term for what we've been calling atomicity: that instances
>     cannot appear to be torn, even when published under race.
>     (Previously, all classes had this property.)
>
>
> I think this term we really need is this one's negation. You never 
> need to (or can) mention it with identity classes; with the rest you 
> can use it to opt into more risk. The English words that come to mind 
> are https://www.thesaurus.com/browse/fragile 
> <https://urldefense.com/v3/__https://www.thesaurus.com/browse/fragile__;!!ACWV5N9M2RV99hQ!NPBDak8nHT2gphJA0fD58S9nj9V1-O5xUyBd2nxtpk2hxoVd-9j08O_pZ8jje-dwBjEwf_GjQjVeQvGq$>.

"Fragile" certainly will make people think twice about using it (and is 
effective estoppel against "but something bad happened").

>
>     ?- A term for those non-identity classes which do not _require_ a
>     reference.? These must have a valid zero, and give rise to two
>     types, what we've been calling the "ref" and "val" projections.
>
>
> I think we need to name the *type* first before the class. Today we have
>
> 1. primitive types (the values are the instances)
> 2. reference types (the values are references to the instances)
>
> But this isn't the *heart* of what it means to be "primitive"; it just 
> happens to be true of primitives so far. And sure, we'll certainly 
> explain all of this *partly* by saying these types are 
> "primitive-LIKE". But what is the quality that they and true primitive 
> types have in common? It's "the values are the instances", so this can 
> either lead to "value type" or go back to "inline/direct/immediate type".

You can make a good argument that this is where we should use the V-word 
(primitives are value types, as are the val projection of B3 classes), 
and come up with a better name for the whole B2/B3 spectrum (such as 
non-identity classes.)? It connects to why we chose value in the first 
place -- to evoke "passed by value".

> Then, back to your question, what is the name for a *class* that 
> *also* gives rise to a value type -- a "valuable class"?
>
>     ?- A term for what we've been calling the "ref" and "val"
>     projections.
>
>
> Note I think we should only invoke the concept of "projection" once we 
> get into type variables. Otherwise we simply have two types for one 
> class. (And the reason for that is very solid / easy to defend, just 
> by appealing to how we'd've preferred int and Integer had worked.) I 
> would just call them the reference type and the (name debated just 
> above) type, simple as that.

So a "valuable" class has a reference type and a value type.? How does 
that related to nullity?? Obviously the reference type is nullable and 
the value type is not, but do we want to use nullability in the user 
description / type denotation, or should we stick with value and reference?


From brian.goetz at oracle.com  Wed May  4 19:36:40 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 4 May 2022 15:36:40 -0400
Subject: [External] : Re: User model: terminology
In-Reply-To: <CAGKkBku2zkwkvVLu8yCAmjDjqAN3AXc1oODgML_v_CBp4c9N-g@mail.gmail.com>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
 <CAGKkBku2zkwkvVLu8yCAmjDjqAN3AXc1oODgML_v_CBp4c9N-g@mail.gmail.com>
Message-ID: <cddcd432-cda4-3359-dac4-0a1ca80b26d9@oracle.com>

While on the subject of defaults: we've been treating B2 as the default 
kind of non-identity class, on the theory that it is the smallest hop 
away from identity classes, and also that it covers a broader range (all 
the existing value-based classes.)? Is that still the default we want?

Flipping that default might make framing the B2/B3 distinction easier: 
rather than "tolerant of zero", what we'd opt into is "ref only".

<bikeshed>

Pulling farther, there's a bucket-inversion we might be able to pull 
here, just by moving some terminology around:

 ??? class B1 { }???????????????? // ref only
 ??? value class B3 { }?????????? // ref and val projections
 ??? value-based class B2 { }???? // ref only

And then we can apply non-atomic / fragile (or whatever we call it) to 
either B2 or B3.

This has a few positive properties:

 ?- Connection to the existing term "value-based", which means "follows 
the value constraints, but is a ref type", and has the connotation of 
"approximation of a value class"
 ?- .val makes sense in the context of a "value class"
 ?- We get the orthogonality we are seeking, but avoid piling up lots of 
modifiers (non-atomic zero-happy value class) as well as not having to 
invent crazy words like "zero-happy"
 ?- Practical difference between .val and .ref is just about nullity now
 ?- We get the "must opt into non-atomicity" that Brian has been ranting 
about

This basically leaves the bucket model intact, with some flipped 
terminology, but importantly, factors atomicity out of being an implicit 
bucket property, and instead an explicit global choice.? This is by far 
the most important aspect of the restack I am pushing.

It is an orthogonal choice as to whether .val or .ref gets the "good" 
name for value classes.

On the negative side, there is an extra syntactic burden to get to B2 
compared to B3 (value-based instead of value) which might cause some 
developers to reach for it when they might prefer VBC. But if the 
default for B3 is .ref, it might not matter as much (they're both ref 
types and you still get integrity), so the only risk is accidental 
exposure of the zero value.

</bikeshed>

On 5/4/2022 2:27 PM, Kevin Bourrillion wrote:
> My favorite kind of thread...
>
> At the risk of inducing groans, a reminder that much of my own 
> terminology backstory is found in Data in Java Programs 
> <https://urldefense.com/v3/__https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/preview__;!!ACWV5N9M2RV99hQ!NPBDak8nHT2gphJA0fD58S9nj9V1-O5xUyBd2nxtpk2hxoVd-9j08O_pZ8jje-dwBjEwf_GjQis3lrdF$>, 
> and that when there are places we disagree below, the disagreement is 
> probably highlighted by something?in *that* document first.
>
> Since for all we know it might be out of step with how typical Java 
> devs really think, I'll just mention that it's at least been 
> well-received by reddit 
> <https://urldefense.com/v3/__https://www.reddit.com/r/java/search/?q=*22data*20in*20java*20programs*22__;JSUlJSU!!ACWV5N9M2RV99hQ!NPBDak8nHT2gphJA0fD58S9nj9V1-O5xUyBd2nxtpk2hxoVd-9j08O_pZ8jje-dwBjEwf_GjQo7MMfgr$> 
> twice (if reddit can find something to complain about, they usually do!).
>
>
> On Wed, May 4, 2022 at 8:05 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>
>     Currently, we have primitives and classes/references, where
>     primitives have box/wrapper reference companions.? The original
>     goal of Bucket 3 was to model primitive/box pairs.? We have
>     tentatively been calling these "primitives", but there are good
>     arguments why we should not overload this term.
>
>     We have tentatively assigned the phrase "value class" to all
>     identity-free classes, but it is also possible we can use value to
>     describe what we've been calling primitives, and use something
>     else (identity-free, non-identity) to describe the bigger family.
>
>     So, in our search for how to stack the user model, we should bear
>     in mind that names that have been tentatively assigned to one
>     thing might be a better fit for something else (e.g., the "new
>     primitives").? We are looking for:
>
>     ?- A term for all non-identity classes. (Previously, all classes
>     had identity.)
>
>
> The term applies to the objects first and foremost. The object either 
> has identity or does not.
>
> What *is* identity? I'll claim it's exactly like an ordinary immutable 
> field-based property, with one special provision: it is *always* 
> auto-assigned to be unique, and thus can never be copied. That feels 
> to me like it tells the whole story. So the difference between these 
> kinds of objects is exactly a "with identity" / "without identity" 
> distinction, and as we know from interface naming ("HasFoo"), it is 
> often impossible to turn that into adjective form.
>
> The second complication here is the backward default. *Having* 
> identity is actually?the special property! I do think we should lean 
> into that. Part of upgrading your code to be "Java 21-modern" (or 
> whatever) really should be marking all your classes that you really 
> *want* to have identity and letting the rest lose it. The terms that 
> feel right are "identity object" and "class that produces identity 
> objects" shortened to "identity class".
>
> For the most part I think we'll end up talking about "identity 
> classes" and "classes in general", and more rarely needing to refer to 
> "classes without identity" or "non-identity classes". So I think it's 
> okay to let them use "A \ B"-style terminology as I've done here. (I 
> furthermore still think it's okay to have an IdentityObject interface 
> but no ValueObject interface, as the latter doesn't really embody 
> additional client-facing capabilities.)
>
> This is one of at least four examples of backward defaults in the 
> language. We are either stuck with painful/awkward terminology choices 
> in all of them, or we could pursue the idea of letting source files 
> declare their language level, upon which the problem vanishes.
>
>     - A term for what we've been calling atomicity: that instances
>     cannot appear to be torn, even when published under race.
>     (Previously, all classes had this property.)
>
>
> I think this term we really need is this one's negation. You never 
> need to (or can) mention it with identity classes; with the rest you 
> can use it to opt into more risk. The English words that come to mind 
> are https://www.thesaurus.com/browse/fragile 
> <https://urldefense.com/v3/__https://www.thesaurus.com/browse/fragile__;!!ACWV5N9M2RV99hQ!NPBDak8nHT2gphJA0fD58S9nj9V1-O5xUyBd2nxtpk2hxoVd-9j08O_pZ8jje-dwBjEwf_GjQjVeQvGq$>.
>
>     ?- A term for those non-identity classes which do not _require_ a
>     reference.? These must have a valid zero, and give rise to two
>     types, what we've been calling the "ref" and "val" projections.
>
>
> I think we need to name the *type* first before the class. Today we have
>
> 1. primitive types (the values are the instances)
> 2. reference types (the values are references to the instances)
>
> But this isn't the *heart* of what it means to be "primitive"; it just 
> happens to be true of primitives so far. And sure, we'll certainly 
> explain all of this *partly* by saying these types are 
> "primitive-LIKE". But what is the quality that they and true primitive 
> types have in common? It's "the values are the instances", so this can 
> either lead to "value type" or go back to "inline/direct/immediate type".
>
> At this moment I like both "value type" and "inline type" well enough. 
> Value *is* overloaded, to be sure, because of "value semantics" (aka 
> why AutoValue is called AutoValue). But the connection is strong 
> enough imho. I can delve deep into this topic if desired.
>
> Then, back to your question, what is the name for a *class* that 
> *also* gives rise to a value type -- a "valuable class"?
>
>     ?- A term for what we've been calling the "ref" and "val"
>     projections.
>
>
> Note I think we should only invoke the concept of "projection" once we 
> get into type variables. Otherwise we simply have two types for one 
> class. (And the reason for that is very solid / easy to defend, just 
> by appealing to how we'd've preferred int and Integer had worked.) I 
> would just call them the reference type and the (name debated just 
> above) type, simple as that.
>
>     Let's start with _terms_, not _declaration syntax_.
>
>
> Yes, and even the term we like 2nd best for a thing can still be 
> useful in the documentation of that thing.
>
> -- 
> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com

From john.r.rose at oracle.com  Wed May  4 20:18:49 2022
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 04 May 2022 13:18:49 -0700
Subject: EG meeting, 2022-05-04
In-Reply-To: <CAGKkBksUX4n4gkj1+k2UX4gzugW=0gfnCOvxM0wAujREoUC7wQ@mail.gmail.com>
References: <DC8C49B9-2530-4CCD-895C-62A0EE70BF17@oracle.com>
 <CAGKkBksUX4n4gkj1+k2UX4gzugW=0gfnCOvxM0wAujREoUC7wQ@mail.gmail.com>
Message-ID: <3A489891-F0C1-4079-827E-1D513BEF56E6@oracle.com>

On 4 May 2022, at 11:36, Kevin Bourrillion wrote:

>> - "Foo / Foo.ref is a backward default": Kevin and Brian argued that we
>> should prefer treating B3 classes as reference-default, with something like
>> '.val' to opt in to a primitive value type
>>
>
> I will say that I have not personally found the opposition to this change
> to be nearly as strong as the principal arguments in favor. It creates a
> very valuable uniformity in how things work. I hope it goes this way.

(This is hard to parse without that last little sentence.  I think I agree.)

For one thing, you can instantly see, by inspection of the source code,
whether a given variable permits null.

That advantage holds for simple variable declarations, array declarations.
Maybe even with generic type vars.

For another, Integer can just be itself, with Integer.val ? int.

From kevinb at google.com  Wed May  4 21:42:21 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 4 May 2022 14:42:21 -0700
Subject: User model: terminology
In-Reply-To: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
 <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr>
Message-ID: <CAGKkBkuazs_uDVWgF_dikXOJM73VOUCL1=orftHyz_f=-mRsgg@mail.gmail.com>

On Wed, May 4, 2022 at 8:44 AM Remi Forax <forax at univ-mlv.fr> wrote:

>  - A term for all non-identity classes.  (Previously, all classes had
> identity.)
>
> I've used the term "immediate", immediate object vs reference object.
>

Note that the temporal meaning (right now) is much much stronger in
people's minds than the spatial one ("immediately next to"). And this here
isn't even quite spatial. So for me, this doesn't work.


I believe that we should use a term that indicates that the object is
> composed of several values, a term like "compound", "composite" or perhaps
> "aggregate".
> I think i prefer compound due to its Latin root.
>

How strong do we think the parallels are with the Gamma et al "composite
pattern"? If strong, we should stick to "composite", and if not, maybe we
shouldn't, falling back on "compound".


> The other solution is instead of saying that it's non-terable by default,
> is to force users to always use a keyword to indicate the "atomiciy" state,
>

(I think that would be extremely unfortunate, though.)

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu May  5 13:51:34 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 5 May 2022 09:51:34 -0400
Subject: Reader mail bag
Message-ID: <e4bed781-435f-07da-6ce1-e8a39c508747@oracle.com>

As the topic has turned to how Valhalla will extend into the language 
syntax, we've had a significant uptick of postings on the 
valhalla-spec-comments list.

As a public service announcement, let me remind people of the role of 
the -comments list:

 ?- Postings should be self-contained; think "suggestion box."
 ?- The most helpful sort of comments center around providing 
information that is genuinely new to the EG ("you missed this case".)? 
The least helpful are those that are entirely subjective reactions ("I 
don't like the .val syntax".)
 ?- When there is an active discussion, it is usually best to let it 
play out before commenting.? EG discussions often operate on a longer 
time scale, and take a more meandering path, than the design discussions 
you may be used to, but there is a method to the madness.
 ?- It is not a general mechanism for "I would like to inject a reply 
into the EG discussion."


On to the mail bag.

 ?- 
https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-April/000028.html 
(Quan Anh)
 ?- 
https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-May/000032.html 
(Tim Feuerbach)
 ?- 
https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-April/000030.html 
(Mateusz Romanowski)
 ?- 
https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-April/000031.html 
(Izz Rainy)

Quan raises two questions / observations:

 ?- Could primitives explictly choose the name of their "box"?
 ?- Isn't the atomicity question overblown?

We considered the "choose both names" approach early in the process (in 
fact, in a much earlier version, there were two declared classes.)? But 
since naming things is hard, asking the user to name two things is 
harder, and it also asks readers to carry around the mapping of "X is 
the box of Y."? (It is bad enough that the existing primitives have this 
problem, and that the box of `int` is called `Integer`, not `Int`.)

It is very tempting to reach this conclusion about atomicity, but IMNSHO 
this is a siren song.? Since bad things can only happen when the program 
is broken (data races), it seems reasonable initially to blame the user 
for their broken program.? But unfortunately, I think this is too easy 
(though I still understand the temptation.)? Users are used to the idea 
that constructors establish objects with their invariants intact; seeing 
instances that don't obey invariants enforced by constructors would be 
astonishing.? (This is the biggest criticism of serialization; that it 
allows the integrity model of the language to be undermined in ways that 
are not visible in the source code.? More of that would not be good.)? 
Further, people have internalized the notion that "immutable objects are 
thread-safe" (and this is a really good thing for the ecosystem to have 
learned); we break this at our peril.? (Further, evil actors can 
maliciously create torn values through deliberate races, and then inject 
them into innocent victim code.)


Tim asks whether the non-nullability of the .val projection is the same 
feature as the non-nullable `String!` that people have been asking for 
for years.? Indeed, this is a question that has been at the back of our 
mind for most of this project.? While we are unsure that we can spell 
`.val` with a bang, we also are apprehensive about painting ourselves 
into a future where we are tempted to have both.

The unfortunate answer is that they are not the same (though this 
doesn't mean they can't be unified.)? For Point.val, null is 
*unrepresentable*; for `String!`, this would surely erase to `String`, 
so the possibility of null pollution is still present.? (As a side note, 
this means that migrating `Point` to `Point.ref` is binary-incompatible, 
while `String` to `String!` is binary compatible (though may be source 
incompatible.))? It is still an open question whether the natural 
interpretation of `String!` is to erase null checks at compile time, or 
to reify runtime checks at each assignment.? I plan to have some more 
in-depth discussions about this, but having them now would divert us 
from solving the more immediate problems.

(Tim also observes that, to the degree that primitives are the only way 
to get truly non-nullable types, people will abuse them as a way of 
"sticking it to the stupid compiler" for not giving them general 
non-nullable types, shooting other people's feet in the process.? Sadly 
true; developers are their most dangerous when they think they are being 
clever.? But this only underscores the importance of making surprising 
properties (like non-atomicity) explicit, rather than having them come 
for the ride with other things.)


From brian.goetz at oracle.com  Thu May  5 16:13:33 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 5 May 2022 12:13:33 -0400
Subject: [External] : Re: User model: terminology
In-Reply-To: <cddcd432-cda4-3359-dac4-0a1ca80b26d9@oracle.com>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
 <CAGKkBku2zkwkvVLu8yCAmjDjqAN3AXc1oODgML_v_CBp4c9N-g@mail.gmail.com>
 <cddcd432-cda4-3359-dac4-0a1ca80b26d9@oracle.com>
Message-ID: <b426b2a6-5381-17cd-58a3-da078b08d839@oracle.com>

As a general meta-observation, the whole point of the "let's throw all 
the pieces in the air" discussions is that often, when you break some 
existing assumptions, you can reassemble the pieces in a lower energy 
state -- but you usually can't get there in one move.? So usually in the 
middle of that process, you find yourself transiting through states 
which may be more mathematically attractive but less syntactically 
attractive; the key is to sit on your aesthetic reaction and realize 
that this may well be an intermediate state.

There was a lot of pushback to the "User model stacking" thread 
(including in the -comments postings), on the basis of "these names 
suck" or "there are too many knobs".? But its unlikely we get to the 
right stacking without first going through a less attractive, but more 
general stacking.

The inversion below might not be part of the final answer, but let's let 
the process play out, I think we're making progress.

On 5/4/2022 3:36 PM, Brian Goetz wrote:
>
>
> Pulling farther, there's a bucket-inversion we might be able to pull 
> here, just by moving some terminology around:
>
> ??? class B1 { }???????????????? // ref only
> ??? value class B3 { }?????????? // ref and val projections
> ??? value-based class B2 { }???? // ref only

From brian.goetz at oracle.com  Thu May  5 17:51:26 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 5 May 2022 13:51:26 -0400
Subject: User model stacking: current status
Message-ID: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>

The current stacking discussion is motivated by several factors:

 ?- experiences prototyping both B2 and B3
 ?- recently discovered hardware improvements in atomic operations 
(e.g., Intel's recent specification strengthening around 128-bit vector 
loads and stores)
 ?- further thought on the consequences of the B2/B3 model, particularly 
with regard to tearing

The B2/B3 split was a useful proxy during prototyping, with each being 
built around a known use case: B2 around value-based classes, and B3 
around numeric abstractions.? My main objection is twofold: there are 
gratuitous-seeming differences in performance model (B3s flatten much 
better currently), which puts users to bad choices between semantics and 
performance, and the degree to which tearing is hidden behind some other 
proxy ("primitive-ness", non-nullity, etc), which is likely to surprise 
users when invariants are checked in the constructor but not necessarily 
obeyed at runtime.? I want the observed behavioral distinctions between 
buckets to be clearly related to their semantic differences, and we're 
not there yet.

The differences in flattening and performance between the current B2/B3 
derives directly from the possibility of tearing. When tearing is 
unacceptable, we are likely to fall back on using indirections to make 
loads and stores of references atomic (the "non-flat" option); even 
where we are able to gain some flattening through compiler heroics (the 
"low flat" option), these hit the ceiling pretty fast (we're unlikely to 
get above 128 bits any time soon, and may need at least one bit for 
null) and these also have other costs (wider loads and stores means more 
data movement and more register shuffling, in addition to the complexity 
of the required compiler heroics.)? Full-flat requires tearing.? But I 
don't see an intrinsic reason (yet) while we can't have full-flat for 
VBCs like Optional.

The most encouraging direction is to factor atomicity out of the bucket 
model.? We can make both buckets (VBC and primitive-like) atomic by 
default; this still gets us all the calling convention optimizations, 
and for very small values (such as single field ones, like Optional), we 
can probably achieve full flattening in the heap, and more flattening 
for small-ish values with low-flat heroics.? We can allow both buckets 
to opt into non-atomicity, which unlocks full-flat layout in the heap, 
with the only difference being whether we have to perturb the 
representation to make null representable.

This gets us to something like:

 ??? [ atomic | non-atomic ] __value class B2 { }
 ??? [ atomic | non-atomic ] __primitive class B3 { }

There are many bikesheds here, including the spelling of all these 
things, and whether or not we say "class" or "struct" or "primitive" or 
nothing at all, or whether these work with records, but painting can 
come later.?? There are also many other decisions to make, but I'll 
observe several properties we've already gained by this stacking:

 ?- non-atomicity is explicit, rather than hiding it behind "primitive" 
or "non-nullable" or "zero-happy"
 ?- non-atomicity is orthogonal, which means that the performance 
difference between B2 and B3 (or B3.val and B3.ref), for either polarity 
of atomicity, is exclusively that imposed by the null-encoding requirement
 ?- safe by default, can opt into more performance by opting out of some 
safety
 ?- non-atomic sounds "just scary enough" to make people think twice, or 
at least learn what non-atomic means

Atomicity is only needed when a class has cross-field invariants (or 
when it's construction API varies significantly from its 
representation.)? Numeric classes like Complex have no invariants, and 
Rational has only single-field invariants, but classes like IntRange 
would have cross-field invariants.? In cases where the VM can provide 
atomicity for free (e.g., single-field classes), it wouldn't make a 
difference.

If we further opt for Kevin's "ref is default" proposal, then we add 
another:

 ?- All unadorned type names are reference types

Separately, I think we can reconsider where we spend the "value" 
keyword.? Previously "value" meant "non-identity", but I think it is 
better spent meaning "has a value projection", which leads us to the 
minor reshuffling presented yesterday:

 ??? class B1 { }???????????????? // ref only, == based on identity
 ??? value-based class B2 { }???? // ref only, == based on state
value class B3 { }?????????? // Has ref and val projections

This affirms B2 as "value-lite", connects to the term we colonized in 
Java 8 for "classes that have value-like semantics", and moves away from 
"primitive".

Let's work through Kevin's examples here:

 ?- Rational.? Here, the default value is particularly bad (denominator 
should not be zero).? This leads to an uncomfortable choice; choose B2, 
or choose B3 and deal with the DBZE as "user error" when it happens.? 
Internal methods (e.g., multiply two rationals) can treat the default 
value as "0/1" instead and produce a valid rational, but any code that 
pulls out the denominator and operate on it externally will confront the 
zero anyway.? Whichever way one chooses, people will complain "but 
that's bad".? Rational is interesting because it _has_ a sensible 
default, it is just not the zero representation.
 ?- EmployeeId.? Similar, but maybe more tolerable to treat as a B2, and 
doesn't require atomicity.
 ?- Instant.? Seems this is a (probably non-atomic) B2.
 ?- Complex.? Solid non-atomic B3.
 ?- Optional, OptionalInt, etc.? In a world where B3 is ref-default, 
these can be B3; otherwise B2.
 ?- IntRange: atomic B3 (cross-field invariant.)

There are lots of other things to discuss here, including a discussion 
of what does non-atomic B2 really mean, and whether there are additional 
risks that come from tearing _between the null and the fields_.? I'll 
address that in a separate mail, but I think that factoring out atomic 
into its own explicit thing is a pure win, and that in turn exposes some 
sensible terminology shuffling in the other buckets.

Also, bikeshed topics to cover (please, let's not let this drown the 
discussion):
 ?- How to spell atomic / non-atomic
 ?- How to spell B2 and B3
 ?- How to spell .ref and .val
 ?- ref-default vs val-default for B3
 ?? - if we go ref-default, reconciling this with universal generics
 ?? - reconciling this with nullable types


From brian.goetz at oracle.com  Thu May  5 19:21:28 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 5 May 2022 15:21:28 -0400
Subject: User model stacking: current status
In-Reply-To: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
Message-ID: <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>


> There are lots of other things to discuss here, including a discussion 
> of what does non-atomic B2 really mean, and whether there are 
> additional risks that come from tearing _between the null and the 
> fields_.

So, let's discuss non-atomic B2s.? (First, note that atomicity is only 
relevant in the heap; on the stack, everything is thread-confined, so 
there will be no tearing.)

If we have:

 ??? non-atomic __b2 class DateTime {
 ??????? long date;
 ??????? long time;
 ??? }

then the layout of a B2 (or a B3.ref) is really (long, long, boolean), 
not just (long, long), because of the null channel.? (We may be able to 
hide the null channel elsewhere, but that's an optimization.)

If two threads racily write (d1, t1) and (d2, t2) to a shared mutable 
DateTime, it is possible for an observer to observe (d1, t2) or (d2, 
t1).? Saying non-atomic says "this is the cost of data races".? But 
additionally, if we have a race between writing null and (d, t), there 
is another possible form of tearing.

Let's write this out more explicitly.? Suppose that T1 writes a non-null 
value (d, t, true), and T2 writes null as (0, 0, false). Then it would 
be possible to observe (0, 0, true), which means that we would be 
conceivably exposing the zero value to the user, even though a B2 class 
might want to hide its zero.

So, suppose instead that we implemented writing a null as simply storing 
false to the synthetic boolean field.? Then, in the event of a race 
between reader and writer, we could only see values for date and time 
that were previously put there by some thread.? This satisfies the OOTA 
(out of thin air) safety requirements of the JMM.

The other consequence we might have from this sort of tearing is if one 
of the other fields is an OOP.? If the GC is unaware of the significance 
of the null field (and we'd like for the GC to stay unaware of this), 
then it is possible to have a null value where one of the oop fields 
(from a previous write) is non-null, keeping that object reachable even 
when it is logically not reachable.? (As an interesting connection, the 
boolean here is "special" in the same way as the synthetic boolean 
channel is in pattern matching -- it dictates whether the _other_ 
channels are valid.? Which makes nullable values a good implementation 
strategy for pattern carriers.)

So we have a choice for how we implement writing nulls, with a 
pick-your-poison consequence:

 ?- If we do a wide write, and write all the fields to zero, we risk 
exposing a zero value even when the zero is a bad value;
 ?- If we do a narrow write, and only write the null field, we risk 
pinning other OOPs in memory


From daniel.smith at oracle.com  Thu May  5 22:00:23 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 5 May 2022 22:00:23 +0000
Subject: User model stacking: current status
In-Reply-To: <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
Message-ID: <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>

> On May 5, 2022, at 1:21 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> Let's write this out more explicitly.  Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false).  Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero.  
> 
> So, suppose instead that we implemented writing a null as simply storing false to the synthetic boolean field.  Then, in the event of a race between reader and writer, we could only see values for date and time that were previously put there by some thread.  This satisfies the OOTA (out of thin air) safety requirements of the JMM.

(0, 0, false) is the initial value of a field/array, even if the VM implements a "narrow write" strategy. That is, if I write (1, 1, true) at the moment of reading from a fresh field, I could easily get (0, 0, true).

This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created. (A secondary reason, valid but one I'm less sympathetic to, is that the all-zeros value is okay but inconvenient, and it would be nice to reduce how much it pops up. A third reason is reference-defaultness, important for migration if we don't offer it in B3.)

This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3.

Said differently: a B2 author usually wants to associate a cross-field invariant with the null flag (zero-value fields iff null). But in declaring the class non-atomic, they've sworn off cross-field invariants.

This was a useful discovery for me yesterday: that, in fact, nullability and atomicity are closely related. There's a strong theoretical defense for the idea that opting out of identity and supporting a non-null type (i.e., B3) are prerequisites to non-atomic flattening.


From brian.goetz at oracle.com  Thu May  5 22:03:26 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 5 May 2022 18:03:26 -0400
Subject: User model stacking: current status
In-Reply-To: <CAJq4Gi5urcy-e0OBjDjOBkWYg24W7dS+iZ0p9540e8Ob3BSQOA@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <CAJq4Gi5urcy-e0OBjDjOBkWYg24W7dS+iZ0p9540e8Ob3BSQOA@mail.gmail.com>
Message-ID: <8de623d6-a7db-bcf9-25bf-1991b938e027@oracle.com>


>> Let's write this out more explicitly.  Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false).  Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero.

<brainstorming>

The OOTA guarantee we get is: threads that read a variable (fields and 
array elements) will only see a value that has been put there by a 
"prior" write in some thread.? And every variable is treated as if it 
has an initial write of the default value for that variable, as per JLS 
17.4.4:

> The write of the default value (zero, false, or null) to each variable
> synchronizes-with the first action in every thread.

Ignoring fields that are themselves composite values for a moment, this 
means that, if we treat nulls as a full-width all-zeroes value, then 
when we write a null to our DateTime example, we are _returning_ date 
and time to a value that has already been written there.? So reading 0 
for date or time is not OOTA, though might be surprising.? And writing 
all the fields seems simpler and more uniform, and avoids the GC issue, 
right?

So one of the other consequences of a non-atomic B2 is that not only 
will races result in a torn value, but they may also expose the zero 
value (or torn parts of it.)? This doesn't seem entirely out of hand for 
something that explicitly permits tearing.

I tried to sketch what a JLS section on "non-atomic values" might look 
like, by cribbing liberally from JLS 17.7:

> For the purposes of the Java programming language memory model, a 
> single write to, or read of, a variable whose type is a non-atomic 
> value class or value-based class may be treated as separate writes or 
> reads of its fields.? This can result in a situation where a thread 
> sees some field values from one write, and some field values from 
> another write.

This is a start. ? (Plus the business about volatile.)? It basically 
says that from a JMM perspective, a non-volatile variable whose type is 
a non-atomic value class is really a tuple of its fields.? In correctly 
synchronized programs, this should not be observable.

It may be the case that we can exempt _final_ variables whose type is a 
non-atomic value class.

The section on final field guarantees will need heavier work (because a 
final field can be nested many levels deep).

</brainstorming>

From brian.goetz at oracle.com  Thu May  5 22:06:10 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 5 May 2022 18:06:10 -0400
Subject: User model stacking: current status
In-Reply-To: <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
Message-ID: <968c3ffd-6b13-a7c5-511d-1e75be53dd48@oracle.com>

Maybe :)? But I don't want to prune this exploration just yet.

On 5/5/2022 6:00 PM, Dan Smith wrote:
>> On May 5, 2022, at 1:21 PM, Brian Goetz<brian.goetz at oracle.com>  wrote:
>>
>> Let's write this out more explicitly.  Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false).  Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero.
>>
>> So, suppose instead that we implemented writing a null as simply storing false to the synthetic boolean field.  Then, in the event of a race between reader and writer, we could only see values for date and time that were previously put there by some thread.  This satisfies the OOTA (out of thin air) safety requirements of the JMM.
> (0, 0, false) is the initial value of a field/array, even if the VM implements a "narrow write" strategy. That is, if I write (1, 1, true) at the moment of reading from a fresh field, I could easily get (0, 0, true).
>
> This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created. (A secondary reason, valid but one I'm less sympathetic to, is that the all-zeros value is okay but inconvenient, and it would be nice to reduce how much it pops up. A third reason is reference-defaultness, important for migration if we don't offer it in B3.)
>
> This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3.
>
> Said differently: a B2 author usually wants to associate a cross-field invariant with the null flag (zero-value fields iff null). But in declaring the class non-atomic, they've sworn off cross-field invariants.
>
> This was a useful discovery for me yesterday: that, in fact, nullability and atomicity are closely related. There's a strong theoretical defense for the idea that opting out of identity and supporting a non-null type (i.e., B3) are prerequisites to non-atomic flattening.
>

From forax at univ-mlv.fr  Fri May  6 12:32:12 2022
From: forax at univ-mlv.fr (forax at univ-mlv.fr)
Date: Fri, 6 May 2022 14:32:12 +0200 (CEST)
Subject: User model: terminology
In-Reply-To: <CAGKkBkuazs_uDVWgF_dikXOJM73VOUCL1=orftHyz_f=-mRsgg@mail.gmail.com>
References: <f054fbf4-95f6-10b0-11e2-8492aaae8a1a@oracle.com>
 <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr>
 <CAGKkBkuazs_uDVWgF_dikXOJM73VOUCL1=orftHyz_f=-mRsgg@mail.gmail.com>
Message-ID: <181951155.22004792.1651840332866.JavaMail.zimbra@u-pem.fr>

> From: "Kevin Bourrillion" <kevinb at google.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Brian Goetz" <brian.goetz at oracle.com>, "valhalla-spec-experts"
> <valhalla-spec-experts at openjdk.java.net>
> Sent: Wednesday, May 4, 2022 11:42:21 PM
> Subject: Re: User model: terminology

> On Wed, May 4, 2022 at 8:44 AM Remi Forax < [ mailto:forax at univ-mlv.fr |
> forax at univ-mlv.fr ] > wrote:

>>> - A term for all non-identity classes. (Previously, all classes had identity.)

>> I've used the term "immediate", immediate object vs reference object.

> Note that the temporal meaning (right now) is much much stronger in people's
> minds than the spatial one ("immediately next to"). And this here isn't even
> quite spatial. So for me, this doesn't work.

The temporal meaning re-enforce the idea, you do not have to follow a pointer and wait for the value to arrive, the value is already here. 

R?mi 

From brian.goetz at oracle.com  Fri May  6 14:04:13 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 6 May 2022 10:04:13 -0400
Subject: User model stacking: current status
In-Reply-To: <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
Message-ID: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>

Thinking more about Dan's concerns here ...

On 5/5/2022 6:00 PM, Dan Smith wrote:
> This is significant because the primary reason to declare a B2 rather 
> than a B3 is to guarantee that the all-zeros value cannot be created. 

This is a little bit of a circular argument; it takes a property that an 
atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the 
whole point" of B2.? It may be that exposure of the zero is so bad we 
may eventually want to back away from the idea, but let's come up with a 
fair picture of what a non-atomic B2 means, and ask if that's 
sufficiently useful.

> This leads me to conclude that if you're declaring a non-atomic B2, 
> you might as well just declare a non-atomic B3. 

Fair point, but let's pull on this string for a moment.? Suppose I want 
a null-default, flattenable value, and I'm willing to take the tearing 
to get there.? So you're saying "then declare a B3 and use B3.ref".? But 
B3.ref was supposed to have the same semantics as an equivalent B2!? (I 
realize I'm doing the same thing I just accused you of above -- taking 
an old invariant and positiioning it as "the point".? Stay tuned.)? 
Which means either that we lose flattening, again, or we create yet 
another asymmetry between B3.ref and B2. Maybe you're saying that the 
combination of nullable and full-flat is just too much to ask, but I am 
not sure it is; in any case, let's convince ourselves of this before we 
rule it out.

Or maybe, what you're saying is that my claim that B3.ref and B2 are the 
same thing is the stale thing here, and we can let it go and get it back 
in another form.? In which case you're positing a model where:

 ?- B1 is unchanged
 ?- B2 is always atomic, reference, nullable
 ?- B3 really means "the zero is OK", comes with .ref and .val, and 
(non-atomic B3).ref is still tearable?

In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in 
the stacking I've been discussing.? Is that what you're saying?

 ??? class B1 { }? // ref, identity, atomic
 ??? value-based class B2 { }? // ref, non-identity, atomic
 ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, both 
projections share atomicity

If we go with ref-default, then this is a small leap from yesterday's 
stacking, because "B3" and "B2" are both reference types, so if you want 
a tearable, non-atomic reference type, saying `non-atomic value class 
B3` and then just using B3 gets you that. Then:

 ?- B2 is like B1, minus identity
 ?- B3 means "uninitialized values are OK, you get two types, a 
zero-default and a non-default"
 ?- Non-atomicity is an extra property we can add to B3, to get more 
flattening in exchange for less integrity
 ?- The use cases for non-atomic B2 are served by non-atomic B3 (when 
.ref is the default)

I think this still has the properties I want; I can freely choose the 
reasonable subsets of { identity, has-zero, nullable, atomicity } that I 
want; the orthogonality of non-atomic across buckets becomes 
orthogonality of non-atomic with nullity, and the "B3.ref is just like 
B2" is shown to be the "false friend."


From daniel.smith at oracle.com  Fri May  6 15:15:52 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Fri, 6 May 2022 15:15:52 +0000
Subject: User model stacking: current status
In-Reply-To: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
Message-ID: <59B138BF-4940-4EA3-AFB8-E03591060BB3@oracle.com>

> On May 6, 2022, at 8:04 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> Thinking more about Dan's concerns here ...
> 
> On 5/5/2022 6:00 PM, Dan Smith wrote:
>> This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created. 
> 
> This is a little bit of a circular argument; it takes a property that an atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the whole point" of B2.  It may be that exposure of the zero is so bad we may eventually want to back away from the idea, but let's come up with a fair picture of what a non-atomic B2 means, and ask if that's sufficiently useful.

Fair. My interpretation is that we decided to create B2 because we weren't satisfied with the lack of guarantees offered to no-good-default classes that were reference-default B3s. So in that historical sense, B2s exist to offer guarantees.

>> This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3. 
> 
> Fair point, but let's pull on this string for a moment.  Suppose I want a null-default, flattenable value, and I'm willing to take the tearing to get there.  So you're saying "then declare a B3 and use B3.ref".  But B3.ref was supposed to have the same semantics as an equivalent B2!  (I realize I'm doing the same thing I just accused you of above -- taking an old invariant and positiioning it as "the point".  Stay tuned.)  Which means either that we lose flattening, again, or we create yet another asymmetry between B3.ref and B2. Maybe you're saying that the combination of nullable and full-flat is just too much to ask, but I am not sure it is; in any case, let's convince ourselves of this before we rule it out.

Yeah, I think my mindset has been here?non-atomic flat nulls are just more trouble than they're worth?but I'm open to discovering a compelling use case.


From brian.goetz at oracle.com  Sun May  8 16:32:09 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 8 May 2022 12:32:09 -0400
Subject: User model stacking: current status
In-Reply-To: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
Message-ID: <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>

To track the progress of the spiral:

 ?- We originally came up with the B2/B3 division to carve off B2 as the 
"safe subset", where you get less flattening but nulls and more 
integrity.? This provided a safe migration target for existing VBCs, as 
well as a reasonable target for creating new VBCs that want to be mostly 
class-like but enjoy some additional optimization (and shed accidental 
identity for safety reasons.)

 ?- When we put all the flesh on the bones of B2/B3, there were some 
undesirable consequences, such as (a) tearing was too subtle, and (b) 
both the semantics and cost model differences between B2/B3 were going 
to be hard to explain (and in some cases, users have bad choices between 
semantics and performance.)

 ?- A few weeks ago, we decided to more seriously consider separating 
atomicity out as an explicit thing on its own.? This had the benefit of 
putting semantics first, and offered a clearer cost model: you could 
give up identity but keep null-default and integrity (B2), further give 
up nulls to get some more density (B3.val), and further further give up 
atomicity to get more flatness (non-atomic B3.)? This was honest, but 
led people to complain "great, now there are four buckets."

 ?- We explored making non-atomicity a cross-cutting concern, so there 
are two new buckets (VBC and primitive-like), either of which can choose 
their atomicity constraints, and then within the primitive-like bucket, 
the .val and .ref projections differ only with respect to the 
consequences of nullity.? This felt cleaner (more orthogonal), but the 
notion of a non-atomic B2 itself is kind of weird.

So where this brings us is back to something that might feel like the 
four-bucket approach in the third bullet above, but with two big 
differences: atomicity is an explicit property of a class, rather than a 
property of reference-ness, and a B3.ref is not necessarily the same as 
a B2.? This recognizes that the main distinction between B2 or B3 is 
*whether a class can tolerate its zero value.*

More explicitly:

 ?- B1 remains unchanged

 ?- B2 is for "ordinary" value-based classes.? Always atomic, always 
nullable, always reference; the only difference with B1 is that it has 
shed its identity, enabling routine stack-based flattening, and perhaps 
some heap flattening depending on VM sophistication and heroics.? B2 is 
a good target for migrating many existing value-based classes.

 ?- B3 means that a class can tolerate its zero (uninitialized) value, 
and therefore gives rise to two types, which we'll call B3.ref and 
B3.val.? The former is a reference type and is therefore nullable and 
null-default; the latter is a direct/immediate/value type whose default 
is zero.

 ?- B3 classes can further be marked non-atomic; this unlocks greater 
flattening in the heap at the cost of tearing under race, and is 
suitable for classes without cross-field invariants.? Non-atomicity 
accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears 
(and therefore might expose its zero under race, as per friday's 
discussions.)

Syntactically (reminder: NOT an invitation to discuss syntax at this 
point), this might look like:

 ??? class B1 { }??????????????? // identity, reference, atomic

 ??? value-based class B2 { }??? // non-identity, reference, atomic

 ??? value class B3 { }????????? // non-identity, .ref and .val, both atomic

 ??? non-atomic value class B3 { }? // similar to B3, but both are 
non-atomic

So, two new (but related) class modifiers, of which one has an 
additional modifier.? (The spelling of all of these can be discussed 
after the user model is entirely nailed down.)

So, there's a monotonic sequence of "give stuff up, get other stuff":

 ?- B2 gives up identity relative to B1, gains some flattening
 ?- B3 optionally gives up null-defaultness relative to B2, yielding two 
types, one of which sheds some footprint
 ?- non-atomic B3 gives up atomicity relative to B3, gaining more 
flatness, for both type projections


On 5/6/2022 10:04 AM, Brian Goetz wrote:
> Thinking more about Dan's concerns here ...
>
> On 5/5/2022 6:00 PM, Dan Smith wrote:
>> This is significant because the primary reason to declare a B2 rather 
>> than a B3 is to guarantee that the all-zeros value cannot be created. 
>
> This is a little bit of a circular argument; it takes a property that 
> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be 
> "the whole point" of B2.? It may be that exposure of the zero is so 
> bad we may eventually want to back away from the idea, but let's come 
> up with a fair picture of what a non-atomic B2 means, and ask if 
> that's sufficiently useful.
>
>> This leads me to conclude that if you're declaring a non-atomic B2, 
>> you might as well just declare a non-atomic B3. 
>
> Fair point, but let's pull on this string for a moment.? Suppose I 
> want a null-default, flattenable value, and I'm willing to take the 
> tearing to get there.? So you're saying "then declare a B3 and use 
> B3.ref".? But B3.ref was supposed to have the same semantics as an 
> equivalent B2!? (I realize I'm doing the same thing I just accused you 
> of above -- taking an old invariant and positiioning it as "the 
> point".? Stay tuned.)? Which means either that we lose flattening, 
> again, or we create yet another asymmetry between B3.ref and B2. Maybe 
> you're saying that the combination of nullable and full-flat is just 
> too much to ask, but I am not sure it is; in any case, let's convince 
> ourselves of this before we rule it out.
>
> Or maybe, what you're saying is that my claim that B3.ref and B2 are 
> the same thing is the stale thing here, and we can let it go and get 
> it back in another form.? In which case you're positing a model where:
>
> ?- B1 is unchanged
> ?- B2 is always atomic, reference, nullable
> ?- B3 really means "the zero is OK", comes with .ref and .val, and 
> (non-atomic B3).ref is still tearable?
>
> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) 
> in the stacking I've been discussing.? Is that what you're saying?
>
> ??? class B1 { }? // ref, identity, atomic
> ??? value-based class B2 { }? // ref, non-identity, atomic
> ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, both 
> projections share atomicity
>
> If we go with ref-default, then this is a small leap from yesterday's 
> stacking, because "B3" and "B2" are both reference types, so if you 
> want a tearable, non-atomic reference type, saying `non-atomic value 
> class B3` and then just using B3 gets you that. Then:
>
> ?- B2 is like B1, minus identity
> ?- B3 means "uninitialized values are OK, you get two types, a 
> zero-default and a non-default"
> ?- Non-atomicity is an extra property we can add to B3, to get more 
> flattening in exchange for less integrity
> ?- The use cases for non-atomic B2 are served by non-atomic B3 (when 
> .ref is the default)
>
> I think this still has the properties I want; I can freely choose the 
> reasonable subsets of { identity, has-zero, nullable, atomicity } that 
> I want; the orthogonality of non-atomic across buckets becomes 
> orthogonality of non-atomic with nullity, and the "B3.ref is just like 
> B2" is shown to be the "false friend."
>
>

From kevinb at google.com  Mon May  9 15:43:56 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Mon, 9 May 2022 08:43:56 -0700
Subject: User model stacking: current status
In-Reply-To: <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
Message-ID: <CAGKkBku1Vi005tcqohD89VBuD6zE9pjUSWhy=2LfNbmBzUGaOw@mail.gmail.com>

On Sun, May 8, 2022 at 9:32 AM Brian Goetz <brian.goetz at oracle.com> wrote:

 - When we put all the flesh on the bones of B2/B3, there were some
> undesirable consequences, such as (a) tearing was too subtle, and (b) both
> the semantics and cost model differences between B2/B3 were going to be
> hard to explain (and in some cases, users have bad choices between
> semantics and performance.)
>

Explaining the semantic model doesn't feel hard to me right now, at least
not in any ways that I see current proposals addressing.

Explaining the cost model can mean two very different things. The general
user is fine with asterisks all over the place saying "* the VM may do
something different if it is convinced it knows better, and it will mostly
be right, just go with the flow" -- that is already how everything about
the cost model for everything works. If someone needs to understand more
deeply than that, it's expected to be difficult.

Putting these together, (b) doesn't sound valid to my ears.


So where this brings us is back to something that might feel like the
> four-bucket approach in the third bullet above, but with two big
> differences: atomicity is an explicit property of a class, rather than a
> property of reference-ness, and a B3.ref is not necessarily the same as a
> B2.
>

I don't follow how a B3.ref != a B2, unless you just mean that you can have
a reference to a bogus instance more easily than B2 can (which takes
serialization/reflection to do that).


>  - B3 classes can further be marked non-atomic; this unlocks greater
> flattening in the heap at the cost of tearing under race, and is suitable
> for classes without cross-field invariants.  Non-atomicity accrues equally
> to B3.ref and B3.val; a non-atomic B3.ref still tears (and therefore might
> expose its zero under race, as per friday's discussions.)
>

Am I right that this "non-atomic" marker would be ignored for classes like
Integer where the vm can tell that it can just give you the best of both
worlds?


> Syntactically (reminder: NOT an invitation to discuss syntax at this
> point), this might look like:
>
>     class B1 { }                // identity, reference, atomic
>
>     value-based class B2 { }    // non-identity, reference, atomic
>
>     value class B3 { }          // non-identity, .ref and .val, both atomic
>
>     non-atomic value class B3 { }  // similar to B3, but both are
> non-atomic
>

Buckets.java:7: error: duplicate class: B3

But seriously, we won't get away with pretending there are just 3 buckets
if we do this. Let's be honest and call it B4.

Would I be right that we can achieve primitive unification even without B4?
There is nothing wrong with our delivering many performance gains while
leaving others on the table for later.


> On 5/6/2022 10:04 AM, Brian Goetz wrote:
>
> Thinking more about Dan's concerns here ...
>
> On 5/5/2022 6:00 PM, Dan Smith wrote:
>
> This is significant because the primary reason to declare a B2 rather than
> a B3 is to guarantee that the all-zeros value cannot be created.
>
>
> This is a little bit of a circular argument; it takes a property that an
> atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the
> whole point" of B2.  It may be that exposure of the zero is so bad we may
> eventually want to back away from the idea, but let's come up with a fair
> picture of what a non-atomic B2 means, and ask if that's sufficiently
> useful.
>
> This leads me to conclude that if you're declaring a non-atomic B2, you
> might as well just declare a non-atomic B3.
>
>
> Fair point, but let's pull on this string for a moment.  Suppose I want a
> null-default, flattenable value, and I'm willing to take the tearing to get
> there.  So you're saying "then declare a B3 and use B3.ref".  But B3.ref
> was supposed to have the same semantics as an equivalent B2!  (I realize
> I'm doing the same thing I just accused you of above -- taking an old
> invariant and positiioning it as "the point".  Stay tuned.)  Which means
> either that we lose flattening, again, or we create yet another asymmetry
> between B3.ref and B2. Maybe you're saying that the combination of nullable
> and full-flat is just too much to ask, but I am not sure it is; in any
> case, let's convince ourselves of this before we rule it out.
>
> Or maybe, what you're saying is that my claim that B3.ref and B2 are the
> same thing is the stale thing here, and we can let it go and get it back in
> another form.  In which case you're positing a model where:
>
>  - B1 is unchanged
>  - B2 is always atomic, reference, nullable
>  - B3 really means "the zero is OK", comes with .ref and .val, and
> (non-atomic B3).ref is still tearable?
>
> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in
> the stacking I've been discussing.  Is that what you're saying?
>
>     class B1 { }  // ref, identity, atomic
>     value-based class B2 { }  // ref, non-identity, atomic
>     [ non-atomic ] value class B3 { }  // ref or val, zero is ok, both
> projections share atomicity
>
> If we go with ref-default, then this is a small leap from yesterday's
> stacking, because "B3" and "B2" are both reference types, so if you want a
> tearable, non-atomic reference type, saying `non-atomic value class B3` and
> then just using B3 gets you that. Then:
>
>  - B2 is like B1, minus identity
>  - B3 means "uninitialized values are OK, you get two types, a
> zero-default and a non-default"
>  - Non-atomicity is an extra property we can add to B3, to get more
> flattening in exchange for less integrity
>  - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref
> is the default)
>
> I think this still has the properties I want; I can freely choose the
> reasonable subsets of { identity, has-zero, nullable, atomicity } that I
> want; the orthogonality of non-atomic across buckets becomes orthogonality
> of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be
> the "false friend."
>
>
>
>

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Mon May  9 15:51:53 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 9 May 2022 11:51:53 -0400
Subject: User model stacking: current status
In-Reply-To: <CAGKkBku1Vi005tcqohD89VBuD6zE9pjUSWhy=2LfNbmBzUGaOw@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <CAGKkBku1Vi005tcqohD89VBuD6zE9pjUSWhy=2LfNbmBzUGaOw@mail.gmail.com>
Message-ID: <e9125821-5f0f-4cdb-945d-5ddc6ca451c8@oracle.com>


>
>     So where this brings us is back to something that might feel like
>     the four-bucket approach in the third bullet above, but with two
>     big differences: atomicity is an explicit property of a class,
>     rather than a property of reference-ness, and a B3.ref is not
>     necessarily the same as a B2.
>
>
> I don't follow how a B3.ref != a B2, unless you just mean that you can 
> have a reference to a bogus instance more easily than B2 can (which 
> takes serialization/reflection to do that).

It means that a B3.ref is exactly as subject to tearing as the 
same-atomicity B3, whereas a B2 is not.

>     ?- B3 classes can further be marked non-atomic; this unlocks
>     greater flattening in the heap at the cost of tearing under race,
>     and is suitable for classes without cross-field invariants.?
>     Non-atomicity accrues equally to B3.ref and B3.val; a non-atomic
>     B3.ref still tears (and therefore might expose its zero under
>     race, as per friday's discussions.)
>
>
> Am I right that this "non-atomic" marker would be ignored for classes 
> like Integer where the vm can tell that it can just give you the best 
> of both worlds?

We can provide atomicity semantics for sufficiently small objects at no 
cost.? In practicality this probably means "classes whose layout boils 
down to a single 32-bit-or-smaller primitive, or a single reference".

>
> But seriously, we won't get away with pretending there are just 3 
> buckets if we do this. Let's be honest and call it B4.

"Bucket" is a term that makes sense in language design, but need not 
flow into the user model.? But yes, there really are three things that 
the user needs control over: identity, zero-friendliness, atomicity.? If 
you want to call that four buckets, I won't argue. The real discussion 
here is whether these controls need to be *separate*.? And I think they do:

 ?- The premise of Valhalla is that the VM can't guess whether identity 
is needed, so the user has to explicitly disavow it to enable more goodies;
 ?- Classes like LocalDate have no good zero, so the user needs to be 
able to disavow the zero value when it doesn't fit the semantics of the 
class;
 ?- (the controversial one) Atomicity is simply too confusing and 
potentially astonishing to piggyback on "primitive-ness" or 
"reference-ness" in a codes-like-a-class world.

> Would I be right that we can achieve primitive unification even 
> without B4? There is nothing wrong with our delivering many 
> performance gains while leaving others on the table for later.

Yes, delivering primitive unification first means you can't have flat 
Complex yet.


From kevinb at google.com  Mon May  9 16:10:37 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Mon, 9 May 2022 09:10:37 -0700
Subject: User model stacking: current status
In-Reply-To: <e9125821-5f0f-4cdb-945d-5ddc6ca451c8@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <CAGKkBku1Vi005tcqohD89VBuD6zE9pjUSWhy=2LfNbmBzUGaOw@mail.gmail.com>
 <e9125821-5f0f-4cdb-945d-5ddc6ca451c8@oracle.com>
Message-ID: <CAGKkBks8wjcfCO4gUadSknbGQinkAzF74n04BkNDmaQVbih-Fg@mail.gmail.com>

On Mon, May 9, 2022 at 8:52 AM Brian Goetz <brian.goetz at oracle.com> wrote:

We can provide atomicity semantics for sufficiently small objects at no
> cost.  In practicality this probably means "classes whose layout boils down
> to a single 32-bit-or-smaller primitive, or a single reference".
>

Right, and for long and double we can say they are as atomic as they ever
were.


> But seriously, we won't get away with pretending there are just 3 buckets
> if we do this. Let's be honest and call it B4.
>
> "Bucket" is a term that makes sense in language design, but need not flow
> into the user model.  But yes, there really are three things that the user
> needs control over: identity, zero-friendliness, atomicity.  If you want to
> call that four buckets, I won't argue.
>

I *am* of course only caring about the user model, and that's where I'm
saying we would not get away with pretending this isn't a 4th kind of
concrete class.


>  - Classes like LocalDate have no good zero, so the user needs to be able
> to disavow the zero value when it doesn't fit the semantics of the class;
>


>  - (the controversial one) Atomicity is simply too confusing and
> potentially astonishing to piggyback on "primitive-ness" or
> "reference-ness" in a codes-like-a-class world.
>

(Controversial with me at least; I keep thinking who are these people who
can understand the rest of how to safely write non-locking concurrent code
yet would struggle with this?)

Would I be right that we can achieve primitive unification even without B4?
> There is nothing wrong with our delivering many performance gains while
> leaving others on the table for later.
>
> Yes, delivering primitive unification first means you can't have flat
> Complex yet.
>

But they still get *often-flat* Complex? Sounds like *always-flat* Complex
is the perfect thing to punt on then.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Mon May  9 16:54:24 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 9 May 2022 12:54:24 -0400
Subject: User model stacking: current status
In-Reply-To: <CAGKkBks8wjcfCO4gUadSknbGQinkAzF74n04BkNDmaQVbih-Fg@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <CAGKkBku1Vi005tcqohD89VBuD6zE9pjUSWhy=2LfNbmBzUGaOw@mail.gmail.com>
 <e9125821-5f0f-4cdb-945d-5ddc6ca451c8@oracle.com>
 <CAGKkBks8wjcfCO4gUadSknbGQinkAzF74n04BkNDmaQVbih-Fg@mail.gmail.com>
Message-ID: <f419f1d1-7837-4cad-faa1-cf84edb84d9a@oracle.com>


>     ?- (the controversial one) Atomicity is simply too confusing and
>     potentially astonishing to piggyback on "primitive-ness" or
>     "reference-ness" in a codes-like-a-class world.
>
>
> (Controversial with me at least; I keep thinking who are these people 
> who can understand the rest of how to safely write non-locking 
> concurrent code yet would struggle with this?)

So, the reason I'm being so dogmatic about this is that it undermines 
the belief that "immutable classes are always thread-safe".? I think the 
"objects vs values" distinction is too subtle; hiding atomicity behind 
primitive-ness is too subtle.? I can get behind saying "immutable 
classes are always thread-safe, unless they have been explicitly marked 
as non-atomic", because this is a clear indication that can't be 
confused for anything else.

>
>>     Would I be right that we can achieve primitive unification even
>>     without B4? There is nothing wrong with our delivering many
>>     performance gains while leaving others on the table for later.
>     Yes, delivering primitive unification first means you can't have
>     flat Complex yet.
>
>
> But they still get /often-flat/?Complex? Sounds like 
> /always-flat/?Complex is the perfect thing to punt on then.

They'll get flattening on the stack, but the layout of a Complex[] will 
likely be an array of pointers for a long time, until some heroics kick 
in.? I don't necessarily have a problem with a phased delivery where 
flattening comes later, but I'll note, too, that this is where *most of 
the heap flattening win is* -- arrays of nontrivial numerics.? Because 
there are lots of them, and such code will likely iterate over the 
arrays plenty, doing small amounts of CPU work per element, and then 
stalling when the memory subsystem chokes on a cache miss.? So we can't 
punt for that long.

From forax at univ-mlv.fr  Mon May  9 17:34:01 2022
From: forax at univ-mlv.fr (Remi Forax)
Date: Mon, 9 May 2022 19:34:01 +0200 (CEST)
Subject: User model stacking: current status
In-Reply-To: <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
Message-ID: <1864439853.23284976.1652117641027.JavaMail.zimbra@u-pem.fr>

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "daniel smith" <daniel.smith at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Sunday, May 8, 2022 6:32:09 PM
> Subject: Re: User model stacking: current status

> To track the progress of the spiral:

> - We originally came up with the B2/B3 division to carve off B2 as the "safe
> subset", where you get less flattening but nulls and more integrity. This
> provided a safe migration target for existing VBCs, as well as a reasonable
> target for creating new VBCs that want to be mostly class-like but enjoy some
> additional optimization (and shed accidental identity for safety reasons.)

> - When we put all the flesh on the bones of B2/B3, there were some undesirable
> consequences, such as (a) tearing was too subtle, and (b) both the semantics
> and cost model differences between B2/B3 were going to be hard to explain (and
> in some cases, users have bad choices between semantics and performance.)

> - A few weeks ago, we decided to more seriously consider separating atomicity
> out as an explicit thing on its own. This had the benefit of putting semantics
> first, and offered a clearer cost model: you could give up identity but keep
> null-default and integrity (B2), further give up nulls to get some more density
> (B3.val), and further further give up atomicity to get more flatness
> (non-atomic B3.) This was honest, but led people to complain "great, now there
> are four buckets."

> - We explored making non-atomicity a cross-cutting concern, so there are two new
> buckets (VBC and primitive-like), either of which can choose their atomicity
> constraints, and then within the primitive-like bucket, the .val and .ref
> projections differ only with respect to the consequences of nullity. This felt
> cleaner (more orthogonal), but the notion of a non-atomic B2 itself is kind of
> weird.

> So where this brings us is back to something that might feel like the
> four-bucket approach in the third bullet above, but with two big differences:
> atomicity is an explicit property of a class, rather than a property of
> reference-ness, and a B3.ref is not necessarily the same as a B2. This
> recognizes that the main distinction between B2 or B3 is *whether a class can
> tolerate its zero value.*

> More explicitly:

> - B1 remains unchanged

> - B2 is for "ordinary" value-based classes. Always atomic, always nullable,
> always reference; the only difference with B1 is that it has shed its identity,
> enabling routine stack-based flattening, and perhaps some heap flattening
> depending on VM sophistication and heroics. B2 is a good target for migrating
> many existing value-based classes.

> - B3 means that a class can tolerate its zero (uninitialized) value, and
> therefore gives rise to two types, which we'll call B3.ref and B3.val. The
> former is a reference type and is therefore nullable and null-default; the
> latter is a direct/immediate/value type whose default is zero.

> - B3 classes can further be marked non-atomic; this unlocks greater flattening
> in the heap at the cost of tearing under race, and is suitable for classes
> without cross-field invariants. Non-atomicity accrues equally to B3.ref and
> B3.val; a non-atomic B3.ref still tears (and therefore might expose its zero
> under race, as per friday's discussions.)

> Syntactically (reminder: NOT an invitation to discuss syntax at this point),
> this might look like:

> class B1 { } // identity, reference, atomic

> value-based class B2 { } // non-identity, reference, atomic

> value class B3 { } // non-identity, .ref and .val, both atomic

> non-atomic value class B3 { } // similar to B3, but both are non-atomic

> So, two new (but related) class modifiers, of which one has an additional
> modifier. (The spelling of all of these can be discussed after the user model
> is entirely nailed down.)

> So, there's a monotonic sequence of "give stuff up, get other stuff":

> - B2 gives up identity relative to B1, gains some flattening
> - B3 optionally gives up null-defaultness relative to B2, yielding two types,
> one of which sheds some footprint
> - non-atomic B3 gives up atomicity relative to B3, gaining more flatness, for
> both type projections
There is also something we should talk, using non-atomic value classes does not mean automatically better performance. 
It's something i've discovered trying to implement HashMap (more Map.of() in fact) using value classes. 
Updating a value class in the heap requires more writes, more memory traffic than just updating pointers so depending on the algorithm, you may see performance degradation compared to a pointer based implementation. 

So even if we provide non-atomic B3, performance can be worst than using atomic B3, sadly gaining more flatness does not necessarily translate into better performance. 

R?mi 

Side Note: using more pointers mean more pressure to the GC, but it's very hard to quantify that pressure so maybe overall the system is more performant but given that JMH tests usually does not take GCs into account, it's an effect that we do not see. 

From brian.goetz at oracle.com  Mon May  9 17:46:17 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 9 May 2022 13:46:17 -0400
Subject: User model stacking: current status
In-Reply-To: <1864439853.23284976.1652117641027.JavaMail.zimbra@u-pem.fr>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <1864439853.23284976.1652117641027.JavaMail.zimbra@u-pem.fr>
Message-ID: <c031ce8f-8b72-5a49-cd17-3c0fc8bd5694@oracle.com>

Yes, Doug posted some data a while back about sorting, where the 
breakeven between sorting references and taking the indirection hit and 
sorting values and taking the "more memory movement" hit was not 
obvious. Flattening means ... flattening.? Sometimes it means faster, 
but sometimes not.? This is yet another reason why we should focus on 
providing semantic knobs, not "performance-labeled" knobs.

On 5/9/2022 1:34 PM, Remi Forax wrote:
> There is also something we should talk, using non-atomic value classes 
> does not mean automatically better performance.
> It's something i've discovered trying to implement HashMap (more 
> Map.of() in fact) using value classes.
> Updating a value class in the heap requires more writes, more memory 
> traffic than just updating pointers so depending on the algorithm, you 
> may see performance degradation compared to a pointer based 
> implementation.
>
> So even if we provide non-atomic B3, performance can be worst than 
> using atomic B3, sadly gaining more flatness does not necessarily 
> translate into better performance.

From daniel.smith at oracle.com  Mon May  9 20:47:19 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Mon, 9 May 2022 20:47:19 +0000
Subject: User model stacking: current status
In-Reply-To: <CAGKkBks8wjcfCO4gUadSknbGQinkAzF74n04BkNDmaQVbih-Fg@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <CAGKkBku1Vi005tcqohD89VBuD6zE9pjUSWhy=2LfNbmBzUGaOw@mail.gmail.com>
 <e9125821-5f0f-4cdb-945d-5ddc6ca451c8@oracle.com>
 <CAGKkBks8wjcfCO4gUadSknbGQinkAzF74n04BkNDmaQVbih-Fg@mail.gmail.com>
Message-ID: <C4B9BD94-2151-40D9-A177-54B91392C3A3@oracle.com>

> On May 9, 2022, at 10:10 AM, Kevin Bourrillion <kevinb at google.com> wrote:
> 
>>> But seriously, we won't get away with pretending there are just 3 buckets if we do this. Let's be honest and call it B4.
>> "Bucket" is a term that makes sense in language design, but need not flow into the user model.  But yes, there really are three things that the user needs control over: identity, zero-friendliness, atomicity.  If you want to call that four buckets, I won't argue.
>> 
> I *am* of course only caring about the user model, and that's where I'm saying we would not get away with pretending this isn't a 4th kind of concrete class.

Here's a presentation that doesn't feel to me like it's describing a menu with four choices:

In Java, there are object references and there are primitives. For which kinds of values are you trying to declare a class?

If object references: okay, do your objects need identity or not?

If primitives: okay, do your primitives need atomicity or not?


From brian.goetz at oracle.com  Mon May  9 21:14:09 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 9 May 2022 17:14:09 -0400
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
Message-ID: <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>

Assuming the stacking here is satisfactory, let's talk about .ref and .val.

Kevin made a strong argument for .ref as default, so let's pull on that 
string for a bit.

Universal generics need a way to express .ref at least for type 
variables, so if we're going to make .ref the default, we still need a 
way to denote it.? Calling the types Foo.ref and Foo.val, where Foo is 
an alias for Foo.ref, is one way to achieve this.

<wild-speculation>

Now, let's step out onto some controversial territory: how do we spell 
.ref and .val?? Specifically, how good a fit is `!` and `?` (henceforth, 
emotional types), now that the B3 election is _solely_ about the 
existence of a zero-default .val?? (Before, it was a poor fit, but now 
it might be credible.? Yet another reason I wanted to tease apart what 
"primitive" meant into independent axes.)

Pro: users think they really want emotional types.
Pro: to the extent we eventually acquire full emotional types, and to 
the extent these align cleanly with primitive type projections, it 
avoids weirdnesses like `Foo.val?`, where there are two ways to talk 
about nullity.

Con: These will surely not initially be the full emotional types users 
think they want, and so may well be met by "you idiots, these are not 
the emotional types we want"
Con: To the extent full emotional types do not align clearly with 
primitive type projections, we might be painted into a corner and it 
might be harder to do emotional types.

Risk: the language treatment of emotional types is one thing, but the 
real cost in introducing them into the language is annotating the 
libraries.? Having them in the language but not annotating the libraries 
on a timely basis may well be a step backwards.


If we had full emotional types, some would have their non-nullity erased 
(`String!` erases to the same type descriptor as ordinary `String`) and 
some would have it reified (Integer! translates to a separate type, the 
`I` carrier.)? This means that migrating `String` to `String` might be 
binary-compatible, but `Integer` to `Integer!` would not be.? (This is 
probably an acceptable asymmetry.)

But a bigger question is whether an erased `String!` should be backed up 
by a synthetic null check at the boundary between checked and unchecked 
code, such as method entry points (just as unpacking a T from a generic 
is backed up by a synthetic cast at the boundary between generic and 
explicit code.)? This is reasonable (and cheap enough), but may be on a 
collision course with some interpretations of `String!`.

Initially, we probably would restrict the use of `!` to val-projections 
of primitive classes, but the pressure to extend it would always be just 
around the corner (e.g., having them in type patterns would likely 
address many people's initial discomfort about null handling in patterns).

</wild-speculation>

My goal here is not to dive into the details of "let's design nullable 
types", as that would be a distraction at this point, as much as to 
gauge sentiment on whether this is worth exploring further, and gather 
considerations I may have missed in this brief summary.


On 5/8/2022 12:32 PM, Brian Goetz wrote:
> To track the progress of the spiral:
>
> ?- We originally came up with the B2/B3 division to carve off B2 as 
> the "safe subset", where you get less flattening but nulls and more 
> integrity.? This provided a safe migration target for existing VBCs, 
> as well as a reasonable target for creating new VBCs that want to be 
> mostly class-like but enjoy some additional optimization (and shed 
> accidental identity for safety reasons.)
>
> ?- When we put all the flesh on the bones of B2/B3, there were some 
> undesirable consequences, such as (a) tearing was too subtle, and (b) 
> both the semantics and cost model differences between B2/B3 were going 
> to be hard to explain (and in some cases, users have bad choices 
> between semantics and performance.)
>
> ?- A few weeks ago, we decided to more seriously consider separating 
> atomicity out as an explicit thing on its own. This had the benefit of 
> putting semantics first, and offered a clearer cost model: you could 
> give up identity but keep null-default and integrity (B2), further 
> give up nulls to get some more density (B3.val), and further further 
> give up atomicity to get more flatness (non-atomic B3.)? This was 
> honest, but led people to complain "great, now there are four buckets."
>
> ?- We explored making non-atomicity a cross-cutting concern, so there 
> are two new buckets (VBC and primitive-like), either of which can 
> choose their atomicity constraints, and then within the primitive-like 
> bucket, the .val and .ref projections differ only with respect to the 
> consequences of nullity.? This felt cleaner (more orthogonal), but the 
> notion of a non-atomic B2 itself is kind of weird.
>
> So where this brings us is back to something that might feel like the 
> four-bucket approach in the third bullet above, but with two big 
> differences: atomicity is an explicit property of a class, rather than 
> a property of reference-ness, and a B3.ref is not necessarily the same 
> as a B2.? This recognizes that the main distinction between B2 or B3 
> is *whether a class can tolerate its zero value.*
>
> More explicitly:
>
> ?- B1 remains unchanged
>
> ?- B2 is for "ordinary" value-based classes.? Always atomic, always 
> nullable, always reference; the only difference with B1 is that it has 
> shed its identity, enabling routine stack-based flattening, and 
> perhaps some heap flattening depending on VM sophistication and 
> heroics.? B2 is a good target for migrating many existing value-based 
> classes.
>
> ?- B3 means that a class can tolerate its zero (uninitialized) value, 
> and therefore gives rise to two types, which we'll call B3.ref and 
> B3.val.? The former is a reference type and is therefore nullable and 
> null-default; the latter is a direct/immediate/value type whose 
> default is zero.
>
> ?- B3 classes can further be marked non-atomic; this unlocks greater 
> flattening in the heap at the cost of tearing under race, and is 
> suitable for classes without cross-field invariants.? Non-atomicity 
> accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears 
> (and therefore might expose its zero under race, as per friday's 
> discussions.)
>
> Syntactically (reminder: NOT an invitation to discuss syntax at this 
> point), this might look like:
>
> ??? class B1 { }??????????????? // identity, reference, atomic
>
> ??? value-based class B2 { }??? // non-identity, reference, atomic
>
> ??? value class B3 { }????????? // non-identity, .ref and .val, both 
> atomic
>
> ??? non-atomic value class B3 { }? // similar to B3, but both are 
> non-atomic
>
> So, two new (but related) class modifiers, of which one has an 
> additional modifier.? (The spelling of all of these can be discussed 
> after the user model is entirely nailed down.)
>
> So, there's a monotonic sequence of "give stuff up, get other stuff":
>
> ?- B2 gives up identity relative to B1, gains some flattening
> ?- B3 optionally gives up null-defaultness relative to B2, yielding 
> two types, one of which sheds some footprint
> ?- non-atomic B3 gives up atomicity relative to B3, gaining more 
> flatness, for both type projections
>
>
>
>
>
>
> On 5/6/2022 10:04 AM, Brian Goetz wrote:
>> Thinking more about Dan's concerns here ...
>>
>> On 5/5/2022 6:00 PM, Dan Smith wrote:
>>> This is significant because the primary reason to declare a B2 
>>> rather than a B3 is to guarantee that the all-zeros value cannot be 
>>> created. 
>>
>> This is a little bit of a circular argument; it takes a property that 
>> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be 
>> "the whole point" of B2.? It may be that exposure of the zero is so 
>> bad we may eventually want to back away from the idea, but let's come 
>> up with a fair picture of what a non-atomic B2 means, and ask if 
>> that's sufficiently useful.
>>
>>> This leads me to conclude that if you're declaring a non-atomic B2, 
>>> you might as well just declare a non-atomic B3. 
>>
>> Fair point, but let's pull on this string for a moment.? Suppose I 
>> want a null-default, flattenable value, and I'm willing to take the 
>> tearing to get there.? So you're saying "then declare a B3 and use 
>> B3.ref".? But B3.ref was supposed to have the same semantics as an 
>> equivalent B2!? (I realize I'm doing the same thing I just accused 
>> you of above -- taking an old invariant and positiioning it as "the 
>> point".? Stay tuned.)? Which means either that we lose flattening, 
>> again, or we create yet another asymmetry between B3.ref and B2. 
>> Maybe you're saying that the combination of nullable and full-flat is 
>> just too much to ask, but I am not sure it is; in any case, let's 
>> convince ourselves of this before we rule it out.
>>
>> Or maybe, what you're saying is that my claim that B3.ref and B2 are 
>> the same thing is the stale thing here, and we can let it go and get 
>> it back in another form.? In which case you're positing a model where:
>>
>> ?- B1 is unchanged
>> ?- B2 is always atomic, reference, nullable
>> ?- B3 really means "the zero is OK", comes with .ref and .val, and 
>> (non-atomic B3).ref is still tearable?
>>
>> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) 
>> in the stacking I've been discussing.? Is that what you're saying?
>>
>> ??? class B1 { }? // ref, identity, atomic
>> ??? value-based class B2 { }? // ref, non-identity, atomic
>> ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, 
>> both projections share atomicity
>>
>> If we go with ref-default, then this is a small leap from yesterday's 
>> stacking, because "B3" and "B2" are both reference types, so if you 
>> want a tearable, non-atomic reference type, saying `non-atomic value 
>> class B3` and then just using B3 gets you that. Then:
>>
>> ?- B2 is like B1, minus identity
>> ?- B3 means "uninitialized values are OK, you get two types, a 
>> zero-default and a non-default"
>> ?- Non-atomicity is an extra property we can add to B3, to get more 
>> flattening in exchange for less integrity
>> ?- The use cases for non-atomic B2 are served by non-atomic B3 (when 
>> .ref is the default)
>>
>> I think this still has the properties I want; I can freely choose the 
>> reasonable subsets of { identity, has-zero, nullable, atomicity } 
>> that I want; the orthogonality of non-atomic across buckets becomes 
>> orthogonality of non-atomic with nullity, and the "B3.ref is just 
>> like B2" is shown to be the "false friend."
>>
>>
>

From mariell.hoversholm at paf.com  Wed May 11 12:47:52 2022
From: mariell.hoversholm at paf.com (Mariell Hoversholm)
Date: Wed, 11 May 2022 14:47:52 +0200
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
Message-ID: <CAKSb5uNn+7fQAQ-TEjnrMyXpaeLyoBPg--qUfJ=0Jhdt=dXh0A@mail.gmail.com>

Hi,

Would nullable types be `T?`? This is what I've inferred, but would
appreciate it being made explicit. I will continue with this assumption in
the
rest of my answer.

I personally very much enjoy Kotlin's and Rust's forced nullity. I believe a
clear majority of the other users of these languages do the same. Because of
this, I would absolutely encourage you/another team to go down the path of
considering designing and implementing nullness as part of the type-system.
Regarding changing the types to be `.ref` by default: I think this would be
a
beneficial change with regards to the current behaviour of types in Java
(i.e.
`.ref` being the only option, and have primitives be exceptions to this
rule).

This could potentially lead to some smaller mess-ups in the future, however
as I
would _imagine_ most developers would like the benefits of `.val` types in
most
instances, but they may forget. To put the `.val` case into a perspective
of how
easy it could become to forget: if you have a game where you need
positions, you
could have e.g. `Pos3d` (let's model this as `primitive record Pos3d(double
x,
double y, double z)` for completeness' sake). This is a light type that
would be
defaulted to being stored on the heap. Larger games will require you to
write
`.val` everywhere, which may easily be forgotten in hot code.

Given the possibility of `Optional.val`, we could potentially be missing a
good
practice here. It would(/could) be much cheaper to allocate an
`Optional.val<Pos3d.val!>!` than it would be to allocate a `Pos3d.ref?`.
Please
also note that I have avoided the topic of binary- and source-compatibility
in
this entirely; they may very much be important aspects to consider, given
the
defaults would change existing code, even in the JDK.

I'm not sure of how much help I am to your gauging interest, but hope it
could,
at the very least, be a small indication of how users of other languages may
find the ideas brought up.

Kind regards,
Mariell Hoversholm (she/her)

On Mon, 9 May 2022 at 23:14, Brian Goetz <brian.goetz at oracle.com> wrote:

> Assuming the stacking here is satisfactory, let's talk about .ref and .val.
>
> Kevin made a strong argument for .ref as default, so let's pull on that
> string for a bit.
>
> Universal generics need a way to express .ref at least for type
> variables, so if we're going to make .ref the default, we still need a
> way to denote it.  Calling the types Foo.ref and Foo.val, where Foo is
> an alias for Foo.ref, is one way to achieve this.
>
> <wild-speculation>
>
> Now, let's step out onto some controversial territory: how do we spell
> .ref and .val?  Specifically, how good a fit is `!` and `?` (henceforth,
> emotional types), now that the B3 election is _solely_ about the
> existence of a zero-default .val?  (Before, it was a poor fit, but now
> it might be credible.  Yet another reason I wanted to tease apart what
> "primitive" meant into independent axes.)
>
> Pro: users think they really want emotional types.
> Pro: to the extent we eventually acquire full emotional types, and to
> the extent these align cleanly with primitive type projections, it
> avoids weirdnesses like `Foo.val?`, where there are two ways to talk
> about nullity.
>
> Con: These will surely not initially be the full emotional types users
> think they want, and so may well be met by "you idiots, these are not
> the emotional types we want"
> Con: To the extent full emotional types do not align clearly with
> primitive type projections, we might be painted into a corner and it
> might be harder to do emotional types.
>
> Risk: the language treatment of emotional types is one thing, but the
> real cost in introducing them into the language is annotating the
> libraries.  Having them in the language but not annotating the libraries
> on a timely basis may well be a step backwards.
>
>
> If we had full emotional types, some would have their non-nullity erased
> (`String!` erases to the same type descriptor as ordinary `String`) and
> some would have it reified (Integer! translates to a separate type, the
> `I` carrier.)  This means that migrating `String` to `String` might be
> binary-compatible, but `Integer` to `Integer!` would not be.  (This is
> probably an acceptable asymmetry.)
>
> But a bigger question is whether an erased `String!` should be backed up
> by a synthetic null check at the boundary between checked and unchecked
> code, such as method entry points (just as unpacking a T from a generic
> is backed up by a synthetic cast at the boundary between generic and
> explicit code.)  This is reasonable (and cheap enough), but may be on a
> collision course with some interpretations of `String!`.
>
> Initially, we probably would restrict the use of `!` to val-projections
> of primitive classes, but the pressure to extend it would always be just
> around the corner (e.g., having them in type patterns would likely
> address many people's initial discomfort about null handling in patterns).
>
> </wild-speculation>
>
> My goal here is not to dive into the details of "let's design nullable
> types", as that would be a distraction at this point, as much as to
> gauge sentiment on whether this is worth exploring further, and gather
> considerations I may have missed in this brief summary.
>
>
> On 5/8/2022 12:32 PM, Brian Goetz wrote:
> > To track the progress of the spiral:
> >
> >  - We originally came up with the B2/B3 division to carve off B2 as
> > the "safe subset", where you get less flattening but nulls and more
> > integrity.  This provided a safe migration target for existing VBCs,
> > as well as a reasonable target for creating new VBCs that want to be
> > mostly class-like but enjoy some additional optimization (and shed
> > accidental identity for safety reasons.)
> >
> >  - When we put all the flesh on the bones of B2/B3, there were some
> > undesirable consequences, such as (a) tearing was too subtle, and (b)
> > both the semantics and cost model differences between B2/B3 were going
> > to be hard to explain (and in some cases, users have bad choices
> > between semantics and performance.)
> >
> >  - A few weeks ago, we decided to more seriously consider separating
> > atomicity out as an explicit thing on its own. This had the benefit of
> > putting semantics first, and offered a clearer cost model: you could
> > give up identity but keep null-default and integrity (B2), further
> > give up nulls to get some more density (B3.val), and further further
> > give up atomicity to get more flatness (non-atomic B3.)  This was
> > honest, but led people to complain "great, now there are four buckets."
> >
> >  - We explored making non-atomicity a cross-cutting concern, so there
> > are two new buckets (VBC and primitive-like), either of which can
> > choose their atomicity constraints, and then within the primitive-like
> > bucket, the .val and .ref projections differ only with respect to the
> > consequences of nullity.  This felt cleaner (more orthogonal), but the
> > notion of a non-atomic B2 itself is kind of weird.
> >
> > So where this brings us is back to something that might feel like the
> > four-bucket approach in the third bullet above, but with two big
> > differences: atomicity is an explicit property of a class, rather than
> > a property of reference-ness, and a B3.ref is not necessarily the same
> > as a B2.  This recognizes that the main distinction between B2 or B3
> > is *whether a class can tolerate its zero value.*
> >
> > More explicitly:
> >
> >  - B1 remains unchanged
> >
> >  - B2 is for "ordinary" value-based classes.  Always atomic, always
> > nullable, always reference; the only difference with B1 is that it has
> > shed its identity, enabling routine stack-based flattening, and
> > perhaps some heap flattening depending on VM sophistication and
> > heroics.  B2 is a good target for migrating many existing value-based
> > classes.
> >
> >  - B3 means that a class can tolerate its zero (uninitialized) value,
> > and therefore gives rise to two types, which we'll call B3.ref and
> > B3.val.  The former is a reference type and is therefore nullable and
> > null-default; the latter is a direct/immediate/value type whose
> > default is zero.
> >
> >  - B3 classes can further be marked non-atomic; this unlocks greater
> > flattening in the heap at the cost of tearing under race, and is
> > suitable for classes without cross-field invariants.  Non-atomicity
> > accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears
> > (and therefore might expose its zero under race, as per friday's
> > discussions.)
> >
> > Syntactically (reminder: NOT an invitation to discuss syntax at this
> > point), this might look like:
> >
> >     class B1 { }                // identity, reference, atomic
> >
> >     value-based class B2 { }    // non-identity, reference, atomic
> >
> >     value class B3 { }          // non-identity, .ref and .val, both
> > atomic
> >
> >     non-atomic value class B3 { }  // similar to B3, but both are
> > non-atomic
> >
> > So, two new (but related) class modifiers, of which one has an
> > additional modifier.  (The spelling of all of these can be discussed
> > after the user model is entirely nailed down.)
> >
> > So, there's a monotonic sequence of "give stuff up, get other stuff":
> >
> >  - B2 gives up identity relative to B1, gains some flattening
> >  - B3 optionally gives up null-defaultness relative to B2, yielding
> > two types, one of which sheds some footprint
> >  - non-atomic B3 gives up atomicity relative to B3, gaining more
> > flatness, for both type projections
> >
> >
> >
> >
> >
> >
> > On 5/6/2022 10:04 AM, Brian Goetz wrote:
> >> Thinking more about Dan's concerns here ...
> >>
> >> On 5/5/2022 6:00 PM, Dan Smith wrote:
> >>> This is significant because the primary reason to declare a B2
> >>> rather than a B3 is to guarantee that the all-zeros value cannot be
> >>> created.
> >>
> >> This is a little bit of a circular argument; it takes a property that
> >> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be
> >> "the whole point" of B2.  It may be that exposure of the zero is so
> >> bad we may eventually want to back away from the idea, but let's come
> >> up with a fair picture of what a non-atomic B2 means, and ask if
> >> that's sufficiently useful.
> >>
> >>> This leads me to conclude that if you're declaring a non-atomic B2,
> >>> you might as well just declare a non-atomic B3.
> >>
> >> Fair point, but let's pull on this string for a moment.  Suppose I
> >> want a null-default, flattenable value, and I'm willing to take the
> >> tearing to get there.  So you're saying "then declare a B3 and use
> >> B3.ref".  But B3.ref was supposed to have the same semantics as an
> >> equivalent B2!  (I realize I'm doing the same thing I just accused
> >> you of above -- taking an old invariant and positiioning it as "the
> >> point".  Stay tuned.)  Which means either that we lose flattening,
> >> again, or we create yet another asymmetry between B3.ref and B2.
> >> Maybe you're saying that the combination of nullable and full-flat is
> >> just too much to ask, but I am not sure it is; in any case, let's
> >> convince ourselves of this before we rule it out.
> >>
> >> Or maybe, what you're saying is that my claim that B3.ref and B2 are
> >> the same thing is the stale thing here, and we can let it go and get
> >> it back in another form.  In which case you're positing a model where:
> >>
> >>  - B1 is unchanged
> >>  - B2 is always atomic, reference, nullable
> >>  - B3 really means "the zero is OK", comes with .ref and .val, and
> >> (non-atomic B3).ref is still tearable?
> >>
> >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2)
> >> in the stacking I've been discussing.  Is that what you're saying?
> >>
> >>     class B1 { }  // ref, identity, atomic
> >>     value-based class B2 { }  // ref, non-identity, atomic
> >>     [ non-atomic ] value class B3 { }  // ref or val, zero is ok,
> >> both projections share atomicity
> >>
> >> If we go with ref-default, then this is a small leap from yesterday's
> >> stacking, because "B3" and "B2" are both reference types, so if you
> >> want a tearable, non-atomic reference type, saying `non-atomic value
> >> class B3` and then just using B3 gets you that. Then:
> >>
> >>  - B2 is like B1, minus identity
> >>  - B3 means "uninitialized values are OK, you get two types, a
> >> zero-default and a non-default"
> >>  - Non-atomicity is an extra property we can add to B3, to get more
> >> flattening in exchange for less integrity
> >>  - The use cases for non-atomic B2 are served by non-atomic B3 (when
> >> .ref is the default)
> >>
> >> I think this still has the properties I want; I can freely choose the
> >> reasonable subsets of { identity, has-zero, nullable, atomicity }
> >> that I want; the orthogonality of non-atomic across buckets becomes
> >> orthogonality of non-atomic with nullity, and the "B3.ref is just
> >> like B2" is shown to be the "false friend."
> >>
> >>
> >
>


-- 

*Mariell Hoversholm *(she/her)

Software Developer

Integrations (Slack #integration-team-public)


Paf

Mobile: +46 73 329 40 18

Br?ddgatan 11 SE602 22, Norrk?ping

Sweden

*Working remote from Uppsala*


This email is confidential and may contain legally privileged information.
If you are not the intended recipient, please contact the sender and delete
the email from your system without producing, distributing or retaining
copies thereof. Thank you.

From brian.goetz at oracle.com  Wed May 11 12:53:10 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 11 May 2022 08:53:10 -0400
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <CAKSb5uNn+7fQAQ-TEjnrMyXpaeLyoBPg--qUfJ=0Jhdt=dXh0A@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAKSb5uNn+7fQAQ-TEjnrMyXpaeLyoBPg--qUfJ=0Jhdt=dXh0A@mail.gmail.com>
Message-ID: <25c654c1-2f8d-ffed-b289-cbe34dd392f9@oracle.com>


> I'm not sure of how much help I am to your gauging interest, but hope 
> it could,
> at the very least, be a small indication of how users of other 
> languages may
> find the ideas brought up.

Oh, trust me, we're well aware that millions of Java developers would 
jump for joy -- at least, initially -- if we pulled this trigger.

My concern is what they do _after_ that.? Giving people something that 
looks superficially like something they think they like from another 
language does not always create lasting joy.? What happens in the first 
five minutes is much less important than what happens in the following 
ten years.


From mariell.hoversholm at paf.com  Wed May 11 13:32:00 2022
From: mariell.hoversholm at paf.com (Mariell Hoversholm)
Date: Wed, 11 May 2022 15:32:00 +0200
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <25c654c1-2f8d-ffed-b289-cbe34dd392f9@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAKSb5uNn+7fQAQ-TEjnrMyXpaeLyoBPg--qUfJ=0Jhdt=dXh0A@mail.gmail.com>
 <25c654c1-2f8d-ffed-b289-cbe34dd392f9@oracle.com>
Message-ID: <CAKSb5uMs=+HhZxz=X0feM_AmQDREZabp7wP9y4D76BmnbvkcGQ@mail.gmail.com>

I would also like to bring up the fact that this could already be possible
in the long term (though not as baked into the language as the idea at
hand). Currently, the ecosystem has many different libraries for nullity
annotations; this ranges from JetBrains Annotations to CheckerQual (with
annotation processing to find bugs; same exists with NullAway and Google
Find Bugs) to Spring/Quarkus.

While we do not have the language feature, code is commonly annotated with
annotations instead, e.g.:

    @NonNull String getSomeValue();
    @Nullable Integer fetchSomething(@Nullable String reference);

and with tools such as lombok, we can even get automatic null-checks on
these annotations. This has been underway for a while now (at least a few
years). However, I think only recently (~1-2 yrs ago) did proper large
projects pick them up (referring to Spring and Guava). I do not see much of
a reason for this to change, given it increases safety in code, but we may
all be surprised there.

That being said, I fully understand the implications any such changes would
have on the language: it would be an official, set-in-stone solution to an
issue we haven't seen culminate for a significant time period. Different
languages, libraries, and tools come with different solutions constantly,
featuring different helpers: the Elvis operator, `?:` (Kotlin, Groovy) or
`??` (C#) is a big one, along with the null-safe accessor, `?.`.
Perhaps it would be a good idea to bring up a separate language topic for
these solutions as serious ideas, as opposed to wild speculation?

My apologies if anything is incoherent; I've had a full day already :-).
Cheers.

On Wed, 11 May 2022 at 14:53, Brian Goetz <brian.goetz at oracle.com> wrote:

>
>
> > I'm not sure of how much help I am to your gauging interest, but hope
> > it could,
> > at the very least, be a small indication of how users of other
> > languages may
> > find the ideas brought up.
>
> Oh, trust me, we're well aware that millions of Java developers would
> jump for joy -- at least, initially -- if we pulled this trigger.
>
> My concern is what they do _after_ that.  Giving people something that
> looks superficially like something they think they like from another
> language does not always create lasting joy.  What happens in the first
> five minutes is much less important than what happens in the following
> ten years.
>
>
>

From kevinb at google.com  Thu May 12 01:45:23 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 11 May 2022 18:45:23 -0700
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
Message-ID: <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>

On Mon, May 9, 2022 at 2:14 PM Brian Goetz <brian.goetz at oracle.com> wrote:

Now, let's step out onto some controversial territory: how do we spell .ref
> and .val?  Specifically, how good a fit is `!` and `?` (henceforth,
> emotional types), now that the B3 election is _solely_ about the
> existence of a zero-default .val?  (Before, it was a poor fit, but now it
> might be credible.  Yet another reason I wanted to tease apart what
> "primitive" meant into independent axes.)
>

I'm certainly open to the possibility that `?` or `!` can help us out here.

But the only way it can fly is if it is clearly a stepping stone toward a
proper nullable-types feature. We would not want to get stuck here.

That unfortunately forces us to have some clear idea how we would
want/expect such a feature to look and work.


My goal here is not to dive into the details of "let's design nullable
> types", as that would be a distraction at this point,
>

(out of order reply) Well... I'm sorry for what follows, then. I think
there is no way to know whether current proposals would be painting
ourselves into a corner unless we explore the topic a bit.

Here is my current concept of this beast:

* bare `String` means what it always has ("String of ambiguous nullness")
* `String!` indicates "an actual string" (I don't like to say "a non-null
string" because *null is not a string!*)
* `String?` indicates "a string or null".
* `!` and `?` also work to project a type variable in either direction.
* Exclamation fatigue would be very real, so assume there is some way to
make `!` the default for some scope
* javac (possibly behind a flag) would need to treat `?` and a
suitably-blessed `@Nullable` identically, and same for `!` and `@NonNull`;
there is just no way to survive a transition otherwise

Enter Valhalla:

* (Let's say we have B1, B2a/B3a (atomic), and B2b/B3b ("b"reakable?))
* On a B3 value type like `int`, `?` would be nonsense and `!` redundant.
* That's equally true of a B3 value type spelled `Complex.val` (if such a
thing exists).
* (assuming `Complex` is ref-default) all three of `Complex`, `Complex?`,
and `Complex!` have valid and distinct meanings.

Now, imagining that we reached this point... would B3a/B3b (as a
language-level thing) become immediately vestigial?. With Complex as a B2a
or B2b, would `Complex!` ever not optimize to the B3-like *implementation*?
I think the (standard) primitives could be understood as B2 themselves,
with `int` just being an alias for `Integer!`.

Obviously, if it would become vestigial, then we should try to avoid ever
having it all, by simply :-) delaying it and solving B2-then-nullness.


Pro: users think they really want emotional types.
>

Quibble: nah, we *know* we want them...


> Con: These will surely not initially be the full emotional types users
> think they want, and so may well be met by "you idiots, these are not the
> emotional types we want"
>

We don't have to worry about this if we have a good story that it's a
stepping stone. The stepping stone could be that it just doesn't work for
B1 types yet. I would say that there's a moral hazard that people might
choose B2 just to get that... but since that only happens if they don't
*need* identity... we'd like them to do that anyway!


> Con: To the extent full emotional types do not align clearly with
> primitive type projections, we might be painted into a corner and it might
> be harder to do emotional types.
>

I'm questioning whether we would need primitive type projections at all,
just nullable/non-null type projections.


> Risk: the language treatment of emotional types is one thing, but the real
> cost in introducing them into the language is annotating the libraries.
> Having them in the language but not annotating the libraries on a timely
> basis may well be a step backwards.
>

For a while we'd only have to annotate as we migrate B1 -> B2.
And it can be automated to a significant degree, more than halfway I think.


If we had full emotional types, some would have their non-nullity erased
> (`String!` erases to the same type descriptor as ordinary `String`) and
> some would have it reified (Integer! translates to a separate type, the `I`
> carrier.)  This means that migrating `String` to `String` might be
> binary-compatible, but `Integer` to `Integer!` would not be.  (This is
> probably an acceptable asymmetry.)
>

Agree acceptable.


> But a bigger question is whether an erased `String!` should be backed up
> by a synthetic null check at the boundary between checked and unchecked
> code, such as method entry points (just as unpacking a T from a generic is
> backed up by a synthetic cast at the boundary between generic and explicit
> code.)  This is reasonable (and cheap enough), but may be on a collision
> course with some interpretations of `String!`.
>

There seem to be a continuum of approaches from "more checking/less
pollution" to "more pollution/problems get found far from where they really
happened." The generics experience was that few people bothered to use
`checkedCollection()`, and I doubt many added type checks via bytecode
either, and it all worked well enough, buuut there are a few reasons for
that that don't translate to null.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu May 12 12:22:06 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 12 May 2022 08:22:06 -0400
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
Message-ID: <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com>


> Here is my current concept of this beast:
>

The next installment of this is: how does assignment and conversion 
work?? Presumably, it starts with:

 ?- there is a null-discarding conversion from T? to T! (this is a 
narrowing conversion)
 ?- there is a nullability-injecting conversion from T! to T? (this is a 
widening conversion)

and then we get to decide: which conversions are allowed in assignment 
context?? Clearly a nullability-injecting conversion is OK here 
(assigning String! to String? is clearly OK, it's a widening), so the 
question is: how do you go from `T?` to `T!` ? Options include:

 ?- it's like unboxing, let the assignment through, perhaps with a 
warning, and NPE if it fails
 ?- require a narrowing cast

> Enter Valhalla:
>
> * (Let's say we have B1, B2a/B3a (atomic), and B2b/B3b ("b"reakable?))
> * On a B3 value type like `int`, `?` would be nonsense and `!` redundant.
> * That's equally true of a B3 value type spelled `Complex.val` (if 
> such a thing exists).
> * (assuming `Complex` is ref-default) all three of `Complex`, 
> `Complex?`, and `Complex!` have valid and distinct meanings.

If we have both .val and nullity annotations I think we are losing. The 
idea here would be that B3.val *is literally spelled* `B3!`. The 
declaration story is unchanged: class B1 / value-based class B2 / [ 
non-atomic ] value class B3, for some suitable spellings.

> Now, imagining that we reached this point... would B3a/B3b (as a 
> language-level thing) become immediately vestigial?.

Unfortunately not.? We need permission to unleash the zero-default type, 
because many B2 types (e.g., LocalDate) have no good zero.? So B3 is 
needed to unlock that.

> With Complex as a B2a or B2b, would `Complex!` ever not optimize to 
> the B3-like *implementation*? I think the (standard) primitives could 
> be understood as B2 themselves, with?`int` just being an alias for 
> `Integer!`.

A B3:

 ??? value class Integer { ... } // int is alias for Integer!

What this short discussion has revealed is that there really are two 
interpretations of non-null here:

 ?- In the traditional cardinality-based interpretation, T! means: "a 
reference, but it definitely holds an instance of T, so you better have 
initialized it properly"
 ?- In the B3 interpretation, it means: "the zero (uninitialized, 
not-run-through-the-ctor) value is a valid value, so you don't need to 
have initialized it."

>     Con: To the extent full emotional types do not align clearly with
>     primitive type projections, we might be painted into a corner and
>     it might be harder to do emotional types.
>
>
> I'm questioning whether we would need primitive type projections at 
> all, just nullable/non-null type projections.

Indeed, that was the point of my query.


From kevinb at google.com  Thu May 12 15:25:52 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 12 May 2022 08:25:52 -0700
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
 <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com>
Message-ID: <CAGKkBkt+sFei=pzwgOtYiTaXrj+EuVBk+MZDhTxjcbO+QPiefg@mail.gmail.com>

On Thu, May 12, 2022 at 5:22 AM Brian Goetz <brian.goetz at oracle.com> wrote:


>  - there is a nullability-injecting conversion from T! to T? (this is a
> widening conversion)
>

I think we'd expect full subtyping here, right? It needs to work for
covariant arrays, covariant returns, type argument bounds, etc.


> and then we get to decide: which conversions are allowed in assignment
> context?  Clearly a nullability-injecting conversion is OK here (assigning
> String! to String? is clearly OK, it's a widening), so the question is: how
> do you go from `T?` to `T!` ?  Options include:
>
>  - it's like unboxing, let the assignment through, perhaps with a warning,
> and NPE if it fails
>  - require a narrowing cast
>

Yes, I do think we want a cast there (a special operator for it is very
helpful so you don't have to repeat the base type), but as far as I know
the case could be made either way for error vs. warning if the cast isn't
there.

Enter Valhalla:
>
> * (Let's say we have B1, B2a/B3a (atomic), and B2b/B3b ("b"reakable?))
> * On a B3 value type like `int`, `?` would be nonsense and `!` redundant.
> * That's equally true of a B3 value type spelled `Complex.val` (if such a
> thing exists).
> * (assuming `Complex` is ref-default) all three of `Complex`, `Complex?`,
> and `Complex!` have valid and distinct meanings.
>
>
> If we have both .val and nullity annotations I think we are losing.  The
> idea here would be that B3.val *is literally spelled* `B3!`.  The
> declaration story is unchanged: class B1 / value-based class B2 / [
> non-atomic ] value class B3, for some suitable spellings.
>

I have tried to write the above to account for *that* possibility and for
the subtly different possibility that you don't "spell .val" at all, you
just express your nullability needs and the system optimizes to a value
type when it can.


Now, imagining that we reached this point... would B3a/B3b (as a
> language-level thing) become immediately vestigial?.
>
> Unfortunately not.  We need permission to unleash the zero-default type,
> because many B2 types (e.g., LocalDate) have no good zero.  So B3 is needed
> to unlock that.
>

(Sorry to be a skipping record, but *no* type has a great default value.
It's just about tolerable levels of badness. We tolerate `long` and will
tolerate `ulong` because we're habituated to it. At best it is sometimes a
tiny convenience. It's never exactly an *advantage* to be unable to
distinguish whether a variable was ever initialized.)

But, suppose the *class* is identifiable in some way as friendly to that
default value. I'm still struggling to think through whether we also
strictly need to have something at the use site equivalent to `.val`. Or if
just knowing the nullness bit is enough. It may be fundamentally the same
question you're asking; I'm not sure.


What this short discussion has revealed is that there really are two
> interpretations of non-null here:
>
>  - In the traditional cardinality-based interpretation, T! means: "a
> reference, but it definitely holds an instance of T, so you better have
> initialized it properly"
>  - In the B3 interpretation, it means: "the zero (uninitialized,
> not-run-through-the-ctor) value is a valid value, so you don't need to have
> initialized it."
>

I'm not sure these are that different. I think that as types they are the
same. It's the conjuring of default values, specifically, that differs: we
can do it for B2, B3, and B3!, and we don't know how to find one for B2!.
But that's not a complication, it's just precisely what we're saying B2
exists for: to stop that from happening.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu May 12 15:59:28 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 12 May 2022 11:59:28 -0400
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <CAGKkBkt+sFei=pzwgOtYiTaXrj+EuVBk+MZDhTxjcbO+QPiefg@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
 <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com>
 <CAGKkBkt+sFei=pzwgOtYiTaXrj+EuVBk+MZDhTxjcbO+QPiefg@mail.gmail.com>
Message-ID: <fb60fe60-3d31-b8e7-94e8-f3e5b45f42b6@oracle.com>


> On Thu, May 12, 2022 at 5:22 AM Brian Goetz <brian.goetz at oracle.com> 
> wrote:
>
>     ?- there is a nullability-injecting conversion from T! to T? (this
>     is a widening conversion)
>
>
> I think we'd expect full subtyping here, right? It needs to work for 
> covariant arrays, covariant returns, type argument bounds, etc.

There's two questions here, one at the language level and one at the VM 
level.

At the VM level, `I` is not going to be a subtype of `LInteger`.? At the 
language level, we have a choice of whether to use subtyping or widening 
conversions, but given that the VM is expecting a widening conversion, 
it is probably better to align to that.? (Similarly, the distinction 
between int and Integer in overload selection is based on the assumption 
that they are not subtypes, but instead related by conversions.)

So while abstractly, the value sets may form subsets, which says that at 
least _structurally_ they are subtypes, we try to avoid having subtype 
relationships between things that use different representations, because 
it creates difficult seams in translation, and lean on conversion 
machinery instead.

In practice, the distinction between "int widens to long" and "int <: 
long" is not particularly visible, except in corner cases like "boxing 
is allowed in loose invocation contexts but not strict invocation 
contexts."

>
>     and then we get to decide: which conversions are allowed in
>     assignment context?? Clearly a nullability-injecting conversion is
>     OK here (assigning String! to String? is clearly OK, it's a
>     widening), so the question is: how do you go from `T?` to `T!` ??
>     Options include:
>
>     ?- it's like unboxing, let the assignment through, perhaps with a
>     warning, and NPE if it fails
>     ?- require a narrowing cast
>
>
> Yes, I do think we want a cast there (a special operator for it is 
> very helpful so you don't have to repeat the base type), but as far as 
> I know the case could be made either way for error vs. warning if the 
> cast isn't there.

This is the decision point I want to highlight; while one might at first 
assume "well obviously you should explicitly convert", there are 
actually more choices than the obvious one, and it is a decision that 
should be made deliberately.

> But, suppose the *class* is identifiable in some way as friendly to 
> that default value. I'm still struggling to think through whether we 
> also strictly need to have something at the use site equivalent to 
> `.val`. Or if just knowing the nullness bit is enough. It may be 
> fundamentally the same question you're asking; I'm not sure.

I think we may be saying the same thing.? It is a declaration-site 
property as to whether we want to tolerate uninitialized values.? We do 
for int; we probably also do for Complex, not only because "its a number 
and the existing numbers work that way", but because there's a 
performance tradeoff, which is that being intolerant of uninitialized 
values has a footprint cost, and effectively doubling the size of a flat 
`Complex[]` will not be appreciated.

For such a zero-tolerant class, there is still room to make the choice 
at the use site which flavor you want.? One positive consequence of 
having decomplected atomicity from { nullity, primitive-ness } is that 
it becomes *possible* to spell this distinction with emotional sigils, 
rather than some weirder thing (e.g., .val.)

>
>     What this short discussion has revealed is that there really are
>     two interpretations of non-null here:
>
>     ?- In the traditional cardinality-based interpretation, T! means:
>     "a reference, but it definitely holds an instance of T, so you
>     better have initialized it properly"
>     ?- In the B3 interpretation, it means: "the zero (uninitialized,
>     not-run-through-the-ctor) value is a valid value, so you don't
>     need to have initialized it."
>
>
> I'm not sure these are that different. I think that as types they are 
> the same. It's the conjuring of default values, specifically, that 
> differs: we can do it for B2, B3, and B3!, and we don't know how to 
> find one for B2!. But that's not a complication, it's just precisely 
> what we're saying B2 exists for: to stop that from happening.
>

This question is at the heart of this sub-thread.

I think what you are saying is that for ref-only classes (B1 and B2), 
then T! is a _restriction_ type (which we will probably erase to the 
erasure of T), whereas for for zero-capable classes (B3), then `T!` is a 
true projection which makes the null value *unrepresentable*, and that 
you're OK with that.


From daniel.smith at oracle.com  Thu May 12 17:07:08 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 12 May 2022 17:07:08 +0000
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
Message-ID: <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com>

> On May 11, 2022, at 7:45 PM, Kevin Bourrillion <kevinb at google.com> wrote:
> 
> * `String!` indicates "an actual string" (I don't like to say "a non-null string" because *null is not a string!*)

The thread talks around this later, but... what do I get initially if I declare a field/array component of type 'String!'?

I think in most approaches this would end up being a warning, with the field/array erased to LString and storing a null. (Alternatively, we build 'String!' into the JVM, and I think that has to come with "uninitialized" detection on reads. We talked through that strategy quite a bit in the context of B2 before settling on "just use 'null'".)

So this is potentially a fundamental difference between String! and Point!: 'new String![5]' and 'new Point![5]' give you very different arrays.

> * Exclamation fatigue would be very real, so assume there is some way to make `!` the default for some scope

+1

Yes, I think it's a dead end to expect users to sprinkle '!' everywhere they don't want nulls?this is usually the informal default in common programming practice, so we need some way to enable flipping the default.

Lesson for B3: if B3! is primarily meant to be interpreted as a null-free type, people will naturally want to use that null-free type everywhere, and will want it to be default. (Reference default makes more sense where you generally want to use the nullable type, and only occasionally will opt in to the value type, probably for reasons other than whether 'null' is semantically meaningful.)

Also, a danger for B3 is that a rather casual flipping of defaults doesn't just affect compiler behavior?it changes the initial value and possibly atomicity of a field/array. So a little more scary for a random switch somewhere to change all your 'Point' usages from ref-default to val-default.

From brian.goetz at oracle.com  Thu May 12 17:17:53 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 12 May 2022 13:17:53 -0400
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
 <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com>
Message-ID: <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com>


>> * Exclamation fatigue would be very real, so assume there is some way to make `!` the default for some scope
> +1
>
> Yes, I think it's a dead end to expect users to sprinkle '!' everywhere they don't want nulls?this is usually the informal default in common programming practice, so we need some way to enable flipping the default.

On the other hand, this is on a collision course with Kevin's 
"ref-default" recommendation, which had many strong supporting reasons, 
whether this is spelled `!` or `.val`.? The "but it will be tiring for 
people to type" doesn't feel like a very good reason to flip the default 
from something that has such strong objective justifications.

(Dan was never sold on ref-default, but Kevin was, so I'll leave it to 
him to reconcile "ref-default is the right default" with "but, 
exclamation fatigue.")

From kevinb at google.com  Thu May 12 22:14:02 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 12 May 2022 15:14:02 -0700
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
 <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com>
 <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com>
Message-ID: <CAGKkBku4X-CAxSAm=vKeRe3KOA9AfJBMGMO1dsbL6o=opONzyA@mail.gmail.com>

I don't see the conflict. I'm saying, yeah, there *will* be exclamation
fatigue until a feature comes along eventually to relieve it. (In the worst
case, that's `public null-marked class...`; in the best case it's just
`language-level 22;` or what have you.) But I still think it's the right
thing to do anyway.


On Thu, May 12, 2022 at 10:18 AM Brian Goetz <brian.goetz at oracle.com> wrote:

>
>
> >> * Exclamation fatigue would be very real, so assume there is some way
> to make `!` the default for some scope
> > +1
> >
> > Yes, I think it's a dead end to expect users to sprinkle '!' everywhere
> they don't want nulls?this is usually the informal default in common
> programming practice, so we need some way to enable flipping the default.
>
> On the other hand, this is on a collision course with Kevin's
> "ref-default" recommendation, which had many strong supporting reasons,
> whether this is spelled `!` or `.val`.  The "but it will be tiring for
> people to type" doesn't feel like a very good reason to flip the default
> from something that has such strong objective justifications.
>
> (Dan was never sold on ref-default, but Kevin was, so I'll leave it to
> him to reconcile "ref-default is the right default" with "but,
> exclamation fatigue.")
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From john.r.rose at oracle.com  Mon May 16 21:46:23 2022
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 16 May 2022 14:46:23 -0700
Subject: [External] : Re: On tearing
In-Reply-To: <4917C3C1-B0DC-48A5-B987-F7AB95FA1DB4@oracle.com>
References: <F622BFE1-CBEA-4A45-BC55-933C30DC96C3@oracle.com>
 <423731687.17626456.1651073926931.JavaMail.zimbra@u-pem.fr>
 <4917C3C1-B0DC-48A5-B987-F7AB95FA1DB4@oracle.com>
Message-ID: <2C6D8B90-7551-4064-9608-1B59EC2B6FA3@oracle.com>


On 27 Apr 2022, at 9:50, Brian Goetz wrote:

> ?This whole area seems extremely prone to wishful thinking; we hate 
> the idea of making something slower than it could be, that we convince 
> ourselves that ?the user can reason about this.?  Whether or not 
> it is ?too big a leap?, I think it is a bigger leap than you are 
> thinking.
>
>> For me, we should make the model clear, the compiler should insert a 
>> non user overridable default constructor but not more because using a 
>> primitive class is already an arcane construct.
>
> This might help a little bit, but it is addressing the smaller part of 
> the problem (zeroes); we need to address the bigger problem (tearing).

I think I mostly agree with Remi on this point.

A tearable primitive class (call it T-B3 as opposed A-B3 which is 
atomic) can, as you describe, have its invariants broken by races that 
have the effect of writing arbitrary (or almost arbitrary) values into 
fields at any time.

A regular mutable B1 class has a similar problem, except it can be 
defended by a constructor and/or mutator methods that check per-field 
values being stored.  Let?s look at the simplest case (which is rare 
in practice, since it is scary):  Suppose a class has public fields 
which are mutable.  Call such a class a OM-B1 class meaning ?open 
mutable B1?.

I think that we can (and probably should) address this educational issue 
by making T-B3 classes look (somehow) like OM-B1 classes.  Then every 
bit of training which leads users to be watchful in their use of OM-B1 
will apply to T-B3 classes.

How to make T-B3 look like OM-B1?  Well, Remi?s idea of a mandated 
open constructor gets most of the way there.  Mandating that the B3 
fields are public is also helpful.  (Records kinda-sorta do that, but 
through component reader methods.)  I truly think those two steps are 
enough, to make it clear to an author of a T-B3 that, if a T-B3 
container is accessible to untrusted parties, then it is free to take on 
any combination of field values at any time.  (And I?m using the word 
?free? here in the rigorous math sense, as in a free product type.)

A further step to nail down the message that the components are 
independently variable would be to provide a reconstructor syntax of 
some sort that amounted to an open invitation to (a) take an instance of 
the T-B3, (b) modify or replace any or all of its field values, and then 
(c ) put it back in the container it came from.  By ?open? I mean 
?public to all comers?, which means that every baseline Java 
programmer, who knows about public mutable fields (we can?t cure world 
hunger or negligent Java scribblers), will know that, using that syntax, 
anybody can write anything into any T-B3 value stored in an unprotected 
container.  Just like a OM-B1 object.  Nothing new to see, and all the 
old warnings apply!

We would have to be careful about our messaging about immutability here, 
to prevent folks from mistakenly confusing a T-B3 with an immutable B1 
(I-B1) or B2 (all of which are truly immutable).

One way to do this, that would be blindingly obvious (and IMO too 
blinding), would to (a) allow a `non-final` modifier on fields, 
canceling any implicit immutability property, and (b) *require* 
`non-final` modifiers on all fields in a T-B3 class.  I put this forward 
in the service of brainstorming, to show an extreme (too extreme IMO) 
way to forcibly advertise the T- in T-B3 classes.  But as I said, I 
think in practice it will be enough to make T-B3 classes look like OM-B1 
classes, which are clearly not immutable, even without a `non-final` 
modifier.

>
> I don?t think we have to go so far as to outlaw tearing, but there 
> have to be enough cues, at the use and declaration site, that 
> something interesting is happening here.

Yes, cues.  And my point above, mainly, is that to the extent such cues 
are available in the world of OM-B1 classes already, we should make use 
of them for T-B3 classes.  And where not, such cues should make it 
really clear that there is an open invitation (public to untrusted 
parties) to make piecemeal edits to the fields of a T-B3 class.

>
>> There is no point to nanny people here given that only experts will 
>> want to play with it.
>
> This is *definitely* wishful thinking.  People will hear that this is 
> a tool for performance; 99% of Java developers will convince 
> themselves they are experts because, performance!  Developers 
> pathologically over-rotate towards whatever the Stack Overflow crowd 
> says is faster.  (And so will Copilot.)  So, definitely no.  This 
> argument is pure wishful thinking.   (I will admit to being 
> occasionally tempted by this argument too, but then I snap out of it.)

I?m with Brian on this.

>> But we (the EG) can also fail, and make a primitive class too easy to 
>> use, what scare me is people using primitive class just because it's 
>> not nullable.
>
> Yes, this is one of the many pitfalls we have to avoid!
>
> This game is hard.

Yep.  Removing null for footprint, by moving from B2 to B3, is a normal 
thing people will do, but it if also introduces the T- part 
(tearability) secretly, that?s probably a lose.  Which leads to the 
current consideration of tearability as partially independent from the 
B2/B3 axis. So B2 XOR B3 = nullability alone, not = 
nullability+atomicity.

Separately, I *do* think T-B3 is more likely to be useful than A-B3 
(atomic B3), and likewise T-B2 has limited use compared to A-B2.  This 
is why I?ve been content with the conflation of T-B3 with B3-simple, 
for so long.

But, embracing the current conversation, I do think that T-B3 needs to 
be *really clearly componentwise mutable*.  I think that whether T-B3 is 
the default setting of B3 or some further opt-in (from default A-B3 to 
T-B3).

And, to summarize, mandated wide-open fields and/or mandated dumb 
non-checking constructors are a legitimate way to advertise the 
open-ness of T-B3 classes.  Then the tearability part is a small 
corollary of the Big Story, which is the openness of the fields to all 
comers.

A final point:  This is why in our last few meetings I keep mentioning 
the C++ idea of a `struct`, which is not a non-class, but rather a class 
whose defaults are set to be open to all comers.  I think if we do a 
?struct-like? design for T-B3 we can win.


From daniel.smith at oracle.com  Wed May 18 14:24:12 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 18 May 2022 14:24:12 +0000
Subject: EG meeting, 2022-05-18
Message-ID: <C4D1CB1C-AE4D-44ED-BFC5-0338F60EB269@oracle.com>

EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT).

Recent threads to discuss:

- "User model stacking: current status": Brian talked about factoring atomicity out of the B2/B3 choice, as an extra choice applying to B3 (and perhaps B2, too)

- "Nullity (was: User model stacking: current status)": Brian explored the possibility of using '?' and '!' as alternatives to '.ref' and '.val' for B3 classes, anticipating more general support in the language for null-free types

- "User model: terminology": Brian summarized the different features that need labels (non-identity classes, non-identity classes with a valid zero, tearable classes, types with and without null)


From daniel.smith at oracle.com  Wed May 18 18:47:29 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Wed, 18 May 2022 18:47:29 +0000
Subject: EG meeting, 2022-05-18
In-Reply-To: <C4D1CB1C-AE4D-44ED-BFC5-0338F60EB269@oracle.com>
References: <C4D1CB1C-AE4D-44ED-BFC5-0338F60EB269@oracle.com>
Message-ID: <4E552391-A87A-41F6-A148-388F8F61FCD8@oracle.com>


> On May 18, 2022, at 8:24 AM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
> EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT).
> 
> Recent threads to discuss:
> 
> - "User model stacking: current status": Brian talked about factoring atomicity out of the B2/B3 choice, as an extra choice applying to B3 (and perhaps B2, too)
> 
> - "Nullity (was: User model stacking: current status)": Brian explored the possibility of using '?' and '!' as alternatives to '.ref' and '.val' for B3 classes, anticipating more general support in the language for null-free types
> 
> - "User model: terminology": Brian summarized the different features that need labels (non-identity classes, non-identity classes with a valid zero, tearable classes, types with and without null)

Summary of this discussion:

Reviewed how we ended up with concerns about the status quo approach to primitive classes (documented in JEP 401), how we wanted a better story for tearing, and different strategies that have been considered there. Nothing new here, just summarizing.

Dug into some details of the nullable+tearable combination:

- A tearable B2 class is probably a mismatch?if you can tear, you can create a zero value, but the B2 has declared itself zero-hostile. No objections, then, to the idea that atomic/non-atomic is a property of B3 only (or equivalently, by giving up atomicity you've entered a new category, B4).

- Tearable+nullable B3 types (e.g., 'LPoint;' could be considered tearable) remain a possible area to explore. There's some concern about user model?tearing a null leads to surprising outcomes after a null check and possible hard-to-observe memory leaks?and implementation. It would help to ground this conversation in some more concrete examples, though.


From daniel.smith at oracle.com  Thu May 19 23:14:07 2022
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 19 May 2022 23:14:07 +0000
Subject: Spec change documents for Value Objects
In-Reply-To: <12C0C3B4-1A4C-4FCF-AEFD-A577F2333B27@oracle.com>
References: <12C0C3B4-1A4C-4FCF-AEFD-A577F2333B27@oracle.com>
Message-ID: <92660EAE-70A6-4FD0-8ECD-4A795D139F2E@oracle.com>

On Apr 27, 2022, at 5:01 PM, Dan Smith <daniel.smith at oracle.com<mailto:daniel.smith at oracle.com>> wrote:

Please see these two spec change documents for JLS and JVMS changes in support of the Value Objects feature.

Here's a revision, including some additional language checks that I missed in the first iteration.

http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jls.html
http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jvms.html

----------

Diff of the changes:

diff --git a/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md> b/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
index 3e8e44aa2c..392242efb9 100644
--- a/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
+++ b/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
@@ -501,9 +501,9 @@ It is permitted for the class declaration to redundantly specify the `final`
 modifier.

 The `identity` and `value` modifiers limit the set of classes that can extend
-an `abstract` class ([8.1.4]).
+a non-`final` class ([8.1.4]).

-Special restrictions apply to the field declarations ([8.3.1.2]), method
+Special restrictions apply to the field declarations ([8.3.1]), method
 declarations ([8.4.3.6]), and constructors ([8.8.7]) of a class that is not an
 `identity` class.

@@ -524,6 +524,61 @@ Should there be?


+#### 8.1.3 Inner Classes and Enclosing Instances {#jls-8.1.3}
+
+...
+
+An inner class *C* is a *direct inner class of a class or interface O* if *O* is
+the immediately enclosing class or interface declaration of *C* and the
+declaration of *C* does not occur in a static context.
+
+> If an inner class is a local class or an anonymous class, it may be declared
+> in a static context, and in that case is not considered an inner class of any
+> enclosing class or interface.
+
+A class *C* is an *inner class of class or interface O* if it is either a direct
+inner class of *O* or an inner class of an inner class of *O*.
+
+> It is unusual, but possible, for the immediately enclosing class or interface
+> declaration of an inner class to be an interface.
+> This only occurs if the class is a local or anonymous class declared in a
+> `default` or `static` method body ([9.4]).
+
+A class or interface *O* is the *zeroth lexically enclosing class or interface
+declaration of itself*.
+
+A class *O* is the *n'th lexically enclosing class declaration of a class C* if
+it is the immediately enclosing class declaration of the *n-1*'th lexically
+enclosing class declaration of *C*.
+
+An instance *i* of a direct inner class *C* of a class or interface *O* is
+associated with an instance of *O*, known as the *immediately enclosing instance
+of i*.
+The immediately enclosing instance of an object, if any, is determined when the
+object is created ([15.9.2]).
+
+An object *o* is the *zeroth lexically enclosing instance of itself*.
+
+An object *o* is the *n'th lexically enclosing instance of an instance i* if it
+is the immediately enclosing instance of the *n-1*'th lexically enclosing
+instance of *i*.
+
+An instance of an inner local class or an anonymous class whose declaration
+occurs in a static context has no immediately enclosing instance.
+Also, an instance of a `static` nested class ([8.1.1.4]) has no immediately
+enclosing instance.
+
+**It is a compile-time error if an inner class has an immediately enclosing
+instance but is declared an `abstract` `value` class ([8.1.1.1], [8.1.1.5]).**
+
+> **If an abstract class is declared with neither the `value` nor the `identity`
+> modifier, but it is an inner class and has an immediately enclosing instance,
+> it is implicitly an `identity` class, per [8.1.1.5].**
+
+...
+
+
+
 #### 8.1.4 Superclasses and Subclasses {#jls-8.1.4}

 The optional `extends` clause in a normal class declaration specifies the
@@ -761,8 +816,110 @@ instance method.**


+### 8.6 Instance Initializers {#jls-8.6}
+
+An *instance initializer* declared in a class is executed when an instance of
+the class is created ([12.5], [15.9], [8.8.7.1]).
+
+*InstanceInitializer:*
+: *Block*
+
+**It is a compile-time error for an `abstract` `value` class to declare an
+instance initializer.**
+
+> **If an abstract class is declared with neither the `value` nor the `identity`
+> modifier, but it declares an instance initializer, it is implicitly an
+> `identity` class, per [8.1.1.5].**
+
+It is a compile-time error if an instance initializer cannot complete normally
+([14.22]).
+
+It is a compile-time error if a `return` statement ([14.17]) appears anywhere
+within an instance initializer.
+
+An instance initializer is permitted to refer to the current object using the
+keyword `this` ([15.8.3]) or the keyword `super` ([15.11.2], [15.12]), and to
+use any type variables in scope.
+
+Restrictions on how an instance initializer may refer to instance variables,
+even when the instance variables are in scope, are specified in [8.3.3].
+
+Exception checking for an instance initializer is specified in [11.2.3].
+
+
+
 ### 8.8 Constructor Declarations {#jls-8.8}

+A *constructor* is used in the creation of an object that is an instance of a
+class ([12.5], [15.9]).
+
+*ConstructorDeclaration:*
+: {*ConstructorModifier*} *ConstructorDeclarator* [*Throws*] *ConstructorBody*
+
+*ConstructorDeclarator:*
+: [*TypeParameters*] *SimpleTypeName*\
+  `(` [*ReceiverParameter* `,`] [*FormalParameterList*] `)`
+
+*SimpleTypeName:*
+: *TypeIdentifier*
+
+The rules in this section apply to constructors in all class declarations,
+including enum declarations and record declarations.
+However, special rules apply to enum declarations with regard to constructor
+modifiers, constructor bodies, and default constructors; these rules are stated
+in [8.9.2].
+Special rules also apply to record declarations with regard to constructors, as
+stated in [8.10.4].
+
+The *SimpleTypeName* in the *ConstructorDeclarator* must be the simple name of
+the class that contains the constructor declaration, or a compile-time error
+occurs.
+
+In all other respects, a constructor declaration looks just like a method
+declaration that has no result ([8.4.5]).
+
+Constructor declarations are not members.
+They are never inherited and therefore are not subject to hiding or overriding.
+
+**It is a compile-time error for an `abstract` `value` class to declare a
+nontrivial constructor ([8.1.1.5]).**
+
+> **If an abstract class is declared with neither the `value` nor the `identity`
+> modifier, but it declares a nontrivial constructor, it is implicitly an
+> `identity` class, per [8.1.1.5].**
+
+:::editorial
+It's not ideal to define a new term just for the purpose of this rule. But the
+list of things to check is long, and we don't want to repeat it. Perhaps it
+would be helpful to somehow overlap this definition with the discussion of
+default constructors in [8.8.9].
+:::
+
+Constructors are invoked by class instance creation expressions ([15.9]), by the
+conversions and concatenations caused by the string concatenation operator `+`
+([15.18.1]), and by explicit constructor invocations from other constructors
+([8.8.7]).
+Access to constructors is governed by access modifiers ([6.6]), so it is
+possible to prevent class instantiation by declaring an inaccessible constructor
+([8.8.10]).
+
+Constructors are never invoked by method invocation expressions ([15.12]).
+
+:::example
+
+Example 8.8-1. Constructor Declarations
+
+```
+class Point {
+    int x, y;
+    Point(int x, int y) { this.x = x; this.y = y; }
+}
+```
+
+:::
+
+
+
 #### 8.8.7 Constructor Body {#jls-8.8.7}

 The first statement of a constructor body may be an explicit invocation of
@@ -2231,7 +2388,7 @@ synchronization.
 [8.1.1.4]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.1.4
 [8.1.1.5]: #jls-8.1.1.5
 [8.1.2]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.2
-[8.1.3]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.3
+[8.1.3]: #jls-8.1.3
 [8.1.4]: #jls-8.1.4
 [8.1.5]: #jls-8.1.5
 [8.1.6]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.6
@@ -2260,9 +2417,9 @@ synchronization.
 [8.4.8.3]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.4.8.3
 [8.4.9]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.4.9
 [8.5]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.5
-[8.6]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.6
+[8.6]: #jls-8.6
 [8.7]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.7
-[8.8]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8
+[8.8]: #jls-8.8
 [8.8.1]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8.1
 [8.8.2]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8.2
 [8.8.3]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8.3

diff --git a/closed/src/java.se/share/specs/value-objects-jvms.md<http://java.se/share/specs/value-objects-jvms.md> b/closed/src/java.se/share/specs/value-objects-jvms.md<http://java.se/share/specs/value-objects-jvms.md>
index fe747ad3bd..70e24541ce 100644
--- a/closed/src/java.se/share/specs/value-objects-jvms.md<http://java.se/share/specs/value-objects-jvms.md>
+++ b/closed/src/java.se/share/specs/value-objects-jvms.md<http://java.se/share/specs/value-objects-jvms.md>
@@ -1561,6 +1561,70 @@ Attribute                          Location                    `class`


+#### 4.7.6 The `InnerClasses` Attribute {#jvms-4.7.6}
+
+...
+
+inner_class_access_flags
+
+:   The value of the `inner_class_access_flags` item is a mask of flags used
+    to denote access permissions to and properties of class or interface *C*
+    as declared in the source code from which this `class` file was
+    compiled.
+    It is used by a compiler to recover the original information when source
+    code is not available.
+    The flags are specified in [Table 4.7.6-A].
+
+    ::: {.table #jvms-4.7.6-300-D.1-D.1}
+
+    Table 4.7.6-A. Nested class access and property flags
+
+    ----------------------------------------------------------------------------
+    Flag Name                Value       Interpretation
+    ------------------------ ----------- ---------------------------------------
+    `ACC_PUBLIC`             0x0001      Marked or implicitly `public` in
+                                         source.
+
+    `ACC_PRIVATE`            0x0002      Marked `private` in source.
+
+    `ACC_PROTECTED`          0x0004      Marked `protected` in source.
+
+    `ACC_STATIC`             0x0008      Marked or implicitly `static` in
+                                         source.
+
+    `ACC_FINAL`              0x0010      Marked or implicitly `final` in
+                                         source.
+
+    **`ACC_IDENTITY`**       **0x0020**  **Declared as an `identity` class or
+                                         interface.**
+
+    **`ACC_VALUE`**          **0x0040**  **Declared as a `value` class or
+                                         interface.**
+
+    `ACC_INTERFACE`          0x0200      Was an `interface` in source.
+
+    `ACC_ABSTRACT`           0x0400      Marked or implicitly `abstract` in
+                                         source.
+
+    `ACC_SYNTHETIC`          0x1000      Declared synthetic; not present in the
+                                         source code.
+
+    `ACC_ANNOTATION`         0x2000      Declared as an annotation interface.
+
+    `ACC_ENUM`               0x4000      Declared as an `enum` class.
+    ----------------------------------------------------------------------------
+
+    :::
+
+    All bits of the `inner_class_access_flags` item not assigned in [Table
+    4.7.6-A] are reserved for future use.
+    They should be set to zero in generated `class` files and should be
+    ignored by Java Virtual Machine implementations.
+
+...
+
+
+
 #### **4.7.31 The `Preload` Attribute** {#jvms-4.7.31}

 :::inserted


From kevinb at google.com  Thu May 26 17:12:56 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 26 May 2022 10:12:56 -0700
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <CAGKkBku4X-CAxSAm=vKeRe3KOA9AfJBMGMO1dsbL6o=opONzyA@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
 <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com>
 <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com>
 <CAGKkBku4X-CAxSAm=vKeRe3KOA9AfJBMGMO1dsbL6o=opONzyA@mail.gmail.com>
Message-ID: <CAGKkBktx_r9De4nkzvJHsJdnRHiynUs4Qq9MTgGX7Gavo-2+GQ@mail.gmail.com>

Returning to this thread and going up a level or two:

The real impact of this discussion, imho, should not be "now let's rush a
declarative nullness feature out asap", or even "let's solve bucket 3 now
in a way nullness will have to be harmonious with later". What I humbly
suggest it points to is, maybe: "let's shift focus right now to delivering
just bucket 2 asap, so that we keep our options open longer for the rest".
Is that fair? It seems like a very good plan to me. Bucket 2 is pretty
non-invasive to the language model and still improves matters for Integer.

Thoughts?


On Thu, May 12, 2022 at 3:14 PM Kevin Bourrillion <kevinb at google.com> wrote:

> I don't see the conflict. I'm saying, yeah, there *will* be exclamation
> fatigue until a feature comes along eventually to relieve it. (In the worst
> case, that's `public null-marked class...`; in the best case it's just
> `language-level 22;` or what have you.) But I still think it's the right
> thing to do anyway.
>
>
> On Thu, May 12, 2022 at 10:18 AM Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
>>
>>
>> >> * Exclamation fatigue would be very real, so assume there is some way
>> to make `!` the default for some scope
>> > +1
>> >
>> > Yes, I think it's a dead end to expect users to sprinkle '!' everywhere
>> they don't want nulls?this is usually the informal default in common
>> programming practice, so we need some way to enable flipping the default.
>>
>> On the other hand, this is on a collision course with Kevin's
>> "ref-default" recommendation, which had many strong supporting reasons,
>> whether this is spelled `!` or `.val`.  The "but it will be tiring for
>> people to type" doesn't feel like a very good reason to flip the default
>> from something that has such strong objective justifications.
>>
>> (Dan was never sold on ref-default, but Kevin was, so I'll leave it to
>> him to reconcile "ref-default is the right default" with "but,
>> exclamation fatigue.")
>>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu May 26 17:19:57 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 26 May 2022 13:19:57 -0400
Subject: Nullity (was: User model stacking: current status)
In-Reply-To: <CAGKkBktx_r9De4nkzvJHsJdnRHiynUs4Qq9MTgGX7Gavo-2+GQ@mail.gmail.com>
References: <ca46ba38-920b-7feb-6fb4-3130a4b915ad@oracle.com>
 <b164a895-1f60-2ded-23ca-f55045d7b525@oracle.com>
 <E942B0BC-926B-4562-9045-41CB3132BD8A@oracle.com>
 <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com>
 <cf3c1ec2-ccef-9a63-b7b8-0742c486bc88@oracle.com>
 <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com>
 <CAGKkBksZnEA-V=o+xTz=yUPWpcRYa3DFLXf-tjMT6Xg=rmqc3g@mail.gmail.com>
 <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com>
 <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com>
 <CAGKkBku4X-CAxSAm=vKeRe3KOA9AfJBMGMO1dsbL6o=opONzyA@mail.gmail.com>
 <CAGKkBktx_r9De4nkzvJHsJdnRHiynUs4Qq9MTgGX7Gavo-2+GQ@mail.gmail.com>
Message-ID: <2c6b7895-4427-d561-6969-777f038a1c2c@oracle.com>

I agree that Bucket 2 is largely uncontroversial (and largely 
implemented) and makes a sensible unit of delivery -- with the proviso 
that we need to properly message that it will not yet deliver the 
performance improvements that most users are hoping to get out of 
Valhalla. There'll be no heap flattening, and no user-definable 
primitives.? There'll be improved optimization for on-stack values 
(which will appear to most users as "better escape analysis").

That said, I don't think this reduces the urgency to find a bucket-3 
*design* that we like.

On 5/26/2022 1:12 PM, Kevin Bourrillion wrote:
> Returning to this thread and going up a level or two:
>
> The real impact of this discussion, imho, should not be "now let's 
> rush a declarative nullness feature out asap", or even "let's solve 
> bucket?3 now in a way nullness will have to be harmonious with later". 
> What I humbly suggest it points to is,?maybe: "let's shift focus right 
> now to delivering just bucket 2 asap, so that we keep our options open 
> longer for the rest". Is that fair? It seems like a very good plan to 
> me. Bucket 2 is pretty non-invasive to the language model and still 
> improves matters for Integer.
>
> Thoughts?
>
>
>
> On Thu, May 12, 2022 at 3:14 PM Kevin Bourrillion <kevinb at google.com> 
> wrote:
>
>     I don't see the conflict. I'm saying, yeah, there *will* be
>     exclamation fatigue until a feature?comes along eventually
>     to?relieve it. (In the worst case, that's `public null-marked
>     class...`; in the best case it's just `language-level 22;` or what
>     have you.) But I still think it's the right thing to do anyway.
>
>
>     On Thu, May 12, 2022 at 10:18 AM Brian Goetz
>     <brian.goetz at oracle.com> wrote:
>
>
>
>         >> * Exclamation fatigue would be very real, so assume there
>         is some way to make `!` the default for some scope
>         > +1
>         >
>         > Yes, I think it's a dead end to expect users to sprinkle '!'
>         everywhere they don't want nulls?this is usually the informal
>         default in common programming practice, so we need some way to
>         enable flipping the default.
>
>         On the other hand, this is on a collision course with Kevin's
>         "ref-default" recommendation, which had many strong supporting
>         reasons,
>         whether this is spelled `!` or `.val`.? The "but it will be
>         tiring for
>         people to type" doesn't feel like a very good reason to flip
>         the default
>         from something that has such strong objective justifications.
>
>         (Dan was never sold on ref-default, but Kevin was, so I'll
>         leave it to
>         him to reconcile "ref-default is the right default" with "but,
>         exclamation fatigue.")
>
>
>
>     -- 
>     Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com
>
>
>
> -- 
> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com

From kevinb at google.com  Thu May 26 17:33:49 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 26 May 2022 10:33:49 -0700
Subject: We need help to migrate from bucket 1 to 2; and, the == problem
In-Reply-To: <CAGKkBku0Uuyes3JOBb+gzMz_O4hBOgNLpyWf7he8y3k9n_nc7A@mail.gmail.com>
References: <CAGKkBksxR=MiaW4-w_6Lb_gxgZDub0WdLhFZa9aQ-WUEAv1eZQ@mail.gmail.com>
 <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com>
 <CAGKkBkup_kk+t_hJpNE8=gUaP12PS4621aFeZXAuFE5yEtaL2g@mail.gmail.com>
 <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com>
 <CAGKkBku0Uuyes3JOBb+gzMz_O4hBOgNLpyWf7he8y3k9n_nc7A@mail.gmail.com>
Message-ID: <CAGKkBksDtVHzXB=PQkOCp3JBQ190y+B25KBMisVmS1EnOCWcuw@mail.gmail.com>

I'd like to bump this thread, as it seems to me to be the biggest obstacle
to bucket 2 being able to deliver value.

* A warning not just on synchronization, but on *any* identity-dependence.
* Not special for Integer etc.; it all needs to work through a general
facility that anyone can use.
    * We don't need the constructor warnings, though.
* The annotation should evoke the idea of "this class is becoming a bucket
2 class".
    * It would be vestigial once the class *is* bucket-2.
    * I would lean against enshrining the "value-based" terminology even
further (we can get into this if necessary).
* I think we need an explicit way to clearly and *intentionally* depend on
identity. This code would *prefer to break* if the objects in use became
bucket-2. e.g.:
    * o1.identity() == o2.identity() // I like this
    * System.identity(o1) == System.identity(o2) // this too
    * System.identityEquals(o1, o2)
    * o1 === o2

Thoughts?


On Tue, Apr 26, 2022 at 3:09 PM Kevin Bourrillion <kevinb at google.com> wrote:

> Above, when I said the proposed `==` behavior is "not a behavior that
> anyone ever *actually wants* -- unless they just happen to have no fields
> of reference types at all", I did leave out some other cases. Like when
> your only field types (recursing down fields of value types) that are
> reference types are types that don't override `equals()` (e.g. `Function`).
> In a way this sort of furthers my argument that the boundary between when
> `==` is safely an `equals` synonym and when it isn't is going to be
> difficult to perceive. Yet, since people hunger for `==` to really mean
> `equals`, they are highly overwhelmingly likely to do it as much as
> possible whenever they are convinced it looks safe. And then one addition
> of a string field in some leaf-level type can break a whole lot of code.
>
>
> On Tue, Apr 26, 2022 at 2:53 PM Dan Smith <daniel.smith at oracle.com> wrote:
>
> Yes, a public annotation was the original proposal. At some point we
>> scaled that back to just JDK-internal. The discussions were a long time
>> ago, but if I remember right the main concern was that a formalized, Java
>> SE notion of "value-based class" would lead to some unwanted complexity
>> when we eventually get to *real* value classes (e.g., a misguided CS 101
>> course question: "what's the difference between a value-based class and a
>> value class? which one should you use?").
>>
>
> Yeah, I hear that. The word "value" does have multiple confusable
> meanings. I'd say the key difference is that "value semantics" are
> logically a *recursive* rejection of identity, while a Valhalla B2/B3 class
> on its own addresses only one level deep.
>
> Anyway, I think what I'm proposing avoids trouble by specifically labeling
> one state as simply the transitional state to the other. I'm not sure
> there'd be much to get hung up on.
>
>
>
>> It seemed like producing some special warnings for JDK classes would
>> address the bulk of the problem without needing to fall into this trap.
>>
>
> I'd just say it addresses a more specific problem: how *those* particular
> classes can become B2/B3 (non-identity) classes.
>
>
>
>> Would an acceptable compromise be for a third-party tool to support its
>> own annotations, while also recognizing @jdk.internal.ValueBased as an
>> alternative spelling of the same thing?
>>
>
> I think it's "a" compromise :-), I will just have to work through how
> acceptable.
>
> Is there any such thing as a set of criteria for when a warning deserves
> to be handled by javac instead of left to all the world's aftermarket
> static analyzers to handle?
>
> (Secondarily... why are we warning only on synchronization, and not on
>> `==` or (marginal) `identityHC`?)
>>
>> I think this was simply not a battle that we wanted to fight?discouraging
>> all uses of '==' on type Integer, for example.
>>
>
> Who would be fighting the other side of that battle? Not anyone having
> some *need* to use `==` over `.equals()`, because we'll be breaking them
> when Integer changes buckets anyway. So... just the users saying "we should
> get to use `==` as a shortcut for `.equals()` as long as we stay within the
> cached range"? Oh, wait:
>
>
> Within these constraints, there are reasonable things that can be done
>> with '==', like optimizing for a situation where 'equals' is likely to be
>> true.
>>
>
> Ok, that too. Fair I suppose... it's just that it's such a very special
> case...
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu May 26 19:57:38 2022
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 26 May 2022 12:57:38 -0700
Subject: We need help to migrate from bucket 1 to 2; and, the == problem
In-Reply-To: <CAJq4Gi7=9G7kSDepw2eUFN7ZP3oOdX1H=BmaCcbCF3YtBkVWSg@mail.gmail.com>
References: <CAGKkBksxR=MiaW4-w_6Lb_gxgZDub0WdLhFZa9aQ-WUEAv1eZQ@mail.gmail.com>
 <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com>
 <CAGKkBkup_kk+t_hJpNE8=gUaP12PS4621aFeZXAuFE5yEtaL2g@mail.gmail.com>
 <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com>
 <CAGKkBku0Uuyes3JOBb+gzMz_O4hBOgNLpyWf7he8y3k9n_nc7A@mail.gmail.com>
 <CAGKkBksDtVHzXB=PQkOCp3JBQ190y+B25KBMisVmS1EnOCWcuw@mail.gmail.com>
 <CAJq4Gi7=9G7kSDepw2eUFN7ZP3oOdX1H=BmaCcbCF3YtBkVWSg@mail.gmail.com>
Message-ID: <CAGKkBkuWxdqLOpmvPVxaFXwTrpPhhMwbumvkZV5muYj0-TEwbg@mail.gmail.com>

On Thu, May 26, 2022 at 10:57 AM Dan Heidinga <heidinga at redhat.com> wrote:

This will have high costs in the regular performance model as it will
>

Sorry, I should have mentioned up front that we'd have to be content with
only the warnings we can spot at compile-time.

I also should have been clear that none of this is super well thought-out
yet; I'm just hoping to get a conversation going.


> * I think we need an explicit way to clearly and *intentionally* depend
> on identity. This code would *prefer to break* if the objects in use became
> bucket-2. e.g.:
> >     * o1.identity() == o2.identity() // I like this
> >     * System.identity(o1) == System.identity(o2) // this too
>
> Are these marker methods?  What would they return?


In the name of including a wide range of possibilities, I included some
crazy ones that almost certainly won't pan out. I should have thought more
and ruled them out before sending. I like them "syntactically", supporting
the notion that an object's identity iis like an attribute, but to expose
that identity as a type of its own will not make sense.

Thanks for responding.


> --Dan
>
> >
> > Thoughts?
> >
> >
> > On Tue, Apr 26, 2022 at 3:09 PM Kevin Bourrillion <kevinb at google.com>
> wrote:
> >>
> >> Above, when I said the proposed `==` behavior is "not a behavior that
> anyone ever *actually wants* -- unless they just happen to have no fields
> of reference types at all", I did leave out some other cases. Like when
> your only field types (recursing down fields of value types) that are
> reference types are types that don't override `equals()` (e.g. `Function`).
> In a way this sort of furthers my argument that the boundary between when
> `==` is safely an `equals` synonym and when it isn't is going to be
> difficult to perceive. Yet, since people hunger for `==` to really mean
> `equals`, they are highly overwhelmingly likely to do it as much as
> possible whenever they are convinced it looks safe. And then one addition
> of a string field in some leaf-level type can break a whole lot of code.
> >>
> >>
> >> On Tue, Apr 26, 2022 at 2:53 PM Dan Smith <daniel.smith at oracle.com>
> wrote:
> >>
> >>> Yes, a public annotation was the original proposal. At some point we
> scaled that back to just JDK-internal. The discussions were a long time
> ago, but if I remember right the main concern was that a formalized, Java
> SE notion of "value-based class" would lead to some unwanted complexity
> when we eventually get to *real* value classes (e.g., a misguided CS 101
> course question: "what's the difference between a value-based class and a
> value class? which one should you use?").
> >>
> >>
> >> Yeah, I hear that. The word "value" does have multiple confusable
> meanings. I'd say the key difference is that "value semantics" are
> logically a *recursive* rejection of identity, while a Valhalla B2/B3 class
> on its own addresses only one level deep.
> >>
> >> Anyway, I think what I'm proposing avoids trouble by specifically
> labeling one state as simply the transitional state to the other. I'm not
> sure there'd be much to get hung up on.
> >>
> >>
> >>>
> >>> It seemed like producing some special warnings for JDK classes would
> address the bulk of the problem without needing to fall into this trap.
> >>
> >>
> >> I'd just say it addresses a more specific problem: how *those*
> particular classes can become B2/B3 (non-identity) classes.
> >>
> >>
> >>>
> >>> Would an acceptable compromise be for a third-party tool to support
> its own annotations, while also recognizing @jdk.internal.ValueBased as an
> alternative spelling of the same thing?
> >>
> >>
> >> I think it's "a" compromise :-), I will just have to work through how
> acceptable.
> >>
> >> Is there any such thing as a set of criteria for when a warning
> deserves to be handled by javac instead of left to all the world's
> aftermarket static analyzers to handle?
> >>
> >>> (Secondarily... why are we warning only on synchronization, and not on
> `==` or (marginal) `identityHC`?)
> >>>
> >>> I think this was simply not a battle that we wanted to
> fight?discouraging all uses of '==' on type Integer, for example.
> >>
> >>
> >> Who would be fighting the other side of that battle? Not anyone having
> some need to use `==` over `.equals()`, because we'll be breaking them when
> Integer changes buckets anyway. So... just the users saying "we should get
> to use `==` as a shortcut for `.equals()` as long as we stay within the
> cached range"? Oh, wait:
> >>
> >>
> >>> Within these constraints, there are reasonable things that can be done
> with '==', like optimizing for a situation where 'equals' is likely to be
> true.
> >>
> >>
> >> Ok, that too. Fair I suppose... it's just that it's such a very special
> case...
> >>
> >> --
> >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
> >
> >
> >
> > --
> > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>
>

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com