From john.r.rose at oracle.com Sun May 1 03:22:20 2022 From: john.r.rose at oracle.com (John Rose) Date: Sat, 30 Apr 2022 20:22:20 -0700 Subject: [External] : Re: User model stacking In-Reply-To: <8C6F4291-889A-4A61-9872-8476F9ABAEEA@oracle.com> References: <8C6F4291-889A-4A61-9872-8476F9ABAEEA@oracle.com> Message-ID: <55B81113-FF77-4C1F-BBAE-6E680DAC5B7D@oracle.com> On 27 Apr 2022, at 16:12, Brian Goetz wrote: > We can divide the VM flattening strategy into three rough categories (would you like some milk with your eclair?): > > - non-flat ? use a pointer > - full-flat ? inline the layout into the enclosing container, access with narrow loads > - low-flat ? use some combination of atomic operations to cram multiple fields into 64 or 128 bits, access with wide loads There?s a another kind of strategy here, call it ?fat-flat?. That would encompass any hardware and/or software transaction memory mechanism that uses storage of more than 64 bits. I think all such techniques include a fast and slow path, which means unpredictable performance. Such techniques usually require ?slack? of some sort in the data structure, either otherwise unencoded states (like pseudo-oops) or extra words (injected STM headers). This is not completely off the table, because (remember) we are often going to inject an extra word just to represent the null state. In for a penny, in for a pound: If we add a word to encode the null state, it can also encode an inflated ?synchronized access? state. That?s part of the ?VM physics? that Dan is asking about. > > B1 will always take the non-flat strategy. Non-volatile B3 that are smaller than some threshold (e.g., full cache line) will prefer the full-flat strategy. Non-atomic B2 can also pursue the full-flat strategy, but may have an extra field for the null channel. Atomic B2/B3 may try the low-flat strategy, and fall back to non-flat where necessary. Volatiles will likely choose non-flat, unless they fit in the CAS window. But it is always VM?s choice. A fat-flat strategy can cover atomic B2/B3, even volatiles. Thing to remember: Even if a class designer selects the non-atomic option, a use-site volatile annotation surely overrides that. A non-atomic B2 is a funny type: It is usually non-atomic, except for volatile variables. That suggests to me there?s a hole in the user model, a way to select atomic-but-not-volatile use sites (variables and array elements, in particular) for non-atomic B2?s. From brian.goetz at oracle.com Tue May 3 17:56:04 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 3 May 2022 13:56:04 -0400 Subject: Null channels (was: User model stacking) In-Reply-To: References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <0FDEFA76-E212-4636-9E64-A603F703D0A5@oracle.com> Message-ID: About six months ago we started working on flattening references in calling conventions in the Valhalla repos.? We use the Preload attribute to force preloading of classes that are known to be (or expected to be) value classes, but which are referenced only via L descriptors, so that at the (early) time that calling convention is chosen, we have the additional information that that this is an identity-free class.? In these cases, we scalarize the calling convention as we do with Q types, but we add an extra boolean channel for null; it is as if we add a boolean field to the object layout.? When we adapt between the scalarized and indirected forms (e.g., c2i adapters), we apply the obvious semantics to the null channel. We have not yet applied the same treatment to field layout, but we can (and it has the same timing constraints, so it also needs Preload), and the VM has additional degrees of implementation freedom in doing so.? The simplest is to let the layout engine choose to flatten a preloaded L value type by injecting a boolean field which represents nullity, and adapting null checks to check this field (which can be hoisted etc.) The layout engine has other tricks available to it as well, to further reduce the footprint of representing "might be null", if it can find suitable slack space in the representation.? Such tricks could include using slack bits in boolean fields (potentially seven of them), low order bits of pointers (a la compressed OOPs), unused color bits of 64 bit pointers, etc.? Some of these choices require transforms on load/store (e.g., those that use pointer bits), not unlike what we do with compressed OOPs.? This is entirely "VM's choice" and affects only quality of implementation; there is nothing in the classfile that conditions this, other than the ACC_VALUE indication and L/Q type carriers.? So the VM has a rich set of footprint/computation tradeoffs for encoding the null channel, but logically, it is an "extra boolean field" that all nullable value types have. > I'd like to reserve judgement on this stacking as I'm uncomfortable > (uncertain maybe?) about the practicality of the extra null channel. > Without having validated the extra null channel, I'm concerned we're > exposing a broader set of options in the language that will, in > practice, map down to the existing 3 buckets we've been talking about. > Maybe this factoring allows a slightly larger number of classes to be > flattened or leaves the door open for them to get it in the future? What I'm trying to do here is decomplect flattening from nullity. Right now, we have an unfortunate interaction which both makes certain combinations impossible, and makes the user model harder to reason about. Identity-freedom unlocks flattening in the stack (calling convention.)? The lesson of that exercise (which was somewhat surprising, but good) is that nullity is mostly a non-issue here -- we can treat the nullity information as just being an extra state component when scalarizing, with some straightforward fixups when we adapt between direct and indirect representations.? This is great, because we're not asking users to choose between nullability and flattening; users pick the combination of { identity, nullability } they want, and they get the best flattening we can give: ??? case (identity, _) -> 1; // no flattening ??? case (non-identity, non-nullable) -> nFields;? // scalarize fields ??? case (non-identity, nullable) -> nFields + 1;? // scalarize fields with extra null channel Asking for nullability on top of non-identity means only that there is a little more "footprint" in the calling convention, but not a qualitative difference.? That's good. In the heap, it is a different story.? What unlocks flattening in the heap (in addition to identity-freedom) is some permission for _non-atomicity_ of loads and stores.? For sufficiently simple classes (e.g., one int field) this is a non-issue, but because loads and stores of references must be atomic (at least, according to the current JMM), references to wide values (B2 and B3.ref) cannot be flattened as much as B3.val.? There are various tricks we can do (e.g., stuffing two 32 bit fields into a 64 bit atomic) to increase the number of classes that can get good flattening, but it hits a wall much faster than "primitives". What I'd like is for the flattening story on the heap and the stack to be as similar as possible.? Imagine, for a moment, that tearing was not an issue.? Then where we would be in the heap is the same story as above: no flattening for identity classes, scalarization in the heap for non-nullable values, and scalarization with an extra boolean field (maybe, same set of potential optimizations as on the stack) for nullable values.? This is very desirable, because it is so much easier to reason about: ?- non-identity unlocks scalarization on the stack ?- non-atomicity unlocks flattening in the heap ?- in both, ref-ness / nullity means maybe an extra byte of footprint compared to the baseline (with additional opportunistic optimizations that let us get more flattening / better footprint in various special cases, such as very small values.) > In previous discussions around the extra null channel for flattened > values, we were really looking at narrowly applicable optimization - > basically for nullable values that would fit within 64bits. With this > stacking, and the info about intel allowing atomicity up to 128bits, > the extra null channel becomes more widely applicable. Yes.? What I'm trying to do is separate this all from the details of what instructions CPU X has, and instead connect optimizations to semantics: nullity requires extra footprint (unless it can be optimized away by stealing bits somehow), and does so uniformly across the buckets / heap / stack / whatever.? Nullability is a semantic property; providing this property may have a cost, but the more uniform we can make it, the simpler it is to reason about, and the simpler to implement (since we can use the same encoding tricks in both stack and heap.) > Some of my hesitation comes from experiences writing structs or > multi-field invariants in C where memory barriers and careful > read/write protocols are important to ensure consistent data in the > face of races. Widening the set of cases that have a multifield > invariant *created and enforced by the VM* by adding an additional > null channel will make it more likely the VM (and optimized jit code!) > can do the wrong thing. Yes, this is why I want to bring it into the programming model.? I don't want to magically analyze the constructor and say "whoa, that looks like a cross-field invariant"; I want the class author to say "you have permission to shred" or "you do not have permission to shred", and we optimize within the semantic properties declared by the author. In addition to cross-field invariants being part of the boundary between whether or not we need atomicity, transparency also comes into play.? When we "construct" a long, we have a pretty clear idea how the value maps to all the bits; with encapsulation, we do not (but for records, we do again, because we've constrained away the ability to let representation diverge from interface.)? Again, though, I think we are better off having the author declare the required atomicity properties rather than trying to derive them from other things (e.g., constructor body, record-ness, etc.) > I have always been somewhat uneasy about the injected nullchannel > approach and concerned about how difficult it will be for service > engineers to support when something goes wrong. If there's experience > that can be shared that shows this works well in an implementation, > then I'll be less concerned. Perhaps Tobias and Frederic can share more about what we've discovered here? From brian.goetz at oracle.com Tue May 3 22:17:29 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 3 May 2022 18:17:29 -0400 Subject: User model stacking In-Reply-To: <4F70E6B7-FAC8-4845-8969-8D545B6FB4FB@oracle.com> References: <4F70E6B7-FAC8-4845-8969-8D545B6FB4FB@oracle.com> Message-ID: > Just so we don't lose this history, a reminder that back when we settled on the 3 buckets, we viewed it as a useful simplification from a more general approach with lots of "knobs". Instead of asking developers to think about 3-4 mostly-orthogonal properties and set them all appropriately, we preferred a model in which *objects* and *primitive values* were distinct entities with distinct properties. Atomicity, nullability, etc., weren't extra things to have to reason about independently, they were natural consequences of what it meant to be (or not) a variable that stores objects. Indeed; it is often a process of "spiraling", where we seem to return to places we've already been, but perhaps in a lower energy state.? We came by the earlier bucket model honestly, as it approximated the use cases we envisioned as most important.?? I think its time to rethink the three-bucket model, not because three is too big or small a number, but because (a) the relationship between the buckets is complex, (b) it puts users to some difficult choices between semantics and performance, and (c) we have real concerns that hiding the permission to tear behind some proxy (e.g., "non null" or "B3") will be too subtle and potentially astonishing. > That was awhile ago, we may have learned some things since then, but I think there's still something to the idea that we can expect everybody to understand the difference between objects and primitives, even if they don't totally understand all the implications. (When they eventually discover some corner of the implications, we hope they'll say, "oh, sure, that makes sense because this is/isn't an object.") I think this is true for all the aspects _except_ tearing.?? I tried the argument "it can tear because its not an object" on for size, and I just can't imagine people not forgetting it routinely. > My inclination would probably be to abandon the object/value > dichotomy, revert to "everything is an object", perhaps revisit our > ideas about conversions/subtyping between ref and val types, and > develop a model that allows tearing of some objects. Probably all > do-able, but I'm not sure it's a better model. I don't think we have to go so far as this.? Just as Valhalla questions the previously-universal property of "all objects have identity", we can play the same game with "all objects provide integrity guarantees" (final field semantics.)? Some classes can shed identity; some further can shed the integrity requirements. (Both require a judgement on the part of the class author.)? We can then optimize accordingly. By factoring out atomicity/integrity as an orthogonal semantic constraint, we get to a lower energy state for B2 vs B3: "does this class have a good zero".? Complex does; LocalDate does not.? And we get to a simpler performance consequence of B3.ref vs B3.val: at most an extra bit of footprint.? These are both easier to understand. From forax at univ-mlv.fr Tue May 3 22:52:21 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 4 May 2022 00:52:21 +0200 (CEST) Subject: Null channels (was: User model stacking) In-Reply-To: References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <0FDEFA76-E212-4636-9E64-A603F703D0A5@oracle.com> Message-ID: <738292834.20526867.1651618341891.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" [...] > > What I'm trying to do here is decomplect flattening from nullity. Right > now, we have an unfortunate interaction which both makes certain > combinations impossible, and makes the user model harder to reason about. > > Identity-freedom unlocks flattening in the stack (calling convention.) > The lesson of that exercise (which was somewhat surprising, but good) is > that nullity is mostly a non-issue here -- we can treat the nullity > information as just being an extra state component when scalarizing, > with some straightforward fixups when we adapt between direct and > indirect representations.? This is great, because we're not asking users > to choose between nullability and flattening; users pick the combination > of { identity, nullability } they want, and they get the best flattening > we can give: > > ??? case (identity, _) -> 1; // no flattening > ??? case (non-identity, non-nullable) -> nFields;? // scalarize fields > ??? case (non-identity, nullable) -> nFields + 1;? // scalarize fields > with extra null channel > > Asking for nullability on top of non-identity means only that there is a > little more "footprint" in the calling convention, but not a qualitative > difference.? That's good. > > In the heap, it is a different story.? What unlocks flattening in the > heap (in addition to identity-freedom) is some permission for > _non-atomicity_ of loads and stores.? For sufficiently simple classes > (e.g., one int field) this is a non-issue, but because loads and stores > of references must be atomic (at least, according to the current JMM), > references to wide values (B2 and B3.ref) cannot be flattened as much as > B3.val.? There are various tricks we can do (e.g., stuffing two 32 bit > fields into a 64 bit atomic) to increase the number of classes that can > get good flattening, but it hits a wall much faster than "primitives". > > What I'd like is for the flattening story on the heap and the stack to > be as similar as possible.? Imagine, for a moment, that tearing was not > an issue.? Then where we would be in the heap is the same story as > above: no flattening for identity classes, scalarization in the heap for > non-nullable values, and scalarization with an extra boolean field > (maybe, same set of potential optimizations as on the stack) for > nullable values.? This is very desirable, because it is so much easier > to reason about: > > ?- non-identity unlocks scalarization on the stack > ?- non-atomicity unlocks flattening in the heap > ?- in both, ref-ness / nullity means maybe an extra byte of footprint > compared to the baseline > > (with additional opportunistic optimizations that let us get more > flattening / better footprint in various special cases, such as very > small values.) yes, choosing (non-)identity x (non-)nullability x (non-)atomicity at declaration site makes the performance model easier to understand. At declaration site, there are still nullability x atomicity with .ref and volatile respectively. I agree with John that being able to declare array items volatile is missing but i believe it's an Array 2.0 feature. Once we get universal generics, what we win is that not only ArrayList is compact on heap but ArrayList too. R?mi From brian.goetz at oracle.com Wed May 4 14:27:52 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 4 May 2022 10:27:52 -0400 Subject: User model: ref as default, vs universal generics Message-ID: <83ace890-ab9d-f3ac-b8cf-9300b33f08e2@oracle.com> Just to record a constraint: there's somewhat of a conflict between the idea of "make ref the default", as Kevin advocated, and universal generics, which we need to keep in mind as we stack the whole tower. If a B3 class gives us Foo and Foo.val, then Map::get (currently) has no way to declare its return value as "ref T". The plan of record has been: ??? V.ref get(K key) but if V.ref is not denotable, we have a problem.? That means we can't *just* have Foo and Foo.val; we need at least to be able to say T.ref for type variables, if not Foo.ref for all B3 classes. If we can manage to use T!, then this is an obvious application for T?, but this approach brings new questions. From daniel.smith at oracle.com Wed May 4 14:31:45 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 4 May 2022 14:31:45 +0000 Subject: EG meeting, 2022-05-04 Message-ID: EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT). We've had a flurry of activity in the last couple of weeks. I think we can summarize as follows: - "Spec change documents for Value Objects": revised JVMS to align with previous discussions about Value Objects, and a new JLS changes document to match - "We need help to migrate from bucket 1 to 2; and, the == problem": Kevin asked about JEP 390 applying to non-JDK classes, and about whether javac should discourage use of '==' - "Foo / Foo.ref is a backward default": Kevin and Brian argued that we should prefer treating B3 classes as reference-default, with something like '.val' to opt in to a primitive value type - "User model stacking": Brian discussed treating atomicity as an orthogonal property, no longer tied to B3 From brian.goetz at oracle.com Wed May 4 15:05:24 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 4 May 2022 11:05:24 -0400 Subject: User model: terminology Message-ID: Let's talk about terminology.? (This is getting dangerously close to a call-for-bikeshed, so let's exercise restraint.) Currently, we have primitives and classes/references, where primitives have box/wrapper reference companions.? The original goal of Bucket 3 was to model primitive/box pairs. We have tentatively been calling these "primitives", but there are good arguments why we should not overload this term. We have tentatively assigned the phrase "value class" to all identity-free classes, but it is also possible we can use value to describe what we've been calling primitives, and use something else (identity-free, non-identity) to describe the bigger family. So, in our search for how to stack the user model, we should bear in mind that names that have been tentatively assigned to one thing might be a better fit for something else (e.g., the "new primitives").? We are looking for: ?- A term for all non-identity classes.? (Previously, all classes had identity.) - A term for? what we've been calling atomicity: that instances cannot appear to be torn, even when published under race.? (Previously, all classes had this property.) ?- A term for those non-identity classes which do not _require_ a reference.? These must have a valid zero, and give rise to two types, what we've been calling the "ref" and "val" projections. ?- A term for what we've been calling the "ref" and "val" projections. Let's start with _terms_, not _declaration syntax_. From forax at univ-mlv.fr Wed May 4 15:44:09 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 4 May 2022 17:44:09 +0200 (CEST) Subject: User model: terminology In-Reply-To: References: Message-ID: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Wednesday, May 4, 2022 5:05:24 PM > Subject: User model: terminology > Let's talk about terminology. (This is getting dangerously close to a > call-for-bikeshed, so let's exercise restraint.) > Currently, we have primitives and classes/references, where primitives have > box/wrapper reference companions. The original goal of Bucket 3 was to model > primitive/box pairs. We have tentatively been calling these "primitives", but > there are good arguments why we should not overload this term. > We have tentatively assigned the phrase "value class" to all identity-free > classes, but it is also possible we can use value to describe what we've been > calling primitives, and use something else (identity-free, non-identity) to > describe the bigger family. > So, in our search for how to stack the user model, we should bear in mind that > names that have been tentatively assigned to one thing might be a better fit > for something else (e.g., the "new primitives"). We are looking for: > - A term for all non-identity classes. (Previously, all classes had identity.) I've used the term "immediate", immediate object vs reference object. > - A term for what we've been calling atomicity: that instances cannot appear to > be torn, even when published under race. (Previously, all classes had this > property.) As you said, the default should be non-tearable. I believe that we should use a term that indicates that the object is composed of several values, a term like "compound", "composite" or perhaps "aggregate". I think i prefer compound due to its Latin root. The other solution is instead of saying that it's non-terable by default, is to force users to always use a keyword to indicate the "atomiciy" state, (non-)splitable, (non-)secable (secable is more or less the latin equivalent of the greek atomic). > - A term for those non-identity classes which do not _require_ a reference. > These must have a valid zero, and give rise to two types, what we've been > calling the "ref" and "val" projections. I like "zero-default" (as opposite of null-default) but mostly because it's a valid hyphenated keyword. > - A term for what we've been calling the "ref" and "val" projections. Technically, what we called the ref projection is now a nullable projection, we are adding null into the set of possible values. > Let's start with _terms_, not _declaration syntax_. R?mi From brian.goetz at oracle.com Wed May 4 17:32:39 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 4 May 2022 13:32:39 -0400 Subject: [External] : Re: User model: terminology In-Reply-To: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr> References: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr> Message-ID: > ?- A term for those non-identity classes which do not _require_ a > reference.? These must have a valid zero, and give rise to two > types, what we've been calling the "ref" and "val" projections. > > > I like "zero-default" (as opposite of null-default) but mostly because > it's a valid hyphenated keyword. In addition to staying away from declaration syntax for purposes of this thread, let's also stay away from defaults, so we can stay focused on concepts. "Default is zero" as a concept is part of the story, but I worry it may be the dependent part.? Because before you can get to "default is zero", you need a way to say "has a sensible zero".? For LocalDate, the zero is not sensible (well, it could be, but 1970 is a pretty lousy zero value), whereas for Complex, zero is not only valid, but is arguably a great value.? This is a semantic statement about the domain. For a "has no sensible zero" type, the only choice is a reference, which brings its own default -- null.? So "has a sensible zero" gates "has a val projection", but does not yet say anything about which (ref/val) is the default. It's nice to say "zero" directly, but I'm not sure it says what we mean by "zero-default", since the default of the ref projection is null, like all other refs.?? Obviously the example I had in my earlier mail ("zero-happy") are silly and were meant only to be evocative. So what we're looking for is a word for "the zero value is good, so the concept of a non-nullable instance makes sense". Which brings me to another observation: this is a different sense of non-nullable than what we might mean by: ??? void foo(String! s) { ... } Because, a Foo.val is a type we can use as, say, an array component type (because the zero is valid), but the traditional interpretation of `Foo!` makes it ineligible for use as an array component (and probably a field), because references in the heap are null-default. So when we talk about non-nullable instances, we're really saying "there is *another* good default other than null." From kevinb at google.com Wed May 4 18:27:38 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 4 May 2022 11:27:38 -0700 Subject: User model: terminology In-Reply-To: References: Message-ID: My favorite kind of thread... At the risk of inducing groans, a reminder that much of my own terminology backstory is found in Data in Java Programs , and that when there are places we disagree below, the disagreement is probably highlighted by something in *that* document first. Since for all we know it might be out of step with how typical Java devs really think, I'll just mention that it's at least been well-received by reddit twice (if reddit can find something to complain about, they usually do!). On Wed, May 4, 2022 at 8:05 AM Brian Goetz wrote: Currently, we have primitives and classes/references, where primitives have > box/wrapper reference companions. The original goal of Bucket 3 was to > model primitive/box pairs. We have tentatively been calling these > "primitives", but there are good arguments why we should not overload this > term. > > We have tentatively assigned the phrase "value class" to all identity-free > classes, but it is also possible we can use value to describe what we've > been calling primitives, and use something else (identity-free, > non-identity) to describe the bigger family. > > So, in our search for how to stack the user model, we should bear in mind > that names that have been tentatively assigned to one thing might be a > better fit for something else (e.g., the "new primitives"). We are looking > for: > > - A term for all non-identity classes. (Previously, all classes had > identity.) > The term applies to the objects first and foremost. The object either has identity or does not. What *is* identity? I'll claim it's exactly like an ordinary immutable field-based property, with one special provision: it is *always* auto-assigned to be unique, and thus can never be copied. That feels to me like it tells the whole story. So the difference between these kinds of objects is exactly a "with identity" / "without identity" distinction, and as we know from interface naming ("HasFoo"), it is often impossible to turn that into adjective form. The second complication here is the backward default. *Having* identity is actually the special property! I do think we should lean into that. Part of upgrading your code to be "Java 21-modern" (or whatever) really should be marking all your classes that you really *want* to have identity and letting the rest lose it. The terms that feel right are "identity object" and "class that produces identity objects" shortened to "identity class". For the most part I think we'll end up talking about "identity classes" and "classes in general", and more rarely needing to refer to "classes without identity" or "non-identity classes". So I think it's okay to let them use "A \ B"-style terminology as I've done here. (I furthermore still think it's okay to have an IdentityObject interface but no ValueObject interface, as the latter doesn't really embody additional client-facing capabilities.) This is one of at least four examples of backward defaults in the language. We are either stuck with painful/awkward terminology choices in all of them, or we could pursue the idea of letting source files declare their language level, upon which the problem vanishes. > - A term for what we've been calling atomicity: that instances cannot > appear to be torn, even when published under race. (Previously, all > classes had this property.) > I think this term we really need is this one's negation. You never need to (or can) mention it with identity classes; with the rest you can use it to opt into more risk. The English words that come to mind are https://www.thesaurus.com/browse/fragile. > - A term for those non-identity classes which do not _require_ a > reference. These must have a valid zero, and give rise to two types, what > we've been calling the "ref" and "val" projections. > I think we need to name the *type* first before the class. Today we have 1. primitive types (the values are the instances) 2. reference types (the values are references to the instances) But this isn't the *heart* of what it means to be "primitive"; it just happens to be true of primitives so far. And sure, we'll certainly explain all of this *partly* by saying these types are "primitive-LIKE". But what is the quality that they and true primitive types have in common? It's "the values are the instances", so this can either lead to "value type" or go back to "inline/direct/immediate type". At this moment I like both "value type" and "inline type" well enough. Value *is* overloaded, to be sure, because of "value semantics" (aka why AutoValue is called AutoValue). But the connection is strong enough imho. I can delve deep into this topic if desired. Then, back to your question, what is the name for a *class* that *also* gives rise to a value type -- a "valuable class"? > - A term for what we've been calling the "ref" and "val" projections. > Note I think we should only invoke the concept of "projection" once we get into type variables. Otherwise we simply have two types for one class. (And the reason for that is very solid / easy to defend, just by appealing to how we'd've preferred int and Integer had worked.) I would just call them the reference type and the (name debated just above) type, simple as that. > Let's start with _terms_, not _declaration syntax_. > Yes, and even the term we like 2nd best for a thing can still be useful in the documentation of that thing. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Wed May 4 18:36:27 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 4 May 2022 11:36:27 -0700 Subject: EG meeting, 2022-05-04 In-Reply-To: References: Message-ID: I wish I hadn't missed this meeting, but I was still paying the consequences for a bad decision to take an "overnight layover" coming home Monday night/Tuesday morning. On Wed, May 4, 2022 at 7:31 AM Dan Smith wrote: > > - "We need help to migrate from bucket 1 to 2; and, the == problem": Kevin > asked about JEP 390 applying to non-JDK classes, and about whether javac > should discourage use of '==' > I will try to pitch this `obj==` problem more comprehensively soon. > - "Foo / Foo.ref is a backward default": Kevin and Brian argued that we > should prefer treating B3 classes as reference-default, with something like > '.val' to opt in to a primitive value type > I will say that I have not personally found the opposition to this change to be nearly as strong as the principal arguments in favor. It creates a very valuable uniformity in how things work. I hope it goes this way. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed May 4 19:01:17 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 4 May 2022 15:01:17 -0400 Subject: [External] : Re: User model: terminology In-Reply-To: References: Message-ID: > > What *is* identity? I'll claim it's exactly like an ordinary immutable > field-based property, with one special provision: it is *always* > auto-assigned to be unique, and thus can never be copied. That feels > to me like it tells the whole story. So the difference between these > kinds of objects is exactly a "with identity" / "without identity" > distinction, and as we know from interface naming ("HasFoo"), it is > often impossible to turn that into adjective form. There's an interesting parallel with nullability here, where nullability is also like an immutable field-based property, which is automatically checked before accessing other fields. > The second complication here is the backward default. *Having* > identity is actually?the special property! I do think we should lean > into that. Part of upgrading your code to be "Java 21-modern" (or > whatever) really should be marking all your classes that you really > *want* to have identity and letting the rest lose it. The terms that > feel right are "identity object" and "class that produces identity > objects" shortened to "identity class". In addition to having picked a few wrong defaults in the past, we have also committed the sin of not making both states denotable; there are no keywords for the opposite of static, abstract, or final, or for package-private access.? (Part of the motivation for putting the non-X stake in the ground that did with non-sealed is to provide an easy extension to non-abstract, non-final, non-static, if we later want.)? Not being able to denote "identity class" except by the absence of some other keywords would be another instance of that. > - A term for what we've been calling atomicity: that instances > cannot appear to be torn, even when published under race. > (Previously, all classes had this property.) > > > I think this term we really need is this one's negation. You never > need to (or can) mention it with identity classes; with the rest you > can use it to opt into more risk. The English words that come to mind > are https://www.thesaurus.com/browse/fragile > . "Fragile" certainly will make people think twice about using it (and is effective estoppel against "but something bad happened"). > > ?- A term for those non-identity classes which do not _require_ a > reference.? These must have a valid zero, and give rise to two > types, what we've been calling the "ref" and "val" projections. > > > I think we need to name the *type* first before the class. Today we have > > 1. primitive types (the values are the instances) > 2. reference types (the values are references to the instances) > > But this isn't the *heart* of what it means to be "primitive"; it just > happens to be true of primitives so far. And sure, we'll certainly > explain all of this *partly* by saying these types are > "primitive-LIKE". But what is the quality that they and true primitive > types have in common? It's "the values are the instances", so this can > either lead to "value type" or go back to "inline/direct/immediate type". You can make a good argument that this is where we should use the V-word (primitives are value types, as are the val projection of B3 classes), and come up with a better name for the whole B2/B3 spectrum (such as non-identity classes.)? It connects to why we chose value in the first place -- to evoke "passed by value". > Then, back to your question, what is the name for a *class* that > *also* gives rise to a value type -- a "valuable class"? > > ?- A term for what we've been calling the "ref" and "val" > projections. > > > Note I think we should only invoke the concept of "projection" once we > get into type variables. Otherwise we simply have two types for one > class. (And the reason for that is very solid / easy to defend, just > by appealing to how we'd've preferred int and Integer had worked.) I > would just call them the reference type and the (name debated just > above) type, simple as that. So a "valuable" class has a reference type and a value type.? How does that related to nullity?? Obviously the reference type is nullable and the value type is not, but do we want to use nullability in the user description / type denotation, or should we stick with value and reference? From brian.goetz at oracle.com Wed May 4 19:36:40 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 4 May 2022 15:36:40 -0400 Subject: [External] : Re: User model: terminology In-Reply-To: References: Message-ID: While on the subject of defaults: we've been treating B2 as the default kind of non-identity class, on the theory that it is the smallest hop away from identity classes, and also that it covers a broader range (all the existing value-based classes.)? Is that still the default we want? Flipping that default might make framing the B2/B3 distinction easier: rather than "tolerant of zero", what we'd opt into is "ref only". Pulling farther, there's a bucket-inversion we might be able to pull here, just by moving some terminology around: ??? class B1 { }???????????????? // ref only ??? value class B3 { }?????????? // ref and val projections ??? value-based class B2 { }???? // ref only And then we can apply non-atomic / fragile (or whatever we call it) to either B2 or B3. This has a few positive properties: ?- Connection to the existing term "value-based", which means "follows the value constraints, but is a ref type", and has the connotation of "approximation of a value class" ?- .val makes sense in the context of a "value class" ?- We get the orthogonality we are seeking, but avoid piling up lots of modifiers (non-atomic zero-happy value class) as well as not having to invent crazy words like "zero-happy" ?- Practical difference between .val and .ref is just about nullity now ?- We get the "must opt into non-atomicity" that Brian has been ranting about This basically leaves the bucket model intact, with some flipped terminology, but importantly, factors atomicity out of being an implicit bucket property, and instead an explicit global choice.? This is by far the most important aspect of the restack I am pushing. It is an orthogonal choice as to whether .val or .ref gets the "good" name for value classes. On the negative side, there is an extra syntactic burden to get to B2 compared to B3 (value-based instead of value) which might cause some developers to reach for it when they might prefer VBC. But if the default for B3 is .ref, it might not matter as much (they're both ref types and you still get integrity), so the only risk is accidental exposure of the zero value. On 5/4/2022 2:27 PM, Kevin Bourrillion wrote: > My favorite kind of thread... > > At the risk of inducing groans, a reminder that much of my own > terminology backstory is found in Data in Java Programs > , > and that when there are places we disagree below, the disagreement is > probably highlighted by something?in *that* document first. > > Since for all we know it might be out of step with how typical Java > devs really think, I'll just mention that it's at least been > well-received by reddit > > twice (if reddit can find something to complain about, they usually do!). > > > On Wed, May 4, 2022 at 8:05 AM Brian Goetz wrote: > > Currently, we have primitives and classes/references, where > primitives have box/wrapper reference companions.? The original > goal of Bucket 3 was to model primitive/box pairs.? We have > tentatively been calling these "primitives", but there are good > arguments why we should not overload this term. > > We have tentatively assigned the phrase "value class" to all > identity-free classes, but it is also possible we can use value to > describe what we've been calling primitives, and use something > else (identity-free, non-identity) to describe the bigger family. > > So, in our search for how to stack the user model, we should bear > in mind that names that have been tentatively assigned to one > thing might be a better fit for something else (e.g., the "new > primitives").? We are looking for: > > ?- A term for all non-identity classes. (Previously, all classes > had identity.) > > > The term applies to the objects first and foremost. The object either > has identity or does not. > > What *is* identity? I'll claim it's exactly like an ordinary immutable > field-based property, with one special provision: it is *always* > auto-assigned to be unique, and thus can never be copied. That feels > to me like it tells the whole story. So the difference between these > kinds of objects is exactly a "with identity" / "without identity" > distinction, and as we know from interface naming ("HasFoo"), it is > often impossible to turn that into adjective form. > > The second complication here is the backward default. *Having* > identity is actually?the special property! I do think we should lean > into that. Part of upgrading your code to be "Java 21-modern" (or > whatever) really should be marking all your classes that you really > *want* to have identity and letting the rest lose it. The terms that > feel right are "identity object" and "class that produces identity > objects" shortened to "identity class". > > For the most part I think we'll end up talking about "identity > classes" and "classes in general", and more rarely needing to refer to > "classes without identity" or "non-identity classes". So I think it's > okay to let them use "A \ B"-style terminology as I've done here. (I > furthermore still think it's okay to have an IdentityObject interface > but no ValueObject interface, as the latter doesn't really embody > additional client-facing capabilities.) > > This is one of at least four examples of backward defaults in the > language. We are either stuck with painful/awkward terminology choices > in all of them, or we could pursue the idea of letting source files > declare their language level, upon which the problem vanishes. > > - A term for what we've been calling atomicity: that instances > cannot appear to be torn, even when published under race. > (Previously, all classes had this property.) > > > I think this term we really need is this one's negation. You never > need to (or can) mention it with identity classes; with the rest you > can use it to opt into more risk. The English words that come to mind > are https://www.thesaurus.com/browse/fragile > . > > ?- A term for those non-identity classes which do not _require_ a > reference.? These must have a valid zero, and give rise to two > types, what we've been calling the "ref" and "val" projections. > > > I think we need to name the *type* first before the class. Today we have > > 1. primitive types (the values are the instances) > 2. reference types (the values are references to the instances) > > But this isn't the *heart* of what it means to be "primitive"; it just > happens to be true of primitives so far. And sure, we'll certainly > explain all of this *partly* by saying these types are > "primitive-LIKE". But what is the quality that they and true primitive > types have in common? It's "the values are the instances", so this can > either lead to "value type" or go back to "inline/direct/immediate type". > > At this moment I like both "value type" and "inline type" well enough. > Value *is* overloaded, to be sure, because of "value semantics" (aka > why AutoValue is called AutoValue). But the connection is strong > enough imho. I can delve deep into this topic if desired. > > Then, back to your question, what is the name for a *class* that > *also* gives rise to a value type -- a "valuable class"? > > ?- A term for what we've been calling the "ref" and "val" > projections. > > > Note I think we should only invoke the concept of "projection" once we > get into type variables. Otherwise we simply have two types for one > class. (And the reason for that is very solid / easy to defend, just > by appealing to how we'd've preferred int and Integer had worked.) I > would just call them the reference type and the (name debated just > above) type, simple as that. > > Let's start with _terms_, not _declaration syntax_. > > > Yes, and even the term we like 2nd best for a thing can still be > useful in the documentation of that thing. > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com From john.r.rose at oracle.com Wed May 4 20:18:49 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 04 May 2022 13:18:49 -0700 Subject: EG meeting, 2022-05-04 In-Reply-To: References: Message-ID: <3A489891-F0C1-4079-827E-1D513BEF56E6@oracle.com> On 4 May 2022, at 11:36, Kevin Bourrillion wrote: >> - "Foo / Foo.ref is a backward default": Kevin and Brian argued that we >> should prefer treating B3 classes as reference-default, with something like >> '.val' to opt in to a primitive value type >> > > I will say that I have not personally found the opposition to this change > to be nearly as strong as the principal arguments in favor. It creates a > very valuable uniformity in how things work. I hope it goes this way. (This is hard to parse without that last little sentence. I think I agree.) For one thing, you can instantly see, by inspection of the source code, whether a given variable permits null. That advantage holds for simple variable declarations, array declarations. Maybe even with generic type vars. For another, Integer can just be itself, with Integer.val ? int. From kevinb at google.com Wed May 4 21:42:21 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 4 May 2022 14:42:21 -0700 Subject: User model: terminology In-Reply-To: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr> References: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr> Message-ID: On Wed, May 4, 2022 at 8:44 AM Remi Forax wrote: > - A term for all non-identity classes. (Previously, all classes had > identity.) > > I've used the term "immediate", immediate object vs reference object. > Note that the temporal meaning (right now) is much much stronger in people's minds than the spatial one ("immediately next to"). And this here isn't even quite spatial. So for me, this doesn't work. I believe that we should use a term that indicates that the object is > composed of several values, a term like "compound", "composite" or perhaps > "aggregate". > I think i prefer compound due to its Latin root. > How strong do we think the parallels are with the Gamma et al "composite pattern"? If strong, we should stick to "composite", and if not, maybe we shouldn't, falling back on "compound". > The other solution is instead of saying that it's non-terable by default, > is to force users to always use a keyword to indicate the "atomiciy" state, > (I think that would be extremely unfortunate, though.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu May 5 13:51:34 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 May 2022 09:51:34 -0400 Subject: Reader mail bag Message-ID: As the topic has turned to how Valhalla will extend into the language syntax, we've had a significant uptick of postings on the valhalla-spec-comments list. As a public service announcement, let me remind people of the role of the -comments list: ?- Postings should be self-contained; think "suggestion box." ?- The most helpful sort of comments center around providing information that is genuinely new to the EG ("you missed this case".)? The least helpful are those that are entirely subjective reactions ("I don't like the .val syntax".) ?- When there is an active discussion, it is usually best to let it play out before commenting.? EG discussions often operate on a longer time scale, and take a more meandering path, than the design discussions you may be used to, but there is a method to the madness. ?- It is not a general mechanism for "I would like to inject a reply into the EG discussion." On to the mail bag. ?- https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-April/000028.html (Quan Anh) ?- https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-May/000032.html (Tim Feuerbach) ?- https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-April/000030.html (Mateusz Romanowski) ?- https://mail.openjdk.java.net/pipermail/valhalla-spec-comments/2022-April/000031.html (Izz Rainy) Quan raises two questions / observations: ?- Could primitives explictly choose the name of their "box"? ?- Isn't the atomicity question overblown? We considered the "choose both names" approach early in the process (in fact, in a much earlier version, there were two declared classes.)? But since naming things is hard, asking the user to name two things is harder, and it also asks readers to carry around the mapping of "X is the box of Y."? (It is bad enough that the existing primitives have this problem, and that the box of `int` is called `Integer`, not `Int`.) It is very tempting to reach this conclusion about atomicity, but IMNSHO this is a siren song.? Since bad things can only happen when the program is broken (data races), it seems reasonable initially to blame the user for their broken program.? But unfortunately, I think this is too easy (though I still understand the temptation.)? Users are used to the idea that constructors establish objects with their invariants intact; seeing instances that don't obey invariants enforced by constructors would be astonishing.? (This is the biggest criticism of serialization; that it allows the integrity model of the language to be undermined in ways that are not visible in the source code.? More of that would not be good.)? Further, people have internalized the notion that "immutable objects are thread-safe" (and this is a really good thing for the ecosystem to have learned); we break this at our peril.? (Further, evil actors can maliciously create torn values through deliberate races, and then inject them into innocent victim code.) Tim asks whether the non-nullability of the .val projection is the same feature as the non-nullable `String!` that people have been asking for for years.? Indeed, this is a question that has been at the back of our mind for most of this project.? While we are unsure that we can spell `.val` with a bang, we also are apprehensive about painting ourselves into a future where we are tempted to have both. The unfortunate answer is that they are not the same (though this doesn't mean they can't be unified.)? For Point.val, null is *unrepresentable*; for `String!`, this would surely erase to `String`, so the possibility of null pollution is still present.? (As a side note, this means that migrating `Point` to `Point.ref` is binary-incompatible, while `String` to `String!` is binary compatible (though may be source incompatible.))? It is still an open question whether the natural interpretation of `String!` is to erase null checks at compile time, or to reify runtime checks at each assignment.? I plan to have some more in-depth discussions about this, but having them now would divert us from solving the more immediate problems. (Tim also observes that, to the degree that primitives are the only way to get truly non-nullable types, people will abuse them as a way of "sticking it to the stupid compiler" for not giving them general non-nullable types, shooting other people's feet in the process.? Sadly true; developers are their most dangerous when they think they are being clever.? But this only underscores the importance of making surprising properties (like non-atomicity) explicit, rather than having them come for the ride with other things.) From brian.goetz at oracle.com Thu May 5 16:13:33 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 May 2022 12:13:33 -0400 Subject: [External] : Re: User model: terminology In-Reply-To: References: Message-ID: As a general meta-observation, the whole point of the "let's throw all the pieces in the air" discussions is that often, when you break some existing assumptions, you can reassemble the pieces in a lower energy state -- but you usually can't get there in one move.? So usually in the middle of that process, you find yourself transiting through states which may be more mathematically attractive but less syntactically attractive; the key is to sit on your aesthetic reaction and realize that this may well be an intermediate state. There was a lot of pushback to the "User model stacking" thread (including in the -comments postings), on the basis of "these names suck" or "there are too many knobs".? But its unlikely we get to the right stacking without first going through a less attractive, but more general stacking. The inversion below might not be part of the final answer, but let's let the process play out, I think we're making progress. On 5/4/2022 3:36 PM, Brian Goetz wrote: > > > Pulling farther, there's a bucket-inversion we might be able to pull > here, just by moving some terminology around: > > ??? class B1 { }???????????????? // ref only > ??? value class B3 { }?????????? // ref and val projections > ??? value-based class B2 { }???? // ref only From brian.goetz at oracle.com Thu May 5 17:51:26 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 May 2022 13:51:26 -0400 Subject: User model stacking: current status Message-ID: The current stacking discussion is motivated by several factors: ?- experiences prototyping both B2 and B3 ?- recently discovered hardware improvements in atomic operations (e.g., Intel's recent specification strengthening around 128-bit vector loads and stores) ?- further thought on the consequences of the B2/B3 model, particularly with regard to tearing The B2/B3 split was a useful proxy during prototyping, with each being built around a known use case: B2 around value-based classes, and B3 around numeric abstractions.? My main objection is twofold: there are gratuitous-seeming differences in performance model (B3s flatten much better currently), which puts users to bad choices between semantics and performance, and the degree to which tearing is hidden behind some other proxy ("primitive-ness", non-nullity, etc), which is likely to surprise users when invariants are checked in the constructor but not necessarily obeyed at runtime.? I want the observed behavioral distinctions between buckets to be clearly related to their semantic differences, and we're not there yet. The differences in flattening and performance between the current B2/B3 derives directly from the possibility of tearing. When tearing is unacceptable, we are likely to fall back on using indirections to make loads and stores of references atomic (the "non-flat" option); even where we are able to gain some flattening through compiler heroics (the "low flat" option), these hit the ceiling pretty fast (we're unlikely to get above 128 bits any time soon, and may need at least one bit for null) and these also have other costs (wider loads and stores means more data movement and more register shuffling, in addition to the complexity of the required compiler heroics.)? Full-flat requires tearing.? But I don't see an intrinsic reason (yet) while we can't have full-flat for VBCs like Optional. The most encouraging direction is to factor atomicity out of the bucket model.? We can make both buckets (VBC and primitive-like) atomic by default; this still gets us all the calling convention optimizations, and for very small values (such as single field ones, like Optional), we can probably achieve full flattening in the heap, and more flattening for small-ish values with low-flat heroics.? We can allow both buckets to opt into non-atomicity, which unlocks full-flat layout in the heap, with the only difference being whether we have to perturb the representation to make null representable. This gets us to something like: ??? [ atomic | non-atomic ] __value class B2 { } ??? [ atomic | non-atomic ] __primitive class B3 { } There are many bikesheds here, including the spelling of all these things, and whether or not we say "class" or "struct" or "primitive" or nothing at all, or whether these work with records, but painting can come later.?? There are also many other decisions to make, but I'll observe several properties we've already gained by this stacking: ?- non-atomicity is explicit, rather than hiding it behind "primitive" or "non-nullable" or "zero-happy" ?- non-atomicity is orthogonal, which means that the performance difference between B2 and B3 (or B3.val and B3.ref), for either polarity of atomicity, is exclusively that imposed by the null-encoding requirement ?- safe by default, can opt into more performance by opting out of some safety ?- non-atomic sounds "just scary enough" to make people think twice, or at least learn what non-atomic means Atomicity is only needed when a class has cross-field invariants (or when it's construction API varies significantly from its representation.)? Numeric classes like Complex have no invariants, and Rational has only single-field invariants, but classes like IntRange would have cross-field invariants.? In cases where the VM can provide atomicity for free (e.g., single-field classes), it wouldn't make a difference. If we further opt for Kevin's "ref is default" proposal, then we add another: ?- All unadorned type names are reference types Separately, I think we can reconsider where we spend the "value" keyword.? Previously "value" meant "non-identity", but I think it is better spent meaning "has a value projection", which leads us to the minor reshuffling presented yesterday: ??? class B1 { }???????????????? // ref only, == based on identity ??? value-based class B2 { }???? // ref only, == based on state value class B3 { }?????????? // Has ref and val projections This affirms B2 as "value-lite", connects to the term we colonized in Java 8 for "classes that have value-like semantics", and moves away from "primitive". Let's work through Kevin's examples here: ?- Rational.? Here, the default value is particularly bad (denominator should not be zero).? This leads to an uncomfortable choice; choose B2, or choose B3 and deal with the DBZE as "user error" when it happens.? Internal methods (e.g., multiply two rationals) can treat the default value as "0/1" instead and produce a valid rational, but any code that pulls out the denominator and operate on it externally will confront the zero anyway.? Whichever way one chooses, people will complain "but that's bad".? Rational is interesting because it _has_ a sensible default, it is just not the zero representation. ?- EmployeeId.? Similar, but maybe more tolerable to treat as a B2, and doesn't require atomicity. ?- Instant.? Seems this is a (probably non-atomic) B2. ?- Complex.? Solid non-atomic B3. ?- Optional, OptionalInt, etc.? In a world where B3 is ref-default, these can be B3; otherwise B2. ?- IntRange: atomic B3 (cross-field invariant.) There are lots of other things to discuss here, including a discussion of what does non-atomic B2 really mean, and whether there are additional risks that come from tearing _between the null and the fields_.? I'll address that in a separate mail, but I think that factoring out atomic into its own explicit thing is a pure win, and that in turn exposes some sensible terminology shuffling in the other buckets. Also, bikeshed topics to cover (please, let's not let this drown the discussion): ?- How to spell atomic / non-atomic ?- How to spell B2 and B3 ?- How to spell .ref and .val ?- ref-default vs val-default for B3 ?? - if we go ref-default, reconciling this with universal generics ?? - reconciling this with nullable types From brian.goetz at oracle.com Thu May 5 19:21:28 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 May 2022 15:21:28 -0400 Subject: User model stacking: current status In-Reply-To: References: Message-ID: > There are lots of other things to discuss here, including a discussion > of what does non-atomic B2 really mean, and whether there are > additional risks that come from tearing _between the null and the > fields_. So, let's discuss non-atomic B2s.? (First, note that atomicity is only relevant in the heap; on the stack, everything is thread-confined, so there will be no tearing.) If we have: ??? non-atomic __b2 class DateTime { ??????? long date; ??????? long time; ??? } then the layout of a B2 (or a B3.ref) is really (long, long, boolean), not just (long, long), because of the null channel.? (We may be able to hide the null channel elsewhere, but that's an optimization.) If two threads racily write (d1, t1) and (d2, t2) to a shared mutable DateTime, it is possible for an observer to observe (d1, t2) or (d2, t1).? Saying non-atomic says "this is the cost of data races".? But additionally, if we have a race between writing null and (d, t), there is another possible form of tearing. Let's write this out more explicitly.? Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false). Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero. So, suppose instead that we implemented writing a null as simply storing false to the synthetic boolean field.? Then, in the event of a race between reader and writer, we could only see values for date and time that were previously put there by some thread.? This satisfies the OOTA (out of thin air) safety requirements of the JMM. The other consequence we might have from this sort of tearing is if one of the other fields is an OOP.? If the GC is unaware of the significance of the null field (and we'd like for the GC to stay unaware of this), then it is possible to have a null value where one of the oop fields (from a previous write) is non-null, keeping that object reachable even when it is logically not reachable.? (As an interesting connection, the boolean here is "special" in the same way as the synthetic boolean channel is in pattern matching -- it dictates whether the _other_ channels are valid.? Which makes nullable values a good implementation strategy for pattern carriers.) So we have a choice for how we implement writing nulls, with a pick-your-poison consequence: ?- If we do a wide write, and write all the fields to zero, we risk exposing a zero value even when the zero is a bad value; ?- If we do a narrow write, and only write the null field, we risk pinning other OOPs in memory From daniel.smith at oracle.com Thu May 5 22:00:23 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 5 May 2022 22:00:23 +0000 Subject: User model stacking: current status In-Reply-To: References: Message-ID: > On May 5, 2022, at 1:21 PM, Brian Goetz wrote: > > Let's write this out more explicitly. Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false). Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero. > > So, suppose instead that we implemented writing a null as simply storing false to the synthetic boolean field. Then, in the event of a race between reader and writer, we could only see values for date and time that were previously put there by some thread. This satisfies the OOTA (out of thin air) safety requirements of the JMM. (0, 0, false) is the initial value of a field/array, even if the VM implements a "narrow write" strategy. That is, if I write (1, 1, true) at the moment of reading from a fresh field, I could easily get (0, 0, true). This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created. (A secondary reason, valid but one I'm less sympathetic to, is that the all-zeros value is okay but inconvenient, and it would be nice to reduce how much it pops up. A third reason is reference-defaultness, important for migration if we don't offer it in B3.) This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3. Said differently: a B2 author usually wants to associate a cross-field invariant with the null flag (zero-value fields iff null). But in declaring the class non-atomic, they've sworn off cross-field invariants. This was a useful discovery for me yesterday: that, in fact, nullability and atomicity are closely related. There's a strong theoretical defense for the idea that opting out of identity and supporting a non-null type (i.e., B3) are prerequisites to non-atomic flattening. From brian.goetz at oracle.com Thu May 5 22:03:26 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 May 2022 18:03:26 -0400 Subject: User model stacking: current status In-Reply-To: References: Message-ID: <8de623d6-a7db-bcf9-25bf-1991b938e027@oracle.com> >> Let's write this out more explicitly. Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false). Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero. The OOTA guarantee we get is: threads that read a variable (fields and array elements) will only see a value that has been put there by a "prior" write in some thread.? And every variable is treated as if it has an initial write of the default value for that variable, as per JLS 17.4.4: > The write of the default value (zero, false, or null) to each variable > synchronizes-with the first action in every thread. Ignoring fields that are themselves composite values for a moment, this means that, if we treat nulls as a full-width all-zeroes value, then when we write a null to our DateTime example, we are _returning_ date and time to a value that has already been written there.? So reading 0 for date or time is not OOTA, though might be surprising.? And writing all the fields seems simpler and more uniform, and avoids the GC issue, right? So one of the other consequences of a non-atomic B2 is that not only will races result in a torn value, but they may also expose the zero value (or torn parts of it.)? This doesn't seem entirely out of hand for something that explicitly permits tearing. I tried to sketch what a JLS section on "non-atomic values" might look like, by cribbing liberally from JLS 17.7: > For the purposes of the Java programming language memory model, a > single write to, or read of, a variable whose type is a non-atomic > value class or value-based class may be treated as separate writes or > reads of its fields.? This can result in a situation where a thread > sees some field values from one write, and some field values from > another write. This is a start. ? (Plus the business about volatile.)? It basically says that from a JMM perspective, a non-volatile variable whose type is a non-atomic value class is really a tuple of its fields.? In correctly synchronized programs, this should not be observable. It may be the case that we can exempt _final_ variables whose type is a non-atomic value class. The section on final field guarantees will need heavier work (because a final field can be nested many levels deep). From brian.goetz at oracle.com Thu May 5 22:06:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 May 2022 18:06:10 -0400 Subject: User model stacking: current status In-Reply-To: References: Message-ID: <968c3ffd-6b13-a7c5-511d-1e75be53dd48@oracle.com> Maybe :)? But I don't want to prune this exploration just yet. On 5/5/2022 6:00 PM, Dan Smith wrote: >> On May 5, 2022, at 1:21 PM, Brian Goetz wrote: >> >> Let's write this out more explicitly. Suppose that T1 writes a non-null value (d, t, true), and T2 writes null as (0, 0, false). Then it would be possible to observe (0, 0, true), which means that we would be conceivably exposing the zero value to the user, even though a B2 class might want to hide its zero. >> >> So, suppose instead that we implemented writing a null as simply storing false to the synthetic boolean field. Then, in the event of a race between reader and writer, we could only see values for date and time that were previously put there by some thread. This satisfies the OOTA (out of thin air) safety requirements of the JMM. > (0, 0, false) is the initial value of a field/array, even if the VM implements a "narrow write" strategy. That is, if I write (1, 1, true) at the moment of reading from a fresh field, I could easily get (0, 0, true). > > This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created. (A secondary reason, valid but one I'm less sympathetic to, is that the all-zeros value is okay but inconvenient, and it would be nice to reduce how much it pops up. A third reason is reference-defaultness, important for migration if we don't offer it in B3.) > > This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3. > > Said differently: a B2 author usually wants to associate a cross-field invariant with the null flag (zero-value fields iff null). But in declaring the class non-atomic, they've sworn off cross-field invariants. > > This was a useful discovery for me yesterday: that, in fact, nullability and atomicity are closely related. There's a strong theoretical defense for the idea that opting out of identity and supporting a non-null type (i.e., B3) are prerequisites to non-atomic flattening. > From forax at univ-mlv.fr Fri May 6 12:32:12 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 6 May 2022 14:32:12 +0200 (CEST) Subject: User model: terminology In-Reply-To: References: <589500778.20959312.1651679049275.JavaMail.zimbra@u-pem.fr> Message-ID: <181951155.22004792.1651840332866.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Sent: Wednesday, May 4, 2022 11:42:21 PM > Subject: Re: User model: terminology > On Wed, May 4, 2022 at 8:44 AM Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >>> - A term for all non-identity classes. (Previously, all classes had identity.) >> I've used the term "immediate", immediate object vs reference object. > Note that the temporal meaning (right now) is much much stronger in people's > minds than the spatial one ("immediately next to"). And this here isn't even > quite spatial. So for me, this doesn't work. The temporal meaning re-enforce the idea, you do not have to follow a pointer and wait for the value to arrive, the value is already here. R?mi From brian.goetz at oracle.com Fri May 6 14:04:13 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 6 May 2022 10:04:13 -0400 Subject: User model stacking: current status In-Reply-To: References: Message-ID: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Thinking more about Dan's concerns here ... On 5/5/2022 6:00 PM, Dan Smith wrote: > This is significant because the primary reason to declare a B2 rather > than a B3 is to guarantee that the all-zeros value cannot be created. This is a little bit of a circular argument; it takes a property that an atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the whole point" of B2.? It may be that exposure of the zero is so bad we may eventually want to back away from the idea, but let's come up with a fair picture of what a non-atomic B2 means, and ask if that's sufficiently useful. > This leads me to conclude that if you're declaring a non-atomic B2, > you might as well just declare a non-atomic B3. Fair point, but let's pull on this string for a moment.? Suppose I want a null-default, flattenable value, and I'm willing to take the tearing to get there.? So you're saying "then declare a B3 and use B3.ref".? But B3.ref was supposed to have the same semantics as an equivalent B2!? (I realize I'm doing the same thing I just accused you of above -- taking an old invariant and positiioning it as "the point".? Stay tuned.)? Which means either that we lose flattening, again, or we create yet another asymmetry between B3.ref and B2. Maybe you're saying that the combination of nullable and full-flat is just too much to ask, but I am not sure it is; in any case, let's convince ourselves of this before we rule it out. Or maybe, what you're saying is that my claim that B3.ref and B2 are the same thing is the stale thing here, and we can let it go and get it back in another form.? In which case you're positing a model where: ?- B1 is unchanged ?- B2 is always atomic, reference, nullable ?- B3 really means "the zero is OK", comes with .ref and .val, and (non-atomic B3).ref is still tearable? In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the stacking I've been discussing.? Is that what you're saying? ??? class B1 { }? // ref, identity, atomic ??? value-based class B2 { }? // ref, non-identity, atomic ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, both projections share atomicity If we go with ref-default, then this is a small leap from yesterday's stacking, because "B3" and "B2" are both reference types, so if you want a tearable, non-atomic reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then: ?- B2 is like B1, minus identity ?- B3 means "uninitialized values are OK, you get two types, a zero-default and a non-default" ?- Non-atomicity is an extra property we can add to B3, to get more flattening in exchange for less integrity ?- The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the default) I think this still has the properties I want; I can freely choose the reasonable subsets of { identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be the "false friend." From daniel.smith at oracle.com Fri May 6 15:15:52 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 6 May 2022 15:15:52 +0000 Subject: User model stacking: current status In-Reply-To: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: <59B138BF-4940-4EA3-AFB8-E03591060BB3@oracle.com> > On May 6, 2022, at 8:04 AM, Brian Goetz wrote: > > Thinking more about Dan's concerns here ... > > On 5/5/2022 6:00 PM, Dan Smith wrote: >> This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created. > > This is a little bit of a circular argument; it takes a property that an atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the whole point" of B2. It may be that exposure of the zero is so bad we may eventually want to back away from the idea, but let's come up with a fair picture of what a non-atomic B2 means, and ask if that's sufficiently useful. Fair. My interpretation is that we decided to create B2 because we weren't satisfied with the lack of guarantees offered to no-good-default classes that were reference-default B3s. So in that historical sense, B2s exist to offer guarantees. >> This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3. > > Fair point, but let's pull on this string for a moment. Suppose I want a null-default, flattenable value, and I'm willing to take the tearing to get there. So you're saying "then declare a B3 and use B3.ref". But B3.ref was supposed to have the same semantics as an equivalent B2! (I realize I'm doing the same thing I just accused you of above -- taking an old invariant and positiioning it as "the point". Stay tuned.) Which means either that we lose flattening, again, or we create yet another asymmetry between B3.ref and B2. Maybe you're saying that the combination of nullable and full-flat is just too much to ask, but I am not sure it is; in any case, let's convince ourselves of this before we rule it out. Yeah, I think my mindset has been here?non-atomic flat nulls are just more trouble than they're worth?but I'm open to discovering a compelling use case. From brian.goetz at oracle.com Sun May 8 16:32:09 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 8 May 2022 12:32:09 -0400 Subject: User model stacking: current status In-Reply-To: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: To track the progress of the spiral: ?- We originally came up with the B2/B3 division to carve off B2 as the "safe subset", where you get less flattening but nulls and more integrity.? This provided a safe migration target for existing VBCs, as well as a reasonable target for creating new VBCs that want to be mostly class-like but enjoy some additional optimization (and shed accidental identity for safety reasons.) ?- When we put all the flesh on the bones of B2/B3, there were some undesirable consequences, such as (a) tearing was too subtle, and (b) both the semantics and cost model differences between B2/B3 were going to be hard to explain (and in some cases, users have bad choices between semantics and performance.) ?- A few weeks ago, we decided to more seriously consider separating atomicity out as an explicit thing on its own.? This had the benefit of putting semantics first, and offered a clearer cost model: you could give up identity but keep null-default and integrity (B2), further give up nulls to get some more density (B3.val), and further further give up atomicity to get more flatness (non-atomic B3.)? This was honest, but led people to complain "great, now there are four buckets." ?- We explored making non-atomicity a cross-cutting concern, so there are two new buckets (VBC and primitive-like), either of which can choose their atomicity constraints, and then within the primitive-like bucket, the .val and .ref projections differ only with respect to the consequences of nullity.? This felt cleaner (more orthogonal), but the notion of a non-atomic B2 itself is kind of weird. So where this brings us is back to something that might feel like the four-bucket approach in the third bullet above, but with two big differences: atomicity is an explicit property of a class, rather than a property of reference-ness, and a B3.ref is not necessarily the same as a B2.? This recognizes that the main distinction between B2 or B3 is *whether a class can tolerate its zero value.* More explicitly: ?- B1 remains unchanged ?- B2 is for "ordinary" value-based classes.? Always atomic, always nullable, always reference; the only difference with B1 is that it has shed its identity, enabling routine stack-based flattening, and perhaps some heap flattening depending on VM sophistication and heroics.? B2 is a good target for migrating many existing value-based classes. ?- B3 means that a class can tolerate its zero (uninitialized) value, and therefore gives rise to two types, which we'll call B3.ref and B3.val.? The former is a reference type and is therefore nullable and null-default; the latter is a direct/immediate/value type whose default is zero. ?- B3 classes can further be marked non-atomic; this unlocks greater flattening in the heap at the cost of tearing under race, and is suitable for classes without cross-field invariants.? Non-atomicity accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears (and therefore might expose its zero under race, as per friday's discussions.) Syntactically (reminder: NOT an invitation to discuss syntax at this point), this might look like: ??? class B1 { }??????????????? // identity, reference, atomic ??? value-based class B2 { }??? // non-identity, reference, atomic ??? value class B3 { }????????? // non-identity, .ref and .val, both atomic ??? non-atomic value class B3 { }? // similar to B3, but both are non-atomic So, two new (but related) class modifiers, of which one has an additional modifier.? (The spelling of all of these can be discussed after the user model is entirely nailed down.) So, there's a monotonic sequence of "give stuff up, get other stuff": ?- B2 gives up identity relative to B1, gains some flattening ?- B3 optionally gives up null-defaultness relative to B2, yielding two types, one of which sheds some footprint ?- non-atomic B3 gives up atomicity relative to B3, gaining more flatness, for both type projections On 5/6/2022 10:04 AM, Brian Goetz wrote: > Thinking more about Dan's concerns here ... > > On 5/5/2022 6:00 PM, Dan Smith wrote: >> This is significant because the primary reason to declare a B2 rather >> than a B3 is to guarantee that the all-zeros value cannot be created. > > This is a little bit of a circular argument; it takes a property that > an atomic B2 has, but a non-atomic B2 lacks, and declares that to be > "the whole point" of B2.? It may be that exposure of the zero is so > bad we may eventually want to back away from the idea, but let's come > up with a fair picture of what a non-atomic B2 means, and ask if > that's sufficiently useful. > >> This leads me to conclude that if you're declaring a non-atomic B2, >> you might as well just declare a non-atomic B3. > > Fair point, but let's pull on this string for a moment.? Suppose I > want a null-default, flattenable value, and I'm willing to take the > tearing to get there.? So you're saying "then declare a B3 and use > B3.ref".? But B3.ref was supposed to have the same semantics as an > equivalent B2!? (I realize I'm doing the same thing I just accused you > of above -- taking an old invariant and positiioning it as "the > point".? Stay tuned.)? Which means either that we lose flattening, > again, or we create yet another asymmetry between B3.ref and B2. Maybe > you're saying that the combination of nullable and full-flat is just > too much to ask, but I am not sure it is; in any case, let's convince > ourselves of this before we rule it out. > > Or maybe, what you're saying is that my claim that B3.ref and B2 are > the same thing is the stale thing here, and we can let it go and get > it back in another form.? In which case you're positing a model where: > > ?- B1 is unchanged > ?- B2 is always atomic, reference, nullable > ?- B3 really means "the zero is OK", comes with .ref and .val, and > (non-atomic B3).ref is still tearable? > > In this model, (non-atomic B3).ref takes the place of (non-atomic B2) > in the stacking I've been discussing.? Is that what you're saying? > > ??? class B1 { }? // ref, identity, atomic > ??? value-based class B2 { }? // ref, non-identity, atomic > ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, both > projections share atomicity > > If we go with ref-default, then this is a small leap from yesterday's > stacking, because "B3" and "B2" are both reference types, so if you > want a tearable, non-atomic reference type, saying `non-atomic value > class B3` and then just using B3 gets you that. Then: > > ?- B2 is like B1, minus identity > ?- B3 means "uninitialized values are OK, you get two types, a > zero-default and a non-default" > ?- Non-atomicity is an extra property we can add to B3, to get more > flattening in exchange for less integrity > ?- The use cases for non-atomic B2 are served by non-atomic B3 (when > .ref is the default) > > I think this still has the properties I want; I can freely choose the > reasonable subsets of { identity, has-zero, nullable, atomicity } that > I want; the orthogonality of non-atomic across buckets becomes > orthogonality of non-atomic with nullity, and the "B3.ref is just like > B2" is shown to be the "false friend." > > From kevinb at google.com Mon May 9 15:43:56 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 9 May 2022 08:43:56 -0700 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: On Sun, May 8, 2022 at 9:32 AM Brian Goetz wrote: - When we put all the flesh on the bones of B2/B3, there were some > undesirable consequences, such as (a) tearing was too subtle, and (b) both > the semantics and cost model differences between B2/B3 were going to be > hard to explain (and in some cases, users have bad choices between > semantics and performance.) > Explaining the semantic model doesn't feel hard to me right now, at least not in any ways that I see current proposals addressing. Explaining the cost model can mean two very different things. The general user is fine with asterisks all over the place saying "* the VM may do something different if it is convinced it knows better, and it will mostly be right, just go with the flow" -- that is already how everything about the cost model for everything works. If someone needs to understand more deeply than that, it's expected to be difficult. Putting these together, (b) doesn't sound valid to my ears. So where this brings us is back to something that might feel like the > four-bucket approach in the third bullet above, but with two big > differences: atomicity is an explicit property of a class, rather than a > property of reference-ness, and a B3.ref is not necessarily the same as a > B2. > I don't follow how a B3.ref != a B2, unless you just mean that you can have a reference to a bogus instance more easily than B2 can (which takes serialization/reflection to do that). > - B3 classes can further be marked non-atomic; this unlocks greater > flattening in the heap at the cost of tearing under race, and is suitable > for classes without cross-field invariants. Non-atomicity accrues equally > to B3.ref and B3.val; a non-atomic B3.ref still tears (and therefore might > expose its zero under race, as per friday's discussions.) > Am I right that this "non-atomic" marker would be ignored for classes like Integer where the vm can tell that it can just give you the best of both worlds? > Syntactically (reminder: NOT an invitation to discuss syntax at this > point), this might look like: > > class B1 { } // identity, reference, atomic > > value-based class B2 { } // non-identity, reference, atomic > > value class B3 { } // non-identity, .ref and .val, both atomic > > non-atomic value class B3 { } // similar to B3, but both are > non-atomic > Buckets.java:7: error: duplicate class: B3 But seriously, we won't get away with pretending there are just 3 buckets if we do this. Let's be honest and call it B4. Would I be right that we can achieve primitive unification even without B4? There is nothing wrong with our delivering many performance gains while leaving others on the table for later. > On 5/6/2022 10:04 AM, Brian Goetz wrote: > > Thinking more about Dan's concerns here ... > > On 5/5/2022 6:00 PM, Dan Smith wrote: > > This is significant because the primary reason to declare a B2 rather than > a B3 is to guarantee that the all-zeros value cannot be created. > > > This is a little bit of a circular argument; it takes a property that an > atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the > whole point" of B2. It may be that exposure of the zero is so bad we may > eventually want to back away from the idea, but let's come up with a fair > picture of what a non-atomic B2 means, and ask if that's sufficiently > useful. > > This leads me to conclude that if you're declaring a non-atomic B2, you > might as well just declare a non-atomic B3. > > > Fair point, but let's pull on this string for a moment. Suppose I want a > null-default, flattenable value, and I'm willing to take the tearing to get > there. So you're saying "then declare a B3 and use B3.ref". But B3.ref > was supposed to have the same semantics as an equivalent B2! (I realize > I'm doing the same thing I just accused you of above -- taking an old > invariant and positiioning it as "the point". Stay tuned.) Which means > either that we lose flattening, again, or we create yet another asymmetry > between B3.ref and B2. Maybe you're saying that the combination of nullable > and full-flat is just too much to ask, but I am not sure it is; in any > case, let's convince ourselves of this before we rule it out. > > Or maybe, what you're saying is that my claim that B3.ref and B2 are the > same thing is the stale thing here, and we can let it go and get it back in > another form. In which case you're positing a model where: > > - B1 is unchanged > - B2 is always atomic, reference, nullable > - B3 really means "the zero is OK", comes with .ref and .val, and > (non-atomic B3).ref is still tearable? > > In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in > the stacking I've been discussing. Is that what you're saying? > > class B1 { } // ref, identity, atomic > value-based class B2 { } // ref, non-identity, atomic > [ non-atomic ] value class B3 { } // ref or val, zero is ok, both > projections share atomicity > > If we go with ref-default, then this is a small leap from yesterday's > stacking, because "B3" and "B2" are both reference types, so if you want a > tearable, non-atomic reference type, saying `non-atomic value class B3` and > then just using B3 gets you that. Then: > > - B2 is like B1, minus identity > - B3 means "uninitialized values are OK, you get two types, a > zero-default and a non-default" > - Non-atomicity is an extra property we can add to B3, to get more > flattening in exchange for less integrity > - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref > is the default) > > I think this still has the properties I want; I can freely choose the > reasonable subsets of { identity, has-zero, nullable, atomicity } that I > want; the orthogonality of non-atomic across buckets becomes orthogonality > of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be > the "false friend." > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon May 9 15:51:53 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 May 2022 11:51:53 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: > > So where this brings us is back to something that might feel like > the four-bucket approach in the third bullet above, but with two > big differences: atomicity is an explicit property of a class, > rather than a property of reference-ness, and a B3.ref is not > necessarily the same as a B2. > > > I don't follow how a B3.ref != a B2, unless you just mean that you can > have a reference to a bogus instance more easily than B2 can (which > takes serialization/reflection to do that). It means that a B3.ref is exactly as subject to tearing as the same-atomicity B3, whereas a B2 is not. > ?- B3 classes can further be marked non-atomic; this unlocks > greater flattening in the heap at the cost of tearing under race, > and is suitable for classes without cross-field invariants.? > Non-atomicity accrues equally to B3.ref and B3.val; a non-atomic > B3.ref still tears (and therefore might expose its zero under > race, as per friday's discussions.) > > > Am I right that this "non-atomic" marker would be ignored for classes > like Integer where the vm can tell that it can just give you the best > of both worlds? We can provide atomicity semantics for sufficiently small objects at no cost.? In practicality this probably means "classes whose layout boils down to a single 32-bit-or-smaller primitive, or a single reference". > > But seriously, we won't get away with pretending there are just 3 > buckets if we do this. Let's be honest and call it B4. "Bucket" is a term that makes sense in language design, but need not flow into the user model.? But yes, there really are three things that the user needs control over: identity, zero-friendliness, atomicity.? If you want to call that four buckets, I won't argue. The real discussion here is whether these controls need to be *separate*.? And I think they do: ?- The premise of Valhalla is that the VM can't guess whether identity is needed, so the user has to explicitly disavow it to enable more goodies; ?- Classes like LocalDate have no good zero, so the user needs to be able to disavow the zero value when it doesn't fit the semantics of the class; ?- (the controversial one) Atomicity is simply too confusing and potentially astonishing to piggyback on "primitive-ness" or "reference-ness" in a codes-like-a-class world. > Would I be right that we can achieve primitive unification even > without B4? There is nothing wrong with our delivering many > performance gains while leaving others on the table for later. Yes, delivering primitive unification first means you can't have flat Complex yet. From kevinb at google.com Mon May 9 16:10:37 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 9 May 2022 09:10:37 -0700 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: On Mon, May 9, 2022 at 8:52 AM Brian Goetz wrote: We can provide atomicity semantics for sufficiently small objects at no > cost. In practicality this probably means "classes whose layout boils down > to a single 32-bit-or-smaller primitive, or a single reference". > Right, and for long and double we can say they are as atomic as they ever were. > But seriously, we won't get away with pretending there are just 3 buckets > if we do this. Let's be honest and call it B4. > > "Bucket" is a term that makes sense in language design, but need not flow > into the user model. But yes, there really are three things that the user > needs control over: identity, zero-friendliness, atomicity. If you want to > call that four buckets, I won't argue. > I *am* of course only caring about the user model, and that's where I'm saying we would not get away with pretending this isn't a 4th kind of concrete class. > - Classes like LocalDate have no good zero, so the user needs to be able > to disavow the zero value when it doesn't fit the semantics of the class; > > - (the controversial one) Atomicity is simply too confusing and > potentially astonishing to piggyback on "primitive-ness" or > "reference-ness" in a codes-like-a-class world. > (Controversial with me at least; I keep thinking who are these people who can understand the rest of how to safely write non-locking concurrent code yet would struggle with this?) Would I be right that we can achieve primitive unification even without B4? > There is nothing wrong with our delivering many performance gains while > leaving others on the table for later. > > Yes, delivering primitive unification first means you can't have flat > Complex yet. > But they still get *often-flat* Complex? Sounds like *always-flat* Complex is the perfect thing to punt on then. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon May 9 16:54:24 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 May 2022 12:54:24 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: > ?- (the controversial one) Atomicity is simply too confusing and > potentially astonishing to piggyback on "primitive-ness" or > "reference-ness" in a codes-like-a-class world. > > > (Controversial with me at least; I keep thinking who are these people > who can understand the rest of how to safely write non-locking > concurrent code yet would struggle with this?) So, the reason I'm being so dogmatic about this is that it undermines the belief that "immutable classes are always thread-safe".? I think the "objects vs values" distinction is too subtle; hiding atomicity behind primitive-ness is too subtle.? I can get behind saying "immutable classes are always thread-safe, unless they have been explicitly marked as non-atomic", because this is a clear indication that can't be confused for anything else. > >> Would I be right that we can achieve primitive unification even >> without B4? There is nothing wrong with our delivering many >> performance gains while leaving others on the table for later. > Yes, delivering primitive unification first means you can't have > flat Complex yet. > > > But they still get /often-flat/?Complex? Sounds like > /always-flat/?Complex is the perfect thing to punt on then. They'll get flattening on the stack, but the layout of a Complex[] will likely be an array of pointers for a long time, until some heroics kick in.? I don't necessarily have a problem with a phased delivery where flattening comes later, but I'll note, too, that this is where *most of the heap flattening win is* -- arrays of nontrivial numerics.? Because there are lots of them, and such code will likely iterate over the arrays plenty, doing small amounts of CPU work per element, and then stalling when the memory subsystem chokes on a cache miss.? So we can't punt for that long. From forax at univ-mlv.fr Mon May 9 17:34:01 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 9 May 2022 19:34:01 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: <1864439853.23284976.1652117641027.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "daniel smith" > Cc: "valhalla-spec-experts" > Sent: Sunday, May 8, 2022 6:32:09 PM > Subject: Re: User model stacking: current status > To track the progress of the spiral: > - We originally came up with the B2/B3 division to carve off B2 as the "safe > subset", where you get less flattening but nulls and more integrity. This > provided a safe migration target for existing VBCs, as well as a reasonable > target for creating new VBCs that want to be mostly class-like but enjoy some > additional optimization (and shed accidental identity for safety reasons.) > - When we put all the flesh on the bones of B2/B3, there were some undesirable > consequences, such as (a) tearing was too subtle, and (b) both the semantics > and cost model differences between B2/B3 were going to be hard to explain (and > in some cases, users have bad choices between semantics and performance.) > - A few weeks ago, we decided to more seriously consider separating atomicity > out as an explicit thing on its own. This had the benefit of putting semantics > first, and offered a clearer cost model: you could give up identity but keep > null-default and integrity (B2), further give up nulls to get some more density > (B3.val), and further further give up atomicity to get more flatness > (non-atomic B3.) This was honest, but led people to complain "great, now there > are four buckets." > - We explored making non-atomicity a cross-cutting concern, so there are two new > buckets (VBC and primitive-like), either of which can choose their atomicity > constraints, and then within the primitive-like bucket, the .val and .ref > projections differ only with respect to the consequences of nullity. This felt > cleaner (more orthogonal), but the notion of a non-atomic B2 itself is kind of > weird. > So where this brings us is back to something that might feel like the > four-bucket approach in the third bullet above, but with two big differences: > atomicity is an explicit property of a class, rather than a property of > reference-ness, and a B3.ref is not necessarily the same as a B2. This > recognizes that the main distinction between B2 or B3 is *whether a class can > tolerate its zero value.* > More explicitly: > - B1 remains unchanged > - B2 is for "ordinary" value-based classes. Always atomic, always nullable, > always reference; the only difference with B1 is that it has shed its identity, > enabling routine stack-based flattening, and perhaps some heap flattening > depending on VM sophistication and heroics. B2 is a good target for migrating > many existing value-based classes. > - B3 means that a class can tolerate its zero (uninitialized) value, and > therefore gives rise to two types, which we'll call B3.ref and B3.val. The > former is a reference type and is therefore nullable and null-default; the > latter is a direct/immediate/value type whose default is zero. > - B3 classes can further be marked non-atomic; this unlocks greater flattening > in the heap at the cost of tearing under race, and is suitable for classes > without cross-field invariants. Non-atomicity accrues equally to B3.ref and > B3.val; a non-atomic B3.ref still tears (and therefore might expose its zero > under race, as per friday's discussions.) > Syntactically (reminder: NOT an invitation to discuss syntax at this point), > this might look like: > class B1 { } // identity, reference, atomic > value-based class B2 { } // non-identity, reference, atomic > value class B3 { } // non-identity, .ref and .val, both atomic > non-atomic value class B3 { } // similar to B3, but both are non-atomic > So, two new (but related) class modifiers, of which one has an additional > modifier. (The spelling of all of these can be discussed after the user model > is entirely nailed down.) > So, there's a monotonic sequence of "give stuff up, get other stuff": > - B2 gives up identity relative to B1, gains some flattening > - B3 optionally gives up null-defaultness relative to B2, yielding two types, > one of which sheds some footprint > - non-atomic B3 gives up atomicity relative to B3, gaining more flatness, for > both type projections There is also something we should talk, using non-atomic value classes does not mean automatically better performance. It's something i've discovered trying to implement HashMap (more Map.of() in fact) using value classes. Updating a value class in the heap requires more writes, more memory traffic than just updating pointers so depending on the algorithm, you may see performance degradation compared to a pointer based implementation. So even if we provide non-atomic B3, performance can be worst than using atomic B3, sadly gaining more flatness does not necessarily translate into better performance. R?mi Side Note: using more pointers mean more pressure to the GC, but it's very hard to quantify that pressure so maybe overall the system is more performant but given that JMH tests usually does not take GCs into account, it's an effect that we do not see. From brian.goetz at oracle.com Mon May 9 17:46:17 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 May 2022 13:46:17 -0400 Subject: User model stacking: current status In-Reply-To: <1864439853.23284976.1652117641027.JavaMail.zimbra@u-pem.fr> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <1864439853.23284976.1652117641027.JavaMail.zimbra@u-pem.fr> Message-ID: Yes, Doug posted some data a while back about sorting, where the breakeven between sorting references and taking the indirection hit and sorting values and taking the "more memory movement" hit was not obvious. Flattening means ... flattening.? Sometimes it means faster, but sometimes not.? This is yet another reason why we should focus on providing semantic knobs, not "performance-labeled" knobs. On 5/9/2022 1:34 PM, Remi Forax wrote: > There is also something we should talk, using non-atomic value classes > does not mean automatically better performance. > It's something i've discovered trying to implement HashMap (more > Map.of() in fact) using value classes. > Updating a value class in the heap requires more writes, more memory > traffic than just updating pointers so depending on the algorithm, you > may see performance degradation compared to a pointer based > implementation. > > So even if we provide non-atomic B3, performance can be worst than > using atomic B3, sadly gaining more flatness does not necessarily > translate into better performance. From daniel.smith at oracle.com Mon May 9 20:47:19 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 9 May 2022 20:47:19 +0000 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: > On May 9, 2022, at 10:10 AM, Kevin Bourrillion wrote: > >>> But seriously, we won't get away with pretending there are just 3 buckets if we do this. Let's be honest and call it B4. >> "Bucket" is a term that makes sense in language design, but need not flow into the user model. But yes, there really are three things that the user needs control over: identity, zero-friendliness, atomicity. If you want to call that four buckets, I won't argue. >> > I *am* of course only caring about the user model, and that's where I'm saying we would not get away with pretending this isn't a 4th kind of concrete class. Here's a presentation that doesn't feel to me like it's describing a menu with four choices: In Java, there are object references and there are primitives. For which kinds of values are you trying to declare a class? If object references: okay, do your objects need identity or not? If primitives: okay, do your primitives need atomicity or not? From brian.goetz at oracle.com Mon May 9 21:14:09 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 May 2022 17:14:09 -0400 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> Assuming the stacking here is satisfactory, let's talk about .ref and .val. Kevin made a strong argument for .ref as default, so let's pull on that string for a bit. Universal generics need a way to express .ref at least for type variables, so if we're going to make .ref the default, we still need a way to denote it.? Calling the types Foo.ref and Foo.val, where Foo is an alias for Foo.ref, is one way to achieve this. Now, let's step out onto some controversial territory: how do we spell .ref and .val?? Specifically, how good a fit is `!` and `?` (henceforth, emotional types), now that the B3 election is _solely_ about the existence of a zero-default .val?? (Before, it was a poor fit, but now it might be credible.? Yet another reason I wanted to tease apart what "primitive" meant into independent axes.) Pro: users think they really want emotional types. Pro: to the extent we eventually acquire full emotional types, and to the extent these align cleanly with primitive type projections, it avoids weirdnesses like `Foo.val?`, where there are two ways to talk about nullity. Con: These will surely not initially be the full emotional types users think they want, and so may well be met by "you idiots, these are not the emotional types we want" Con: To the extent full emotional types do not align clearly with primitive type projections, we might be painted into a corner and it might be harder to do emotional types. Risk: the language treatment of emotional types is one thing, but the real cost in introducing them into the language is annotating the libraries.? Having them in the language but not annotating the libraries on a timely basis may well be a step backwards. If we had full emotional types, some would have their non-nullity erased (`String!` erases to the same type descriptor as ordinary `String`) and some would have it reified (Integer! translates to a separate type, the `I` carrier.)? This means that migrating `String` to `String` might be binary-compatible, but `Integer` to `Integer!` would not be.? (This is probably an acceptable asymmetry.) But a bigger question is whether an erased `String!` should be backed up by a synthetic null check at the boundary between checked and unchecked code, such as method entry points (just as unpacking a T from a generic is backed up by a synthetic cast at the boundary between generic and explicit code.)? This is reasonable (and cheap enough), but may be on a collision course with some interpretations of `String!`. Initially, we probably would restrict the use of `!` to val-projections of primitive classes, but the pressure to extend it would always be just around the corner (e.g., having them in type patterns would likely address many people's initial discomfort about null handling in patterns). My goal here is not to dive into the details of "let's design nullable types", as that would be a distraction at this point, as much as to gauge sentiment on whether this is worth exploring further, and gather considerations I may have missed in this brief summary. On 5/8/2022 12:32 PM, Brian Goetz wrote: > To track the progress of the spiral: > > ?- We originally came up with the B2/B3 division to carve off B2 as > the "safe subset", where you get less flattening but nulls and more > integrity.? This provided a safe migration target for existing VBCs, > as well as a reasonable target for creating new VBCs that want to be > mostly class-like but enjoy some additional optimization (and shed > accidental identity for safety reasons.) > > ?- When we put all the flesh on the bones of B2/B3, there were some > undesirable consequences, such as (a) tearing was too subtle, and (b) > both the semantics and cost model differences between B2/B3 were going > to be hard to explain (and in some cases, users have bad choices > between semantics and performance.) > > ?- A few weeks ago, we decided to more seriously consider separating > atomicity out as an explicit thing on its own. This had the benefit of > putting semantics first, and offered a clearer cost model: you could > give up identity but keep null-default and integrity (B2), further > give up nulls to get some more density (B3.val), and further further > give up atomicity to get more flatness (non-atomic B3.)? This was > honest, but led people to complain "great, now there are four buckets." > > ?- We explored making non-atomicity a cross-cutting concern, so there > are two new buckets (VBC and primitive-like), either of which can > choose their atomicity constraints, and then within the primitive-like > bucket, the .val and .ref projections differ only with respect to the > consequences of nullity.? This felt cleaner (more orthogonal), but the > notion of a non-atomic B2 itself is kind of weird. > > So where this brings us is back to something that might feel like the > four-bucket approach in the third bullet above, but with two big > differences: atomicity is an explicit property of a class, rather than > a property of reference-ness, and a B3.ref is not necessarily the same > as a B2.? This recognizes that the main distinction between B2 or B3 > is *whether a class can tolerate its zero value.* > > More explicitly: > > ?- B1 remains unchanged > > ?- B2 is for "ordinary" value-based classes.? Always atomic, always > nullable, always reference; the only difference with B1 is that it has > shed its identity, enabling routine stack-based flattening, and > perhaps some heap flattening depending on VM sophistication and > heroics.? B2 is a good target for migrating many existing value-based > classes. > > ?- B3 means that a class can tolerate its zero (uninitialized) value, > and therefore gives rise to two types, which we'll call B3.ref and > B3.val.? The former is a reference type and is therefore nullable and > null-default; the latter is a direct/immediate/value type whose > default is zero. > > ?- B3 classes can further be marked non-atomic; this unlocks greater > flattening in the heap at the cost of tearing under race, and is > suitable for classes without cross-field invariants.? Non-atomicity > accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears > (and therefore might expose its zero under race, as per friday's > discussions.) > > Syntactically (reminder: NOT an invitation to discuss syntax at this > point), this might look like: > > ??? class B1 { }??????????????? // identity, reference, atomic > > ??? value-based class B2 { }??? // non-identity, reference, atomic > > ??? value class B3 { }????????? // non-identity, .ref and .val, both > atomic > > ??? non-atomic value class B3 { }? // similar to B3, but both are > non-atomic > > So, two new (but related) class modifiers, of which one has an > additional modifier.? (The spelling of all of these can be discussed > after the user model is entirely nailed down.) > > So, there's a monotonic sequence of "give stuff up, get other stuff": > > ?- B2 gives up identity relative to B1, gains some flattening > ?- B3 optionally gives up null-defaultness relative to B2, yielding > two types, one of which sheds some footprint > ?- non-atomic B3 gives up atomicity relative to B3, gaining more > flatness, for both type projections > > > > > > > On 5/6/2022 10:04 AM, Brian Goetz wrote: >> Thinking more about Dan's concerns here ... >> >> On 5/5/2022 6:00 PM, Dan Smith wrote: >>> This is significant because the primary reason to declare a B2 >>> rather than a B3 is to guarantee that the all-zeros value cannot be >>> created. >> >> This is a little bit of a circular argument; it takes a property that >> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be >> "the whole point" of B2.? It may be that exposure of the zero is so >> bad we may eventually want to back away from the idea, but let's come >> up with a fair picture of what a non-atomic B2 means, and ask if >> that's sufficiently useful. >> >>> This leads me to conclude that if you're declaring a non-atomic B2, >>> you might as well just declare a non-atomic B3. >> >> Fair point, but let's pull on this string for a moment.? Suppose I >> want a null-default, flattenable value, and I'm willing to take the >> tearing to get there.? So you're saying "then declare a B3 and use >> B3.ref".? But B3.ref was supposed to have the same semantics as an >> equivalent B2!? (I realize I'm doing the same thing I just accused >> you of above -- taking an old invariant and positiioning it as "the >> point".? Stay tuned.)? Which means either that we lose flattening, >> again, or we create yet another asymmetry between B3.ref and B2. >> Maybe you're saying that the combination of nullable and full-flat is >> just too much to ask, but I am not sure it is; in any case, let's >> convince ourselves of this before we rule it out. >> >> Or maybe, what you're saying is that my claim that B3.ref and B2 are >> the same thing is the stale thing here, and we can let it go and get >> it back in another form.? In which case you're positing a model where: >> >> ?- B1 is unchanged >> ?- B2 is always atomic, reference, nullable >> ?- B3 really means "the zero is OK", comes with .ref and .val, and >> (non-atomic B3).ref is still tearable? >> >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) >> in the stacking I've been discussing.? Is that what you're saying? >> >> ??? class B1 { }? // ref, identity, atomic >> ??? value-based class B2 { }? // ref, non-identity, atomic >> ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, >> both projections share atomicity >> >> If we go with ref-default, then this is a small leap from yesterday's >> stacking, because "B3" and "B2" are both reference types, so if you >> want a tearable, non-atomic reference type, saying `non-atomic value >> class B3` and then just using B3 gets you that. Then: >> >> ?- B2 is like B1, minus identity >> ?- B3 means "uninitialized values are OK, you get two types, a >> zero-default and a non-default" >> ?- Non-atomicity is an extra property we can add to B3, to get more >> flattening in exchange for less integrity >> ?- The use cases for non-atomic B2 are served by non-atomic B3 (when >> .ref is the default) >> >> I think this still has the properties I want; I can freely choose the >> reasonable subsets of { identity, has-zero, nullable, atomicity } >> that I want; the orthogonality of non-atomic across buckets becomes >> orthogonality of non-atomic with nullity, and the "B3.ref is just >> like B2" is shown to be the "false friend." >> >> > From mariell.hoversholm at paf.com Wed May 11 12:47:52 2022 From: mariell.hoversholm at paf.com (Mariell Hoversholm) Date: Wed, 11 May 2022 14:47:52 +0200 Subject: Nullity (was: User model stacking: current status) In-Reply-To: <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> Message-ID: Hi, Would nullable types be `T?`? This is what I've inferred, but would appreciate it being made explicit. I will continue with this assumption in the rest of my answer. I personally very much enjoy Kotlin's and Rust's forced nullity. I believe a clear majority of the other users of these languages do the same. Because of this, I would absolutely encourage you/another team to go down the path of considering designing and implementing nullness as part of the type-system. Regarding changing the types to be `.ref` by default: I think this would be a beneficial change with regards to the current behaviour of types in Java (i.e. `.ref` being the only option, and have primitives be exceptions to this rule). This could potentially lead to some smaller mess-ups in the future, however as I would _imagine_ most developers would like the benefits of `.val` types in most instances, but they may forget. To put the `.val` case into a perspective of how easy it could become to forget: if you have a game where you need positions, you could have e.g. `Pos3d` (let's model this as `primitive record Pos3d(double x, double y, double z)` for completeness' sake). This is a light type that would be defaulted to being stored on the heap. Larger games will require you to write `.val` everywhere, which may easily be forgotten in hot code. Given the possibility of `Optional.val`, we could potentially be missing a good practice here. It would(/could) be much cheaper to allocate an `Optional.val!` than it would be to allocate a `Pos3d.ref?`. Please also note that I have avoided the topic of binary- and source-compatibility in this entirely; they may very much be important aspects to consider, given the defaults would change existing code, even in the JDK. I'm not sure of how much help I am to your gauging interest, but hope it could, at the very least, be a small indication of how users of other languages may find the ideas brought up. Kind regards, Mariell Hoversholm (she/her) On Mon, 9 May 2022 at 23:14, Brian Goetz wrote: > Assuming the stacking here is satisfactory, let's talk about .ref and .val. > > Kevin made a strong argument for .ref as default, so let's pull on that > string for a bit. > > Universal generics need a way to express .ref at least for type > variables, so if we're going to make .ref the default, we still need a > way to denote it. Calling the types Foo.ref and Foo.val, where Foo is > an alias for Foo.ref, is one way to achieve this. > > > > Now, let's step out onto some controversial territory: how do we spell > .ref and .val? Specifically, how good a fit is `!` and `?` (henceforth, > emotional types), now that the B3 election is _solely_ about the > existence of a zero-default .val? (Before, it was a poor fit, but now > it might be credible. Yet another reason I wanted to tease apart what > "primitive" meant into independent axes.) > > Pro: users think they really want emotional types. > Pro: to the extent we eventually acquire full emotional types, and to > the extent these align cleanly with primitive type projections, it > avoids weirdnesses like `Foo.val?`, where there are two ways to talk > about nullity. > > Con: These will surely not initially be the full emotional types users > think they want, and so may well be met by "you idiots, these are not > the emotional types we want" > Con: To the extent full emotional types do not align clearly with > primitive type projections, we might be painted into a corner and it > might be harder to do emotional types. > > Risk: the language treatment of emotional types is one thing, but the > real cost in introducing them into the language is annotating the > libraries. Having them in the language but not annotating the libraries > on a timely basis may well be a step backwards. > > > If we had full emotional types, some would have their non-nullity erased > (`String!` erases to the same type descriptor as ordinary `String`) and > some would have it reified (Integer! translates to a separate type, the > `I` carrier.) This means that migrating `String` to `String` might be > binary-compatible, but `Integer` to `Integer!` would not be. (This is > probably an acceptable asymmetry.) > > But a bigger question is whether an erased `String!` should be backed up > by a synthetic null check at the boundary between checked and unchecked > code, such as method entry points (just as unpacking a T from a generic > is backed up by a synthetic cast at the boundary between generic and > explicit code.) This is reasonable (and cheap enough), but may be on a > collision course with some interpretations of `String!`. > > Initially, we probably would restrict the use of `!` to val-projections > of primitive classes, but the pressure to extend it would always be just > around the corner (e.g., having them in type patterns would likely > address many people's initial discomfort about null handling in patterns). > > > > My goal here is not to dive into the details of "let's design nullable > types", as that would be a distraction at this point, as much as to > gauge sentiment on whether this is worth exploring further, and gather > considerations I may have missed in this brief summary. > > > On 5/8/2022 12:32 PM, Brian Goetz wrote: > > To track the progress of the spiral: > > > > - We originally came up with the B2/B3 division to carve off B2 as > > the "safe subset", where you get less flattening but nulls and more > > integrity. This provided a safe migration target for existing VBCs, > > as well as a reasonable target for creating new VBCs that want to be > > mostly class-like but enjoy some additional optimization (and shed > > accidental identity for safety reasons.) > > > > - When we put all the flesh on the bones of B2/B3, there were some > > undesirable consequences, such as (a) tearing was too subtle, and (b) > > both the semantics and cost model differences between B2/B3 were going > > to be hard to explain (and in some cases, users have bad choices > > between semantics and performance.) > > > > - A few weeks ago, we decided to more seriously consider separating > > atomicity out as an explicit thing on its own. This had the benefit of > > putting semantics first, and offered a clearer cost model: you could > > give up identity but keep null-default and integrity (B2), further > > give up nulls to get some more density (B3.val), and further further > > give up atomicity to get more flatness (non-atomic B3.) This was > > honest, but led people to complain "great, now there are four buckets." > > > > - We explored making non-atomicity a cross-cutting concern, so there > > are two new buckets (VBC and primitive-like), either of which can > > choose their atomicity constraints, and then within the primitive-like > > bucket, the .val and .ref projections differ only with respect to the > > consequences of nullity. This felt cleaner (more orthogonal), but the > > notion of a non-atomic B2 itself is kind of weird. > > > > So where this brings us is back to something that might feel like the > > four-bucket approach in the third bullet above, but with two big > > differences: atomicity is an explicit property of a class, rather than > > a property of reference-ness, and a B3.ref is not necessarily the same > > as a B2. This recognizes that the main distinction between B2 or B3 > > is *whether a class can tolerate its zero value.* > > > > More explicitly: > > > > - B1 remains unchanged > > > > - B2 is for "ordinary" value-based classes. Always atomic, always > > nullable, always reference; the only difference with B1 is that it has > > shed its identity, enabling routine stack-based flattening, and > > perhaps some heap flattening depending on VM sophistication and > > heroics. B2 is a good target for migrating many existing value-based > > classes. > > > > - B3 means that a class can tolerate its zero (uninitialized) value, > > and therefore gives rise to two types, which we'll call B3.ref and > > B3.val. The former is a reference type and is therefore nullable and > > null-default; the latter is a direct/immediate/value type whose > > default is zero. > > > > - B3 classes can further be marked non-atomic; this unlocks greater > > flattening in the heap at the cost of tearing under race, and is > > suitable for classes without cross-field invariants. Non-atomicity > > accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears > > (and therefore might expose its zero under race, as per friday's > > discussions.) > > > > Syntactically (reminder: NOT an invitation to discuss syntax at this > > point), this might look like: > > > > class B1 { } // identity, reference, atomic > > > > value-based class B2 { } // non-identity, reference, atomic > > > > value class B3 { } // non-identity, .ref and .val, both > > atomic > > > > non-atomic value class B3 { } // similar to B3, but both are > > non-atomic > > > > So, two new (but related) class modifiers, of which one has an > > additional modifier. (The spelling of all of these can be discussed > > after the user model is entirely nailed down.) > > > > So, there's a monotonic sequence of "give stuff up, get other stuff": > > > > - B2 gives up identity relative to B1, gains some flattening > > - B3 optionally gives up null-defaultness relative to B2, yielding > > two types, one of which sheds some footprint > > - non-atomic B3 gives up atomicity relative to B3, gaining more > > flatness, for both type projections > > > > > > > > > > > > > > On 5/6/2022 10:04 AM, Brian Goetz wrote: > >> Thinking more about Dan's concerns here ... > >> > >> On 5/5/2022 6:00 PM, Dan Smith wrote: > >>> This is significant because the primary reason to declare a B2 > >>> rather than a B3 is to guarantee that the all-zeros value cannot be > >>> created. > >> > >> This is a little bit of a circular argument; it takes a property that > >> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be > >> "the whole point" of B2. It may be that exposure of the zero is so > >> bad we may eventually want to back away from the idea, but let's come > >> up with a fair picture of what a non-atomic B2 means, and ask if > >> that's sufficiently useful. > >> > >>> This leads me to conclude that if you're declaring a non-atomic B2, > >>> you might as well just declare a non-atomic B3. > >> > >> Fair point, but let's pull on this string for a moment. Suppose I > >> want a null-default, flattenable value, and I'm willing to take the > >> tearing to get there. So you're saying "then declare a B3 and use > >> B3.ref". But B3.ref was supposed to have the same semantics as an > >> equivalent B2! (I realize I'm doing the same thing I just accused > >> you of above -- taking an old invariant and positiioning it as "the > >> point". Stay tuned.) Which means either that we lose flattening, > >> again, or we create yet another asymmetry between B3.ref and B2. > >> Maybe you're saying that the combination of nullable and full-flat is > >> just too much to ask, but I am not sure it is; in any case, let's > >> convince ourselves of this before we rule it out. > >> > >> Or maybe, what you're saying is that my claim that B3.ref and B2 are > >> the same thing is the stale thing here, and we can let it go and get > >> it back in another form. In which case you're positing a model where: > >> > >> - B1 is unchanged > >> - B2 is always atomic, reference, nullable > >> - B3 really means "the zero is OK", comes with .ref and .val, and > >> (non-atomic B3).ref is still tearable? > >> > >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) > >> in the stacking I've been discussing. Is that what you're saying? > >> > >> class B1 { } // ref, identity, atomic > >> value-based class B2 { } // ref, non-identity, atomic > >> [ non-atomic ] value class B3 { } // ref or val, zero is ok, > >> both projections share atomicity > >> > >> If we go with ref-default, then this is a small leap from yesterday's > >> stacking, because "B3" and "B2" are both reference types, so if you > >> want a tearable, non-atomic reference type, saying `non-atomic value > >> class B3` and then just using B3 gets you that. Then: > >> > >> - B2 is like B1, minus identity > >> - B3 means "uninitialized values are OK, you get two types, a > >> zero-default and a non-default" > >> - Non-atomicity is an extra property we can add to B3, to get more > >> flattening in exchange for less integrity > >> - The use cases for non-atomic B2 are served by non-atomic B3 (when > >> .ref is the default) > >> > >> I think this still has the properties I want; I can freely choose the > >> reasonable subsets of { identity, has-zero, nullable, atomicity } > >> that I want; the orthogonality of non-atomic across buckets becomes > >> orthogonality of non-atomic with nullity, and the "B3.ref is just > >> like B2" is shown to be the "false friend." > >> > >> > > > -- *Mariell Hoversholm *(she/her) Software Developer Integrations (Slack #integration-team-public) Paf Mobile: +46 73 329 40 18 Br?ddgatan 11 SE602 22, Norrk?ping Sweden *Working remote from Uppsala* This email is confidential and may contain legally privileged information. If you are not the intended recipient, please contact the sender and delete the email from your system without producing, distributing or retaining copies thereof. Thank you. From brian.goetz at oracle.com Wed May 11 12:53:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 11 May 2022 08:53:10 -0400 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> Message-ID: <25c654c1-2f8d-ffed-b289-cbe34dd392f9@oracle.com> > I'm not sure of how much help I am to your gauging interest, but hope > it could, > at the very least, be a small indication of how users of other > languages may > find the ideas brought up. Oh, trust me, we're well aware that millions of Java developers would jump for joy -- at least, initially -- if we pulled this trigger. My concern is what they do _after_ that.? Giving people something that looks superficially like something they think they like from another language does not always create lasting joy.? What happens in the first five minutes is much less important than what happens in the following ten years. From mariell.hoversholm at paf.com Wed May 11 13:32:00 2022 From: mariell.hoversholm at paf.com (Mariell Hoversholm) Date: Wed, 11 May 2022 15:32:00 +0200 Subject: Nullity (was: User model stacking: current status) In-Reply-To: <25c654c1-2f8d-ffed-b289-cbe34dd392f9@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <25c654c1-2f8d-ffed-b289-cbe34dd392f9@oracle.com> Message-ID: I would also like to bring up the fact that this could already be possible in the long term (though not as baked into the language as the idea at hand). Currently, the ecosystem has many different libraries for nullity annotations; this ranges from JetBrains Annotations to CheckerQual (with annotation processing to find bugs; same exists with NullAway and Google Find Bugs) to Spring/Quarkus. While we do not have the language feature, code is commonly annotated with annotations instead, e.g.: @NonNull String getSomeValue(); @Nullable Integer fetchSomething(@Nullable String reference); and with tools such as lombok, we can even get automatic null-checks on these annotations. This has been underway for a while now (at least a few years). However, I think only recently (~1-2 yrs ago) did proper large projects pick them up (referring to Spring and Guava). I do not see much of a reason for this to change, given it increases safety in code, but we may all be surprised there. That being said, I fully understand the implications any such changes would have on the language: it would be an official, set-in-stone solution to an issue we haven't seen culminate for a significant time period. Different languages, libraries, and tools come with different solutions constantly, featuring different helpers: the Elvis operator, `?:` (Kotlin, Groovy) or `??` (C#) is a big one, along with the null-safe accessor, `?.`. Perhaps it would be a good idea to bring up a separate language topic for these solutions as serious ideas, as opposed to wild speculation? My apologies if anything is incoherent; I've had a full day already :-). Cheers. On Wed, 11 May 2022 at 14:53, Brian Goetz wrote: > > > > I'm not sure of how much help I am to your gauging interest, but hope > > it could, > > at the very least, be a small indication of how users of other > > languages may > > find the ideas brought up. > > Oh, trust me, we're well aware that millions of Java developers would > jump for joy -- at least, initially -- if we pulled this trigger. > > My concern is what they do _after_ that. Giving people something that > looks superficially like something they think they like from another > language does not always create lasting joy. What happens in the first > five minutes is much less important than what happens in the following > ten years. > > > From kevinb at google.com Thu May 12 01:45:23 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 11 May 2022 18:45:23 -0700 Subject: Nullity (was: User model stacking: current status) In-Reply-To: <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> Message-ID: On Mon, May 9, 2022 at 2:14 PM Brian Goetz wrote: Now, let's step out onto some controversial territory: how do we spell .ref > and .val? Specifically, how good a fit is `!` and `?` (henceforth, > emotional types), now that the B3 election is _solely_ about the > existence of a zero-default .val? (Before, it was a poor fit, but now it > might be credible. Yet another reason I wanted to tease apart what > "primitive" meant into independent axes.) > I'm certainly open to the possibility that `?` or `!` can help us out here. But the only way it can fly is if it is clearly a stepping stone toward a proper nullable-types feature. We would not want to get stuck here. That unfortunately forces us to have some clear idea how we would want/expect such a feature to look and work. My goal here is not to dive into the details of "let's design nullable > types", as that would be a distraction at this point, > (out of order reply) Well... I'm sorry for what follows, then. I think there is no way to know whether current proposals would be painting ourselves into a corner unless we explore the topic a bit. Here is my current concept of this beast: * bare `String` means what it always has ("String of ambiguous nullness") * `String!` indicates "an actual string" (I don't like to say "a non-null string" because *null is not a string!*) * `String?` indicates "a string or null". * `!` and `?` also work to project a type variable in either direction. * Exclamation fatigue would be very real, so assume there is some way to make `!` the default for some scope * javac (possibly behind a flag) would need to treat `?` and a suitably-blessed `@Nullable` identically, and same for `!` and `@NonNull`; there is just no way to survive a transition otherwise Enter Valhalla: * (Let's say we have B1, B2a/B3a (atomic), and B2b/B3b ("b"reakable?)) * On a B3 value type like `int`, `?` would be nonsense and `!` redundant. * That's equally true of a B3 value type spelled `Complex.val` (if such a thing exists). * (assuming `Complex` is ref-default) all three of `Complex`, `Complex?`, and `Complex!` have valid and distinct meanings. Now, imagining that we reached this point... would B3a/B3b (as a language-level thing) become immediately vestigial?. With Complex as a B2a or B2b, would `Complex!` ever not optimize to the B3-like *implementation*? I think the (standard) primitives could be understood as B2 themselves, with `int` just being an alias for `Integer!`. Obviously, if it would become vestigial, then we should try to avoid ever having it all, by simply :-) delaying it and solving B2-then-nullness. Pro: users think they really want emotional types. > Quibble: nah, we *know* we want them... > Con: These will surely not initially be the full emotional types users > think they want, and so may well be met by "you idiots, these are not the > emotional types we want" > We don't have to worry about this if we have a good story that it's a stepping stone. The stepping stone could be that it just doesn't work for B1 types yet. I would say that there's a moral hazard that people might choose B2 just to get that... but since that only happens if they don't *need* identity... we'd like them to do that anyway! > Con: To the extent full emotional types do not align clearly with > primitive type projections, we might be painted into a corner and it might > be harder to do emotional types. > I'm questioning whether we would need primitive type projections at all, just nullable/non-null type projections. > Risk: the language treatment of emotional types is one thing, but the real > cost in introducing them into the language is annotating the libraries. > Having them in the language but not annotating the libraries on a timely > basis may well be a step backwards. > For a while we'd only have to annotate as we migrate B1 -> B2. And it can be automated to a significant degree, more than halfway I think. If we had full emotional types, some would have their non-nullity erased > (`String!` erases to the same type descriptor as ordinary `String`) and > some would have it reified (Integer! translates to a separate type, the `I` > carrier.) This means that migrating `String` to `String` might be > binary-compatible, but `Integer` to `Integer!` would not be. (This is > probably an acceptable asymmetry.) > Agree acceptable. > But a bigger question is whether an erased `String!` should be backed up > by a synthetic null check at the boundary between checked and unchecked > code, such as method entry points (just as unpacking a T from a generic is > backed up by a synthetic cast at the boundary between generic and explicit > code.) This is reasonable (and cheap enough), but may be on a collision > course with some interpretations of `String!`. > There seem to be a continuum of approaches from "more checking/less pollution" to "more pollution/problems get found far from where they really happened." The generics experience was that few people bothered to use `checkedCollection()`, and I doubt many added type checks via bytecode either, and it all worked well enough, buuut there are a few reasons for that that don't translate to null. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu May 12 12:22:06 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 12 May 2022 08:22:06 -0400 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> Message-ID: <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com> > Here is my current concept of this beast: > The next installment of this is: how does assignment and conversion work?? Presumably, it starts with: ?- there is a null-discarding conversion from T? to T! (this is a narrowing conversion) ?- there is a nullability-injecting conversion from T! to T? (this is a widening conversion) and then we get to decide: which conversions are allowed in assignment context?? Clearly a nullability-injecting conversion is OK here (assigning String! to String? is clearly OK, it's a widening), so the question is: how do you go from `T?` to `T!` ? Options include: ?- it's like unboxing, let the assignment through, perhaps with a warning, and NPE if it fails ?- require a narrowing cast > Enter Valhalla: > > * (Let's say we have B1, B2a/B3a (atomic), and B2b/B3b ("b"reakable?)) > * On a B3 value type like `int`, `?` would be nonsense and `!` redundant. > * That's equally true of a B3 value type spelled `Complex.val` (if > such a thing exists). > * (assuming `Complex` is ref-default) all three of `Complex`, > `Complex?`, and `Complex!` have valid and distinct meanings. If we have both .val and nullity annotations I think we are losing. The idea here would be that B3.val *is literally spelled* `B3!`. The declaration story is unchanged: class B1 / value-based class B2 / [ non-atomic ] value class B3, for some suitable spellings. > Now, imagining that we reached this point... would B3a/B3b (as a > language-level thing) become immediately vestigial?. Unfortunately not.? We need permission to unleash the zero-default type, because many B2 types (e.g., LocalDate) have no good zero.? So B3 is needed to unlock that. > With Complex as a B2a or B2b, would `Complex!` ever not optimize to > the B3-like *implementation*? I think the (standard) primitives could > be understood as B2 themselves, with?`int` just being an alias for > `Integer!`. A B3: ??? value class Integer { ... } // int is alias for Integer! What this short discussion has revealed is that there really are two interpretations of non-null here: ?- In the traditional cardinality-based interpretation, T! means: "a reference, but it definitely holds an instance of T, so you better have initialized it properly" ?- In the B3 interpretation, it means: "the zero (uninitialized, not-run-through-the-ctor) value is a valid value, so you don't need to have initialized it." > Con: To the extent full emotional types do not align clearly with > primitive type projections, we might be painted into a corner and > it might be harder to do emotional types. > > > I'm questioning whether we would need primitive type projections at > all, just nullable/non-null type projections. Indeed, that was the point of my query. From kevinb at google.com Thu May 12 15:25:52 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 12 May 2022 08:25:52 -0700 Subject: Nullity (was: User model stacking: current status) In-Reply-To: <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com> Message-ID: On Thu, May 12, 2022 at 5:22 AM Brian Goetz wrote: > - there is a nullability-injecting conversion from T! to T? (this is a > widening conversion) > I think we'd expect full subtyping here, right? It needs to work for covariant arrays, covariant returns, type argument bounds, etc. > and then we get to decide: which conversions are allowed in assignment > context? Clearly a nullability-injecting conversion is OK here (assigning > String! to String? is clearly OK, it's a widening), so the question is: how > do you go from `T?` to `T!` ? Options include: > > - it's like unboxing, let the assignment through, perhaps with a warning, > and NPE if it fails > - require a narrowing cast > Yes, I do think we want a cast there (a special operator for it is very helpful so you don't have to repeat the base type), but as far as I know the case could be made either way for error vs. warning if the cast isn't there. Enter Valhalla: > > * (Let's say we have B1, B2a/B3a (atomic), and B2b/B3b ("b"reakable?)) > * On a B3 value type like `int`, `?` would be nonsense and `!` redundant. > * That's equally true of a B3 value type spelled `Complex.val` (if such a > thing exists). > * (assuming `Complex` is ref-default) all three of `Complex`, `Complex?`, > and `Complex!` have valid and distinct meanings. > > > If we have both .val and nullity annotations I think we are losing. The > idea here would be that B3.val *is literally spelled* `B3!`. The > declaration story is unchanged: class B1 / value-based class B2 / [ > non-atomic ] value class B3, for some suitable spellings. > I have tried to write the above to account for *that* possibility and for the subtly different possibility that you don't "spell .val" at all, you just express your nullability needs and the system optimizes to a value type when it can. Now, imagining that we reached this point... would B3a/B3b (as a > language-level thing) become immediately vestigial?. > > Unfortunately not. We need permission to unleash the zero-default type, > because many B2 types (e.g., LocalDate) have no good zero. So B3 is needed > to unlock that. > (Sorry to be a skipping record, but *no* type has a great default value. It's just about tolerable levels of badness. We tolerate `long` and will tolerate `ulong` because we're habituated to it. At best it is sometimes a tiny convenience. It's never exactly an *advantage* to be unable to distinguish whether a variable was ever initialized.) But, suppose the *class* is identifiable in some way as friendly to that default value. I'm still struggling to think through whether we also strictly need to have something at the use site equivalent to `.val`. Or if just knowing the nullness bit is enough. It may be fundamentally the same question you're asking; I'm not sure. What this short discussion has revealed is that there really are two > interpretations of non-null here: > > - In the traditional cardinality-based interpretation, T! means: "a > reference, but it definitely holds an instance of T, so you better have > initialized it properly" > - In the B3 interpretation, it means: "the zero (uninitialized, > not-run-through-the-ctor) value is a valid value, so you don't need to have > initialized it." > I'm not sure these are that different. I think that as types they are the same. It's the conjuring of default values, specifically, that differs: we can do it for B2, B3, and B3!, and we don't know how to find one for B2!. But that's not a complication, it's just precisely what we're saying B2 exists for: to stop that from happening. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu May 12 15:59:28 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 12 May 2022 11:59:28 -0400 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <3171fc30-6a77-3d20-1eeb-5d4904326e49@oracle.com> Message-ID: > On Thu, May 12, 2022 at 5:22 AM Brian Goetz > wrote: > > ?- there is a nullability-injecting conversion from T! to T? (this > is a widening conversion) > > > I think we'd expect full subtyping here, right? It needs to work for > covariant arrays, covariant returns, type argument bounds, etc. There's two questions here, one at the language level and one at the VM level. At the VM level, `I` is not going to be a subtype of `LInteger`.? At the language level, we have a choice of whether to use subtyping or widening conversions, but given that the VM is expecting a widening conversion, it is probably better to align to that.? (Similarly, the distinction between int and Integer in overload selection is based on the assumption that they are not subtypes, but instead related by conversions.) So while abstractly, the value sets may form subsets, which says that at least _structurally_ they are subtypes, we try to avoid having subtype relationships between things that use different representations, because it creates difficult seams in translation, and lean on conversion machinery instead. In practice, the distinction between "int widens to long" and "int <: long" is not particularly visible, except in corner cases like "boxing is allowed in loose invocation contexts but not strict invocation contexts." > > and then we get to decide: which conversions are allowed in > assignment context?? Clearly a nullability-injecting conversion is > OK here (assigning String! to String? is clearly OK, it's a > widening), so the question is: how do you go from `T?` to `T!` ?? > Options include: > > ?- it's like unboxing, let the assignment through, perhaps with a > warning, and NPE if it fails > ?- require a narrowing cast > > > Yes, I do think we want a cast there (a special operator for it is > very helpful so you don't have to repeat the base type), but as far as > I know the case could be made either way for error vs. warning if the > cast isn't there. This is the decision point I want to highlight; while one might at first assume "well obviously you should explicitly convert", there are actually more choices than the obvious one, and it is a decision that should be made deliberately. > But, suppose the *class* is identifiable in some way as friendly to > that default value. I'm still struggling to think through whether we > also strictly need to have something at the use site equivalent to > `.val`. Or if just knowing the nullness bit is enough. It may be > fundamentally the same question you're asking; I'm not sure. I think we may be saying the same thing.? It is a declaration-site property as to whether we want to tolerate uninitialized values.? We do for int; we probably also do for Complex, not only because "its a number and the existing numbers work that way", but because there's a performance tradeoff, which is that being intolerant of uninitialized values has a footprint cost, and effectively doubling the size of a flat `Complex[]` will not be appreciated. For such a zero-tolerant class, there is still room to make the choice at the use site which flavor you want.? One positive consequence of having decomplected atomicity from { nullity, primitive-ness } is that it becomes *possible* to spell this distinction with emotional sigils, rather than some weirder thing (e.g., .val.) > > What this short discussion has revealed is that there really are > two interpretations of non-null here: > > ?- In the traditional cardinality-based interpretation, T! means: > "a reference, but it definitely holds an instance of T, so you > better have initialized it properly" > ?- In the B3 interpretation, it means: "the zero (uninitialized, > not-run-through-the-ctor) value is a valid value, so you don't > need to have initialized it." > > > I'm not sure these are that different. I think that as types they are > the same. It's the conjuring of default values, specifically, that > differs: we can do it for B2, B3, and B3!, and we don't know how to > find one for B2!. But that's not a complication, it's just precisely > what we're saying B2 exists for: to stop that from happening. > This question is at the heart of this sub-thread. I think what you are saying is that for ref-only classes (B1 and B2), then T! is a _restriction_ type (which we will probably erase to the erasure of T), whereas for for zero-capable classes (B3), then `T!` is a true projection which makes the null value *unrepresentable*, and that you're OK with that. From daniel.smith at oracle.com Thu May 12 17:07:08 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 12 May 2022 17:07:08 +0000 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> Message-ID: <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com> > On May 11, 2022, at 7:45 PM, Kevin Bourrillion wrote: > > * `String!` indicates "an actual string" (I don't like to say "a non-null string" because *null is not a string!*) The thread talks around this later, but... what do I get initially if I declare a field/array component of type 'String!'? I think in most approaches this would end up being a warning, with the field/array erased to LString and storing a null. (Alternatively, we build 'String!' into the JVM, and I think that has to come with "uninitialized" detection on reads. We talked through that strategy quite a bit in the context of B2 before settling on "just use 'null'".) So this is potentially a fundamental difference between String! and Point!: 'new String![5]' and 'new Point![5]' give you very different arrays. > * Exclamation fatigue would be very real, so assume there is some way to make `!` the default for some scope +1 Yes, I think it's a dead end to expect users to sprinkle '!' everywhere they don't want nulls?this is usually the informal default in common programming practice, so we need some way to enable flipping the default. Lesson for B3: if B3! is primarily meant to be interpreted as a null-free type, people will naturally want to use that null-free type everywhere, and will want it to be default. (Reference default makes more sense where you generally want to use the nullable type, and only occasionally will opt in to the value type, probably for reasons other than whether 'null' is semantically meaningful.) Also, a danger for B3 is that a rather casual flipping of defaults doesn't just affect compiler behavior?it changes the initial value and possibly atomicity of a field/array. So a little more scary for a random switch somewhere to change all your 'Point' usages from ref-default to val-default. From brian.goetz at oracle.com Thu May 12 17:17:53 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 12 May 2022 13:17:53 -0400 Subject: Nullity (was: User model stacking: current status) In-Reply-To: <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com> Message-ID: <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com> >> * Exclamation fatigue would be very real, so assume there is some way to make `!` the default for some scope > +1 > > Yes, I think it's a dead end to expect users to sprinkle '!' everywhere they don't want nulls?this is usually the informal default in common programming practice, so we need some way to enable flipping the default. On the other hand, this is on a collision course with Kevin's "ref-default" recommendation, which had many strong supporting reasons, whether this is spelled `!` or `.val`.? The "but it will be tiring for people to type" doesn't feel like a very good reason to flip the default from something that has such strong objective justifications. (Dan was never sold on ref-default, but Kevin was, so I'll leave it to him to reconcile "ref-default is the right default" with "but, exclamation fatigue.") From kevinb at google.com Thu May 12 22:14:02 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 12 May 2022 15:14:02 -0700 Subject: Nullity (was: User model stacking: current status) In-Reply-To: <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com> <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com> Message-ID: I don't see the conflict. I'm saying, yeah, there *will* be exclamation fatigue until a feature comes along eventually to relieve it. (In the worst case, that's `public null-marked class...`; in the best case it's just `language-level 22;` or what have you.) But I still think it's the right thing to do anyway. On Thu, May 12, 2022 at 10:18 AM Brian Goetz wrote: > > > >> * Exclamation fatigue would be very real, so assume there is some way > to make `!` the default for some scope > > +1 > > > > Yes, I think it's a dead end to expect users to sprinkle '!' everywhere > they don't want nulls?this is usually the informal default in common > programming practice, so we need some way to enable flipping the default. > > On the other hand, this is on a collision course with Kevin's > "ref-default" recommendation, which had many strong supporting reasons, > whether this is spelled `!` or `.val`. The "but it will be tiring for > people to type" doesn't feel like a very good reason to flip the default > from something that has such strong objective justifications. > > (Dan was never sold on ref-default, but Kevin was, so I'll leave it to > him to reconcile "ref-default is the right default" with "but, > exclamation fatigue.") > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Mon May 16 21:46:23 2022 From: john.r.rose at oracle.com (John Rose) Date: Mon, 16 May 2022 14:46:23 -0700 Subject: [External] : Re: On tearing In-Reply-To: <4917C3C1-B0DC-48A5-B987-F7AB95FA1DB4@oracle.com> References: <423731687.17626456.1651073926931.JavaMail.zimbra@u-pem.fr> <4917C3C1-B0DC-48A5-B987-F7AB95FA1DB4@oracle.com> Message-ID: <2C6D8B90-7551-4064-9608-1B59EC2B6FA3@oracle.com> On 27 Apr 2022, at 9:50, Brian Goetz wrote: > ?This whole area seems extremely prone to wishful thinking; we hate > the idea of making something slower than it could be, that we convince > ourselves that ?the user can reason about this.? Whether or not > it is ?too big a leap?, I think it is a bigger leap than you are > thinking. > >> For me, we should make the model clear, the compiler should insert a >> non user overridable default constructor but not more because using a >> primitive class is already an arcane construct. > > This might help a little bit, but it is addressing the smaller part of > the problem (zeroes); we need to address the bigger problem (tearing). I think I mostly agree with Remi on this point. A tearable primitive class (call it T-B3 as opposed A-B3 which is atomic) can, as you describe, have its invariants broken by races that have the effect of writing arbitrary (or almost arbitrary) values into fields at any time. A regular mutable B1 class has a similar problem, except it can be defended by a constructor and/or mutator methods that check per-field values being stored. Let?s look at the simplest case (which is rare in practice, since it is scary): Suppose a class has public fields which are mutable. Call such a class a OM-B1 class meaning ?open mutable B1?. I think that we can (and probably should) address this educational issue by making T-B3 classes look (somehow) like OM-B1 classes. Then every bit of training which leads users to be watchful in their use of OM-B1 will apply to T-B3 classes. How to make T-B3 look like OM-B1? Well, Remi?s idea of a mandated open constructor gets most of the way there. Mandating that the B3 fields are public is also helpful. (Records kinda-sorta do that, but through component reader methods.) I truly think those two steps are enough, to make it clear to an author of a T-B3 that, if a T-B3 container is accessible to untrusted parties, then it is free to take on any combination of field values at any time. (And I?m using the word ?free? here in the rigorous math sense, as in a free product type.) A further step to nail down the message that the components are independently variable would be to provide a reconstructor syntax of some sort that amounted to an open invitation to (a) take an instance of the T-B3, (b) modify or replace any or all of its field values, and then (c ) put it back in the container it came from. By ?open? I mean ?public to all comers?, which means that every baseline Java programmer, who knows about public mutable fields (we can?t cure world hunger or negligent Java scribblers), will know that, using that syntax, anybody can write anything into any T-B3 value stored in an unprotected container. Just like a OM-B1 object. Nothing new to see, and all the old warnings apply! We would have to be careful about our messaging about immutability here, to prevent folks from mistakenly confusing a T-B3 with an immutable B1 (I-B1) or B2 (all of which are truly immutable). One way to do this, that would be blindingly obvious (and IMO too blinding), would to (a) allow a `non-final` modifier on fields, canceling any implicit immutability property, and (b) *require* `non-final` modifiers on all fields in a T-B3 class. I put this forward in the service of brainstorming, to show an extreme (too extreme IMO) way to forcibly advertise the T- in T-B3 classes. But as I said, I think in practice it will be enough to make T-B3 classes look like OM-B1 classes, which are clearly not immutable, even without a `non-final` modifier. > > I don?t think we have to go so far as to outlaw tearing, but there > have to be enough cues, at the use and declaration site, that > something interesting is happening here. Yes, cues. And my point above, mainly, is that to the extent such cues are available in the world of OM-B1 classes already, we should make use of them for T-B3 classes. And where not, such cues should make it really clear that there is an open invitation (public to untrusted parties) to make piecemeal edits to the fields of a T-B3 class. > >> There is no point to nanny people here given that only experts will >> want to play with it. > > This is *definitely* wishful thinking. People will hear that this is > a tool for performance; 99% of Java developers will convince > themselves they are experts because, performance! Developers > pathologically over-rotate towards whatever the Stack Overflow crowd > says is faster. (And so will Copilot.) So, definitely no. This > argument is pure wishful thinking. (I will admit to being > occasionally tempted by this argument too, but then I snap out of it.) I?m with Brian on this. >> But we (the EG) can also fail, and make a primitive class too easy to >> use, what scare me is people using primitive class just because it's >> not nullable. > > Yes, this is one of the many pitfalls we have to avoid! > > This game is hard. Yep. Removing null for footprint, by moving from B2 to B3, is a normal thing people will do, but it if also introduces the T- part (tearability) secretly, that?s probably a lose. Which leads to the current consideration of tearability as partially independent from the B2/B3 axis. So B2 XOR B3 = nullability alone, not = nullability+atomicity. Separately, I *do* think T-B3 is more likely to be useful than A-B3 (atomic B3), and likewise T-B2 has limited use compared to A-B2. This is why I?ve been content with the conflation of T-B3 with B3-simple, for so long. But, embracing the current conversation, I do think that T-B3 needs to be *really clearly componentwise mutable*. I think that whether T-B3 is the default setting of B3 or some further opt-in (from default A-B3 to T-B3). And, to summarize, mandated wide-open fields and/or mandated dumb non-checking constructors are a legitimate way to advertise the open-ness of T-B3 classes. Then the tearability part is a small corollary of the Big Story, which is the openness of the fields to all comers. A final point: This is why in our last few meetings I keep mentioning the C++ idea of a `struct`, which is not a non-class, but rather a class whose defaults are set to be open to all comers. I think if we do a ?struct-like? design for T-B3 we can win. From daniel.smith at oracle.com Wed May 18 14:24:12 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 18 May 2022 14:24:12 +0000 Subject: EG meeting, 2022-05-18 Message-ID: EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT). Recent threads to discuss: - "User model stacking: current status": Brian talked about factoring atomicity out of the B2/B3 choice, as an extra choice applying to B3 (and perhaps B2, too) - "Nullity (was: User model stacking: current status)": Brian explored the possibility of using '?' and '!' as alternatives to '.ref' and '.val' for B3 classes, anticipating more general support in the language for null-free types - "User model: terminology": Brian summarized the different features that need labels (non-identity classes, non-identity classes with a valid zero, tearable classes, types with and without null) From daniel.smith at oracle.com Wed May 18 18:47:29 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 18 May 2022 18:47:29 +0000 Subject: EG meeting, 2022-05-18 In-Reply-To: References: Message-ID: <4E552391-A87A-41F6-A148-388F8F61FCD8@oracle.com> > On May 18, 2022, at 8:24 AM, Dan Smith wrote: > > EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT). > > Recent threads to discuss: > > - "User model stacking: current status": Brian talked about factoring atomicity out of the B2/B3 choice, as an extra choice applying to B3 (and perhaps B2, too) > > - "Nullity (was: User model stacking: current status)": Brian explored the possibility of using '?' and '!' as alternatives to '.ref' and '.val' for B3 classes, anticipating more general support in the language for null-free types > > - "User model: terminology": Brian summarized the different features that need labels (non-identity classes, non-identity classes with a valid zero, tearable classes, types with and without null) Summary of this discussion: Reviewed how we ended up with concerns about the status quo approach to primitive classes (documented in JEP 401), how we wanted a better story for tearing, and different strategies that have been considered there. Nothing new here, just summarizing. Dug into some details of the nullable+tearable combination: - A tearable B2 class is probably a mismatch?if you can tear, you can create a zero value, but the B2 has declared itself zero-hostile. No objections, then, to the idea that atomic/non-atomic is a property of B3 only (or equivalently, by giving up atomicity you've entered a new category, B4). - Tearable+nullable B3 types (e.g., 'LPoint;' could be considered tearable) remain a possible area to explore. There's some concern about user model?tearing a null leads to surprising outcomes after a null check and possible hard-to-observe memory leaks?and implementation. It would help to ground this conversation in some more concrete examples, though. From daniel.smith at oracle.com Thu May 19 23:14:07 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 19 May 2022 23:14:07 +0000 Subject: Spec change documents for Value Objects In-Reply-To: <12C0C3B4-1A4C-4FCF-AEFD-A577F2333B27@oracle.com> References: <12C0C3B4-1A4C-4FCF-AEFD-A577F2333B27@oracle.com> Message-ID: <92660EAE-70A6-4FD0-8ECD-4A795D139F2E@oracle.com> On Apr 27, 2022, at 5:01 PM, Dan Smith > wrote: Please see these two spec change documents for JLS and JVMS changes in support of the Value Objects feature. Here's a revision, including some additional language checks that I missed in the first iteration. http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jls.html http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jvms.html ---------- Diff of the changes: diff --git a/closed/src/java.se/share/specs/value-objects-jls.md b/closed/src/java.se/share/specs/value-objects-jls.md index 3e8e44aa2c..392242efb9 100644 --- a/closed/src/java.se/share/specs/value-objects-jls.md +++ b/closed/src/java.se/share/specs/value-objects-jls.md @@ -501,9 +501,9 @@ It is permitted for the class declaration to redundantly specify the `final` modifier. The `identity` and `value` modifiers limit the set of classes that can extend -an `abstract` class ([8.1.4]). +a non-`final` class ([8.1.4]). -Special restrictions apply to the field declarations ([8.3.1.2]), method +Special restrictions apply to the field declarations ([8.3.1]), method declarations ([8.4.3.6]), and constructors ([8.8.7]) of a class that is not an `identity` class. @@ -524,6 +524,61 @@ Should there be? +#### 8.1.3 Inner Classes and Enclosing Instances {#jls-8.1.3} + +... + +An inner class *C* is a *direct inner class of a class or interface O* if *O* is +the immediately enclosing class or interface declaration of *C* and the +declaration of *C* does not occur in a static context. + +> If an inner class is a local class or an anonymous class, it may be declared +> in a static context, and in that case is not considered an inner class of any +> enclosing class or interface. + +A class *C* is an *inner class of class or interface O* if it is either a direct +inner class of *O* or an inner class of an inner class of *O*. + +> It is unusual, but possible, for the immediately enclosing class or interface +> declaration of an inner class to be an interface. +> This only occurs if the class is a local or anonymous class declared in a +> `default` or `static` method body ([9.4]). + +A class or interface *O* is the *zeroth lexically enclosing class or interface +declaration of itself*. + +A class *O* is the *n'th lexically enclosing class declaration of a class C* if +it is the immediately enclosing class declaration of the *n-1*'th lexically +enclosing class declaration of *C*. + +An instance *i* of a direct inner class *C* of a class or interface *O* is +associated with an instance of *O*, known as the *immediately enclosing instance +of i*. +The immediately enclosing instance of an object, if any, is determined when the +object is created ([15.9.2]). + +An object *o* is the *zeroth lexically enclosing instance of itself*. + +An object *o* is the *n'th lexically enclosing instance of an instance i* if it +is the immediately enclosing instance of the *n-1*'th lexically enclosing +instance of *i*. + +An instance of an inner local class or an anonymous class whose declaration +occurs in a static context has no immediately enclosing instance. +Also, an instance of a `static` nested class ([8.1.1.4]) has no immediately +enclosing instance. + +**It is a compile-time error if an inner class has an immediately enclosing +instance but is declared an `abstract` `value` class ([8.1.1.1], [8.1.1.5]).** + +> **If an abstract class is declared with neither the `value` nor the `identity` +> modifier, but it is an inner class and has an immediately enclosing instance, +> it is implicitly an `identity` class, per [8.1.1.5].** + +... + + + #### 8.1.4 Superclasses and Subclasses {#jls-8.1.4} The optional `extends` clause in a normal class declaration specifies the @@ -761,8 +816,110 @@ instance method.** +### 8.6 Instance Initializers {#jls-8.6} + +An *instance initializer* declared in a class is executed when an instance of +the class is created ([12.5], [15.9], [8.8.7.1]). + +*InstanceInitializer:* +: *Block* + +**It is a compile-time error for an `abstract` `value` class to declare an +instance initializer.** + +> **If an abstract class is declared with neither the `value` nor the `identity` +> modifier, but it declares an instance initializer, it is implicitly an +> `identity` class, per [8.1.1.5].** + +It is a compile-time error if an instance initializer cannot complete normally +([14.22]). + +It is a compile-time error if a `return` statement ([14.17]) appears anywhere +within an instance initializer. + +An instance initializer is permitted to refer to the current object using the +keyword `this` ([15.8.3]) or the keyword `super` ([15.11.2], [15.12]), and to +use any type variables in scope. + +Restrictions on how an instance initializer may refer to instance variables, +even when the instance variables are in scope, are specified in [8.3.3]. + +Exception checking for an instance initializer is specified in [11.2.3]. + + + ### 8.8 Constructor Declarations {#jls-8.8} +A *constructor* is used in the creation of an object that is an instance of a +class ([12.5], [15.9]). + +*ConstructorDeclaration:* +: {*ConstructorModifier*} *ConstructorDeclarator* [*Throws*] *ConstructorBody* + +*ConstructorDeclarator:* +: [*TypeParameters*] *SimpleTypeName*\ + `(` [*ReceiverParameter* `,`] [*FormalParameterList*] `)` + +*SimpleTypeName:* +: *TypeIdentifier* + +The rules in this section apply to constructors in all class declarations, +including enum declarations and record declarations. +However, special rules apply to enum declarations with regard to constructor +modifiers, constructor bodies, and default constructors; these rules are stated +in [8.9.2]. +Special rules also apply to record declarations with regard to constructors, as +stated in [8.10.4]. + +The *SimpleTypeName* in the *ConstructorDeclarator* must be the simple name of +the class that contains the constructor declaration, or a compile-time error +occurs. + +In all other respects, a constructor declaration looks just like a method +declaration that has no result ([8.4.5]). + +Constructor declarations are not members. +They are never inherited and therefore are not subject to hiding or overriding. + +**It is a compile-time error for an `abstract` `value` class to declare a +nontrivial constructor ([8.1.1.5]).** + +> **If an abstract class is declared with neither the `value` nor the `identity` +> modifier, but it declares a nontrivial constructor, it is implicitly an +> `identity` class, per [8.1.1.5].** + +:::editorial +It's not ideal to define a new term just for the purpose of this rule. But the +list of things to check is long, and we don't want to repeat it. Perhaps it +would be helpful to somehow overlap this definition with the discussion of +default constructors in [8.8.9]. +::: + +Constructors are invoked by class instance creation expressions ([15.9]), by the +conversions and concatenations caused by the string concatenation operator `+` +([15.18.1]), and by explicit constructor invocations from other constructors +([8.8.7]). +Access to constructors is governed by access modifiers ([6.6]), so it is +possible to prevent class instantiation by declaring an inaccessible constructor +([8.8.10]). + +Constructors are never invoked by method invocation expressions ([15.12]). + +:::example + +Example 8.8-1. Constructor Declarations + +``` +class Point { + int x, y; + Point(int x, int y) { this.x = x; this.y = y; } +} +``` + +::: + + + #### 8.8.7 Constructor Body {#jls-8.8.7} The first statement of a constructor body may be an explicit invocation of @@ -2231,7 +2388,7 @@ synchronization. [8.1.1.4]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.1.4 [8.1.1.5]: #jls-8.1.1.5 [8.1.2]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.2 -[8.1.3]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.3 +[8.1.3]: #jls-8.1.3 [8.1.4]: #jls-8.1.4 [8.1.5]: #jls-8.1.5 [8.1.6]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.1.6 @@ -2260,9 +2417,9 @@ synchronization. [8.4.8.3]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.4.8.3 [8.4.9]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.4.9 [8.5]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.5 -[8.6]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.6 +[8.6]: #jls-8.6 [8.7]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.7 -[8.8]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8 +[8.8]: #jls-8.8 [8.8.1]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8.1 [8.8.2]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8.2 [8.8.3]: https://docs.oracle.com/javase/specs/jls/se18/html/jls-8.html#jls-8.8.3 diff --git a/closed/src/java.se/share/specs/value-objects-jvms.md b/closed/src/java.se/share/specs/value-objects-jvms.md index fe747ad3bd..70e24541ce 100644 --- a/closed/src/java.se/share/specs/value-objects-jvms.md +++ b/closed/src/java.se/share/specs/value-objects-jvms.md @@ -1561,6 +1561,70 @@ Attribute Location `class` +#### 4.7.6 The `InnerClasses` Attribute {#jvms-4.7.6} + +... + +inner_class_access_flags + +: The value of the `inner_class_access_flags` item is a mask of flags used + to denote access permissions to and properties of class or interface *C* + as declared in the source code from which this `class` file was + compiled. + It is used by a compiler to recover the original information when source + code is not available. + The flags are specified in [Table 4.7.6-A]. + + ::: {.table #jvms-4.7.6-300-D.1-D.1} + + Table 4.7.6-A. Nested class access and property flags + + ---------------------------------------------------------------------------- + Flag Name Value Interpretation + ------------------------ ----------- --------------------------------------- + `ACC_PUBLIC` 0x0001 Marked or implicitly `public` in + source. + + `ACC_PRIVATE` 0x0002 Marked `private` in source. + + `ACC_PROTECTED` 0x0004 Marked `protected` in source. + + `ACC_STATIC` 0x0008 Marked or implicitly `static` in + source. + + `ACC_FINAL` 0x0010 Marked or implicitly `final` in + source. + + **`ACC_IDENTITY`** **0x0020** **Declared as an `identity` class or + interface.** + + **`ACC_VALUE`** **0x0040** **Declared as a `value` class or + interface.** + + `ACC_INTERFACE` 0x0200 Was an `interface` in source. + + `ACC_ABSTRACT` 0x0400 Marked or implicitly `abstract` in + source. + + `ACC_SYNTHETIC` 0x1000 Declared synthetic; not present in the + source code. + + `ACC_ANNOTATION` 0x2000 Declared as an annotation interface. + + `ACC_ENUM` 0x4000 Declared as an `enum` class. + ---------------------------------------------------------------------------- + + ::: + + All bits of the `inner_class_access_flags` item not assigned in [Table + 4.7.6-A] are reserved for future use. + They should be set to zero in generated `class` files and should be + ignored by Java Virtual Machine implementations. + +... + + + #### **4.7.31 The `Preload` Attribute** {#jvms-4.7.31} :::inserted From kevinb at google.com Thu May 26 17:12:56 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 26 May 2022 10:12:56 -0700 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com> <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com> Message-ID: Returning to this thread and going up a level or two: The real impact of this discussion, imho, should not be "now let's rush a declarative nullness feature out asap", or even "let's solve bucket 3 now in a way nullness will have to be harmonious with later". What I humbly suggest it points to is, maybe: "let's shift focus right now to delivering just bucket 2 asap, so that we keep our options open longer for the rest". Is that fair? It seems like a very good plan to me. Bucket 2 is pretty non-invasive to the language model and still improves matters for Integer. Thoughts? On Thu, May 12, 2022 at 3:14 PM Kevin Bourrillion wrote: > I don't see the conflict. I'm saying, yeah, there *will* be exclamation > fatigue until a feature comes along eventually to relieve it. (In the worst > case, that's `public null-marked class...`; in the best case it's just > `language-level 22;` or what have you.) But I still think it's the right > thing to do anyway. > > > On Thu, May 12, 2022 at 10:18 AM Brian Goetz > wrote: > >> >> >> >> * Exclamation fatigue would be very real, so assume there is some way >> to make `!` the default for some scope >> > +1 >> > >> > Yes, I think it's a dead end to expect users to sprinkle '!' everywhere >> they don't want nulls?this is usually the informal default in common >> programming practice, so we need some way to enable flipping the default. >> >> On the other hand, this is on a collision course with Kevin's >> "ref-default" recommendation, which had many strong supporting reasons, >> whether this is spelled `!` or `.val`. The "but it will be tiring for >> people to type" doesn't feel like a very good reason to flip the default >> from something that has such strong objective justifications. >> >> (Dan was never sold on ref-default, but Kevin was, so I'll leave it to >> him to reconcile "ref-default is the right default" with "but, >> exclamation fatigue.") >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu May 26 17:19:57 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 26 May 2022 13:19:57 -0400 Subject: Nullity (was: User model stacking: current status) In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7af12a5b-56ea-6918-7b29-f995f7697883@oracle.com> <66340928-4EB3-44BF-A72A-1DBEB5AED375@oracle.com> <98276008-57ed-5eb7-539e-2b966d7bde34@oracle.com> Message-ID: <2c6b7895-4427-d561-6969-777f038a1c2c@oracle.com> I agree that Bucket 2 is largely uncontroversial (and largely implemented) and makes a sensible unit of delivery -- with the proviso that we need to properly message that it will not yet deliver the performance improvements that most users are hoping to get out of Valhalla. There'll be no heap flattening, and no user-definable primitives.? There'll be improved optimization for on-stack values (which will appear to most users as "better escape analysis"). That said, I don't think this reduces the urgency to find a bucket-3 *design* that we like. On 5/26/2022 1:12 PM, Kevin Bourrillion wrote: > Returning to this thread and going up a level or two: > > The real impact of this discussion, imho, should not be "now let's > rush a declarative nullness feature out asap", or even "let's solve > bucket?3 now in a way nullness will have to be harmonious with later". > What I humbly suggest it points to is,?maybe: "let's shift focus right > now to delivering just bucket 2 asap, so that we keep our options open > longer for the rest". Is that fair? It seems like a very good plan to > me. Bucket 2 is pretty non-invasive to the language model and still > improves matters for Integer. > > Thoughts? > > > > On Thu, May 12, 2022 at 3:14 PM Kevin Bourrillion > wrote: > > I don't see the conflict. I'm saying, yeah, there *will* be > exclamation fatigue until a feature?comes along eventually > to?relieve it. (In the worst case, that's `public null-marked > class...`; in the best case it's just `language-level 22;` or what > have you.) But I still think it's the right thing to do anyway. > > > On Thu, May 12, 2022 at 10:18 AM Brian Goetz > wrote: > > > > >> * Exclamation fatigue would be very real, so assume there > is some way to make `!` the default for some scope > > +1 > > > > Yes, I think it's a dead end to expect users to sprinkle '!' > everywhere they don't want nulls?this is usually the informal > default in common programming practice, so we need some way to > enable flipping the default. > > On the other hand, this is on a collision course with Kevin's > "ref-default" recommendation, which had many strong supporting > reasons, > whether this is spelled `!` or `.val`.? The "but it will be > tiring for > people to type" doesn't feel like a very good reason to flip > the default > from something that has such strong objective justifications. > > (Dan was never sold on ref-default, but Kevin was, so I'll > leave it to > him to reconcile "ref-default is the right default" with "but, > exclamation fatigue.") > > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com From kevinb at google.com Thu May 26 17:33:49 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 26 May 2022 10:33:49 -0700 Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: References: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com> Message-ID: I'd like to bump this thread, as it seems to me to be the biggest obstacle to bucket 2 being able to deliver value. * A warning not just on synchronization, but on *any* identity-dependence. * Not special for Integer etc.; it all needs to work through a general facility that anyone can use. * We don't need the constructor warnings, though. * The annotation should evoke the idea of "this class is becoming a bucket 2 class". * It would be vestigial once the class *is* bucket-2. * I would lean against enshrining the "value-based" terminology even further (we can get into this if necessary). * I think we need an explicit way to clearly and *intentionally* depend on identity. This code would *prefer to break* if the objects in use became bucket-2. e.g.: * o1.identity() == o2.identity() // I like this * System.identity(o1) == System.identity(o2) // this too * System.identityEquals(o1, o2) * o1 === o2 Thoughts? On Tue, Apr 26, 2022 at 3:09 PM Kevin Bourrillion wrote: > Above, when I said the proposed `==` behavior is "not a behavior that > anyone ever *actually wants* -- unless they just happen to have no fields > of reference types at all", I did leave out some other cases. Like when > your only field types (recursing down fields of value types) that are > reference types are types that don't override `equals()` (e.g. `Function`). > In a way this sort of furthers my argument that the boundary between when > `==` is safely an `equals` synonym and when it isn't is going to be > difficult to perceive. Yet, since people hunger for `==` to really mean > `equals`, they are highly overwhelmingly likely to do it as much as > possible whenever they are convinced it looks safe. And then one addition > of a string field in some leaf-level type can break a whole lot of code. > > > On Tue, Apr 26, 2022 at 2:53 PM Dan Smith wrote: > > Yes, a public annotation was the original proposal. At some point we >> scaled that back to just JDK-internal. The discussions were a long time >> ago, but if I remember right the main concern was that a formalized, Java >> SE notion of "value-based class" would lead to some unwanted complexity >> when we eventually get to *real* value classes (e.g., a misguided CS 101 >> course question: "what's the difference between a value-based class and a >> value class? which one should you use?"). >> > > Yeah, I hear that. The word "value" does have multiple confusable > meanings. I'd say the key difference is that "value semantics" are > logically a *recursive* rejection of identity, while a Valhalla B2/B3 class > on its own addresses only one level deep. > > Anyway, I think what I'm proposing avoids trouble by specifically labeling > one state as simply the transitional state to the other. I'm not sure > there'd be much to get hung up on. > > > >> It seemed like producing some special warnings for JDK classes would >> address the bulk of the problem without needing to fall into this trap. >> > > I'd just say it addresses a more specific problem: how *those* particular > classes can become B2/B3 (non-identity) classes. > > > >> Would an acceptable compromise be for a third-party tool to support its >> own annotations, while also recognizing @jdk.internal.ValueBased as an >> alternative spelling of the same thing? >> > > I think it's "a" compromise :-), I will just have to work through how > acceptable. > > Is there any such thing as a set of criteria for when a warning deserves > to be handled by javac instead of left to all the world's aftermarket > static analyzers to handle? > > (Secondarily... why are we warning only on synchronization, and not on >> `==` or (marginal) `identityHC`?) >> >> I think this was simply not a battle that we wanted to fight?discouraging >> all uses of '==' on type Integer, for example. >> > > Who would be fighting the other side of that battle? Not anyone having > some *need* to use `==` over `.equals()`, because we'll be breaking them > when Integer changes buckets anyway. So... just the users saying "we should > get to use `==` as a shortcut for `.equals()` as long as we stay within the > cached range"? Oh, wait: > > > Within these constraints, there are reasonable things that can be done >> with '==', like optimizing for a situation where 'equals' is likely to be >> true. >> > > Ok, that too. Fair I suppose... it's just that it's such a very special > case... > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu May 26 19:57:38 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 26 May 2022 12:57:38 -0700 Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: References: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com> Message-ID: On Thu, May 26, 2022 at 10:57 AM Dan Heidinga wrote: This will have high costs in the regular performance model as it will > Sorry, I should have mentioned up front that we'd have to be content with only the warnings we can spot at compile-time. I also should have been clear that none of this is super well thought-out yet; I'm just hoping to get a conversation going. > * I think we need an explicit way to clearly and *intentionally* depend > on identity. This code would *prefer to break* if the objects in use became > bucket-2. e.g.: > > * o1.identity() == o2.identity() // I like this > > * System.identity(o1) == System.identity(o2) // this too > > Are these marker methods? What would they return? In the name of including a wide range of possibilities, I included some crazy ones that almost certainly won't pan out. I should have thought more and ruled them out before sending. I like them "syntactically", supporting the notion that an object's identity iis like an attribute, but to expose that identity as a type of its own will not make sense. Thanks for responding. > --Dan > > > > > Thoughts? > > > > > > On Tue, Apr 26, 2022 at 3:09 PM Kevin Bourrillion > wrote: > >> > >> Above, when I said the proposed `==` behavior is "not a behavior that > anyone ever *actually wants* -- unless they just happen to have no fields > of reference types at all", I did leave out some other cases. Like when > your only field types (recursing down fields of value types) that are > reference types are types that don't override `equals()` (e.g. `Function`). > In a way this sort of furthers my argument that the boundary between when > `==` is safely an `equals` synonym and when it isn't is going to be > difficult to perceive. Yet, since people hunger for `==` to really mean > `equals`, they are highly overwhelmingly likely to do it as much as > possible whenever they are convinced it looks safe. And then one addition > of a string field in some leaf-level type can break a whole lot of code. > >> > >> > >> On Tue, Apr 26, 2022 at 2:53 PM Dan Smith > wrote: > >> > >>> Yes, a public annotation was the original proposal. At some point we > scaled that back to just JDK-internal. The discussions were a long time > ago, but if I remember right the main concern was that a formalized, Java > SE notion of "value-based class" would lead to some unwanted complexity > when we eventually get to *real* value classes (e.g., a misguided CS 101 > course question: "what's the difference between a value-based class and a > value class? which one should you use?"). > >> > >> > >> Yeah, I hear that. The word "value" does have multiple confusable > meanings. I'd say the key difference is that "value semantics" are > logically a *recursive* rejection of identity, while a Valhalla B2/B3 class > on its own addresses only one level deep. > >> > >> Anyway, I think what I'm proposing avoids trouble by specifically > labeling one state as simply the transitional state to the other. I'm not > sure there'd be much to get hung up on. > >> > >> > >>> > >>> It seemed like producing some special warnings for JDK classes would > address the bulk of the problem without needing to fall into this trap. > >> > >> > >> I'd just say it addresses a more specific problem: how *those* > particular classes can become B2/B3 (non-identity) classes. > >> > >> > >>> > >>> Would an acceptable compromise be for a third-party tool to support > its own annotations, while also recognizing @jdk.internal.ValueBased as an > alternative spelling of the same thing? > >> > >> > >> I think it's "a" compromise :-), I will just have to work through how > acceptable. > >> > >> Is there any such thing as a set of criteria for when a warning > deserves to be handled by javac instead of left to all the world's > aftermarket static analyzers to handle? > >> > >>> (Secondarily... why are we warning only on synchronization, and not on > `==` or (marginal) `identityHC`?) > >>> > >>> I think this was simply not a battle that we wanted to > fight?discouraging all uses of '==' on type Integer, for example. > >> > >> > >> Who would be fighting the other side of that battle? Not anyone having > some need to use `==` over `.equals()`, because we'll be breaking them when > Integer changes buckets anyway. So... just the users saying "we should get > to use `==` as a shortcut for `.equals()` as long as we stay within the > cached range"? Oh, wait: > >> > >> > >>> Within these constraints, there are reasonable things that can be done > with '==', like optimizing for a situation where 'equals' is likely to be > true. > >> > >> > >> Ok, that too. Fair I suppose... it's just that it's such a very special > case... > >> > >> -- > >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > > > > > > -- > > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com