From brian.goetz at oracle.com Fri Apr 1 13:28:44 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 1 Apr 2022 09:28:44 -0400 Subject: Object as a concrete class In-Reply-To: References: Message-ID: <3265acd1-cf7f-089a-2940-e0d19f27343a@oracle.com> > assert new Object().hasIdentity(); > assert !new Point().hasIdentity(); > > But the 'hasIdentity' method can contain arbitrary logic, and doesn't necessarily need to correlate with 'getClass().isIdentityClass()'. More precisely, we were being held hostage to the nature of interfaces; by using `implements IdentityObject` as our measure of id-ness, we forced legacy Object instances to be instances of some other class, so that Object didn't need to implement the interface. > I don't see a useful way to generalize this to other "both kinds" classes (for example, any class with an instance field must be an identity class or a value class). But since we have to make special allowances for Object one way or another, it does seem plausible that we let 'new Object()' continue to create direct instances of class Object, and then specify the following special rules: > > - All concrete, non-value classes are implicitly identity classes *except for Object* > > - The 'new' bytecode is allowed to be used with concrete identity classes *and class Object* > > - Identity objects include instances of identity classes, arrays, *and instances of Object*; 'hasIdentity' reflects this > > - [anything else?] > > There's some extra complexity here, but balanced against the cost of making every Java programmer adjust their model of what 'new Object()' means, and corresponding coding style refactorings, it seems like a win. > > Thoughts? Seems like a win. From brian.goetz at oracle.com Fri Apr 1 14:19:59 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 1 Apr 2022 10:19:59 -0400 Subject: [External] : Re: Object as a concrete class In-Reply-To: References: <3265acd1-cf7f-089a-2940-e0d19f27343a@oracle.com> Message-ID: > The careful API design will be key as I can see a lot of corner cases > related to `obj.getClass().isIdentityClass() != obj.hasIdentity()`. > Do we have a sketch of what the apis for this would look like? I'm > assuming these are just for expository purposes as isIdentityClass() > really needs to return a trinary value - {true, false, maybe}. Let's step back and ask "what is the purpose of Class::isIdentityClass".? It kind of got conflated with the dynamic check of "does this object have identity", but really, this should be a query about how the class is declared -- identity class, value class, or identity-agnostic class.? The latter bucket includes all interfaces, Object, and some abstract classes.? So to the extent we have this method at all, a tri-value return seems almost a forced move. From forax at univ-mlv.fr Fri Apr 1 14:28:16 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 1 Apr 2022 16:28:16 +0200 (CEST) Subject: Object as a concrete class In-Reply-To: References: <3265acd1-cf7f-089a-2940-e0d19f27343a@oracle.com> Message-ID: <1299918887.5703073.1648823296629.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "Brian Goetz" > Cc: "daniel smith" , "valhalla-spec-experts" > Sent: Friday, April 1, 2022 3:50:20 PM > Subject: Re: Object as a concrete class > On Fri, Apr 1, 2022 at 9:29 AM Brian Goetz wrote: >> >> >> > assert new Object().hasIdentity(); >> > assert !new Point().hasIdentity(); >> > >> > But the 'hasIdentity' method can contain arbitrary logic, and doesn't >> > necessarily need to correlate with 'getClass().isIdentityClass()'. >> >> More precisely, we were being held hostage to the nature of interfaces; >> by using `implements IdentityObject` as our measure of id-ness, we >> forced legacy Object instances to be instances of some other class, so >> that Object didn't need to implement the interface. >> >> > I don't see a useful way to generalize this to other "both kinds" classes (for >> > example, any class with an instance field must be an identity class or a value >> > class). But since we have to make special allowances for Object one way or >> > another, it does seem plausible that we let 'new Object()' continue to create >> > direct instances of class Object, and then specify the following special rules: >> > >> > - All concrete, non-value classes are implicitly identity classes *except for >> > Object* >> > >> > - The 'new' bytecode is allowed to be used with concrete identity classes *and >> > class Object* >> > >> > - Identity objects include instances of identity classes, arrays, *and instances >> > of Object*; 'hasIdentity' reflects this >> > >> > - [anything else?] >> > >> > There's some extra complexity here, but balanced against the cost of making >> > every Java programmer adjust their model of what 'new Object()' means, and >> > corresponding coding style refactorings, it seems like a win. >> > >> > Thoughts? >> >> Seems like a win. >> > > The alternative - which we've been exploring till now - is to have the > VM do some magic to turn: > > 0: new #2 // class java/lang/Object > 3: dup > 4: invokespecial #1 // Method java/lang/Object."":()V > > into the `new` of a VM-specified Object subclass, and to allow > invoking a super-class constructor on it rather than its own > (remember, the instance is an Object subclass). > > That's a lot of magic we can do away with through specification and > some careful API design. Seems like a win. It is too magic for me, especially because new Object().getClass() != Object.class > > The careful API design will be key as I can see a lot of corner cases > related to `obj.getClass().isIdentityClass() != obj.hasIdentity()`. > Do we have a sketch of what the apis for this would look like? I'm > assuming these are just for expository purposes as isIdentityClass() > really needs to return a trinary value - {true, false, maybe}. Whatever we choose, the VM doing a rewriting trick (**) or not, the API is the same anyway and as you said it has to be carefully written. We have several ways to frame the discrepancy around Object.class, one is to say that if Object acts as a class, it's an identity class, if Object acts as a type (type of a parameter, a field, etc.) then it accepts both identity and value class. I believe that most of the question about the Class as a type are better answered by using isAssignableFrom. The case where someone needs to know if a type allows only identity class or only value class seems a weird question compared to, can i assign that value ? So i think we may be fine by only implementing very few methods - Class.isValueClass() and - Class.isPrimitiveClass(). All other questions can be implemented in user code, by example type == Object.class || type.isInterface() || !type.isValueClass() > > --Dan R?mi ** and un-rewriting when the bytecode is asked through JVMTI ? From kevinb at google.com Fri Apr 1 15:48:16 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 1 Apr 2022 08:48:16 -0700 Subject: Object as a concrete class In-Reply-To: <1299918887.5703073.1648823296629.JavaMail.zimbra@u-pem.fr> References: <3265acd1-cf7f-089a-2940-e0d19f27343a@oracle.com> <1299918887.5703073.1648823296629.JavaMail.zimbra@u-pem.fr> Message-ID: On Fri, Apr 1, 2022 at 7:28 AM Remi Forax wrote: It is too magic for me, especially because > new Object().getClass() != Object.class > Sounds * not weird enough to rule out, but * weird enough to "deprecate" `new Object()` and make Object abstract From daniel.smith at oracle.com Fri Apr 1 17:02:38 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 1 Apr 2022 17:02:38 +0000 Subject: Alternative to IdentityObject & ValueObject interfaces In-Reply-To: References: Message-ID: <7843DB53-2E1D-4371-A2FD-AF0F9EE928ED@oracle.com> On Mar 22, 2022, at 10:52 PM, Dan Smith > wrote: On Mar 22, 2022, at 7:21 PM, Dan Heidinga > wrote: A couple of comments on the encoding and questions related to descriptors. JVM proposal: - Same conceptual framework. - Classes can be ACC_VALUE, ACC_IDENTITY, or neither. - Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are not. Optionally, modern-version concrete classes are also implicitly ACC_IDENTITY. Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER bits, then any class without one of the bits set (including all the legacy classes) are identity classes. (Trying out this alternative approach to abstract classes: there's no more ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically ACC_IDENTITY, and modern-version abstract classes permit value subclasses unless they opt out with ACC_IDENTITY. It's the bytecode generator's responsibility to set these flags appropriately. Conceptually cleaner, maybe too risky...) With the "clever" encoding, every class is implicitly identity unless it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to explicitly flag modern abstract classes. This is kind of growing on me. A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. Abstract classes and interfaces have to get two different behaviors based on the same 0 bits. Here's another more stable encoding, though, that feels less fiddly to me than what I originally wrote: ACC_VALUE means "allows value object instances" ACC_IDENTITY means "allows identity object instances" If you set *both*, you're a "neither" class/interface. (That is, you allow both kinds of instances.) If you set *none*, you get the default/legacy behavior implicitly: classes are ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE. Update on encoding: after some internal discussion, I've found this to be the most natural fit: - ACC_VALUE (0x0040) corresponds to the 'value' keyword in source files - ACC_IDENTITY (0x0020) corresponds to the (often implicit) 'identity' keyword in source files - If neither is set, the class/interface supports both kinds of subclasses (and must be abstract) - If both are set, or any supers' flags conflict, it's an error - In older-version classes (not interfaces), ACC_IDENTITY is assumed to be set What about newer-version classes that use old encodings? (E.g., a tool bumps its output version number but isn't aware of these flags.) There's a sneaky trick here that minimizes the risk: ACC_IDENTITY is re-using the old ACC_SUPER, which no longer has any effect and that we've encouraged to be set since Java 1.0.2. So if you're already setting ACC_SUPER in your classes, you've automatically opted in to ACC_IDENTITY; doing something different requires making changes to the generated code. So the remaining incompatibility risk is that someone generates a class (not an interface) with a newer version number and with neither flag set (violating the "always set ACC_SUPER" advice), and then either the class won't load (it's concrete, it declares an instance field, etc.), or it's abstract and accidentally supports value subclasses, and so can be instantiated without running logic. The number of unlikely events in this scenario seem like enough for us not to be concerned. From daniel.smith at oracle.com Wed Apr 6 16:00:48 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 6 Apr 2022 16:00:48 +0000 Subject: EG meeting, 2022-04-06 Message-ID: Sorry, missed putting this mail together earlier. Planning to meet today, I'll be a little late. Let's plan on starting about 15 minutes after. Thanks, Dan From daniel.smith at oracle.com Wed Apr 20 05:18:31 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 20 Apr 2022 05:18:31 +0000 Subject: EG meeting *canceled*, 2022-04-20 Message-ID: No new email threads, we'll cancel this time. From kevinb at google.com Fri Apr 22 22:38:12 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Apr 2022 15:38:12 -0700 Subject: Objects vs. values, the continuation Message-ID: I'd like to remind everyone about this (self-important-sounding) document I shared some months ago: Data in Java programs: a basic conceptual model I may have undersold it a bit last time. True, it's not the final word on the only possible conceptual model anyone could ever form; however, it is at least a very extensively thought-out and reviewed and self-consistent one. I've also revised it a lot since you saw it (and it's still open for review). If nothing else, at least when I make arguments on this list you don't have to wonder what they are based on; I've laid it all out in black and white. And on that subject... The crux of that doc for Valhalla purposes is its clear separation between *objects* and *values* as wholly disjoint concepts. An *object*: has its own independent existence; is self-describing, thus can be polymorphic; is always indirected / accessed via reference; is eligible to have identity. A *value*: has no independent existence; is container-described, thus is strictly monomorphic; is always used directly / inline; cannot have identity. (Yes, I'm glossing over that references are also values, here.) What *unifies* objects and values (setting aside references) is that they are all *instances*. (First, to parrot Brian of a while ago: for a primitive type, the values are the instances of the type; for a reference type, the values are references to the instances of the type, those instances being objects.) Some instances are of a type that came from a class, so they get to have *members*. Some instances of are of a type that never *used* to have a class, but will now (int, etc.) -- yay. And some are of array types, which act like halfway-fake classes with a few halfway-fake members. Members for everybody, more or less! Though we have at times said "the goal of Valhalla is to make everything an object", I claim the unification we really want is for everything to be a class instance. I think that gives us enough common ground to focus on when we don't otherwise know which thing the thing is (e.g. with a type variable). One thing I like very much about this is that it fits perfectly with the fact that Integer is a subtype of Object and int is not. The way I think bucket 2 can and should be explained is: "in the *programming model* it absolutely is an object. In the *performance model*, the VM can replace it undetectably with a (compound) value. But that is behind the scenes; it's still behaviorally an object and don't feel bad about calling it an object." To me this is very simple and sensible. If we instead want to say "the int value 5 is an object now too", then we have some problems: * I think it ruins those clean explanations just given * we'd need to coin some new term to mean exactly what I've just proposed that "object" mean, and I have no idea what that would be (do you?) What are the latest thoughts on this? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Sun Apr 24 15:57:57 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 24 Apr 2022 15:57:57 +0000 Subject: Objects vs. values, the continuation In-Reply-To: References: Message-ID: Overall I find a lot to like about this presentation. I?m still a little iffy about whether we can redefine the letters o-b-j-e-c-t in this way, but that is largely a ?syntax? reaction to your statements; the substance of the statements sounds about right. I especially like this bit: The way I think bucket 2 can and should be explained is: "in the programming model it absolutely is an object. In the performance model, the VM can replace it undetectably with a (compound) value. But that is behind the scenes; it's still behaviorally an object and don't feel bad about calling it an object." To me this is very simple and sensible. I think what is missing from our presentation ? and likely key to succeeding ? is how to describe ?compound value? in a way that feels like a thing. Users mostly understand how primitives are different from objects, but ?loose bag of primitives with limited integrity constraints? is a new and complex concept that I worry users will have a hard time keeping separate from their conception of object. Once we start aggregating fields, the temptation is to say ?that?s like an object? and then carry with it some incorrect assumptions of integrity (e.g., final field guarantees) On Apr 22, 2022, at 6:38 PM, Kevin Bourrillion > wrote: I'd like to remind everyone about this (self-important-sounding) document I shared some months ago: Data in Java programs: a basic conceptual model I may have undersold it a bit last time. True, it's not the final word on the only possible conceptual model anyone could ever form; however, it is at least a very extensively thought-out and reviewed and self-consistent one. I've also revised it a lot since you saw it (and it's still open for review). If nothing else, at least when I make arguments on this list you don't have to wonder what they are based on; I've laid it all out in black and white. And on that subject... The crux of that doc for Valhalla purposes is its clear separation between objects and values as wholly disjoint concepts. An object: has its own independent existence; is self-describing, thus can be polymorphic; is always indirected / accessed via reference; is eligible to have identity. A value: has no independent existence; is container-described, thus is strictly monomorphic; is always used directly / inline; cannot have identity. (Yes, I'm glossing over that references are also values, here.) What unifies objects and values (setting aside references) is that they are all instances. (First, to parrot Brian of a while ago: for a primitive type, the values are the instances of the type; for a reference type, the values are references to the instances of the type, those instances being objects.) Some instances are of a type that came from a class, so they get to have members. Some instances of are of a type that never used to have a class, but will now (int, etc.) -- yay. And some are of array types, which act like halfway-fake classes with a few halfway-fake members. Members for everybody, more or less! Though we have at times said "the goal of Valhalla is to make everything an object", I claim the unification we really want is for everything to be a class instance. I think that gives us enough common ground to focus on when we don't otherwise know which thing the thing is (e.g. with a type variable). One thing I like very much about this is that it fits perfectly with the fact that Integer is a subtype of Object and int is not. The way I think bucket 2 can and should be explained is: "in the programming model it absolutely is an object. In the performance model, the VM can replace it undetectably with a (compound) value. But that is behind the scenes; it's still behaviorally an object and don't feel bad about calling it an object." To me this is very simple and sensible. If we instead want to say "the int value 5 is an object now too", then we have some problems: * I think it ruins those clean explanations just given * we'd need to coin some new term to mean exactly what I've just proposed that "object" mean, and I have no idea what that would be (do you?) What are the latest thoughts on this? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Sun Apr 24 22:23:32 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 00:23:32 +0200 (CEST) Subject: Objects vs. values, the continuation In-Reply-To: References: Message-ID: <662061423.16075805.1650839012022.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "valhalla-spec-experts" > Sent: Saturday, April 23, 2022 12:38:12 AM > Subject: Objects vs. values, the continuation > I'd like to remind everyone about this (self-important-sounding) document I > shared some months ago: [ > https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/preview > | Data in Java programs: a basic conceptual model ] > I may have undersold it a bit last time. True, it's not the final word on the > only possible conceptual model anyone could ever form; however, it is at least > a very extensively thought-out and reviewed and self-consistent one. I've also > revised it a lot since you saw it (and it's still open for review). If nothing > else, at least when I make arguments on this list you don't have to wonder what > they are based on; I've laid it all out in black and white. And on that > subject... > The crux of that doc for Valhalla purposes is its clear separation between > objects and values as wholly disjoint concepts. > An object : has its own independent existence; is self-describing, thus can be > polymorphic; is always indirected / accessed via reference; is eligible to have > identity. > A value : has no independent existence; is container-described, thus is strictly > monomorphic; is always used directly / inline; cannot have identity. (Yes, I'm > glossing over that references are also values, here.) > What unifies objects and values (setting aside references) is that they are all > instances . > (First, to parrot Brian of a while ago: for a primitive type, the values are the > instances of the type; for a reference type, the values are references to the > instances of the type, those instances being objects.) > Some instances are of a type that came from a class, so they get to have members > . Some instances of are of a type that never used to have a class, but will now > (int, etc.) -- yay. And some are of array types, which act like halfway-fake > classes with a few halfway-fake members. Members for everybody, more or less! > Though we have at times said "the goal of Valhalla is to make everything an > object", I claim the unification we really want is for everything to be a class > instance. I think that gives us enough common ground to focus on when we don't > otherwise know which thing the thing is (e.g. with a type variable). As we discussed earlier, there are two approaches, one is to say that instance of class = object | value the other is to say that instance of class = object = reference object | immediate object I prefer the later to the former, because it does not goes against what people already think, said differently we add more vocabulary instead of trying to refine the existing vocabulary. I've done several talks on Valhalla, > One thing I like very much about this is that it fits perfectly with the fact > that Integer is a subtype of Object and int is not. I've not a clear idea how primitive will be retrofit to be "immediate class" but at some point, we will want ArrayList to be valid with E still bounded by Object, so the sentence "int is not a subtype of java.lang.Object" may be wrong depending how we retrofit existing primitive types. > The way I think bucket 2 can and should be explained is: "in the programming > model it absolutely is an object. In the performance model , the VM can replace > it undetectably with a (compound) value. But that is behind the scenes; it's > still behaviorally an object and don't feel bad about calling it an object." To > me this is very simple and sensible. B2 are nullable immediate objects. > If we instead want to say "the int value 5 is an object now too", then we have > some problems: > * I think it ruins those clean explanations just given > * we'd need to coin some new term to mean exactly what I've just proposed that > "object" mean, and I have no idea what that would be (do you?) > What are the latest thoughts on this? I like "immediate" to characterize objects without a stable address in memory. R?mi From forax at univ-mlv.fr Sun Apr 24 22:30:19 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 00:30:19 +0200 (CEST) Subject: Objects vs. values, the continuation In-Reply-To: References: Message-ID: <432082437.16075908.1650839417856.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Kevin Bourrillion" > Cc: "valhalla-spec-experts" > Sent: Sunday, April 24, 2022 5:57:57 PM > Subject: Re: Objects vs. values, the continuation > Overall I find a lot to like about this presentation. I?m still a little iffy > about whether we can redefine the letters o-b-j-e-c-t in this way, but that is > largely a ?syntax? reaction to your statements; the substance of the statements > sounds about right. > I especially like this bit: >> The way I think bucket 2 can and should be explained is: "in the programming >> model it absolutely is an object. In the performance model , the VM can replace >> it undetectably with a (compound) value. But that is behind the scenes; it's >> still behaviorally an object and don't feel bad about calling it an object." To >> me this is very simple and sensible. > I think what is missing from our presentation ? and likely key to succeeding ? > is how to describe ?compound value? in a way that feels like a thing. Users > mostly understand how primitives are different from objects, but ?loose bag of > primitives with limited integrity constraints? is a new and complex concept > that I worry users will have a hard time keeping separate from their conception > of object. Once we start aggregating fields, the temptation is to say ?that?s > like an object? and then carry with it some incorrect assumptions of integrity > (e.g., final field guarantees) Having loose integrity is a property of primitive class but it's also a property of double and long. This is not how we describe long and double, at least not until we introduce the notion of concurrency. I think that having a default value / not being null is a property that is easier to understand and easier to grasp than the concept of integrity. Or maybe i'm not understanding what integrity really means. R?mi >> On Apr 22, 2022, at 6:38 PM, Kevin Bourrillion < [ mailto:kevinb at google.com | >> kevinb at google.com ] > wrote: >> I'd like to remind everyone about this (self-important-sounding) document I >> shared some months ago: [ >> https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/preview >> | Data in Java programs: a basic >> conceptual model ] >> I may have undersold it a bit last time. True, it's not the final word on the >> only possible conceptual model anyone could ever form; however, it is at least >> a very extensively thought-out and reviewed and self-consistent one. I've also >> revised it a lot since you saw it (and it's still open for review). If nothing >> else, at least when I make arguments on this list you don't have to wonder what >> they are based on; I've laid it all out in black and white. And on that >> subject... >> The crux of that doc for Valhalla purposes is its clear separation between >> objects and values as wholly disjoint concepts. >> An object : has its own independent existence; is self-describing, thus can be >> polymorphic; is always indirected / accessed via reference; is eligible to have >> identity. >> A value : has no independent existence; is container-described, thus is strictly >> monomorphic; is always used directly / inline; cannot have identity. (Yes, I'm >> glossing over that references are also values, here.) >> What unifies objects and values (setting aside references) is that they are all >> instances . >> (First, to parrot Brian of a while ago: for a primitive type, the values are the >> instances of the type; for a reference type, the values are references to the >> instances of the type, those instances being objects.) >> Some instances are of a type that came from a class, so they get to have members >> . Some instances of are of a type that never used to have a class, but will now >> (int, etc.) -- yay. And some are of array types, which act like halfway-fake >> classes with a few halfway-fake members. Members for everybody, more or less! >> Though we have at times said "the goal of Valhalla is to make everything an >> object", I claim the unification we really want is for everything to be a class >> instance. I think that gives us enough common ground to focus on when we don't >> otherwise know which thing the thing is (e.g. with a type variable). >> One thing I like very much about this is that it fits perfectly with the fact >> that Integer is a subtype of Object and int is not. >> The way I think bucket 2 can and should be explained is: "in the programming >> model it absolutely is an object. In the performance model , the VM can replace >> it undetectably with a (compound) value. But that is behind the scenes; it's >> still behaviorally an object and don't feel bad about calling it an object." To >> me this is very simple and sensible. >> If we instead want to say "the int value 5 is an object now too", then we have >> some problems: >> * I think it ruins those clean explanations just given >> * we'd need to coin some new term to mean exactly what I've just proposed that >> "object" mean, and I have no idea what that would be (do you?) >> What are the latest thoughts on this? >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | >> kevinb at google.com ] From forax at univ-mlv.fr Sun Apr 24 22:37:19 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 00:37:19 +0200 (CEST) Subject: Objects vs. values, the continuation In-Reply-To: <662061423.16075805.1650839012022.JavaMail.zimbra@u-pem.fr> References: <662061423.16075805.1650839012022.JavaMail.zimbra@u-pem.fr> Message-ID: <1959582551.16076201.1650839839215.JavaMail.zimbra@u-pem.fr> > From: "Remi Forax" > To: "Kevin Bourrillion" > Cc: "valhalla-spec-experts" > Sent: Monday, April 25, 2022 12:23:32 AM > Subject: Re: Objects vs. values, the continuation >> From: "Kevin Bourrillion" >> To: "valhalla-spec-experts" >> Sent: Saturday, April 23, 2022 12:38:12 AM >> Subject: Objects vs. values, the continuation >> I'd like to remind everyone about this (self-important-sounding) document I >> shared some months ago: [ >> https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/preview >> | Data in Java programs: a basic conceptual model ] >> I may have undersold it a bit last time. True, it's not the final word on the >> only possible conceptual model anyone could ever form; however, it is at least >> a very extensively thought-out and reviewed and self-consistent one. I've also >> revised it a lot since you saw it (and it's still open for review). If nothing >> else, at least when I make arguments on this list you don't have to wonder what >> they are based on; I've laid it all out in black and white. And on that >> subject... >> The crux of that doc for Valhalla purposes is its clear separation between >> objects and values as wholly disjoint concepts. >> An object : has its own independent existence; is self-describing, thus can be >> polymorphic; is always indirected / accessed via reference; is eligible to have >> identity. >> A value : has no independent existence; is container-described, thus is strictly >> monomorphic; is always used directly / inline; cannot have identity. (Yes, I'm >> glossing over that references are also values, here.) >> What unifies objects and values (setting aside references) is that they are all >> instances . >> (First, to parrot Brian of a while ago: for a primitive type, the values are the >> instances of the type; for a reference type, the values are references to the >> instances of the type, those instances being objects.) >> Some instances are of a type that came from a class, so they get to have members >> . Some instances of are of a type that never used to have a class, but will now >> (int, etc.) -- yay. And some are of array types, which act like halfway-fake >> classes with a few halfway-fake members. Members for everybody, more or less! >> Though we have at times said "the goal of Valhalla is to make everything an >> object", I claim the unification we really want is for everything to be a class >> instance. I think that gives us enough common ground to focus on when we don't >> otherwise know which thing the thing is (e.g. with a type variable). > As we discussed earlier, there are two approaches, one is to say that > instance of class = object | value > the other is to say that > instance of class = object = reference object | immediate object > I prefer the later to the former, because it does not goes against what people > already think, said differently we add more vocabulary instead of trying to > refine the existing vocabulary. > I've done several talks on Valhalla, (oops, part of that sentence went missing) I've done several talks on Valhalla, talking about value object or immediate object resonate more with the attendees than trying to redefine what an instance of a class is. R?mi From brian.goetz at oracle.com Sun Apr 24 23:57:28 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 24 Apr 2022 23:57:28 +0000 Subject: [External] : Re: Objects vs. values, the continuation In-Reply-To: <432082437.16075908.1650839417856.JavaMail.zimbra@u-pem.fr> References: <432082437.16075908.1650839417856.JavaMail.zimbra@u-pem.fr> Message-ID: I agree totally, the former are semantic properties and the latter is a side effect of representation. But that doesn?t help us much, because if people assume that these have the same finial field safety / integrity properties as reference objects, they will be in for a painful surprise. So this has to be part of the story. Sent from my iPad > On Apr 24, 2022, at 6:30 PM, Remi Forax wrote: > > > I think that having a default value / not being null is a property that is easier to understand and easier to grasp than the concept of integrity. From kevinb at google.com Mon Apr 25 02:17:19 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Sun, 24 Apr 2022 22:17:19 -0400 Subject: Objects vs. values, the continuation In-Reply-To: <662061423.16075805.1650839012022.JavaMail.zimbra@u-pem.fr> References: <662061423.16075805.1650839012022.JavaMail.zimbra@u-pem.fr> Message-ID: On Sun, Apr 24, 2022 at 6:23 PM Remi Forax wrote: As we discussed earlier, there are two approaches, one is to say that > instance of class = object | value > the other is to say that > instance of class = object = reference object | immediate object > > I prefer the later to the former, because it does not goes against what > people already think, > Please let us not mistake this: BOTH choices involve redefining what a large number of people already think. That is inescapable and is part of the cost of doing this project. People think "all objects can be locked". We want to break that. They think "everything I can access members on is an object". I want to break that. They think "an object is always accessed via a reference". You want to break that. Etc. etc. This is just what happens when there's been a missing quadrant for decades. One thing I like very much about this is that it fits perfectly with the > fact that Integer is a subtype of Object and int is not. > > I've not a clear idea how primitive will be retrofit to be "immediate > class" but at some point, we will want ArrayList to be valid with E > still bounded by Object, so the sentence "int is not a subtype of > java.lang.Object" may be wrong depending how we retrofit existing primitive > types. > No, this is my whole point. It's important for them to know that that is NOT subtyping. (The only kind of subtyping people know and the only kind they *should* think of is polymorphic subtyping.) They should continue to think of "A is a subtype of B" as meaning that they can interact with a B that is actually itself really an A; it's what Integer (but not int) gives you. This relationship is called something else ("extends", maybe?) and is what array covariance is also being rebased on. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Mon Apr 25 02:37:19 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Sun, 24 Apr 2022 22:37:19 -0400 Subject: Objects vs. values, the continuation In-Reply-To: References: Message-ID: On Sun, Apr 24, 2022 at 11:58 AM Brian Goetz wrote: I think what is missing from our presentation ? and likely key to > succeeding ? is how to describe ?compound value? in a way that feels like a > thing. > Well, a `double` is already a compound value that feels like a thing. Java just hides the internal structure instead of having us access d.exponent directly etc. Is that a useful angle? I'm not sure, but right now I think it is. > Users mostly understand how primitives are different from objects, but > ?loose bag of primitives with limited integrity constraints? is a new and > complex concept > that I worry users will have a hard time keeping separate from their > conception of object. Once we start aggregating fields, the temptation is > to say ?that?s like an object? and then carry with it some incorrect > assumptions of integrity (e.g., final field guarantees) > Yes, no matter what there will be some list of mantras we have to try to retrain people with. We want the list to be small, natural, learnable, but we can't make it go away. So in what I propose, one of those mantras is "my custom value types are like int, but composite(compound) --> an int was never an object and it still isn't --> so neither are my value types" (Of course, I'm still on the path that we *want* users to lean on their existing understanding of int-vs.-Integer for understanding MyType.val-vs.-MyType.ref, which I know has been controversial in the past.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Mon Apr 25 02:52:50 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Sun, 24 Apr 2022 22:52:50 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo Message-ID: Hi, The current plan for `primitive class Foo` -- to call the value type `Foo` and the reference type `Foo.ref` -- is causing a few problems that I think are unnecessary. I've felt for a while now that we are favoring the wrong default. We should let `Foo` be the reference type and require `Foo.val` (precise syntax aside) for the value type. I started to list reasons and came up with more than expected. 1. The option with fewer hazards should usually be the default. Users won't opt themselves into extra safety, but they will sometimes opt out of it. Here, the value type is the one that has attendant risks -- risk of a bad default value, risk of a bad torn value. We want using `Foo.val` to *feel like* cracking open the shell of a `Foo` object and using its innards directly. But if it's spelled as plain `Foo` it won't "feel like" anything at all. 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object. But it sure looks like that `.ref` is specifically telling it NOT to be -- like it's saying "no, VM, *don't* optimize this to be a value even if you can!" That's of course not what we mean. With the change I'm proposing, `Foo.val` does make sense: it's just saying "hey runtime, while you already *might* have represented this as a value, now I'm demanding that you *definitely* do". That's a normal kind of a thing to do. 3. This change would permit compatible migration of an id-less to primitive class. It's a no-op, and use sites are free to migrate to the value type if and when ready. And if they already expose the type in their API, they are free to weigh the costs/benefits of foisting an incompatible change onto *their* users. They have facilities like method deprecation to do it with. In the current plan, this all seems impossible; you would have to fix all your problematic call sites *atomically* with migrating the class. 4. It's much (much) easier on the mental model because *every (id-less) class works in the exact same way*. Some just *also* give you something extra, that's all. This pulls no rugs out from under anyone, which is very very good. 5. The two kinds of types have always been easily distinguishable to date. The current plan would change that. But they have important differences (nullability vs. the default value chief among them) just as Long and long do, and users will need to distinguish them. For example you can spot the redundant check easily in `Foo.val foo = ...; / requireNonNull(foo);`. 6. It's very nice when the *new syntax* corresponds directly to the *new thing*. That is, until a casual developer *sees* `.val` for the first time, they won't have to worry about it. 7. John seemed to like my last fruit analogy, so consider these two equivalent fruit stand signs: a) "for $1, get one apple OR one orange . . . with every orange purchased you must also take a free apple" b) "apples $1 . . . optional free orange with each purchase" Enough said I think :-) 8. The predefined primitives would need less magic. `int` simply acts like a type alias for `Integer.val`, simple as that. This actually shows that the whole feature will be easier to learn because it works very nearly how people already know primitives to work. Contrast with: we hack it so that what would normally be called `Integer` gets called `int` and what normally gets called `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much stranger. What are the opposing arguments? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Apr 25 12:22:16 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 12:22:16 +0000 Subject: [External] : Re: Objects vs. values, the continuation In-Reply-To: References: Message-ID: <356610B9-C463-405A-B9EF-6D153DD743CF@oracle.com> I think what is missing from our presentation ? and likely key to succeeding ? is how to describe ?compound value? in a way that feels like a thing. Well, a `double` is already a compound value that feels like a thing. Java just hides the internal structure instead of having us access d.exponent directly etc. Is that a useful angle? I'm not sure, but right now I think it is. I wish I could be compelled by that argument (and I tried), but I can?t be. I think if we asked 1M Java developers, pretty much all of them would say something like ?double is a primitive 64 bit value?. Yes, long and double have been allowed to tear forever, but (a) implementations have delivered 64 bit atomicity for almost forever, (b) most users don?t use long or double nearly as often as they use int, and (c) tearing is some weird concurrency black voodoo magic that people don?t want to pay attention to. The upshot is that I suspect that only 0.001% of developers have actually spent any significant amount of time thinking about long and double tearing, let alone encountered it in the wild. I don?t think we get away with ?well, that could have happened with long, too.? It gets worse when compound values ?code like a class?, because they have constructors, and constructors exist to establish invariants. It gets worse when we realize the only lesson of JCiP that most developers have internalized is ?immutable classes are thread-safe.? Having a weird new tearing behavior from immutable classes will be astonishing. (One of the reasons to not allow capture of mutable locals back in the Lambda days was that would subject locals to data races ? invalidating one of the few ?free safe concurrency? guarantees we had.) So I think we have to confront the tearing beast head on. From brian.goetz at oracle.com Mon Apr 25 14:05:21 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 14:05:21 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: Message-ID: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> tl:dr; I find pretty much everything about this compelling. And it comes at a good time, too, because now that we?ve figured out what we can deliver, we can figure out the sensible stacking of the object model. As a refresher, recall that we?ve been loosely organizing classes into buckets: Bucket 1 ? good old identity classes. Bucket 2 ? Identity classes, minus the identity. This has some restrictions (no representational polymorphism, no mutability), but a B2 class is still a reference type. That means it can be null (nullity is a property of references) and comes with all the existing guarantees of initialization safety (no tearing.) This is the obvious migration target for value-based classes, and enables us to migrate things like Optional safely because we can preserve all of the intended semantics, keep the L descriptors, keep the name, handle nulls, etc. (As it turns out, we can get more flattening than you might think out of these, even with nullity, but less than we?d ideally like. I?ll write another mail about performance reality.) Bucket 3 ? here?s where it gets a little fuzzier how we stack it. Bucket 3 drops reference-ness, or more precisely, gives you the option to drop reference-ness. (And it is referenceness that enables nullability, and prevents tearing.) A B3 class has two types, a ?val? and a ?ref? type, which have a relationship to each other that is not-coincidentally similar to int/Integer. I think we are all happy with Bucket 2; it has a single and understandable difference from B1, with clear consequences, it supports migration, has surprisingly good flattening *on the stack*, but doesn?t yet offer all th heap flattening we might want. I have a hard time imagining this part of the design isn?t ?done?, modulo syntax. I think we are all still bargaining with Bucket 3, because there is a certain amount of wanting to have the cake and eat it inherent in ?codes like a class, works like an int.? Who gets ?custody of the good name? is part of it, but for me, the main question is ?how do we let people get more flattening without fooling themselves into thinking that there aren?t additional concurrency risks (tearing).? But, let?s address Kevin?s arguments about who should get custody of the good name. That one class gives rise to two types is already weird, and creates opportunity for people to think that one is the ?real? type and one is the ?hanger on.? Unfortunately, depending on which glasses you are wearing, the relationship inverts. We see this with int and Integer. From a user perspective, int is usually the real type, and Integer is this weird compatibility shim. But when you look at their class literals, for example, Integer.class is a fully functional class literal, with member lookup and operational access, but int.class is the weird compatibility shim. The int.class literal is only useful for reflecting over descriptors with primitive types, but does none of the other things reflection does. This should be a hint that there?s a custody battle brewing. In the future world, which of these declarations do we expect to see? public final class Integer { ? } or public mumble value class int { ? } The tension is apparent here too; I think most Java developers would hope that, were we writing the world from scratch, that we?d declare the latter, and then do something to associate the compatibility shim with the real type. (Whatever we do, we still need an Integer.class on our class path, because existing code will want to load it.) This tension carries over into how we declare Complex; are we declaring the ?box?, or are we declaring the primitive? Let?s state the opposing argument up front, because it was our starting point: having to say ?Complex.val? for 99% of the utterances of Complex would likely be perceived as ?boy those Java guys love their boilerplate? (call this the ?lol java? argument for short.) But, since then, our understanding of how this will all actually work has evolved, so it is appropriate to question whether this argument still holds the weight we thought it did at the outset. > 1. The option with fewer hazards should usually be the default. Users won't opt themselves into extra safety, but they will sometimes opt out of it. Here, the value type is the one that has attendant risks -- risk of a bad default value, risk of a bad torn value. We want using `Foo.val` to *feel like* cracking open the shell of a `Foo` object and using its innards directly. But if it's spelled as plain `Foo` it won't "feel like" anything at all. Let me state it more strongly: unboxed ?primitives? are less safe. Despite all the efforts from the brain trust, the computational physics still points us towards ?the default is zero, even if you don?t like that value? and ?these things can tear under race, even though they resemble immutable objects, which don?t.? The insidious thing about tearing is that it is only exhibited in subtly broken programs. The ?subtly? part is the really bad part. So we have four broad options: - neuter primitives so they are always as safe as we might naively hope, which will result in either less performance or a worse programming model; - keep a strong programming model, but allow users to trade some safety (which non-broken programs won?t suffer for) with an explicit declaration-site and/or use-site opt-in (?.val?) - same, but try to educate users about the risk of tearing under data race (good luck) - decide the tradeoff is impossible, and keep the status quo The previous stake in the ground was #3; you are arguing towards #2. > 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object. But it sure looks like that `.ref` is specifically telling it NOT to be -- like it's saying "no, VM, *don't* optimize this to be a value even if you can!" That's of course not what we mean. With the change I'm proposing, `Foo.val` does make sense: it's just saying "hey runtime, while you already *might* have represented this as a value, now I'm demanding that you *definitely* do". That's a normal kind of a thing to do. A key aspect of this is the bike shed tint; .val is not really the right indicator given that the reference type is also a ?value class?. I think we?re comfortable giving the ?value? name to the whole family of identity-free classes, which means that .val needs a new name. Bonus points if the name connotes ?having burst free of the constraints of reference-hood?: unbound, loose, exploded, compound value, etc. And also is pretty short. > 3. This change would permit compatible migration of an id-less to primitive class. It's a no-op, and use sites are free to migrate to the value type if and when ready. And if they already expose the type in their API, they are free to weigh the costs/benefits of foisting an incompatible change onto *their* users. They have facilities like method deprecation to do it with. In the current plan, this all seems impossible; you would have to fix all your problematic call sites *atomically* with migrating the class. This is one of my favorite aspects of this direction. If you recall, you were skeptical from the outset about migrating classes in place at all; the previous stake in the ground said ?well, they can migrate to value classes, but will never be able to shed their null footprint or get ultimate flattening.? With this, we can migrate easily from VBC to B2 with no change in client code, and then _further_ have a crack at migrating to full flatness inside the implementation capsule. That?s sweet. > 4. It's much (much) easier on the mental model because *every (id-less) class works in the exact same way*. Some just *also* give you something extra, that's all. This pulls no rugs out from under anyone, which is very very good. > > 5. The two kinds of types have always been easily distinguishable to date. The current plan would change that. But they have important differences (nullability vs. the default value chief among them) just as Long and long do, and users will need to distinguish them. For example you can spot the redundant check easily in `Foo.val foo = ...; / requireNonNull(foo);`. It is really nice that *any* unadorned identifier is immediately recognizable as being a reference, with all that entails ? initialization safety and nullity. The ?mental database? burden is lower, because Foo is always a reference, and Foo.whatever is always direct/immediate/flat/whatever. > 6. It's very nice when the *new syntax* corresponds directly to the *new thing*. That is, until a casual developer *sees* `.val` for the first time, they won't have to worry about it. > > 7. John seemed to like my last fruit analogy, so consider these two equivalent fruit stand signs: > > a) "for $1, get one apple OR one orange . . . with every orange purchased you must also take a free apple" > b) "apples $1 . . . optional free orange with each purchase" > > Enough said I think :-) > > 8. The predefined primitives would need less magic. `int` simply acts like a type alias for `Integer.val`, simple as that. This actually shows that the whole feature will be easier to learn because it works very nearly how people already know primitives to work. Contrast with: we hack it so that what would normally be called `Integer` gets called `int` and what normally gets called `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much stranger. One more: the .getClass() anomaly goes away. If we have mumble primitive mumble Complex { ? } Complex.val c = ? then what do we get when we ask c for its getClass? The physics again point us at returning Complex.ref.class, not Complex.val.class, but under the old scheme, where the val projection gets the good name, it would seem anomalous, since we ask a val for its class and get the ref mirror. But under the Kevin interpretation, we can say ?well, the CLASS is Complex, so if you ask getClass(), you get Complex.class.? From brian.goetz at oracle.com Mon Apr 25 14:52:17 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 14:52:17 +0000 Subject: Flattening to date Message-ID: <90BDED4C-6573-47DE-A547-F3C98253B080@oracle.com> Let me give a brief overview of where things are with respect to flattening, since some of this influences the user-model discussion Kevin has initiated.) This is a very rough sketch, and not written for a general audience, so if you?re tempted to post this to Twitter because it seems cool and curiosity-satisfying, while I can?t stop you, you?re probably anti-helping.) Layout is always at the discretion of the JVM; that?s how we like it. There will be no directives for ?forcing? any kind of layout, including flattening. The JVM always has the option of indirecting with a pointer. Currently it always does this for object references, and never does this for primitives. For Bucket 1 classes, we will almost certainly continue to lay out an LBucket1 as a pointer. (Remember that layout of an object with an LFoo field often happens before Foo is loaded; flattening introduces an ordering edge into the class loading graph.) Most people think of flattening as being only flattening of heap layouts, but there is also flattening in the calling convention, and this can be a huge source of benefit. Flattening in the calling convention means that rather than passing an aggregate to or from an out-of-line call via a pointer, we scalarize the value and pass the field values instead. Calling convention is generally determined early in the run, so if we load the class after the calling convention is set, we may miss out on this. For a reference type (e.g., B2 classes, and B3.ref), we are constrained by two properties of reference-ness; the need to represent null, and the JMM constraint that loads and stores of references are atomic with respect to one another. (This is where tear-freedom comes from.) Nullity can be represented as some sort of footprint tax (inject a boolean, or reinterpret slack bits such as low order pointer bits in existing fields.) Tearing is not relevant to stack (calling convention) flattening, so even L types can get flattening on the stack. I?ll pause because this is sort of amazing: an LB2, while a reference type, is, in the current implementation, routinely flattened in calling convention, using an extra synthetic field for null. If you thought references were always indirections, you?ll be surprised. Long chains of things like Optional.map(?).flatMap(?) are routinely allocation-free in C2-compiled code, even for out-of-line calls. (The interpreter and C1 still use indirections on the stack and in locals.) In the heap, this is where reference types (including B2) have some trouble. The atomicity requirement bites hard here. References in the heap are routinely laid out as indirections. Final references to id-free instances _could_ safely be flattened, but they are not yet. Mutable references to id-free instances are problematic because of potential tearing. We *could* (but do not yet, and its complicated) flatten 64 bit values by stuffing multiple 32 bit values into a single synthetic field or by storing/loading multiple fields with a single load (?fat loads?), and on platforms with fast 128 bit atomics (which include some intel cores where the spec was recently revised to commit to atomicity), but the complexity cost here is high, and flattening would be limited by the instruction set. This is under investigation but unlikely to be a magic bullet. In the heap, Q types (B3.val) can be fully flattened (though the VM will likely impose a threshold above which it uses indirections anyway, such as 512 bits.) Full flattening means not only the layout, but that we can access the fields of the nested object with narrow single-field loads and stores. Scorecard: - Identity-free reference (L) types can be flattened, within limits (which is amazing) - Identity-free reference types usually pay some footprint tax for the null channel - Identity-free reference types are routinely flattened on stack, and may get some more heap flattening in the future - Identity-free immediate types have no null channel, and can be fully flattened and accessed with narrow loads and stores, because they?re allowed to tear To the extent we treat B2 and B3.ref the same way (which we want to), any flattening wins for refs (e.g., final fields, fat access) will apply to both. From brian.goetz at oracle.com Mon Apr 25 14:59:14 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 14:59:14 +0000 Subject: Objects vs. values, the continuation In-Reply-To: References: Message-ID: <60CE55F5-57D8-4EB1-A61B-2C646D768521@oracle.com> The fact that these are "small" (at most 64 bits) is incidental, not essential; introducing a new quadruple type would not destabilize our concept of a primitive value. If we can tip the user's mental model so that they believe "small is good" for B3 values, then we aid them in hitting the sweet space of the design and help them avoid tearing issues. It doesn't change the model but the more we can encourage the belief that B3 values should be <= 64it the happier users will be with the results. I think its reasonable to say that ?we can flatten 64 bits better than we can flatten 256, but go ahead and write the code you want, and we?ll do what we can.? Recent data suggests that we can get to 128 more quickly than we had initially expected, and (especially if we can drop the null footprint tax, as B3 does), you can do a lot in 128. Presumably in some future hardware generation this number will go up again, whether that is 5 or 10 years from now, we don?t know right now. The tangible things I think we want permission from the user to do are: - drop identity - drop nullity - drop atomicity (non-tearing) B2, as currently sketched, drops the first; B3.val further drops nullity and atomicity together. Whether this is the right stacking is a good discussion to be having now, but ultimately we need permission for each of these. While ?small? objects may sidestep the atomicity constraint, we?d like this to remain an implementation detail, not an in-your-face aspect of the programming model. From forax at univ-mlv.fr Mon Apr 25 15:17:14 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 17:17:14 +0200 (CEST) Subject: 128bits value type and VarHande.compareAndSet() Was: Objects vs. values, the continuation In-Reply-To: <60CE55F5-57D8-4EB1-A61B-2C646D768521@oracle.com> References: <60CE55F5-57D8-4EB1-A61B-2C646D768521@oracle.com> Message-ID: <354619275.16578069.1650899834396.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Dan Heidinga" > Cc: "Kevin Bourrillion" , "valhalla-spec-experts" > > Sent: Monday, April 25, 2022 4:59:14 PM > Subject: Re: Objects vs. values, the continuation >>> The fact that these are "small" (at most 64 bits) is incidental, not essential; >>> introducing a new quadruple type would not destabilize our concept of a >>> primitive value. >> If we can tip the user's mental model so that they believe "small is >> good" for B3 values, then we aid them in hitting the sweet space of >> the design and help them avoid tearing issues. It doesn't change the >> model but the more we can encourage the belief that B3 values should >> be <= 64it the happier users will be with the results. > I think its reasonable to say that ?we can flatten 64 bits better than we can > flatten 256, but go ahead and write the code you want, and we?ll do what we > can.? Recent data suggests that we can get to 128 more quickly than we had > initially expected, and (especially if we can drop the null footprint tax, as > B3 does), you can do a lot in 128. Presumably in some future hardware > generation this number will go up again, whether that is 5 or 10 years from > now, we don?t know right now. This is tangential but i write it here because i will forget again. There is an issue with representing B2 as a 128 bits value, while Intel and ARM both provides 128 atomic read/write if vectorized registers are used, they do not provide a CAS (or the equivament on ARM) on 128 bits. This is an issue for VarHandle because - one can create a VarHandle on any field (array cell), it does not have to be volatile (there is no way to declare the content of an array volatile) - VarHandle.compareAndSet() has to work. If we keep the exact same semantics for VarHandle, we can not use 128 bits for fields declared as B2 because a VarHandle may be constructed on it later. R?mi From forax at univ-mlv.fr Mon Apr 25 15:54:45 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 25 Apr 2022 17:54:45 +0200 (CEST) Subject: Objects vs. values, the continuation In-Reply-To: References: <662061423.16075805.1650839012022.JavaMail.zimbra@u-pem.fr> Message-ID: <513228774.16596152.1650902085338.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "Remi Forax" > Cc: "valhalla-spec-experts" > Sent: Monday, April 25, 2022 4:17:19 AM > Subject: Re: Objects vs. values, the continuation > On Sun, Apr 24, 2022 at 6:23 PM Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> As we discussed earlier, there are two approaches, one is to say that >> instance of class = object | value >> the other is to say that >> instance of class = object = reference object | immediate object >> I prefer the later to the former, because it does not goes against what people >> already think, > Please let us not mistake this: > BOTH choices involve redefining what a large number of people already think. > That is inescapable and is part of the cost of doing this project. > People think "all objects can be locked". We want to break that. They think > "everything I can access members on is an object". I want to break that. They > think "an object is always accessed via a reference". You want to break that. > Etc. etc. This is just what happens when there's been a missing quadrant for > decades. You have to change some existing assumptions, that's true, but you are proposing a kind of refactoring of the world before adding a new concept. It's easier to introduce a new form of something already existing. >>> One thing I like very much about this is that it fits perfectly with the fact >>> that Integer is a subtype of Object and int is not. >> I've not a clear idea how primitive will be retrofit to be "immediate class" but >> at some point, we will want ArrayList to be valid with E still bounded by >> Object, so the sentence "int is not a subtype of java.lang.Object" may be wrong >> depending how we retrofit existing primitive types. > No, this is my whole point. It's important for them to know that that is NOT > subtyping. (The only kind of subtyping people know and the only kind they > *should* think of is polymorphic subtyping.) They should continue to think of > "A is a subtype of B" as meaning that they can interact with a B that is > actually itself really an A; it's what Integer (but not int) gives you. This > relationship is called something else ("extends", maybe?) and is what array > covariance is also being rebased on. Again, subtyping is already an existing concept, there is no win in trying to scooby-doo it. We are introducing a new way to do subtyping, the subtyping between a java.lang.Object and a QPoint; and it's a form of subtyping porlymophism given that you can call toString/equals and hashCode. R?mi From forax at univ-mlv.fr Mon Apr 25 16:05:13 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 25 Apr 2022 18:05:13 +0200 (CEST) Subject: [External] : Re: Objects vs. values, the continuation In-Reply-To: References: <432082437.16075908.1650839417856.JavaMail.zimbra@u-pem.fr> Message-ID: <1596992756.16601677.1650902713026.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Kevin Bourrillion" , "valhalla-spec-experts" > Sent: Monday, April 25, 2022 1:57:28 AM > Subject: Re: [External] : Re: Objects vs. values, the continuation > I agree totally, the former are semantic properties and the latter is a side > effect of representation. But that doesn?t help us much, because if people > assume that these have the same finial field safety / integrity properties as > reference objects, they will be in for a painful surprise. So this has to be > part of the story. Interfaces in Golang are tearable and nobody care, i never seen somebody introducing interfaces in Go saying that they do not have integrity. It's important when talking about the memory model, but first you have to talk about what is the memory model. I think the fact that you can bypass constructor is a bigger deal that's why i still think that the compiler should add an empty constructor to the primitive class and do not allow the user to override it. R?mi > > Sent from my iPad > >> On Apr 24, 2022, at 6:30 PM, Remi Forax wrote: >> >> >> I think that having a default value / not being null is a property that is > > easier to understand and easier to grasp than the concept of integrity. From kevinb at google.com Mon Apr 25 16:31:52 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 12:31:52 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> Message-ID: On Mon, Apr 25, 2022 at 10:05 AM Brian Goetz wrote: Bucket 2 ? This is the obvious migration target for value-based classes, It also seems useful as the migration stepping-stone for bucket 1 -> 3. Which makes me feel good about the possibility of shipping 2 first. Bucket 3 ? here?s where it gets a little fuzzier how we stack it. Bucket 3 > drops reference-ness, or more precisely, gives you the option to drop > reference-ness. (and notice how much nicer this phrase is than "... drops reference-ness and gives you the option to claw it back") I think we are all happy with Bucket 2; it has a single and understandable > difference from B1, with clear consequences, it supports migration, There is still one major problem which I'll try to take to another thread soon. I think we are all still bargaining with Bucket 3, ... for me, the main > question is ?how do we let people get more flattening without fooling > themselves into thinking that there aren?t additional concurrency risks > (tearing).? > The degree of worry over tearing is something we will have to figure out how to size appropriately. From my/Google's perspective I will continue arguing that we are making too much of it. If I'm learning how all this works, and I read/hear a statement like, "By the way, writing racy code can work out more badly than usual when these things are involved" ... my reaction would be "okay, noted. I'll keep right on trying to never write racy code, and if I'm ever diagnosing a puzzling concurrency error I'll come back and learn what this is all about. Maybe I'll check that my static analysis tool has a 'data race when accessing value class' finding enabled. Okay they're working on it, cool...." And then I'd be fine. My mental model would be fine. (I'm *much* more concerned about the proliferation of 1970-type bugs from people using uninitialized values, or the proliferation of pseudo-nullability patterns required to prevent those bugs. If not for just this alone, I think I'd favor letting every B2 class *automatically* be B3 with no extra permission!) That one class gives rise to two types is already weird, and creates > opportunity for people to think that one is the ?real? type and one is the > ?hanger on.? Unfortunately, depending on which glasses you are wearing, > the relationship inverts. We see this with int and Integer. From a user > perspective, int is usually the real type, and Integer is this weird > compatibility shim. But I think delivering Valhalla means -- perhaps ironically, sure -- that we can and should invert that expectation. If you can always think of the reference type as the real thing, then you're getting the "unification" we promised. Substitute a value type when x-y-or-z. Your static analysis tool will propose refactoring your code to use the value type when x-y (and maybe it misses case z). I think it's only on a surface level that this story makes the value types look "lesser". We'd still move a lot of units because of their genuinely compelling advantages. In the future world, which of these declarations do we expect to see? > > public final class Integer { ? } > > or > > public mumble value class int { ? } > > The tension is apparent here too; I think most Java developers would hope > that, were we writing the world from scratch, that we?d declare the latter, > and then do something to associate the compatibility shim with the real > type. imho, we will "just" have to retrain them on this. And as I'll keep repeating, we can't escape this need for retraining no matter which way we go. I think the story is to a point now where the retraining won't be *nearly* as hard as it once was. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Apr 25 17:19:32 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 17:19:32 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> Message-ID: <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> My understanding was we were going to guide most users towards B2 values and would treat B3 as the rare, "expert" mode, for when density really matters. Does that decrease the education problem? I am not convinced it does, but am open minded to see if there are other things we can do at the declaration site to mitigate. (I also know that this topic is one where we?ve tried to convince ourselves of a lot of things, and in the cold light of morning, had regrets.) The problem is that from the client perspective, Complex is just another class in a library; if there?s anything they have to think about before using it in its most elemental form, they should have a chance to do so. ".prim" anyone? (backs slowly away from the bikeshed) ?Not terrible" 3. This change would permit compatible migration of an id-less to primitive class. It's a no-op, and use sites are free to migrate to the value type if and when ready. And if they already expose the type in their API, they are free to weigh the costs/benefits of foisting an incompatible change onto *their* users. They have facilities like method deprecation to do it with. In the current plan, this all seems impossible; you would have to fix all your problematic call sites *atomically* with migrating the class. This is one of my favorite aspects of this direction. If you recall, you were skeptical from the outset about migrating classes in place at all; the previous stake in the ground said ?well, they can migrate to value classes, but will never be able to shed their null footprint or get ultimate flattening.? With this, we can migrate easily from VBC to B2 with no change in client code, and then _further_ have a crack at migrating to full flatness inside the implementation capsule. That?s sweet. Changing from a B2 -> B3 changes the default spelling from "L" -> "Q". Why does this have to be done atomically? Existing descriptors - spelled with "L" - would still work. Code that's recompiled would pick up the Q descriptors. If the author wants Qs, and gets them either "for free" or by adding ".val", there's the same compatibility concerns.... they have to take explicit action to get what they want and to keep descriptors working. What I?m thinking here about migration is that existing APIs can say ?Optional? but field declarations can say Optional.val, getting additional footprint / flattening benefits, without perturbing the APIs (and with cheap conversions.) 6. It's very nice when the *new syntax* corresponds directly to the *new thing*. That is, until a casual developer *sees* `.val` for the first time, they won't have to worry about it. That's nice initially but a few releases after B3 values are available will we still want the syntax to highlight (scream?) "new thing?? Yes, that?s the risk. (Still, primitives today LOOK DIFFERENT from class names.) From kevinb at google.com Mon Apr 25 17:49:57 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 13:49:57 -0400 Subject: Objects vs. values, the continuation In-Reply-To: References: Message-ID: Thanks Dan! On Mon, Apr 25, 2022 at 10:47 AM Dan Heidinga wrote: And I'm unclear on why the ephemeral information presentation is > prefered to the Platonic meaning? > Indeed I've gotten this feedback before too. My response was that it's not so much that one is "preferred", as that there just isn't much to say about the other. 42 as a Platonic concept just sort of ... is? But as a piece of *information* we can trace its flow throughout the code. There is always some source speaking it and some sink hearing it (modulo StatementExpressions). Frankly I don't know the terminology that people who do data flow analysis would use here, but toward my goal of having a clear picture of "what's happening inside my program" (that's at least a level higher than things like registers), I've found this mental picture helpful. *What* is it that gets lost by a StatementExpression, or by a second write to a variable with no read in between? We came to see those as cases of "value not used" in a very analogous sense to our long-existing checks for "variable not used". Or in nullness analysis we have been looking at "where can a null value flow in, and where can it flow out". Each "flowing" or whatever that happened is of interest. You could probably say that the thing I'm really describing is a single "transmission" of a value and not a value itself. I do I agree that there is *something* a little off about my presentation here. I'm slightly concerned about the presentation of "small" as being > incidental. While size isn't a critical factor from the programming > model perspective, it is incredibly important for aligning with the > natural physics of the hardware. > > The fact that these are "small" (at most 64 bits) is incidental, not > essential; introducing a new quadruple type would not destabilize our > concept of a primitive value. > Good point, thanks. Obviously I'm only attempting to pave the way for seeing that compound values are still values, but I certainly don't mean to imply we'd ever want a value that's 1000 bytes long. Upon looking at it again, this text just seemed unnecessary and so I've nuked it. > On Fri, Apr 22, 2022 at 6:38 PM Kevin Bourrillion > wrote: > > > > I'd like to remind everyone about this (self-important-sounding) > document I shared some months ago: Data in Java programs: a basic > conceptual model > > > > I may have undersold it a bit last time. True, it's not the final word > on the only possible conceptual model anyone could ever form; however, it > is at least a very extensively thought-out and reviewed and self-consistent > one. I've also revised it a lot since you saw it (and it's still open for > review). If nothing else, at least when I make arguments on this list you > don't have to wonder what they are based on; I've laid it all out in black > and white. And on that subject... > > > > The crux of that doc for Valhalla purposes is its clear separation > between objects and values as wholly disjoint concepts. > > > > An object: has its own independent existence; is self-describing, thus > can be polymorphic; is always indirected / accessed via reference; is > eligible to have identity. > > > > A value: has no independent existence; is container-described, thus is > strictly monomorphic; is always used directly / inline; cannot have > identity. (Yes, I'm glossing over that references are also values, here.) > > > > What unifies objects and values (setting aside references) is that they > are all instances. > > > > (First, to parrot Brian of a while ago: for a primitive type, the values > are the instances of the type; for a reference type, the values are > references to the instances of the type, those instances being objects.) > > > > Some instances are of a type that came from a class, so they get to have > members. Some instances of are of a type that never used to have a class, > but will now (int, etc.) -- yay. And some are of array types, which act > like halfway-fake classes with a few halfway-fake members. Members for > everybody, more or less! > > > > Though we have at times said "the goal of Valhalla is to make everything > an object", I claim the unification we really want is for everything to be > a class instance. I think that gives us enough common ground to focus on > when we don't otherwise know which thing the thing is (e.g. with a type > variable). > > > > One thing I like very much about this is that it fits perfectly with the > fact that Integer is a subtype of Object and int is not. > > > > The way I think bucket 2 can and should be explained is: "in the > programming model it absolutely is an object. In the performance model, the > VM can replace it undetectably with a (compound) value. But that is behind > the scenes; it's still behaviorally an object and don't feel bad about > calling it an object." To me this is very simple and sensible. > > > > If we instead want to say "the int value 5 is an object now too", then > we have some problems: > > > > * I think it ruins those clean explanations just given > > * we'd need to coin some new term to mean exactly what I've just > proposed that "object" mean, and I have no idea what that would be (do you?) > > > > What are the latest thoughts on this? > > > > -- > > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Mon Apr 25 17:58:51 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 13:58:51 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> Message-ID: On Mon, Apr 25, 2022 at 1:19 PM Brian Goetz wrote: > Changing from a B2 -> B3 changes the default spelling from "L" -> "Q". > Why does this have to be done atomically? > > (First note that I'm thinking in terms of source compatibility, if that makes a difference.) I mostly just mean that anyone anywhere might be counting on the type being nullable. Is it possible to fix all that code ahead of time to be nullness-agnostic, i.e. the very same code works whether the type is nullable or not? I feel like it would be hard to get *everything* that way. > 6. It's very nice when the *new syntax* corresponds directly to the *new > thing*. That is, until a casual developer *sees* `.val` for the first time, > they won't have to worry about it. > > That's nice initially but a few releases after B3 values are available > will we still want the syntax to highlight (scream?) "new thing?? > > Yes, that?s the risk. (Still, primitives today LOOK DIFFERENT from class > names.) > Indeed I could have phrased it that way: it's nice when the thing that IS different is the thing that LOOKS different. It's not the language wearing its history on its sleeve. It's a good outcome in perpetuity imho. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Apr 25 18:26:02 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 18:26:02 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> Message-ID: <9D2BCF50-DE06-4C71-9700-2C5FD8AAB4F7@oracle.com> What I?m thinking here about migration is that existing APIs can say ?Optional? but field declarations can say Optional.val, getting additional footprint / flattening benefits, without perturbing the APIs (and with cheap conversions.) Aren't most of the migration cases (at least for existing VBC) targeting B2? They need to keep the reference semantics due to existing code using null and will still get optimized inside a compiled body. Sort of. For existing uses, we?re stuck with compatibility with the L protocol, certainly. But consider this example: class Foo { private Optional f; public Optional f() { return f; } } The API point has to stay LOptional, but the field could migrate further to QOptional, and there?s definitely value in that. With the current stake in the ground, we have no way to get there, but with Kevin?s proposal, we have the option to go further. I'm a little concerned we're starting to undo the separation between B2 & B3 (just add .val to any B2 class) and will drag ourselves back into the quagmire. B2 and B3 are different points on the spectrum and we should respect the user's intention when they pick one of those points. But they?re coupled; a B3.ref is a B2. Yes, that?s the risk. (Still, primitives today LOOK DIFFERENT from class names.) So we should guide primitive B3 classes to use lower case names? "complex" rather than "Complex"? This started tongue in cheek but it's kind of growing on me as a convention. It matches the existing primitives, makes it clear at glance (no need to carry a type dictionary in your head), and works fine with the ".ref" escape hatch. Give it time, you?ll cycle back :) From kevinb at google.com Mon Apr 25 18:58:31 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 14:58:31 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> Message-ID: On Mon, Apr 25, 2022 at 1:53 PM Dan Heidinga wrote: > Users who have already opted into using a B3 will be annoyed that they > have to use the bad name to get what they already said they wanted. > .... > Users who want the default for B3 to be a reference should > probably have picked a B2 already. I strongly disagree with this. Picking B2 is about *taking the choice away* from your use-sites. By picking B3 they are only saying that the value type should exist at all. It can't say both that *and* which one is the better default at the same time. So we should guide primitive B3 classes to use lower case names? > "complex" rather than "Complex"? This started tongue in cheek but > it's kind of growing on me as a convention. It matches the existing > primitives, makes it clear at glance (no need to carry a type > dictionary in your head), and works fine with the ".ref" escape hatch. > It doesn't match the existing primitives if you have to write `complex.ref`. I think there's a better way to get this :-) _value class Complex {} // generates types Complex and Complex.val _typealias complex = Complex.val This would be the way to make it look and work just like int/Integer. I can at least _imagine_ this being a recommended convention. (But we have to have a separate conversation on how useful typealiases could be if done right!) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Mon Apr 25 19:00:23 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 15:00:23 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> Message-ID: On Mon, Apr 25, 2022 at 2:47 PM Dan Heidinga wrote: As I said above, migration is one use case. It's also only one of the 8 reasons I wrote up. 9 with Brian's. Do we want to make this worse to read for the migration use case? > Yes, and for 8 other reasons! -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Mon Apr 25 19:40:32 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 21:40:32 +0200 (CEST) Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <9D2BCF50-DE06-4C71-9700-2C5FD8AAB4F7@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> <9D2BCF50-DE06-4C71-9700-2C5FD8AAB4F7@oracle.com> Message-ID: <301627367.16660299.1650915632653.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Dan Heidinga" > Cc: "Kevin Bourrillion" , "valhalla-spec-experts" > > Sent: Monday, April 25, 2022 8:26:02 PM > Subject: Re: [External] Foo / Foo.ref is a backward default; should be Foo.val / > Foo >>> What I?m thinking here about migration is that existing APIs can say ?Optional? >>> but field declarations can say Optional.val, getting additional footprint / >>> flattening benefits, without perturbing the APIs (and with cheap conversions.) >> Aren't most of the migration cases (at least for existing VBC) >> targeting B2? They need to keep the reference semantics due to >> existing code using null and will still get optimized inside a >> compiled body. > Sort of. For existing uses, we?re stuck with compatibility with the L protocol, > certainly. But consider this example: > class Foo { > private Optional f; > public Optional f() { return f; } > } > The API point has to stay LOptional, but the field could migrate further to > QOptional, and there?s definitely value in that. With the current stake in the > ground, we have no way to get there, but with Kevin?s proposal, we have the > option to go further. This seems very specific to Optional, for Optional storing null is always a mistake, but that's not true for other VBC, by example a deadline can be typed as an Instant with null meaning no deadline. R?mi From brian.goetz at oracle.com Mon Apr 25 19:54:26 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 19:54:26 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <301627367.16660299.1650915632653.JavaMail.zimbra@u-pem.fr> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> <9D2BCF50-DE06-4C71-9700-2C5FD8AAB4F7@oracle.com> <301627367.16660299.1650915632653.JavaMail.zimbra@u-pem.fr> Message-ID: <564971DD-9772-4D06-BF1D-514E6112230F@oracle.com> This seems very specific to Optional, for Optional storing null is always a mistake, but that's not true for other VBC, by example a deadline can be typed as an Instant with null meaning no deadline. No, it is not specific to Optional at all. Many domains exclude null on an ad-hoc basis. It is about giving the user the choice to reserve space in the heap for the null or not. If other logic has already excluded null (which is common), they can use an Instant.val and get better footprint. If they like the semantics of having null, they can use Instant. From brian.goetz at oracle.com Mon Apr 25 19:56:51 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 19:56:51 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> Message-ID: <6A19015A-3A2A-4C17-AF6B-CA8C6B858FC7@oracle.com> I think this is one of the areas where opinions are going to differ, because there is not necessarily a unitary notion of ?the user?. In a small program were one person wrote all the code, I agree that minimizing intrusion will make that person happy. But Java?s strength is that it makes good *libraries* easy to write, and surely Valhalla will enable new numerical libraries. In which case there are TWO users, the one who wrote the library, and the one using it, and they are communicating through a thin pipe. This is where intent can be lost, and being a ?libraries guy?, I think this is where Kevin is coming from. On Apr 25, 2022, at 1:53 PM, Dan Heidinga > wrote: If a user has picked B2 From forax at univ-mlv.fr Mon Apr 25 20:13:47 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 22:13:47 +0200 (CEST) Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: Message-ID: <455351122.16665063.1650917627546.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "valhalla-spec-experts" > Sent: Monday, April 25, 2022 4:52:50 AM > Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo > Hi, > The current plan for `primitive class Foo` -- to call the value type `Foo` and > the reference type `Foo.ref` -- is causing a few problems that I think are > unnecessary. I've felt for a while now that we are favoring the wrong default. > We should let `Foo` be the reference type and require `Foo.val` (precise syntax > aside) for the value type. > I started to list reasons and came up with more than expected. If ref is the default for B3 then B3 is a worst B2, it's like saying let's transform all long to Long. > 1. The option with fewer hazards should usually be the default. Users won't opt > themselves into extra safety, but they will sometimes opt out of it. Here, the > value type is the one that has attendant risks -- risk of a bad default value, > risk of a bad torn value. We want using `Foo.val` to *feel like* cracking open > the shell of a `Foo` object and using its innards directly. But if it's spelled > as plain `Foo` it won't "feel like" anything at all. Users should use B2 by default, you did agree about that. if users want B3 we should give them B3, asking for B3.val is a kind of double opt-in. > 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object. But > it sure looks like that `.ref` is specifically telling it NOT to be -- like > it's saying "no, VM, *don't* optimize this to be a value even if you can!" > That's of course not what we mean. With the change I'm proposing, `Foo.val` > does make sense: it's just saying "hey runtime, while you already *might* have > represented this as a value, now I'm demanding that you *definitely* do". > That's a normal kind of a thing to do. .ref should be rare in the end, it's mostly a stopgap measure because we do not have universal generics. Once we have universal generics, Foo.val make even less sense. > 3. This change would permit compatible migration of an id-less to primitive > class. It's a no-op, and use sites are free to migrate to the value type if and > when ready. And if they already expose the type in their API, they are free to > weigh the costs/benefits of foisting an incompatible change onto *their* users. > They have facilities like method deprecation to do it with. In the current > plan, this all seems impossible; you would have to fix all your problematic > call sites *atomically* with migrating the class. B3 requires that the default value that makes sense and that bypassing the constructor is fine (because you can construct any values by "merging" existing values). Maybe we should disallow users to even write constructors to avoid to get them false hope. Anyway, those constraints mean that you will not be able to refactor most of the existing classes to a primitive classes because you are loosing encapsulation by doing that. > 4. It's much (much) easier on the mental model because *every (id-less) class > works in the exact same way*. Some just *also* give you something extra, that's > all. This pulls no rugs out from under anyone, which is very very good. No , B2 and B3 are different runtime models, even if B3 is ref by default. The idea of B3 being ref by default is infact dangerous exactly for the reason you explain, it looks like the two models are the same. The problem is that they are not. > 5. The two kinds of types have always been easily distinguishable to date. The > current plan would change that. But they have important differences > (nullability vs. the default value chief among them) just as Long and long do, > and users will need to distinguish them. For example you can spot the redundant > check easily in `Foo.val foo = ...; / requireNonNull(foo);`. You want a use site way to see if a type is a B3 as opposed to B1 and B2 that are both nullable. It's something that can be discussed separately. > 6. It's very nice when the *new syntax* corresponds directly to the *new thing*. > That is, until a casual developer *sees* `.val` for the first time, they won't > have to worry about it. But it's not true, compare Complex.val c = new Complex(1, 2); and Complex c = new Complex(1, 2); > 7. John seemed to like my last fruit analogy, so consider these two equivalent > fruit stand signs: > a) "for $1, get one apple OR one orange . . . with every orange purchased you > must also take a free apple" > b) "apples $1 . . . optional free orange with each purchase" > Enough said I think :-) > 8. The predefined primitives would need less magic. `int` simply acts like a > type alias for `Integer.val`, simple as that. This actually shows that the > whole feature will be easier to learn because it works very nearly how people > already know primitives to work. Contrast with: we hack it so that what would > normally be called `Integer` gets called `int` and what normally gets called > `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much > stranger. int can not be an alias of Integer.val, it's a little more complex than that, by example int.class.getName().equals("int") but at the same time, we want ArrayList to be ArrayList. > What are the opposing arguments? The runtime model of B2 and B3 are not the same. Defaulting B3 to B3.ref make things dangerous because users will have trouble to see how B2 and B3 are different at runtime. R?mi From brian.goetz at oracle.com Mon Apr 25 20:31:07 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Apr 2022 20:31:07 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <455351122.16665063.1650917627546.JavaMail.zimbra@u-pem.fr> References: <455351122.16665063.1650917627546.JavaMail.zimbra@u-pem.fr> Message-ID: I think this is getting out of hand. Kevin presented a carefully thought out argument ? which I am sure he spent many hours on before sending ? about why we might have (yet again) gotten the defaults wrong. I think we owe it to Kevin (and all the Java developers) to think as carefully about his observations as he did before he sent them; blasting back with knee-jerk ?but this is the answer? is not helpful. We?re not finishing designing the user-facing part of this, and we dare not close our minds to the points Kevin has raised, whether or not the eventual answer goes that way or not. On Apr 25, 2022, at 4:13 PM, Remi Forax > wrote: ________________________________ From: "Kevin Bourrillion" > To: "valhalla-spec-experts" > Sent: Monday, April 25, 2022 4:52:50 AM Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo Hi, The current plan for `primitive class Foo` -- to call the value type `Foo` and the reference type `Foo.ref` -- is causing a few problems that I think are unnecessary. I've felt for a while now that we are favoring the wrong default. We should let `Foo` be the reference type and require `Foo.val` (precise syntax aside) for the value type. I started to list reasons and came up with more than expected. If ref is the default for B3 then B3 is a worst B2, it's like saying let's transform all long to Long. 1. The option with fewer hazards should usually be the default. Users won't opt themselves into extra safety, but they will sometimes opt out of it. Here, the value type is the one that has attendant risks -- risk of a bad default value, risk of a bad torn value. We want using `Foo.val` to *feel like* cracking open the shell of a `Foo` object and using its innards directly. But if it's spelled as plain `Foo` it won't "feel like" anything at all. Users should use B2 by default, you did agree about that. if users want B3 we should give them B3, asking for B3.val is a kind of double opt-in. 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object. But it sure looks like that `.ref` is specifically telling it NOT to be -- like it's saying "no, VM, *don't* optimize this to be a value even if you can!" That's of course not what we mean. With the change I'm proposing, `Foo.val` does make sense: it's just saying "hey runtime, while you already *might* have represented this as a value, now I'm demanding that you *definitely* do". That's a normal kind of a thing to do. .ref should be rare in the end, it's mostly a stopgap measure because we do not have universal generics. Once we have universal generics, Foo.val make even less sense. 3. This change would permit compatible migration of an id-less to primitive class. It's a no-op, and use sites are free to migrate to the value type if and when ready. And if they already expose the type in their API, they are free to weigh the costs/benefits of foisting an incompatible change onto *their* users. They have facilities like method deprecation to do it with. In the current plan, this all seems impossible; you would have to fix all your problematic call sites *atomically* with migrating the class. B3 requires that the default value that makes sense and that bypassing the constructor is fine (because you can construct any values by "merging" existing values). Maybe we should disallow users to even write constructors to avoid to get them false hope. Anyway, those constraints mean that you will not be able to refactor most of the existing classes to a primitive classes because you are loosing encapsulation by doing that. 4. It's much (much) easier on the mental model because *every (id-less) class works in the exact same way*. Some just *also* give you something extra, that's all. This pulls no rugs out from under anyone, which is very very good. No , B2 and B3 are different runtime models, even if B3 is ref by default. The idea of B3 being ref by default is infact dangerous exactly for the reason you explain, it looks like the two models are the same. The problem is that they are not. 5. The two kinds of types have always been easily distinguishable to date. The current plan would change that. But they have important differences (nullability vs. the default value chief among them) just as Long and long do, and users will need to distinguish them. For example you can spot the redundant check easily in `Foo.val foo = ...; / requireNonNull(foo);`. You want a use site way to see if a type is a B3 as opposed to B1 and B2 that are both nullable. It's something that can be discussed separately. 6. It's very nice when the *new syntax* corresponds directly to the *new thing*. That is, until a casual developer *sees* `.val` for the first time, they won't have to worry about it. But it's not true, compare Complex.val c = new Complex(1, 2); and Complex c = new Complex(1, 2); 7. John seemed to like my last fruit analogy, so consider these two equivalent fruit stand signs: a) "for $1, get one apple OR one orange . . . with every orange purchased you must also take a free apple" b) "apples $1 . . . optional free orange with each purchase" Enough said I think :-) 8. The predefined primitives would need less magic. `int` simply acts like a type alias for `Integer.val`, simple as that. This actually shows that the whole feature will be easier to learn because it works very nearly how people already know primitives to work. Contrast with: we hack it so that what would normally be called `Integer` gets called `int` and what normally gets called `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much stranger. int can not be an alias of Integer.val, it's a little more complex than that, by example int.class.getName().equals("int") but at the same time, we want ArrayList to be ArrayList. What are the opposing arguments? The runtime model of B2 and B3 are not the same. Defaulting B3 to B3.ref make things dangerous because users will have trouble to see how B2 and B3 are different at runtime. R?mi From forax at univ-mlv.fr Mon Apr 25 20:36:45 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 25 Apr 2022 22:36:45 +0200 (CEST) Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <564971DD-9772-4D06-BF1D-514E6112230F@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3DB1B18D-96C3-42C5-9ECA-9FB496F21ED8@oracle.com> <9D2BCF50-DE06-4C71-9700-2C5FD8AAB4F7@oracle.com> <301627367.16660299.1650915632653.JavaMail.zimbra@u-pem.fr> <564971DD-9772-4D06-BF1D-514E6112230F@oracle.com> Message-ID: <1208897719.16669075.1650919005990.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Dan Heidinga" , "Kevin Bourrillion" > , "valhalla-spec-experts" > > Sent: Monday, April 25, 2022 9:54:26 PM > Subject: Re: [External] Foo / Foo.ref is a backward default; should be Foo.val / > Foo >> This seems very specific to Optional, for Optional storing null is always a >> mistake, but that's not true for other VBC, by example a deadline can be typed >> as an Instant with null meaning no deadline. > No, it is not specific to Optional at all. Many domains exclude null on an > ad-hoc basis. > It is about giving the user the choice to reserve space in the heap for the null > or not. If other logic has already excluded null (which is common), they can > use an Instant.val and get better footprint. If they like the semantics of > having null, they can use Instant. As the maintainer of Instant, you do not want people to be able to misuse your API, if you declare Instant as a B3, then they will see the tearing, so you will declare it as a B2 so the whole point of having B3 being a ref by default is moot. Optional is special not only because null make no sense but also because it's size is small enough to be loaded/stored in one (micro-)instruction. R?mi From kevinb at google.com Mon Apr 25 20:56:23 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 16:56:23 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <455351122.16665063.1650917627546.JavaMail.zimbra@u-pem.fr> References: <455351122.16665063.1650917627546.JavaMail.zimbra@u-pem.fr> Message-ID: On Mon, Apr 25, 2022 at 3:56 PM Brian Goetz wrote: > there are TWO users, the one who wrote the library, and the one using it Yes, precisely. In the general case, these are distinct parties. In the specific case, they'll sometimes be the same. The owner of a non-identity class only gets to choose whether the value type (if B3) exists at all or (if B2) does not. The use-site then either (if B3) has a choice of which type to use, or (if B2) has no choice. Many of the responses seem to just be overlooking or misunderstanding this distinction. I'm sorry if I wasn't being clear. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Mon Apr 25 21:08:51 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Apr 2022 23:08:51 +0200 (CEST) Subject: B3 ref model Message-ID: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> Ok, maybe i've not understood correctly how B3 model works, for me being a B3 is a runtime property, not a type property. By example, if there is an Object but the VM knows the only possible type is a B3 and the value is not null then the VM is free to emit several stores, because it's a B3, so tearing can occur. Said differently, B3 allows tearing, so B3.val and B3.ref allow tearing. If i do not want tearing, then B3 has to be stored in a field volatile or i have to declare the class as a B2. Did i get it right ? R?mi From daniel.smith at oracle.com Mon Apr 25 23:52:44 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 25 Apr 2022 23:52:44 +0000 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> Message-ID: <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> On Apr 25, 2022, at 8:05 AM, Brian Goetz > wrote: Let?s state the opposing argument up front, because it was our starting point: having to say ?Complex.val? for 99% of the utterances of Complex would likely be perceived as ?boy those Java guys love their boilerplate? (call this the ?lol java? argument for short.) But, since then, our understanding of how this will all actually work has evolved, so it is appropriate to question whether this argument still holds the weight we thought it did at the outset. Yeah, I think this has to be the starting place, before we get into whatever other model simplifications, compatible migrations, etc., might be gained. The expectation for any two-types-for-one-name approach should be, I think, that almost all types referencing the class should use the simple name. The non-default form is for special cases only. So if we're considering an approach in which the reference type is used almost all the time, we need to establish that doing so will not be considered a "bad practice" for performance reasons. Specifically: - Are we confident that flattened L types on the stack have negligible costs compared to Q types? (E.g., is there no significant cost to using extra registers to track and check nulls?) - Are we confident that we can achieve atomic, flattened L types on the heap for common cases? - Are we confident that the performance cliff required to guarantee atomicity for heap-flattened L types is acceptable in general programming settings? - Are we also confident that the extra null-tracking overhead of flattened L types on the heap is acceptable in most cases, and only needs to be compressed out by performance-tuning experts? If the answer to all of those is "yes", *then* I think there's an argument that the model simplifications, etc., could be worth asking performance-crucial code to sprinkle in some '.val' types. But I'm sure we're not ready to say "yes" to all those yet... A good test for me is this: if we asked everybody to stop saying 'int' all the time, and prefer 'Integer' instead except in performance-critical code, could we effectively convince them to set aside their misgivings about performance and trust the JVM to be reasonably efficient? From daniel.smith at oracle.com Tue Apr 26 00:15:17 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Apr 2022 00:15:17 +0000 Subject: B3 ref model In-Reply-To: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> References: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> Message-ID: > On Apr 25, 2022, at 3:08 PM, Remi Forax wrote: > > Ok, maybe i've not understood correctly how B3 model works, > for me being a B3 is a runtime property, not a type property. > > By example, if there is an Object but the VM knows the only possible type is a B3 and the value is not null then the VM is free to emit several stores, because it's a B3, so tearing can occur. > > Said differently, B3 allows tearing, so B3.val and B3.ref allow tearing. > > If i do not want tearing, then B3 has to be stored in a field volatile or i have to declare the class as a B2. > > Did i get it right ? The model we've designed is that B3 instances can be represented as *objects* or *primitive values*. Objects enforce atomicity as part of their encapsulation behavior; primitive values do not. Whether something is an object or not is a property of types?ref and val at the language level, L and Q at the JVM level. From kevinb at google.com Tue Apr 26 02:20:37 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 22:20:37 -0400 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> Message-ID: On Mon, Apr 25, 2022 at 7:52 PM Dan Smith wrote: Yeah, I think this has to be the starting place, before we get into > whatever other model simplifications, compatible migrations, etc., might be > gained. > > The expectation for any two-types-for-one-name approach should be, I > think, that almost all types referencing the class should use the simple > name. The non-default form is for special cases only. > Whose expectation is that -- do you mean it will be what users expect? Because they might, but that's not the same as good design. The primary consideration in choosing which behavior should be the default behavior is the problem of false acceptance. i.e., If a user who did not really want this default behavior sleepwalked into it anyway, * When does that problem get discovered? Sooner, or later? Perhaps _too_ late? * Who discovers it? The same party, or someone else? * How much damage has been caused in the interim? Is any of it irreversible? These usually point pretty clearly to one of the options. But if not, the second consideration is readability: which option benefits the reader more from being called out _explicitly_? Only after that I'd move on to "which option do more users want more of the time". Sorry, that was longwinded (and possibly misplaced). So if we're considering an approach in which the reference type is used > almost all the time, we need to establish that doing so will not be > considered a "bad practice" for performance reasons. Specifically: > I don't see why this is. If there's bad performance, the users have the freedom to help themselves to the better performance any time they want to, for the minor cost of a little "sprinkling". That sounds like Valhalla success to me. Isn't it? A good test for me is this: if we asked everybody to stop saying 'int' all > the time, and prefer 'Integer' instead except in performance-critical code, > could we effectively convince them to set aside their misgivings about > performance and trust the JVM to be reasonably efficient? > Well, the thing forcing our hand in our case is the need to work within the limitations of a language with 28 years of expectations already rooted in brains. That is what says we need Foo/Foo.val. So your thought experiment is sort of a false equivalence; the same argument for Foo/Foo.val also says "but keep using int, not Integer". -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Tue Apr 26 03:12:54 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Apr 2022 23:12:54 -0400 Subject: We need help to migrate from bucket 1 to 2; and, the == problem Message-ID: So I want to make my class identityless. But -- whoops! -- I released it years ago and it has lots of usages. And though I've labeled it as "value-based", any number of these callers are depending on its identity in some way or other. I'd like to put -- let's say an annotation on my class, like `@_FutureNonIdentityClass` or whatever, with the following effects: * I get a warning if I'm breaking any of the rules of identityless classes, like if I have a non-final field. * Use sites get a warning if they do _anything_ identity-dependent with it (==, identity hc, synchronization, ...?) This would leave me in a good position to add the real identity-forsaking keyword later (at which time the annotation becomes redundant and should cause a warning until it's removed). We can address all this in Error Prone, but I'm not sure it should be left to that, partly because a bunch of JDK value-based types need this same treatment themselves (apparently only the synchronization warning has been rolled out so far?). Could we get this supported in javac itself? The best thing would be to roll it out in an even earlier release than bucket 2 types themselves... the sooner the better (maybe we could help?). I think the annotation could be relegated to some one-off module so it doesn't pollute the beautiful jdk.base forever. ~~~ One of the things this means is that people should stop using `==` on these value-based classes. And that is really really good, because what we are planning to do to `==` is... really really bad. Don't misread me: if compatibility is sacrosanct then it is probably the least-bad thing we can do! But honestly, it's bad, because it's not a behavior that anyone ever *actually wants* -- unless they just happen to have no fields of reference types at all. But the fact that it does work in that case just makes the whole thing worse, because code like that will be a ticking time bomb waiting to do the wrong thing as soon as one reference-type field is added at any nested level below that point. What if we give users support for their migration path, so there *are no* usages of `==` that need to remain compatible for these types? Then we could make `==` *not do anything* at all for bucket-2 classes. This approach could save us from a lot of pain (longstanding pain and new pain) for int and Integer and friends too. I think Java's historical priority of "compatibility at all costs" has been something of an illusion; it still leaves us high and dry when *we* want to adopt new features as *we* end up having to make incompatible changes to do it. But if we always gave proper support to users' migration scenarios then we wouldn't always *need* the absolute compatibility at the language level. ~~~ (Just to try to move the old overton window, what I really think we should do is go further and deprecate `==` entirely, introducing `System.identityEquals(a, b)` (or maybe `===`) which would only work for identity types. Then in time `==` could be reintroduced as a synonym for `Object.equals()` and everyone would be happy and write shiny bug-free programs.... I know this would be a large deal. Sometime I will have to write at length about just how bad the problem of identity-equality overuse/abuse has been.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Tue Apr 26 08:20:51 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 26 Apr 2022 10:20:51 +0200 (CEST) Subject: B3 ref model In-Reply-To: References: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> Message-ID: <949882043.16817087.1650961251284.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "daniel smith" > To: "Remi Forax" > Cc: "valhalla-spec-experts" > Sent: Tuesday, April 26, 2022 2:15:17 AM > Subject: Re: B3 ref model >> On Apr 25, 2022, at 3:08 PM, Remi Forax wrote: >> >> Ok, maybe i've not understood correctly how B3 model works, >> for me being a B3 is a runtime property, not a type property. >> >> By example, if there is an Object but the VM knows the only possible type is a >> B3 and the value is not null then the VM is free to emit several stores, >> because it's a B3, so tearing can occur. >> >> Said differently, B3 allows tearing, so B3.val and B3.ref allow tearing. >> >> If i do not want tearing, then B3 has to be stored in a field volatile or i have >> to declare the class as a B2. >> >> Did i get it right ? > > The model we've designed is that B3 instances can be represented as *objects* or > *primitive values*. Objects enforce atomicity as part of their encapsulation > behavior; primitive values do not. Whether something is an object or not is a > property of types?ref and val at the language level, L and Q at the JVM level. so if we have primitive class Prim { long value; } class Container { LPrim; prim; } and Prim has been loaded before Container is seen by the VM, the VM can not decide to flatten LPrim; to a long + a bit for nullability because the VM has to ensure atomicity even if the user has declared Prim as a primitive class. I really dislike this semantics, for me, we are re-creating the kind of troubles we have with escape analysis but for field storage. And how the Preload attribute is supposed to work in that context, given that it declares that a L-type is in fact a Q-type ? R?mi From brian.goetz at oracle.com Tue Apr 26 14:09:04 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 26 Apr 2022 14:09:04 +0000 Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: References: Message-ID: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> How much of this is already covered by https://openjdk.java.net/jeps/390 ? On Apr 25, 2022, at 11:12 PM, Kevin Bourrillion > wrote: So I want to make my class identityless. But -- whoops! -- I released it years ago and it has lots of usages. And though I've labeled it as "value-based", any number of these callers are depending on its identity in some way or other. I'd like to put -- let's say an annotation on my class, like `@_FutureNonIdentityClass` or whatever, with the following effects: * I get a warning if I'm breaking any of the rules of identityless classes, like if I have a non-final field. * Use sites get a warning if they do _anything_ identity-dependent with it (==, identity hc, synchronization, ...?) This would leave me in a good position to add the real identity-forsaking keyword later (at which time the annotation becomes redundant and should cause a warning until it's removed). We can address all this in Error Prone, but I'm not sure it should be left to that, partly because a bunch of JDK value-based types need this same treatment themselves (apparently only the synchronization warning has been rolled out so far?). Could we get this supported in javac itself? The best thing would be to roll it out in an even earlier release than bucket 2 types themselves... the sooner the better (maybe we could help?). I think the annotation could be relegated to some one-off module so it doesn't pollute the beautiful jdk.base forever. ~~~ One of the things this means is that people should stop using `==` on these value-based classes. And that is really really good, because what we are planning to do to `==` is... really really bad. Don't misread me: if compatibility is sacrosanct then it is probably the least-bad thing we can do! But honestly, it's bad, because it's not a behavior that anyone ever *actually wants* -- unless they just happen to have no fields of reference types at all. But the fact that it does work in that case just makes the whole thing worse, because code like that will be a ticking time bomb waiting to do the wrong thing as soon as one reference-type field is added at any nested level below that point. What if we give users support for their migration path, so there *are no* usages of `==` that need to remain compatible for these types? Then we could make `==` *not do anything* at all for bucket-2 classes. This approach could save us from a lot of pain (longstanding pain and new pain) for int and Integer and friends too. I think Java's historical priority of "compatibility at all costs" has been something of an illusion; it still leaves us high and dry when *we* want to adopt new features as *we* end up having to make incompatible changes to do it. But if we always gave proper support to users' migration scenarios then we wouldn't always *need* the absolute compatibility at the language level. ~~~ (Just to try to move the old overton window, what I really think we should do is go further and deprecate `==` entirely, introducing `System.identityEquals(a, b)` (or maybe `===`) which would only work for identity types. Then in time `==` could be reintroduced as a synonym for `Object.equals()` and everyone would be happy and write shiny bug-free programs.... I know this would be a large deal. Sometime I will have to write at length about just how bad the problem of identity-equality overuse/abuse has been.) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Tue Apr 26 14:12:43 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 26 Apr 2022 14:12:43 +0000 Subject: B3 ref model In-Reply-To: <949882043.16817087.1650961251284.JavaMail.zimbra@u-pem.fr> References: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> <949882043.16817087.1650961251284.JavaMail.zimbra@u-pem.fr> Message-ID: > so if we have > > primitive class Prim { > long value; > } > > class Container { > LPrim; prim; > } > > and Prim has been loaded before Container is seen by the VM, the VM can not decide to flatten LPrim; to a long + a bit for nullability because the VM has to ensure atomicity even if the user has declared Prim as a primitive class. Of course, we don?t put L or Q in the source code, but we put .val or .ref which translates appropriately. If you have a field of type X.ref, the compiler should put X in the preload attribute. If this doesn?t happen, then yes, there will be no flattening. If it does happen, there may be flattening. > And how the Preload attribute is supposed to work in that context, given that it declares that a L-type is in fact a Q-type ? Preload contains names of classes, not types. At the time Container is compiled, the compiler will find Prim, and will look at whether it is a value type or not, and maybe put something in Preload. Of course the class could migrate between compile and runtime, in which case we?ll lose some flattening we might have had with a consistent compilation. Not a big deal. From kevinb at google.com Tue Apr 26 14:22:38 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 26 Apr 2022 10:22:38 -0400 Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> References: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> Message-ID: It's a great start, but the key difference is that we need to be able to apply this process to *our own* types, not just the JDK types. Really, we should see whatever we need to do for JDK types as a clue to what other library owners will need as well. Understanding this now, I hope you'll reread the proposal? (Secondarily... why are we warning only on synchronization, and not on `==` or (marginal) `identityHC`?) On Tue, Apr 26, 2022 at 10:09 AM Brian Goetz wrote: > How much of this is already covered by https://openjdk.java.net/jeps/390 ? > > On Apr 25, 2022, at 11:12 PM, Kevin Bourrillion wrote: > > So I want to make my class identityless. But -- whoops! -- I released it > years ago and it has lots of usages. And though I've labeled it as > "value-based", any number of these callers are depending on its identity in > some way or other. > > I'd like to put -- let's say an annotation on my class, like > `@_FutureNonIdentityClass` or whatever, with the following effects: > > * I get a warning if I'm breaking any of the rules of > identityless classes, like if I have a non-final field. > * Use sites get a warning if they do _anything_ identity-dependent with it > (==, identity hc, synchronization, ...?) > > This would leave me in a good position to add the real identity-forsaking > keyword later (at which time the annotation becomes redundant and should > cause a warning until it's removed). > > We can address all this in Error Prone, but I'm not sure it should be left > to that, partly because a bunch of JDK value-based types need this same > treatment themselves (apparently only the synchronization warning has been > rolled out so far?). > > Could we get this supported in javac itself? The best thing would be to > roll it out in an even earlier release than bucket 2 types themselves... > the sooner the better (maybe we could help?). > > I think the annotation could be relegated to some one-off module so it > doesn't pollute the beautiful jdk.base forever. > > ~~~ > > One of the things this means is that people should stop using `==` on > these value-based classes. > > And that is really really good, because what we are planning to do to `==` > is... really really bad. Don't misread me: if compatibility is sacrosanct > then it is probably the least-bad thing we can do! But honestly, it's bad, > because it's not a behavior that anyone ever *actually wants* -- unless > they just happen to have no fields of reference types at all. But the fact > that it does work in that case just makes the whole thing worse, because > code like that will be a ticking time bomb waiting to do the wrong thing as > soon as one reference-type field is added at any nested level below that > point. > > What if we give users support for their migration path, so there *are no* > usages of `==` that need to remain compatible for these types? Then we > could make `==` *not do anything* at all for bucket-2 classes. > > This approach could save us from a lot of pain (longstanding pain and new > pain) for int and Integer and friends too. > > I think Java's historical priority of "compatibility at all costs" has > been something of an illusion; it still leaves us high and dry when *we* > want to adopt new features as *we* end up having to make incompatible > changes to do it. But if we always gave proper support to users' migration > scenarios then we wouldn't always *need* the absolute compatibility at the > language level. > > ~~~ > > (Just to try to move the old overton window, what I really think we should > do is go further and deprecate `==` entirely, introducing > `System.identityEquals(a, b)` (or maybe `===`) which would only work for > identity types. Then in time `==` could be reintroduced as a synonym for > `Object.equals()` and everyone would be happy and write shiny bug-free > programs.... I know this would be a large deal. Sometime I will have to > write at length about just how bad the problem of identity-equality > overuse/abuse has been.) > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From daniel.smith at oracle.com Tue Apr 26 14:31:02 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Apr 2022 14:31:02 +0000 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> Message-ID: <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> On Apr 25, 2022, at 8:20 PM, Kevin Bourrillion > wrote: On Mon, Apr 25, 2022 at 7:52 PM Dan Smith > wrote: Yeah, I think this has to be the starting place, before we get into whatever other model simplifications, compatible migrations, etc., might be gained. The expectation for any two-types-for-one-name approach should be, I think, that almost all types referencing the class should use the simple name. The non-default form is for special cases only. Whose expectation is that -- do you mean it will be what users expect? Because they might, but that's not the same as good design. It's how I interpret our requirements, I guess? The vision of B3 is "user-defined primitives": that someone can define in a library a type that can be used interchangeably with the existing built-in primitive types. (We can debate whether "primitive" is the right word here, but the concept persists under whatever naming scheme.) If the expectation is that a typical programmer is going to look over their menu of types and choose between 'int', 'long', or 'Integer128.val', I think we've heavily biased them against the third one. The syntactic overhead is just too much. Whereas if we're saying "just use plain reference type 'Integer128', it'll usually be fine", that's probably something we can sell (if we can deliver on "usually fine"), even though the menu will be more like 'Integer', 'Long', and 'Integer128'. So if we're considering an approach in which the reference type is used almost all the time, we need to establish that doing so will not be considered a "bad practice" for performance reasons. Specifically: I don't see why this is. If there's bad performance, the users have the freedom to help themselves to the better performance any time they want to, for the minor cost of a little "sprinkling". That sounds like Valhalla success to me. Isn't it? I think our success will come from widespread high-performance use of these classes. Like how 'int' works. If the L types are not "high-performance" (a subjective measure, I know), and the Q types are pain to use, I worry that won't be perceived as successful. (Either "Valhalla is a pain to use" or "Valhalla rarely delivers the promised performance".) A good test for me is this: if we asked everybody to stop saying 'int' all the time, and prefer 'Integer' instead except in performance-critical code, could we effectively convince them to set aside their misgivings about performance and trust the JVM to be reasonably efficient? Well, the thing forcing our hand in our case is the need to work within the limitations of a language with 28 years of expectations already rooted in brains. I'm thinking about this test more from a clean slate perspective, I think: rephrased, in a new language (something like Kotlin, say), could we leave out 'int', and convince people to do everything with 'Integer', or in performance-sensitive cases say 'Integer.val'? Would that language be perceived as worse (on either performance or syntactic grounds) than Java? From brian.goetz at oracle.com Tue Apr 26 14:40:50 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 26 Apr 2022 14:40:50 +0000 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: > It's how I interpret our requirements, I guess? > > The vision of B3 is "user-defined primitives": that someone can define in a library a type that can be used interchangeably with the existing built-in primitive types. (We can debate whether "primitive" is the right word here, but the concept persists under whatever naming scheme.) > > If the expectation is that a typical programmer is going to look over their menu of types and choose between 'int', 'long', or 'Integer128.val', I think we've heavily biased them against the third one. The syntactic overhead is just too much. s/is/may be/ I?ve gone back and forth on this a few times as our understanding of the reality gets refined, and surely there?s room for multiple opinions. There are clearly useful choices to be had at both the declaration and use site. If a class has no sensible default, then it?s a B2, and there?s nothing more to be done; if it does, then *maybe* it can be a B3. So this choice ? about zero defaults ? has to be made by the class author. If the class author says that zero is a sensible default, the client has choices at the use site, such as nullity. Do they want to represent only dates, or the possibility of no date being present? While clearly the latter choice depends on the former (you can?t choose .val for a B2), we want to empower both users ? the class writer and the class user ? to make these choices for themselves. > Whereas if we're saying "just use plain reference type 'Integer128', it'll usually be fine", that's probably something we can sell (if we can deliver on "usually fine"), even though the menu will be more like 'Integer', 'Long', and 'Integer128?. Fine is relative. For mainstream users, the .ref type probably is fine most of the time; its better than B1 would be, and B1 is often fine now. Where the real money is is when we have big arrays of flattenable types, because both the footprint and any accidental indirections / atomicity costs are multiplied. I guess I?d like to defer judgment on this bit of low-level syntax issues for a bit, while we let Kevin?s points sink in, and more importantly, figure out the role of the various opt ins (id-free, null-free, tear-free) into the story. From kevinb at google.com Tue Apr 26 14:45:50 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 26 Apr 2022 10:45:50 -0400 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: On Tue, Apr 26, 2022 at 10:31 AM Dan Smith wrote: If the expectation is that a typical programmer is going to look over their > menu of types and choose between 'int', 'long', or 'Integer128.val', I > think we've heavily biased them against the third one. The syntactic > overhead is just too much. > But this bias affects only the specific slice of use cases that both (a) *should *use `Integer128.val` but (b) can get by fine with using `int`. I think that slice is probably marginal. And a tool could come by and fix up the code to squeeze more performance out of it later. I think our success will come from widespread high-performance use of these > classes. Like how 'int' works. If the L types are not "high-performance" (a > subjective measure, I know), and the Q types are pain to use, I worry that > won't be perceived as successful. (Either "Valhalla is a pain to use" or > "Valhalla rarely delivers the promised performance".) > As long as we are distinguishing the "perception issue" from the reality and weighting the perception issue appropriately -- which is not zero, for sure. I'm thinking about this test more from a clean slate perspective, I think: > rephrased, in a new language (something like Kotlin, say), could we leave > out 'int', and convince people to do everything with 'Integer', or in > performance-sensitive cases say 'Integer.val'? Would that language be > perceived as worse (on either performance or syntactic grounds) than Java? > I think I would insist that `.val` be spelled with only one additional character... or even that the value type be generated as the snake_case form of the name! -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Tue Apr 26 15:19:01 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 26 Apr 2022 17:19:01 +0200 (CEST) Subject: B3 ref model In-Reply-To: References: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> <949882043.16817087.1650961251284.JavaMail.zimbra@u-pem.fr> Message-ID: <440578227.17116422.1650986341177.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" > Cc: "daniel smith" , "valhalla-spec-experts" > Sent: Tuesday, April 26, 2022 4:12:43 PM > Subject: Re: B3 ref model >> so if we have >> >> primitive class Prim { >> long value; >> } >> >> class Container { >> LPrim; prim; >> } >> >> and Prim has been loaded before Container is seen by the VM, the VM can not >> decide to flatten LPrim; to a long + a bit for nullability because the VM has >> to ensure atomicity even if the user has declared Prim as a primitive class. > > Of course, we don?t put L or Q in the source code, but we put .val or .ref which > translates appropriately. If you have a field of type X.ref, the compiler > should put X in the preload attribute. If this doesn?t happen, then yes, there > will be no flattening. If it does happen, there may be flattening. but the VM has already loaded Prim, it knows that its a B3. The preload attribute is to pre-load if the VM does not know if it's a value type or not. It seems you want to restrict the VM to take local decisions independently of the classes already loaded ? For me, L-type means: if you do not already know, you will discover later if it's a B1/B2/B3 when the class will be loaded. The preload attribute means: if you do not already know, you should load the class now (at least when you want to take a decison based on the class being a B1/B2/B3 or not). A L-type does not mean, it's a pointer and it's always be a pointer, because if a user has chosen a class to be a B3, the VM should do whatever is possible to flatten it, even if the declared type is a L-type. By example, if you have a code like this record Holder(Object o) { } primitive record Complex(double re, double im) { } Holder holder = new Holder(new Complex(2., 3.)); ... // more codes The VM may found that "holder" escapes but at the same time that Complex is a B3, in that case the instance of Holder is allocated on the heap but the instance of Complex can be flattened inside the heap representation of Holder. In this example, there is no QComplex; in the bytecode of Holder, there is no Preload attribute too, the VM knowning that a Complex is a B3 is enough. And i'm not suggesting that it's what a particular VM should do that, just that a VM should be able to do that. R?mi From forax at univ-mlv.fr Tue Apr 26 15:28:34 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 26 Apr 2022 17:28:34 +0200 (CEST) Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: References: Message-ID: <2099336126.17122176.1650986914963.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "valhalla-spec-experts" > Sent: Tuesday, April 26, 2022 5:12:54 AM > Subject: We need help to migrate from bucket 1 to 2; and, the == problem > So I want to make my class identityless. But -- whoops! -- I released it years > ago and it has lots of usages. And though I've labeled it as "value-based", any > number of these callers are depending on its identity in some way or other. > I'd like to put -- let's say an annotation on my class, like > `@_FutureNonIdentityClass` or whatever, with the following effects: > * I get a warning if I'm breaking any of the rules of identityless classes, like > if I have a non-final field. > * Use sites get a warning if they do _anything_ identity-dependent with it (==, > identity hc, synchronization, ...?) > This would leave me in a good position to add the real identity-forsaking > keyword later (at which time the annotation becomes redundant and should cause > a warning until it's removed). > We can address all this in Error Prone, but I'm not sure it should be left to > that, partly because a bunch of JDK value-based types need this same treatment > themselves (apparently only the synchronization warning has been rolled out so > far?). > Could we get this supported in javac itself? The best thing would be to roll it > out in an even earlier release than bucket 2 types themselves... the sooner the > better (maybe we could help?). > I think the annotation could be relegated to some one-off module so it doesn't > pollute the beautiful jdk.base forever. > ~~~ > One of the things this means is that people should stop using `==` on these > value-based classes. > And that is really really good, because what we are planning to do to `==` is... > really really bad. Don't misread me: if compatibility is sacrosanct then it is > probably the least-bad thing we can do! But honestly, it's bad, because it's > not a behavior that anyone ever *actually wants* -- unless they just happen to > have no fields of reference types at all. But the fact that it does work in > that case just makes the whole thing worse, because code like that will be a > ticking time bomb waiting to do the wrong thing as soon as one reference-type > field is added at any nested level below that point. > What if we give users support for their migration path, so there *are no* usages > of `==` that need to remain compatible for these types? Then we could make `==` > *not do anything* at all for bucket-2 classes. > This approach could save us from a lot of pain (longstanding pain and new pain) > for int and Integer and friends too. > I think Java's historical priority of "compatibility at all costs" has been > something of an illusion; it still leaves us high and dry when *we* want to > adopt new features as *we* end up having to make incompatible changes to do it. > But if we always gave proper support to users' migration scenarios then we > wouldn't always *need* the absolute compatibility at the language level. >From my experience, there are == that are buried deeply masked as innocuous == on Object. A simple example is assertSame() of JUnit, it is defined on Object but will call == on B2 if the arguments are B2. Any non-interprocedural analysis will miss them. Perhaps an agent that rewrite all acmp to use invokedynamic will be able to trap those runtime call to == and emit a warning with the corresponding stacktrace. R?mi From daniel.smith at oracle.com Tue Apr 26 17:56:35 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Apr 2022 17:56:35 +0000 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: On Apr 26, 2022, at 8:45 AM, Kevin Bourrillion > wrote: I think I would insist that `.val` be spelled with only one additional character... or even that the value type be generated as the snake_case form of the name! Okay, this is a meaningful refinement that I find less objectionable. If it's '#Integer128' or 'Integer128!' instead of 'Integer128.val', we've trimmed away a chunk of the typing/reading overhead (though it's still there, and I think some of the overhead is directly in service of what you find attractive?the idea that the value type is something unnatural/a departure from the norm). If it's 'integer128' and 'Integer128', well now there is no "default" type, and I think we're talking about something categorically different. There are some new (surmountable?) problems, but my earlier objections don't apply. From daniel.smith at oracle.com Tue Apr 26 18:25:42 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Apr 2022 18:25:42 +0000 Subject: B3 ref model In-Reply-To: <440578227.17116422.1650986341177.JavaMail.zimbra@u-pem.fr> References: <1071018906.16673227.1650920931468.JavaMail.zimbra@u-pem.fr> <949882043.16817087.1650961251284.JavaMail.zimbra@u-pem.fr> <440578227.17116422.1650986341177.JavaMail.zimbra@u-pem.fr> Message-ID: On Apr 26, 2022, at 9:19 AM, forax at univ-mlv.fr wrote: For me, L-type means: if you do not already know, you will discover later if it's a B1/B2/B3 when the class will be loaded. The preload attribute means: if you do not already know, you should load the class now (at least when you want to take a decison based on the class being a B1/B2/B3 or not). Yes, this is right. Where you're going wrong (or at least in a different direction than the plan of record) is in the expectation that it should matter whether the class is a B2 or a B3. If you look at JEP 401, you'll see that 'ACC_PRIMITIVE' just means "I'm a value class that also supports Q types." A L-type does not mean, it's a pointer and it's always be a pointer, because if a user has chosen a class to be a B3, the VM should do whatever is possible to flatten it, even if the declared type is a L-type. L types for both B2 and B3 classes may be flattened; in both cases, there's a requirement that atomicity be preserved. In the plan of record model, this is not by fiat, but a consequence of the fact that an L type is a reference type, and reference types come with traditional expectations about integrity. From daniel.smith at oracle.com Tue Apr 26 18:53:13 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Apr 2022 18:53:13 +0000 Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: References: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> Message-ID: <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com> On Apr 26, 2022, at 8:22 AM, Kevin Bourrillion > wrote: It's a great start, but the key difference is that we need to be able to apply this process to *our own* types, not just the JDK types. Really, we should see whatever we need to do for JDK types as a clue to what other library owners will need as well. Yes, a public annotation was the original proposal. At some point we scaled that back to just JDK-internal. The discussions were a long time ago, but if I remember right the main concern was that a formalized, Java SE notion of "value-based class" would lead to some unwanted complexity when we eventually get to *real* value classes (e.g., a misguided CS 101 course question: "what's the difference between a value-based class and a value class? which one should you use?"). It seemed like producing some special warnings for JDK classes would address the bulk of the problem without needing to fall into this trap. Would an acceptable compromise be for a third-party tool to support its own annotations, while also recognizing @jdk.internal.ValueBased as an alternative spelling of the same thing? (Secondarily... why are we warning only on synchronization, and not on `==` or (marginal) `identityHC`?) I think this was simply not a battle that we wanted to fight?discouraging all uses of '==' on type Integer, for example. We spent some time trying to figure out what to say about '==', and came up with this: "the class does not provide any instance creation mechanism that promises a unique identity on each method call?in particular, any factory method's contract must allow for the possibility that if two independently-produced instances are equal according to equals(), they may also be equal according to ==;" and "When two instances of a value-based class are equal (according to `equals`), a program should not attempt to distinguish between their identities, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism." (See https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/doc-files/ValueBased.html) Within these constraints, there are reasonable things that can be done with '==', like optimizing for a situation where 'equals' is likely to be true. (I'm sympathetic to "don't do that anyway!", but it's more of a convention thing that javac would tend not to get involved with.) From daniel.smith at oracle.com Tue Apr 26 19:21:33 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Apr 2022 19:21:33 +0000 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: <01AED389-566B-4978-BC80-A925C50801BC@oracle.com> On Apr 26, 2022, at 12:37 PM, Dan Heidinga > wrote: The question again is what's the primary reason(s) for exposing a B3 (.val) vs B2 instance in APIs? What guidance would we give API designers around the use of B3 .val instances? So one piece of guidance we could give is: "always use value types unless you have a good reason not to." If those semantics are acceptable, we're giving the JVM the best information we have to maximize possible performance gains (both today and in future JVMs). Exactly what JVMs do with the information can be a black box. Alternatively, we can recommend "always use reference types unless you're sure there's a performance need for .val" (which you've nicely expanded into a more detailed set of rules). The nature of those rules depends on the answers to my list of performance questions: - Are we confident that flattened L types on the stack have negligible costs compared to Q types? (E.g., is there no significant cost to using extra registers to track and check nulls?) - Are we confident that we can achieve atomic, flattened L types on the heap for common cases? - Are we confident that the performance cliff required to guarantee atomicity for heap-flattened L types is acceptable in general programming settings? - Are we also confident that the extra null-tracking overhead of flattened L types on the heap is acceptable in most cases, and only needs to be compressed out by performance-tuning experts? The goal of these questions is to ensure that "there's a performance need for .val" is a corner case. In going down this path, we've opened the box and tied the guidance to properties of current/near-term implementations. So in addition to needing to validate these expectations, we'd want to be confident that the guidance won't look silly in 10 years as implementations change. A risk in either case is that people disagree about how to interpret the guidance, and then you have mismatches between component boundaries, leading to unnecessary problems like expensive heap allocations, noisy null warnings, or incompatible data structures. (Syntactically, I've been assuming that the "good name" would align with this guidance, going to the type we'd recommend using in most cases, but it's definitely possible to discourage general use of the good name, or not provide a "good name" at all.) From brian.goetz at oracle.com Tue Apr 26 20:49:28 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 26 Apr 2022 20:49:28 +0000 Subject: [External] : Re: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: I think this analysis is largely right. On the stack (parameters, returns, locals), the difference between B2/B3.ref and B3.val will be minimal; some extra register allocation pressure for the null channel, and that?s it. So I think its reasonable to say ?doesn?t matter? for these. Where B3.val pays for itself in performance (but has a cost in other complexity) is on the heap, and doubly so when dealing with larger volumes of data (e.g., arrays.) > With this kind of guidance, a focus on the density of storage, and a > good way to spell ".val", I can see the benefit of Kevin's approach. This is mostly where I am (though, there?s more to stack, and the ref/val thing is only part of it, so we have to look at the whole picture.) > On Apr 26, 2022, at 2:37 PM, Dan Heidinga wrote: > > Where is a B3 (.val) instance preferable to a B2 (B3 .ref) instance? > What are the criteria that would make an API designer expose a B3 > (.val) instance in their method descriptors? > > The primary benefit of a B3 (.val) instance over a B2 instance is > memory density. This relates to the containers holding the value. > The benefits are most evident when dealing with larger numbers of > flattened instances or arrays that live in the heap. The B3 (.val) is > primarily a benefit for backing storage. Agreed? > > For API points, the .val instance ensures that nulls are impossible > and the methods can avoid null checks but may need special handling > for the all-zero value. There are potentially performance benefits > from the calling conventions knowing a priori that null isn't in the > value set. The impact of this is TBD. Alternatively, the default > .ref has the benefits Kevin pointed out in his original email (not > repeating them here). > > Tearing only comes into play when writing to storage. It's a > non-issue for API points. Anything controversial here? > > The question again is what's the primary reason(s) for exposing a B3 > (.val) vs B2 instance in APIs? What guidance would we give API > designers around the use of B3 .val instances? > > My initial attempt at this: > * Use B3 .val instances for backing storage - so instance variables, > arrays, and static fields (Flattening of statics tbd). > * Use B3 .val instances for internal private API points - so private > methods or package private methods where the instance has already come > through some front door > * Use B2 (B3 .ref) in most public APIs for migration, avoiding bad > default values, etc as per Kevin's initial email - at the risk of > slightly worse performance? > * Use B3 .val instances for the small set of public APIs where every > inch of performance matters (and only after proving it matters) > > With this kind of guidance, a focus on the density of storage, and a > good way to spell ".val", I can see the benefit of Kevin's approach. > Still not 100% on board but persuadable. > > In the B3 defaults to .ref model, what does the constructor return? > An L or Q? Can the user control that? > > Remi's "new Complex(r, i)" example left me wondering do users say: > Complex.val c = new Complex(1, 2); > or > Complex.val c = new Complex.val(1, 2); > > Does the Complex class author decide by providing one or the other > form of constructor: > Complex.val(int r, int i) { /* Return QComplex */ .... } > and > Complex(int r, int i) { /* Return LComplex */ .... } > > The instance itself doesn't change but there may be conversions > (checkcast) needed one way or the other. > > --Dan > > On Tue, Apr 26, 2022 at 1:56 PM Dan Smith wrote: >> >> On Apr 26, 2022, at 8:45 AM, Kevin Bourrillion wrote: >> >> I think I would insist that `.val` be spelled with only one additional character... or even that the value type be generated as the snake_case form of the name! >> >> >> Okay, this is a meaningful refinement that I find less objectionable. >> >> If it's '#Integer128' or 'Integer128!' instead of 'Integer128.val', we've trimmed away a chunk of the typing/reading overhead (though it's still there, and I think some of the overhead is directly in service of what you find attractive?the idea that the value type is something unnatural/a departure from the norm). >> >> If it's 'integer128' and 'Integer128', well now there is no "default" type, and I think we're talking about something categorically different. There are some new (surmountable?) problems, but my earlier objections don't apply. >> > From kevinb at google.com Tue Apr 26 21:11:05 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 26 Apr 2022 17:11:05 -0400 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: On Tue, Apr 26, 2022 at 2:37 PM Dan Heidinga wrote: The question again is what's the primary reason(s) for exposing a B3 > (.val) vs B2 instance in APIs? What guidance would we give API > designers around the use of B3 .val instances? > Nice question! I thought about it a little bit and this is my own first take. I think *most* of the advice would be cross-cutting across param types, return types, field types, etc.: If 1. I don't want null to be included as a value 2. I'm definitely not abusing some value as a fake "pseudo-null" sentinel 3. (for a value I'm declaring) I'm willing to take care that it gets initialized properly 4. I'm properly chastened about racy access ... then I believe the .val is *acceptable*, and I further think that whenever it's acceptable it's probably *preferred*. But until I go through this checklist, I'm safer with the reference type. In the B3 defaults to .ref model, what does the constructor return? > An L or Q? Can the user control that? > > Remi's "new Complex(r, i)" example left me wondering do users say: > Complex.val c = new Complex(1, 2); > or > Complex.val c = new Complex.val(1, 2); > The criteria above seem to say that a constructor should always return the value type (with no need for `new Foo.val`). And good thing, because that's the way that lets you easily store it into either kind of variable. Yay? > > On Tue, Apr 26, 2022 at 1:56 PM Dan Smith wrote: > > > > On Apr 26, 2022, at 8:45 AM, Kevin Bourrillion > wrote: > > > > I think I would insist that `.val` be spelled with only one additional > character... or even that the value type be generated as the snake_case > form of the name! > > > > > > Okay, this is a meaningful refinement that I find less objectionable. > > > > If it's '#Integer128' or 'Integer128!' instead of 'Integer128.val', > we've trimmed away a chunk of the typing/reading overhead (though it's > still there, and I think some of the overhead is directly in service of > what you find attractive?the idea that the value type is something > unnatural/a departure from the norm). > > > > If it's 'integer128' and 'Integer128', well now there is no "default" > type, and I think we're talking about something categorically different. > There are some new (surmountable?) problems, but my earlier objections > don't apply. > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Tue Apr 26 21:14:49 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 26 Apr 2022 17:14:49 -0400 Subject: Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <3A62E9B7-2AF0-4E27-857D-7294FC9FA236@oracle.com> <514F5431-0BBD-434F-A78B-10A48E5A782D@oracle.com> Message-ID: On Tue, Apr 26, 2022 at 5:11 PM Kevin Bourrillion wrote: Nice question! I thought about it a little bit and this is my own first > take. I think *most* of the advice would be cross-cutting across param > types, return types, field types, etc.: > > If > 1. I don't want null to be included as a value > 2. I'm definitely not abusing some value as a fake "pseudo-null" sentinel > 3. (for a value I'm declaring) I'm willing to take care that it gets > initialized properly > 4. I'm properly chastened about racy access > For a field type, or especially for the type argument of a collection (cpt type of an array), I might entertain *some* extent of #2 "abuse", with great caution. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Tue Apr 26 22:09:05 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 26 Apr 2022 18:09:05 -0400 Subject: We need help to migrate from bucket 1 to 2; and, the == problem In-Reply-To: <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com> References: <38DF0B35-3F89-484F-8A35-FF2F5924859C@oracle.com> <034E48A2-8AB2-4156-A30C-F6F79F8CABC3@oracle.com> Message-ID: Above, when I said the proposed `==` behavior is "not a behavior that anyone ever *actually wants* -- unless they just happen to have no fields of reference types at all", I did leave out some other cases. Like when your only field types (recursing down fields of value types) that are reference types are types that don't override `equals()` (e.g. `Function`). In a way this sort of furthers my argument that the boundary between when `==` is safely an `equals` synonym and when it isn't is going to be difficult to perceive. Yet, since people hunger for `==` to really mean `equals`, they are highly overwhelmingly likely to do it as much as possible whenever they are convinced it looks safe. And then one addition of a string field in some leaf-level type can break a whole lot of code. On Tue, Apr 26, 2022 at 2:53 PM Dan Smith wrote: Yes, a public annotation was the original proposal. At some point we scaled > that back to just JDK-internal. The discussions were a long time ago, but > if I remember right the main concern was that a formalized, Java SE notion > of "value-based class" would lead to some unwanted complexity when we > eventually get to *real* value classes (e.g., a misguided CS 101 course > question: "what's the difference between a value-based class and a value > class? which one should you use?"). > Yeah, I hear that. The word "value" does have multiple confusable meanings. I'd say the key difference is that "value semantics" are logically a *recursive* rejection of identity, while a Valhalla B2/B3 class on its own addresses only one level deep. Anyway, I think what I'm proposing avoids trouble by specifically labeling one state as simply the transitional state to the other. I'm not sure there'd be much to get hung up on. > It seemed like producing some special warnings for JDK classes would > address the bulk of the problem without needing to fall into this trap. > I'd just say it addresses a more specific problem: how *those* particular classes can become B2/B3 (non-identity) classes. > Would an acceptable compromise be for a third-party tool to support its > own annotations, while also recognizing @jdk.internal.ValueBased as an > alternative spelling of the same thing? > I think it's "a" compromise :-), I will just have to work through how acceptable. Is there any such thing as a set of criteria for when a warning deserves to be handled by javac instead of left to all the world's aftermarket static analyzers to handle? (Secondarily... why are we warning only on synchronization, and not on `==` > or (marginal) `identityHC`?) > > I think this was simply not a battle that we wanted to fight?discouraging > all uses of '==' on type Integer, for example. > Who would be fighting the other side of that battle? Not anyone having some *need* to use `==` over `.equals()`, because we'll be breaking them when Integer changes buckets anyway. So... just the users saying "we should get to use `==` as a shortcut for `.equals()` as long as we stay within the cached range"? Oh, wait: Within these constraints, there are reasonable things that can be done with > '==', like optimizing for a situation where 'equals' is likely to be true. > Ok, that too. Fair I suppose... it's just that it's such a very special case... -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Tue Apr 26 23:22:17 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 26 Apr 2022 19:22:17 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> Message-ID: On Mon, Apr 25, 2022 at 10:05 AM Brian Goetz wrote: > > 1. The option with fewer hazards should usually be the default. Users > won't opt themselves into extra safety, but they will sometimes opt out of > it. Here, the value type is the one that has attendant risks -- risk of a > bad default value, risk of a bad torn value. We want using `Foo.val` to > *feel like* cracking open the shell of a `Foo` object and using its innards > directly. But if it's spelled as plain `Foo` it won't "feel like" anything > at all. > > Let me state it more strongly: unboxed ?primitives? are less safe. > Despite all the efforts from the brain trust, the computational physics > still points us towards ?the default is zero, even if you don?t like that > value? and ?these things can tear under race, even though they resemble > immutable objects, which don?t.? The insidious thing about tearing is that > it is only exhibited in subtly broken programs. The ?subtly? part is the > really bad part. So we have four broad options: > > - neuter primitives so they are always as safe as we might naively hope, > which will result in either less performance or a worse programming model; > - keep a strong programming model, but allow users to trade some safety > (which non-broken programs won?t suffer for) with an explicit > declaration-site and/or use-site opt-in (?.val?) > - same, but try to educate users about the risk of tearing under data > race (good luck) > - decide the tradeoff is impossible, and keep the status quo > > The previous stake in the ground was #3; you are arguing towards #2. I'm confused here -- I don't recognize that as being what I'm arguing. > > 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 > object. But it sure looks like that `.ref` is specifically telling it NOT > to be -- like it's saying "no, VM, *don't* optimize this to be a value even > if you can!" That's of course not what we mean. With the change I'm > proposing, `Foo.val` does make sense: it's just saying "hey runtime, while > you already *might* have represented this as a value, now I'm demanding > that you *definitely* do". That's a normal kind of a thing to do. > > A key aspect of this is the bike shed tint; .val is not really the right > indicator given that the reference type is also a ?value class?. I think > we?re comfortable giving the ?value? name to the whole family of > identity-free classes, which means that .val needs a new name. Bonus > points if the name connotes ?having burst free of the constraints of > reference-hood?: unbound, loose, exploded, compound value, etc. And also > is pretty short. FWIW, I have at no point been comfortable with that decision. My little manifesto puts forth a strong meaning for "value" that bucket 2 has nothing to do with. Bucket 2 objects don't even safely have "value semantics" either (which are recursive). The nomenclature I would like to see would center, somehow, around B1: an identity class B2: a non-identity class B3: a non-identity class (that also gets a .val type) (Thought experiment: if we had an annotation meaning "using the .val type is not a great idea for this class and you should get a compile-time warning if you do" .... would we really and I mean *really* even need bucket 2 at all?) One more: the .getClass() anomaly goes away. > > If we have > > mumble primitive mumble Complex { ? } > > Complex.val c = ? > > then what do we get when we ask c for its getClass? The physics again > point us at returning Complex.ref.class, not Complex.val.class, but under > the old scheme, where the val projection gets the good name, it would seem > anomalous, since we ask a val for its class and get the ref mirror. But > under the Kevin interpretation, we can say ?well, the CLASS is Complex, so > if you ask getClass(), you get Complex.class.? > First, I don't think we can really appeal to "well, the CLASS is..." because people will know that there are two Class instances for that same class, so it doesn't explain which one they'd get. But actually I am increasingly convinced that this method shouldn't return anything. If it must exist, it should just throw, because its purpose is to get an object's dynamic type, and there is no object here, and no dynamic type here. But even better: throwing would force the user to just write whichever of `Complex.class` or `Complex.val.class` they actually mean, and everyone would be better off for that. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed Apr 27 13:59:31 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 13:59:31 +0000 Subject: On tearing Message-ID: Several people have asked why I am so paranoid about tearing. This mail is about tearing; there?ll be another about user model stacking and performance models. (Please, let?s try to resist the temptation to jump to ?the answer?.) Many people are tempted to say ?let it tear.? The argument for ?let it tear? is a natural-sounding one; after all, tearing only happens when someone else has made a mistake (data race). It is super-tempting to say ?Well, they made a mistake, they get the consequences?. While there are conditions under which this would be a reasonable argument, I don?t think those conditions quite hold here, because from both the inside and the outside, B3 classes ?code like a class.? Authors will feel free to use constructors to enforce invariants, and if the use site just looks like ?Point?, clients will not be wanting to keep track of ?is this one of those classes with, or without, integrity?? Add to this, tearing is already weird, and while it is currently allowed for longs and doubles, 99.9999% of Java developers have never actually seen it or had to think about it very carefully, because implementations have had atomic loads and stores for decades. As our poster child, let?s take integer range: __B3 record IntRange(long low, int high) { public IntRange { if (low > high) throw; } } Here, the author has introduced an invariant which is enforced by the constructor. Clients would be surprised to find an IntRange in the wild that disobeys the invariant. Ranges have a reasonable zero value. This a an obvious candidate for B3. But, I can make this tear. Imagine a mutable field: /* mutable */ IntRange r; and two threads racing to write to r. One writes IntRange(5, 10); the other writes IntRange(2,4). If the writes are broken up into two writes, then a client could read IntRange(5, 4). Worse, unlike more traditional races which might be eventually consistent, this torn value will be observable forever. Why does this seem worse than a routine long tearing (which no one ever sees and most users have never heard of)? Because by reading the code, it surely seems like the code is telling me that IntRange(5, 4) is impossible, and having one show up would be astonishing. Worse, a malicious user can create such a bad value (statistically) at will, and then inject that bad value into code that depends on the invariants holding. Not all values are at risk of such astonishment, though. Consider a class like: __B3 record LongHolder(long x) { } Given that a LongHolder can contain any long value, users of LongHolder are not expecting that the range is carefully controlled. There are no invariants for which breaking them would be astonishing; LongHolder(0x1234567887654321) is just as valid a value as LongHolder(3). There are two factors here: invariants and transparency. The above examples paint the ranges of invariants (from none at all, to invariants that constrain multiple fields). But there?s also transparency. The second example was unsurprising because the API allowed us to pass any long in, so we were not surprised to see big values coming out. But if the relationship between the representation and the construction API is more complicated, one could imagine thinking the constructor has deterred all the surprising values, and then still see a surprising value. That longs might tear is less surprising because any combination of bits is a valid long, and there?s no way to exclude certain values when ?constructing? a long. Separately, there are different considerations at the declaration and use site. A user can always avoid tearing by avoiding data races, such as marking the field volatile (that?s the usual cure for longs and doubles.) But what we?re missing is something at the declaration site, where the author can say ?I have integrity concerns? and constrain layout/access accordingly. We experimented with something earlier (?extends NonTearable?) in this area. Coming back to ?why do we care so much?. PLT_Hulk summarized JCiP in one sentence: https://twitter.com/PLT_Hulk/status/509302821091831809 If Java developers have learned one thing about concurrency, it is: ?immutable objects are always thread-safe.? While we can equivocate about whether B3.val are objects or not, this distinction is more subtle than we can expect people to internalize. (If people internalized ?Write immutable classes, they will always be thread-safe?, that would be pretty much the same thing.) We cannot deprive them of the most powerful and useful guideline for writing safe code. (To make another analogy: serialization lets objects appear to not obey invariants established in the constructor. We generally don?t like this; we should not want to encourage more of this.) There are options here, but none are a slam dunk: - Force all B3 values to be atomic, which will have a performance cost; - Deny the ability to enforce invariants on B3 classes (no NonNegativeInt, no IntRange); - Try to educate people about tearing (good luck); - Put out bigger warning signs (e.g., IntRange.tearable) that people can?t miss; - More declaration-site control over atomicity, so classes with invariants can ensure their invariants are defended. I think the last is probably the most sane. From daniel.smith at oracle.com Wed Apr 27 15:03:47 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 27 Apr 2022 15:03:47 +0000 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> Message-ID: <512539E0-7FE8-4EE1-B903-FD5DBFC2EFDE@oracle.com> On Apr 26, 2022, at 5:22 PM, Kevin Bourrillion > wrote: (Thought experiment: if we had an annotation meaning "using the .val type is not a great idea for this class and you should get a compile-time warning if you do" .... would we really and I mean *really* even need bucket 2 at all?) Yes, because some (many?) class authors want strong guarantees that the initial (all-zeros) instance is never available in the wild. This is the most prominent encapsulation-breaking compromise that an author makes when moving from B2 to B3. From kevinb at google.com Wed Apr 27 15:19:30 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 27 Apr 2022 11:19:30 -0400 Subject: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo In-Reply-To: <512539E0-7FE8-4EE1-B903-FD5DBFC2EFDE@oracle.com> References: <455043A6-08BD-43F6-AE4F-F5B6E8BB695D@oracle.com> <512539E0-7FE8-4EE1-B903-FD5DBFC2EFDE@oracle.com> Message-ID: On Wed, Apr 27, 2022 at 11:03 AM Dan Smith wrote: > On Apr 26, 2022, at 5:22 PM, Kevin Bourrillion wrote: > > (Thought experiment: if we had an annotation meaning "using the .val type > is not a great idea for this class and you should get a compile-time > warning if you do" .... would we really and I mean *really* even need > bucket 2 at all?) > > Yes, because some (many?) class authors want strong guarantees that the > initial (all-zeros) instance is never available in the wild. This is the > most prominent encapsulation-breaking compromise that an author makes when > moving from B2 to B3. > Yeah, I'm somewhere on that end of the spectrum too, about types like `Instant` (in fact I started calling this the 1970 problem). I only even say the above because in such cases there *is* the fallback of gritting teeth and getting comfortable in bucket 1. Sad, but my thought was that three kinds of concrete classes (times three kinds of concrete classes) is sad too. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Wed Apr 27 15:38:46 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 27 Apr 2022 17:38:46 +0200 (CEST) Subject: On tearing In-Reply-To: References: Message-ID: <423731687.17626456.1651073926931.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Wednesday, April 27, 2022 3:59:31 PM > Subject: On tearing > Several people have asked why I am so paranoid about tearing. This mail is about > tearing; there?ll be another about user model stacking and performance models. > (Please, let?s try to resist the temptation to jump to ?the answer?.) > Many people are tempted to say ?let it tear.? The argument for ?let it tear? is > a natural-sounding one; after all, tearing only happens when someone else has > made a mistake (data race). It is super-tempting to say ?Well, they made a > mistake, they get the consequences?. > While there are conditions under which this would be a reasonable argument, I > don?t think those conditions quite hold here, because from both the inside and > the outside, B3 classes ?code like a class.? Authors will feel free to use > constructors to enforce invariants, and if the use site just looks like > ?Point?, clients will not be wanting to keep track of ?is this one of those > classes with, or without, integrity?? Add to this, tearing is already weird, > and while it is currently allowed for longs and doubles, 99.9999% of Java > developers have never actually seen it or had to think about it very carefully, > because implementations have had atomic loads and stores for decades. > As our poster child, let?s take integer range: > __B3 record IntRange(long low, int high) { > public IntRange { > if (low > high) throw; > } > } > Here, the author has introduced an invariant which is enforced by the > constructor. Clients would be surprised to find an IntRange in the wild that > disobeys the invariant. Ranges have a reasonable zero value. This a an obvious > candidate for B3. > But, I can make this tear. Imagine a mutable field: > /* mutable */ IntRange r; > and two threads racing to write to r. One writes IntRange(5, 10); the other > writes IntRange(2,4). If the writes are broken up into two writes, then a > client could read IntRange(5, 4). Worse, unlike more traditional races which > might be eventually consistent, this torn value will be observable forever. > Why does this seem worse than a routine long tearing (which no one ever sees and > most users have never heard of)? Because by reading the code, it surely seems > like the code is telling me that IntRange(5, 4) is impossible, and having one > show up would be astonishing. Worse, a malicious user can create such a bad > value (statistically) at will, and then inject that bad value into code that > depends on the invariants holding. > Not all values are at risk of such astonishment, though. Consider a class like: > __B3 record LongHolder(long x) { } > Given that a LongHolder can contain any long value, users of LongHolder are not > expecting that the range is carefully controlled. There are no invariants for > which breaking them would be astonishing; LongHolder(0x1234567887654321) is > just as valid a value as LongHolder(3). > There are two factors here: invariants and transparency. The above examples > paint the ranges of invariants (from none at all, to invariants that constrain > multiple fields). But there?s also transparency. The second example was > unsurprising because the API allowed us to pass any long in, so we were not > surprised to see big values coming out. But if the relationship between the > representation and the construction API is more complicated, one could imagine > thinking the constructor has deterred all the surprising values, and then still > see a surprising value. That longs might tear is less surprising because any > combination of bits is a valid long, and there?s no way to exclude certain > values when ?constructing? a long. > Separately, there are different considerations at the declaration and use site. > A user can always avoid tearing by avoiding data races, such as marking the > field volatile (that?s the usual cure for longs and doubles.) But what we?re > missing is something at the declaration site, where the author can say ?I have > integrity concerns? and constrain layout/access accordingly. We experimented > with something earlier (?extends NonTearable?) in this area. > Coming back to ?why do we care so much?. PLT_Hulk summarized JCiP in one > sentence: > [ https://twitter.com/PLT_Hulk/status/509302821091831809 | > https://twitter.com/PLT_Hulk/status/509302821091831809 ] > If Java developers have learned one thing about concurrency, it is: ?immutable > objects are always thread-safe.? While we can equivocate about whether B3.val > are objects or not, this distinction is more subtle than we can expect people > to internalize. (If people internalized ?Write immutable classes, they will > always be thread-safe?, that would be pretty much the same thing.) We cannot > deprive them of the most powerful and useful guideline for writing safe code. > (To make another analogy: serialization lets objects appear to not obey > invariants established in the constructor. We generally don?t like this; we > should not want to encourage more of this.) > There are options here, but none are a slam dunk: > - Force all B3 values to be atomic, which will have a performance cost; > - Deny the ability to enforce invariants on B3 classes (no NonNegativeInt, no > IntRange); > - Try to educate people about tearing (good luck); > - Put out bigger warning signs (e.g., IntRange.tearable) that people can?t miss; > - More declaration-site control over atomicity, so classes with invariants can > ensure their invariants are defended. > I think the last is probably the most sane. Writing immutable objects in Java is hard, there is already a check list: - be sure that your class in not only unmodifiable but really immutable, storing a mutable class in a field is an issue - do you have declared all fields final, otherwise you have a publication issue - your constructors do not leak "this", right ! so adding a forth item - the class is not a primitive class does not seem to be a big leap too me. I agree with the idea that this is like the serialization, you have two ways to bypass the constructor. It's not as bad as the serialization, where usually you can generate instances with any values. By contrast, with a primitive class - the default value bypass the constructors - you can create a new value by merging components bypassing the constructors. So supposing a straightforward implementation, creating a NonNegativeInt is Ok but creating an Int256Range is not. (NonNegativeInt: to have a negative integers, you have to change the sign bit: the default value does not set the sign bit and merging value will not set the sign bit out of thin air) (Int256Range: you can smash the components of a bigger range to the components of a smaller range) So the rules are quite arcane and the question is should we try to avoid people shooting themselves in the foot. For me, we should make the model clear, the compiler should insert a non user overridable default constructor but not more because using a primitive class is already an arcane construct. There is no point to nanny people here given that only experts will want to play with it. This is very similar to the way Java let you use volatile and do a ++ on the field, that's usually a bad idea but volatile is an arcane construct, not something people use every day. But we (the EG) can also fail, and make a primitive class too easy to use, what scare me is people using primitive class just because it's not nullable. R?mi From brian.goetz at oracle.com Wed Apr 27 16:44:01 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 16:44:01 +0000 Subject: User model stacking Message-ID: Here?s some considerations for stacking the user model. (Again, please let?s resist the temptation to jump to the answer and then defend it.) We have a stacking today which says: - B1 is ordinary identity classes, giving rise to a single reference type - B2 are identity-free classes, giving rise to a single reference type - B3 are flattenable identity-free classes, giving rise to both a reference (L/ref) and primitive (Q/val) type. This stacking has some pleasant aspects. B2 differs from B1 by ?only one bit?: identity. The constraints on B2 are those that come from the lack of identity (mutability, extensibility, locking, etc.) B2 references behave like the object references we are familiar with; nullability, final field guarantees, etc. B3 further makes reference-ness optional; reference-free B3 values give up the affordances of references: they are zero-default and tearable. This stacking is nice because it can framed as a sequence of ?give up some X, get some Y?. People keep asking ?do we need B2, or could we get away with B1/B3?. The main reason for having this distinction is that some id-free classes have no sensible default, and so want to use null as their default. This is a declaration-site property; B3 means that the zero value is reasonable, and use sites can opt into / out of zero-default / nullity. We?d love to compress away this bucket but forcing a zero on classes that can?t give it a reasonable interpretation is problematic. But perhaps we can reduce the visibility of this in the model. The degrees of freedom we could conceivably offer are { identity or not, zero-capable or not, atomic or not } x { use-site, declaration-site } In actuality, not all of these boxes make sense (disavowing the identity of an ArrayList at the use site), and some have been disallowed by the stacking (some characteristics have been lumped.) Here?s another way to stack the declaration: - Some classes can disavow identity - Identity-free classes can further opt into zero-default (currently, B3, polarity chosen at use site) - Identity-free classes can further opt into tearability (currently, B3, polarity chosen at use site) It might seem the sensible move here is to further split B3 into B3a and B3b (where all B3 support zero default, and a/b differ with regard to whether immediate values are tearable). But that may not be the ideal stacking, because we want good flattening for B2 (and B3.ref) also. Ideally, the difference between B2 and B3.val is nullity only (Kevin?s antennae just went up.) So another possible restacking is to say that atomicity is something that has to be *opted out of* at the declaration site (and maybe also at the use site.) With deliberately-wrong syntax: __non-id class B2 { } __non-atomic __non-id class B2a { } __zero-ok __non-id class B3 { } __non-atomic __zero-ok __non-id class B3a { } In this model, you can opt out of identity, and then you can further opt out of atomicity and/or null-default. This ?pulls up? the atomicity/tearaiblity to a property of the class (I?d prefer safe by default, with opt out), and makes zero-*capability* an opt-in property of the class. Then for those that have opted into zero-capability, at the use site, you can select .ref (null) / .val (zero). Obviously these all need better spellings. This model frames specific capabilities as modifiers on the main bucket, so it could be considered either a two bucket, or a four bucket model, depending on how you look. The author is in the best place to make the atomicity decision, since they know the integrity constraints. Single field classes, or classes with only single field invariants (denominator != 0), do not need atomicity. Classes with multi-field invariants do. This differs from the previous stacking in that it moves the spotlight from _references_ and their properties, to the properties themselves. It says to class writers: you should declare the ways in which you are willing to trade safety for performance; you can opt out of the requirement for references and nulls (saving some footprint) and atomicity (faster access). It says to class *users*, you can pick the combination of characteristics, allowed by the author, that meet your needs (can always choose null default if you want, just use a ref.) There are many choices here about ?what are the defaults?. More opting in at the declaration site might mean less need to opt in at the use site. Or not. (We are now in the stage which I call ?shake the box?; we?ve named all the moving parts, and now we?re looking for the lowest-energy state we can get them into.) From brian.goetz at oracle.com Wed Apr 27 16:50:19 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 16:50:19 +0000 Subject: [External] : Re: On tearing In-Reply-To: <423731687.17626456.1651073926931.JavaMail.zimbra@u-pem.fr> References: <423731687.17626456.1651073926931.JavaMail.zimbra@u-pem.fr> Message-ID: <4917C3C1-B0DC-48A5-B987-F7AB95FA1DB4@oracle.com> Writing immutable objects in Java is hard, there is already a check list: - be sure that your class in not only unmodifiable but really immutable, storing a mutable class in a field is an issue - do you have declared all fields final, otherwise you have a publication issue - your constructors do not leak "this", right ! so adding a forth item - the class is not a primitive class does not seem to be a big leap too me. This whole area seems extremely prone to wishful thinking; we hate the idea of making something slower than it could be, that we convince ourselves that ?the user can reason about this.? Whether or not it is ?too big a leap?, I think it is a bigger leap than you are thinking. For me, we should make the model clear, the compiler should insert a non user overridable default constructor but not more because using a primitive class is already an arcane construct. This might help a little bit, but it is addressing the smaller part of the problem (zeroes); we need to address the bigger problem (tearing). I don?t think we have to go so far as to outlaw tearing, but there have to be enough cues, at the use and declaration site, that something interesting is happening here. There is no point to nanny people here given that only experts will want to play with it. This is *definitely* wishful thinking. People will hear that this is a tool for performance; 99% of Java developers will convince themselves they are experts because, performance! Developers pathologically over-rotate towards whatever the Stack Overflow crowd says is faster. (And so will Copilot.) So, definitely no. This argument is pure wishful thinking. (I will admit to being occasionally tempted by this argument too, but then I snap out of it.) But we (the EG) can also fail, and make a primitive class too easy to use, what scare me is people using primitive class just because it's not nullable. Yes, this is one of the many pitfalls we have to avoid! This game is hard. From forax at univ-mlv.fr Wed Apr 27 17:13:05 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 27 Apr 2022 19:13:05 +0200 (CEST) Subject: User model stacking In-Reply-To: References: Message-ID: <1596680034.17715796.1651079585403.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Wednesday, April 27, 2022 6:44:01 PM > Subject: User model stacking > Here?s some considerations for stacking the user model. (Again, please let?s > resist the temptation to jump to the answer and then defend it.) > > We have a stacking today which says: > > - B1 is ordinary identity classes, giving rise to a single reference type > - B2 are identity-free classes, giving rise to a single reference type > - B3 are flattenable identity-free classes, giving rise to both a reference > (L/ref) and primitive (Q/val) type. > > This stacking has some pleasant aspects. B2 differs from B1 by ?only one bit?: > identity. The constraints on B2 are those that come from the lack of identity > (mutability, extensibility, locking, etc.) B2 references behave like the > object references we are familiar with; nullability, final field guarantees, > etc. B3 further makes reference-ness optional; reference-free B3 values give > up the affordances of references: they are zero-default and tearable. This > stacking is nice because it can framed as a sequence of ?give up some X, get > some Y?. > > People keep asking ?do we need B2, or could we get away with B1/B3?. The main > reason for having this distinction is that some id-free classes have no > sensible default, and so want to use null as their default. This is a > declaration-site property; B3 means that the zero value is reasonable, and use > sites can opt into / out of zero-default / nullity. We?d love to compress > away this bucket but forcing a zero on classes that can?t give it a reasonable > interpretation is problematic. But perhaps we can reduce the visibility of > this in the model. > > The degrees of freedom we could conceivably offer are > > { identity or not, zero-capable or not, atomic or not } x { use-site, > declaration-site } > > In actuality, not all of these boxes make sense (disavowing the identity of an > ArrayList at the use site), and some have been disallowed by the stacking (some > characteristics have been lumped.) Here?s another way to stack the > declaration: > > - Some classes can disavow identity > - Identity-free classes can further opt into zero-default (currently, B3, > polarity chosen at use site) > - Identity-free classes can further opt into tearability (currently, B3, > polarity chosen at use site) > > It might seem the sensible move here is to further split B3 into B3a and B3b > (where all B3 support zero default, and a/b differ with regard to whether > immediate values are tearable). But that may not be the ideal stacking, > because we want good flattening for B2 (and B3.ref) also. Ideally, the > difference between B2 and B3.val is nullity only (Kevin?s antennae just went > up.) > > So another possible restacking is to say that atomicity is something that has to > be *opted out of* at the declaration site (and maybe also at the use site.) > With deliberately-wrong syntax: > > __non-id class B2 { } > > __non-atomic __non-id class B2a { } > > __zero-ok __non-id class B3 { } > > __non-atomic __zero-ok __non-id class B3a { } > > In this model, you can opt out of identity, and then you can further opt out of > atomicity and/or null-default. This ?pulls up? the atomicity/tearaiblity to a > property of the class (I?d prefer safe by default, with opt out), and makes > zero-*capability* an opt-in property of the class. Then for those that have > opted into zero-capability, at the use site, you can select .ref (null) / .val > (zero). Obviously these all need better spellings. This model frames specific > capabilities as modifiers on the main bucket, so it could be considered either > a two bucket, or a four bucket model, depending on how you look. > > The author is in the best place to make the atomicity decision, since they know > the integrity constraints. Single field classes, or classes with only single > field invariants (denominator != 0), do not need atomicity. Classes with > multi-field invariants do. > > This differs from the previous stacking in that it moves the spotlight from > _references_ and their properties, to the properties themselves. It says to > class writers: you should declare the ways in which you are willing to trade > safety for performance; you can opt out of the requirement for references and > nulls (saving some footprint) and atomicity (faster access). It says to class > *users*, you can pick the combination of characteristics, allowed by the > author, that meet your needs (can always choose null default if you want, just > use a ref.) > > There are many choices here about ?what are the defaults?. More opting in at > the declaration site might mean less need to opt in at the use site. Or not. > > (We are now in the stage which I call ?shake the box?; we?ve named all the > moving parts, and now we?re looking for the lowest-energy state we can get them > into.) I really like the clean separation between declaration site and use site, the properties being declared as class properties make more sense to me (for whatever reason, i was able to convince myself that it was the actual model). I like the fact that being tearable is a property that has to be enabled (again at declaration site). But know i want all those knobs :) R?mi From forax at univ-mlv.fr Wed Apr 27 19:18:36 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 27 Apr 2022 21:18:36 +0200 (CEST) Subject: User model stacking In-Reply-To: References: Message-ID: <2107571357.17745968.1651087116095.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "Brian Goetz" > Cc: "valhalla-spec-experts" > Sent: Wednesday, April 27, 2022 8:51:15 PM > Subject: Re: User model stacking > I'm trying to understand how this refactoring fits the VM physics. > > In particular, __non-atomic & __zero-ok fit together at the VM level > because the VM's natural state for non-atomic (flattened) data is zero > filled. When those two items are decoupled, I'm unclear on what the > VM would offer in that case. Thoughts? __non-atomic but ! __zero-ok means you have an additional bit that indicate if it's null or not. from the VM POV, fields are still zero filled but you have a way to encode null. > > How does "__non-atomic __non-id class B2a { }" fit with the "no new > nulls" requirements? For me, i may be wrong, the "no new nulls" requirements is from the POV of the language / user, not from the POV of the VM. The VM may have several encodings of null internally. > > --Dan R?mi > > On Wed, Apr 27, 2022 at 12:45 PM Brian Goetz wrote: >> >> Here?s some considerations for stacking the user model. (Again, please let?s >> resist the temptation to jump to the answer and then defend it.) >> >> We have a stacking today which says: >> >> - B1 is ordinary identity classes, giving rise to a single reference type >> - B2 are identity-free classes, giving rise to a single reference type >> - B3 are flattenable identity-free classes, giving rise to both a reference >> (L/ref) and primitive (Q/val) type. >> >> This stacking has some pleasant aspects. B2 differs from B1 by ?only one bit?: >> identity. The constraints on B2 are those that come from the lack of identity >> (mutability, extensibility, locking, etc.) B2 references behave like the >> object references we are familiar with; nullability, final field guarantees, >> etc. B3 further makes reference-ness optional; reference-free B3 values give >> up the affordances of references: they are zero-default and tearable. This >> stacking is nice because it can framed as a sequence of ?give up some X, get >> some Y?. >> >> People keep asking ?do we need B2, or could we get away with B1/B3?. The main >> reason for having this distinction is that some id-free classes have no >> sensible default, and so want to use null as their default. This is a >> declaration-site property; B3 means that the zero value is reasonable, and use >> sites can opt into / out of zero-default / nullity. We?d love to compress >> away this bucket but forcing a zero on classes that can?t give it a reasonable >> interpretation is problematic. But perhaps we can reduce the visibility of >> this in the model. >> >> The degrees of freedom we could conceivably offer are >> >> { identity or not, zero-capable or not, atomic or not } x { use-site, >> declaration-site } >> >> In actuality, not all of these boxes make sense (disavowing the identity of an >> ArrayList at the use site), and some have been disallowed by the stacking (some >> characteristics have been lumped.) Here?s another way to stack the >> declaration: >> >> - Some classes can disavow identity >> - Identity-free classes can further opt into zero-default (currently, B3, >> polarity chosen at use site) >> - Identity-free classes can further opt into tearability (currently, B3, >> polarity chosen at use site) >> >> It might seem the sensible move here is to further split B3 into B3a and B3b >> (where all B3 support zero default, and a/b differ with regard to whether >> immediate values are tearable). But that may not be the ideal stacking, >> because we want good flattening for B2 (and B3.ref) also. Ideally, the >> difference between B2 and B3.val is nullity only (Kevin?s antennae just went >> up.) >> >> So another possible restacking is to say that atomicity is something that has to >> be *opted out of* at the declaration site (and maybe also at the use site.) >> With deliberately-wrong syntax: >> >> __non-id class B2 { } >> >> __non-atomic __non-id class B2a { } >> >> __zero-ok __non-id class B3 { } >> >> __non-atomic __zero-ok __non-id class B3a { } >> >> In this model, you can opt out of identity, and then you can further opt out of >> atomicity and/or null-default. This ?pulls up? the atomicity/tearaiblity to a >> property of the class (I?d prefer safe by default, with opt out), and makes >> zero-*capability* an opt-in property of the class. Then for those that have >> opted into zero-capability, at the use site, you can select .ref (null) / .val >> (zero). Obviously these all need better spellings. This model frames specific >> capabilities as modifiers on the main bucket, so it could be considered either >> a two bucket, or a four bucket model, depending on how you look. >> >> The author is in the best place to make the atomicity decision, since they know >> the integrity constraints. Single field classes, or classes with only single >> field invariants (denominator != 0), do not need atomicity. Classes with >> multi-field invariants do. >> >> This differs from the previous stacking in that it moves the spotlight from >> _references_ and their properties, to the properties themselves. It says to >> class writers: you should declare the ways in which you are willing to trade >> safety for performance; you can opt out of the requirement for references and >> nulls (saving some footprint) and atomicity (faster access). It says to class >> *users*, you can pick the combination of characteristics, allowed by the >> author, that meet your needs (can always choose null default if you want, just >> use a ref.) >> >> There are many choices here about ?what are the defaults?. More opting in at >> the declaration site might mean less need to opt in at the use site. Or not. >> >> (We are now in the stage which I call ?shake the box?; we?ve named all the >> moving parts, and now we?re looking for the lowest-energy state we can get them >> into.) From daniel.smith at oracle.com Wed Apr 27 23:01:28 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 27 Apr 2022 23:01:28 +0000 Subject: Spec change documents for Value Objects Message-ID: <12C0C3B4-1A4C-4FCF-AEFD-A577F2333B27@oracle.com> Please see these two spec change documents for JLS and JVMS changes in support of the Value Objects feature. http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220427/specs/value-objects-jls.html http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220427/specs/value-objects-jvms.html These are synced up with the latest iteration of the draft JEP, found here: https://openjdk.java.net/jeps/8277163 I've applied the changes we discussed recently on this list: - Replacing the 'IdentityObject' and 'ValueObject' interfaces with class modifiers - Updating the treatment of class 'Object' so that it can continue to be instantiated - Solidified the details of special '' instance creation methods The JVMS document is layered on top of some JVMS cleanups that I need to circle back on (mostly as part of separate JEP https://openjdk.java.net/jeps/8267650). I think we've reached a point where it's worth getting these production-ready. Please let me know if you notice missing pieces, think something needs better treatment, or just catch a typo. Thanks! From brian.goetz at oracle.com Wed Apr 27 23:12:47 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 23:12:47 +0000 Subject: [External] : Re: User model stacking In-Reply-To: References: Message-ID: <3C0FB3E1-CCA3-4A1D-94F1-8172CE36E575@oracle.com> Let me try and put some more color on the bike shed (but, again, let?s focus on model, not syntax, for now.) We have two axes of variation we want to express with non-identity classes: atomicity constraints, and whether there is an additional zero-default companion type. These can be mostly orthogonal; you can have either, neither, or both. We've been previously assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which has been bothering us because primitives also code like a class. So we were trying to claw back some atomicity for primitives. But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening than a B3. That's a little sad. And also that it seems like a gratuitous difference, which makes the user model more complicated. So we?re suggesting restacking towards: - Value classes are those without identity - Value classes can be atomic or non-atomic, the default is atomic (safe by default) - Value classes can further opt into having a "val" projection (name TBD, val is probably not it) - Val projections are non-nullable, zero-default ? this is the only difference - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null Example: classic B2 value class B2a { } Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened (?full flat?), multi-field classes can be flattened with "fat load and store" heroics in the future (?low flat?), otherwise, indirection (?no flat?) Example: non-atomic B2 non-atomic value class B2n { } Here, the user has said "I have no atomicity rquirements." A B2n is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode, could use slack pointer bits etc.) Example: atomic B3 zero-capable value class B3a { } This says I am declaring two types, B3a and B3a.zero. (The syntax in this quadrant sucks; need to find better.) B3a is just like B2a above, because we haven?t activated the zero capability at the use site. B3a.zero/val/flat/whatever is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices on the part of the VM. Example: full primitive non-atomic zero-capable value class B3n { } Here, B3n is like B2n, and B3n.zero is a full classic-B3 Q primitive with full flattening. So: - value-ness means "no identity, == means state equality" - You can add non-atomic to value-ness, meaning you give up state integrity - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion, which inherits the atomic-ness Some of the characteristics of this scheme: - The default is atomicity / integrity FOR ALL BUCKETS (safe by default) - The default is nullability FOR ALL BUCKETS - All unadorned type names are reference types / nullable - All Val-adorned type names (X.val) are non-nullable (or .zero, or .whatever) - Atomicity is determined by declaration site, can?t be changed at use site The main syntactic hole is finding the right spelling for "zeroable" / .val. There is some chance we can get away with spelling it `T!`, though this has risks. Spelling zero-happy as any form of ?flat? is probably a bad idea, because B2 can still be flat. A possible spelling for ?non-atomic? is ?relaxed?: relaxed value class B3n { } Boilerplate-measurers would point out that to get full flattening, you have to say three things at the declaration site and one extra thing at the use site: relaxed zero-happy value class Complex { } ? Complex! c; If you forget relaxed, you might get atomicity (but might not cost anything, if the value is small.) If you forget zero-happy, you can?t say `Complex!`, you can only say Complex, and the compiler will remind you. If you forget the !, you maybe get some extra footprint for the null bit. None of these are too bad, but the verbosity police might want to issue a warning here. It is possible we might want to flip the declaration of zero-capable, where classes with no good default can opt OUT of the zero companion, rather than the the other way around: null-default value class LocalDate { } which says that LocalDate must use the nullable (LocalDate) form, not the non-nullable (LocalDate.val/zero/bang) form. From brian.goetz at oracle.com Wed Apr 27 23:12:51 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 23:12:51 +0000 Subject: [External] : Re: User model stacking In-Reply-To: References: Message-ID: <8C6F4291-889A-4A61-9872-8476F9ABAEEA@oracle.com> We can divide the VM flattening strategy into three rough categories (would you like some milk with your eclair?): - non-flat ? use a pointer - full-flat ? inline the layout into the enclosing container, access with narrow loads - low-flat ? use some combination of atomic operations to cram multiple fields into 64 or 128 bits, access with wide loads B1 will always take the non-flat strategy. Non-volatile B3 that are smaller than some threshold (e.g., full cache line) will prefer the full-flat strategy. Non-atomic B2 can also pursue the full-flat strategy, but may have an extra field for the null channel. Atomic B2/B3 may try the low-flat strategy, and fall back to non-flat where necessary. Volatiles will likely choose non-flat, unless they fit in the CAS window. But it is always VM?s choice. The user model may ask for nullability (represented however the VM wants, zero for non-flat, extra channel for low/full flat), and may ask for atomicity (which influences the layout choice too, likely dropping down a notch from full to low or low to non.) So from a class file perspective, we need an extra bit (ACC_ATOMIC) which is acted on at layout time. (B2 / B3.ref translated as L with Preload, B3.val as Q, as today.) So I think this mostly fits into the existing VM, with the addition of an ATOMIC bit which will constrain which flattening strategy we take at layout time. This doesn?t sound like a complex new trick, just another reason to fall back to a weaker flattening strategy for a given layout. We originally split B2 out from B3 to support no-good-default values (aka allow null), support atomicity and avoid tearing. Anything missing in that list? These are the moving parts. B3s are more akin to hint than a promise. B2s too. A conformant VM could use pointers for everything, though might have to do extra null checks in things like checkcast. Many of the properties we want for B2 classes are possible because we adopted references (L carriers). If we shift towards guaranteed atomicity for (some) B3.vals, we're going to need to re-examine the VM model and look at how we represent these additional constraints so the VM can enforce them. Yes. I believe this is limited to the moral equavlient of an ACC_ATOMIC bit, plus some extra steps in the layout / field access instruction selection. The VM can provide some tearing-related guarantees for Qs without indirection but they are hardware dependent - 64bit for sure on all 64bit hardware, 128bit on some newer intel hardware, possibly different constraints on still other platforms - but maybe that's OK? Declaring a type must not tear makes it harder for the VM to provide better density. Yes. In this model, an atomic B3 is basically a B2 without the need for a null channel, so a slightly thinner B2. When the user asks for more atomicity, they are constraining flattening (but not necessarily throwing it all out the windows, maybe low-flat will work for them.) All the hardware-dependent stuff lives in the layout algorithm; whether the current processor can support the desired atomics may be used to select between {low, no, full}-flat. The biggest concern I have with this approach is that instead of having 3 buckets, we're now exposing more of a buffet of options to users. Circling back to where I started this email - good defaults are critical and so is good guidance on when to pick each of the options or performance cargo cults will undercut the work to split out the different cases. Yes. This is a subjective question, whether three equally spaced buckets feels more complex than two buckets with some knobs on the second bucket. We surely want to avoid overwhelming the user with too big a menu, but we already have one user (hi Remi!) clamoring for the full buffet, and he?s hungry. On Apr 27, 2022, at 2:51 PM, Dan Heidinga > wrote: I'm trying to understand how this refactoring fits the VM physics. In particular, __non-atomic & __zero-ok fit together at the VM level because the VM's natural state for non-atomic (flattened) data is zero filled. When those two items are decoupled, I'm unclear on what the VM would offer in that case. Thoughts? How does "__non-atomic __non-id class B2a { }" fit with the "no new nulls" requirements? --Dan On Wed, Apr 27, 2022 at 12:45 PM Brian Goetz > wrote: Here?s some considerations for stacking the user model. (Again, please let?s resist the temptation to jump to the answer and then defend it.) We have a stacking today which says: - B1 is ordinary identity classes, giving rise to a single reference type - B2 are identity-free classes, giving rise to a single reference type - B3 are flattenable identity-free classes, giving rise to both a reference (L/ref) and primitive (Q/val) type. This stacking has some pleasant aspects. B2 differs from B1 by ?only one bit?: identity. The constraints on B2 are those that come from the lack of identity (mutability, extensibility, locking, etc.) B2 references behave like the object references we are familiar with; nullability, final field guarantees, etc. B3 further makes reference-ness optional; reference-free B3 values give up the affordances of references: they are zero-default and tearable. This stacking is nice because it can framed as a sequence of ?give up some X, get some Y?. People keep asking ?do we need B2, or could we get away with B1/B3?. The main reason for having this distinction is that some id-free classes have no sensible default, and so want to use null as their default. This is a declaration-site property; B3 means that the zero value is reasonable, and use sites can opt into / out of zero-default / nullity. We?d love to compress away this bucket but forcing a zero on classes that can?t give it a reasonable interpretation is problematic. But perhaps we can reduce the visibility of this in the model. The degrees of freedom we could conceivably offer are { identity or not, zero-capable or not, atomic or not } x { use-site, declaration-site } In actuality, not all of these boxes make sense (disavowing the identity of an ArrayList at the use site), and some have been disallowed by the stacking (some characteristics have been lumped.) Here?s another way to stack the declaration: - Some classes can disavow identity - Identity-free classes can further opt into zero-default (currently, B3, polarity chosen at use site) - Identity-free classes can further opt into tearability (currently, B3, polarity chosen at use site) It might seem the sensible move here is to further split B3 into B3a and B3b (where all B3 support zero default, and a/b differ with regard to whether immediate values are tearable). But that may not be the ideal stacking, because we want good flattening for B2 (and B3.ref) also. Ideally, the difference between B2 and B3.val is nullity only (Kevin?s antennae just went up.) So another possible restacking is to say that atomicity is something that has to be *opted out of* at the declaration site (and maybe also at the use site.) With deliberately-wrong syntax: __non-id class B2 { } __non-atomic __non-id class B2a { } __zero-ok __non-id class B3 { } __non-atomic __zero-ok __non-id class B3a { } In this model, you can opt out of identity, and then you can further opt out of atomicity and/or null-default. This ?pulls up? the atomicity/tearaiblity to a property of the class (I?d prefer safe by default, with opt out), and makes zero-*capability* an opt-in property of the class. Then for those that have opted into zero-capability, at the use site, you can select .ref (null) / .val (zero). Obviously these all need better spellings. This model frames specific capabilities as modifiers on the main bucket, so it could be considered either a two bucket, or a four bucket model, depending on how you look. The author is in the best place to make the atomicity decision, since they know the integrity constraints. Single field classes, or classes with only single field invariants (denominator != 0), do not need atomicity. Classes with multi-field invariants do. This differs from the previous stacking in that it moves the spotlight from _references_ and their properties, to the properties themselves. It says to class writers: you should declare the ways in which you are willing to trade safety for performance; you can opt out of the requirement for references and nulls (saving some footprint) and atomicity (faster access). It says to class *users*, you can pick the combination of characteristics, allowed by the author, that meet your needs (can always choose null default if you want, just use a ref.) There are many choices here about ?what are the defaults?. More opting in at the declaration site might mean less need to opt in at the use site. Or not. (We are now in the stage which I call ?shake the box?; we?ve named all the moving parts, and now we?re looking for the lowest-energy state we can get them into.) From brian.goetz at oracle.com Wed Apr 27 23:15:08 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 23:15:08 +0000 Subject: [External] : Re: User model stacking In-Reply-To: References: Message-ID: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> Let me try and put some more color on the bike shed (but, again, let?s focus on model, not syntax, for now.) We have two axes of variation we want to express with non-identity classes: atomicity constraints, and whether there is an additional zero-default companion type. These can be mostly orthogonal; you can have either, neither, or both. We've been previously assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which has been bothering us because primitives also code like a class. So we were trying to claw back some atomicity for primitives. But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening than a B3. That's a little sad. And also that it seems like a gratuitous difference, which makes the user model more complicated. So we?re suggesting restacking towards: - Value classes are those without identity - Value classes can be atomic or non-atomic, the default is atomic (safe by default) - Value classes can further opt into having a "val" projection (name TBD, val is probably not it) - Val projections are non-nullable, zero-default ? this is the only difference - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null Example: classic B2 value class B2a { } Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened (?full flat?), multi-field classes can be flattened with "fat load and store" heroics in the future (?low flat?), otherwise, indirection (?no flat?) Example: non-atomic B2 non-atomic value class B2n { } Here, the user has said "I have no atomicity rquirements." A B2n is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode, could use slack pointer bits etc.) Example: atomic B3 zero-capable value class B3a { } This says I am declaring two types, B3a and B3a.zero. (The syntax in this quadrant sucks; need to find better.) B3a is just like B2a above, because we haven?t activated the zero capability at the use site. B3a.zero/val/flat/whatever is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices on the part of the VM. Example: full primitive non-atomic zero-capable value class B3n { } Here, B3n is like B2n, and B3n.zero is a full classic-B3 Q primitive with full flattening. So: - value-ness means "no identity, == means state equality" - You can add non-atomic to value-ness, meaning you give up state integrity - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion, which inherits the atomic-ness Some of the characteristics of this scheme: - The default is atomicity / integrity FOR ALL BUCKETS (safe by default) - The default is nullability FOR ALL BUCKETS - All unadorned type names are reference types / nullable - All Val-adorned type names (X.val) are non-nullable (or .zero, or .whatever) - Atomicity is determined by declaration site, can?t be changed at use site The main syntactic hole is finding the right spelling for "zeroable" / .val. There is some chance we can get away with spelling it `T!`, though this has risks. Spelling zero-happy as any form of ?flat? is probably a bad idea, because B2 can still be flat. A possible spelling for ?non-atomic? is ?relaxed?: relaxed value class B3n { } Boilerplate-measurers would point out that to get full flattening, you have to say three things at the declaration site and one extra thing at the use site: relaxed zero-happy value class Complex { } ? Complex! c; If you forget relaxed, you might get atomicity (but might not cost anything, if the value is small.) If you forget zero-happy, you can?t say `Complex!`, you can only say Complex, and the compiler will remind you. If you forget the !, you maybe get some extra footprint for the null bit. None of these are too bad, but the verbosity police might want to issue a warning here. It is possible we might want to flip the declaration of zero-capable, where classes with no good default can opt OUT of the zero companion, rather than the the other way around: null-default value class LocalDate { } which says that LocalDate must use the nullable (LocalDate) form, not the non-nullable (LocalDate.val/zero/bang) form. On 4/22/2022 2:24 PM, Brian Goetz wrote: I think I have a restack of Dan's idea that feels like fewer buckets. We have two axes of variation we want to express with flattenable types: atomicity constraints, and whether there is an additional zero-default companion type. We've been assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which is bothering us because primitives also code like a class. So we're trying to claw back some atomicity for primitives. But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening. That's a little sad. Let's restack the pieces (again). - Value classes are those without identity - Value classes can be atomic or non-atomic, the default is atomic (safe) - Value classes can further opt into having a "val" projection (name TBD, val is probably not it) - Val projections are non-nullable, zero-default - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null Example: classic B2 value class B2 { } Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened, multi-field classes can be flattened with "fat load and store" heroics in the future, otherwise, indirection. Example: non-atomic B2 non-atomic value class B2a { } Here, the user has said "I have no atomicity rquirements." A B2a is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode.) Example: atomic B3 zero-capable value class B3a { } This says I am declaring two types, B3a and B3a.zero. (These names suck; need better ones.) B3a is just like B2 above. B3a.zero is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices. Example: full primitive non-atomic zero-capable value class B3b { } Here, B3b is like B2a, and B3b.zero is a full classic-B3 Q primitive with full flattening. So the stacking is: - value-ness means "no identity, == means state equality" - You can add non-atomic to value-ness, meaning you give up state integrity - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion This is starting to feel more honest.... On 4/19/2022 6:45 PM, Brian Goetz wrote: By choosing to modify the class, we are implicitly splitting into Buckets 3a and 3n: - B2 gives up identity - B3a further gives up nullity - B3n further gives up atomicity Which opens us up to a new complaint: people didn't even like the B2/B3 split ("why does there have to be two"), and now there are three. Given that atomic/non-atomic only work with primitive, maybe there's a way to compress this further? On 4/19/2022 6:25 PM, Dan Smith wrote: On Apr 19, 2022, at 2:49 PM, Brian Goetz wrote: So, what shall we do when the user says non-atomic, but the constructor expresses a multi-field invariant? Lint warning, if we can detect it and that warning is turned on. On Apr 19, 2022, at 3:22 PM, Brian Goetz wrote: Stepping back, what you're saying is that we manage atomicity among a subset of fields by asking the user to arrange the related fields in a separate class, and give that class extra atomicity. If we wanted to express ColoredDiagonalPoint, in this model we'd say something like: non-atomic primitive ColoredDiagonalPoint { private DiagonalPoint p; private Color c; private atomic primitive DiagonalPoint { private int x, y; DiagonalPoint(int x, int y) { if (x != y) throw; ... } } } Right? Yep. Good illustration of how just providing a class modifier gives programmers significant fine-grained control. We exempt the single-field classes from having an opinion. We could also exempt primitive records with no constructor behavior. Yeah, but (1) hard to identify all assumed invariants?some might appear in factories, etc., or informally in javadoc; and (2) even in a class with no invariants, it's probably useful for the author to explicitly acknowledge that they understand tearing risks. What it gives up (without either a change in programming model, or compiler heroics), is the ability to correlate between user-written invariants and the corresponding atomicity constraints, which could guide users away from errors. Right? Right. Could still do that if we wanted, but my opinion is that it's too much language surface for the scale of the problem. If we did have additional construction constraints, I'd prefer that atomic primitives allow full imperative construction logic & encapsulation. This feels analogous to advanced typing analyses that might prove certain casts to be safe/unsafe. Sure, the language could try to be helpful by implementing that analysis, but it would add lots of complexity, and ultimately it's either a best-effort check or annoyingly restrictive. On Apr 27, 2022, at 2:51 PM, Dan Heidinga wrote: I'm trying to understand how this refactoring fits the VM physics. In particular, __non-atomic & __zero-ok fit together at the VM level because the VM's natural state for non-atomic (flattened) data is zero filled. When those two items are decoupled, I'm unclear on what the VM would offer in that case. Thoughts? How does "__non-atomic __non-id class B2a { }" fit with the "no new nulls" requirements? --Dan On Wed, Apr 27, 2022 at 12:45 PM Brian Goetz wrote: Here?s some considerations for stacking the user model. (Again, please let?s resist the temptation to jump to the answer and then defend it.) We have a stacking today which says: - B1 is ordinary identity classes, giving rise to a single reference type - B2 are identity-free classes, giving rise to a single reference type - B3 are flattenable identity-free classes, giving rise to both a reference (L/ref) and primitive (Q/val) type. This stacking has some pleasant aspects. B2 differs from B1 by ?only one bit?: identity. The constraints on B2 are those that come from the lack of identity (mutability, extensibility, locking, etc.) B2 references behave like the object references we are familiar with; nullability, final field guarantees, etc. B3 further makes reference-ness optional; reference-free B3 values give up the affordances of references: they are zero-default and tearable. This stacking is nice because it can framed as a sequence of ?give up some X, get some Y?. People keep asking ?do we need B2, or could we get away with B1/B3?. The main reason for having this distinction is that some id-free classes have no sensible default, and so want to use null as their default. This is a declaration-site property; B3 means that the zero value is reasonable, and use sites can opt into / out of zero-default / nullity. We?d love to compress away this bucket but forcing a zero on classes that can?t give it a reasonable interpretation is problematic. But perhaps we can reduce the visibility of this in the model. The degrees of freedom we could conceivably offer are { identity or not, zero-capable or not, atomic or not } x { use-site, declaration-site } In actuality, not all of these boxes make sense (disavowing the identity of an ArrayList at the use site), and some have been disallowed by the stacking (some characteristics have been lumped.) Here?s another way to stack the declaration: - Some classes can disavow identity - Identity-free classes can further opt into zero-default (currently, B3, polarity chosen at use site) - Identity-free classes can further opt into tearability (currently, B3, polarity chosen at use site) It might seem the sensible move here is to further split B3 into B3a and B3b (where all B3 support zero default, and a/b differ with regard to whether immediate values are tearable). But that may not be the ideal stacking, because we want good flattening for B2 (and B3.ref) also. Ideally, the difference between B2 and B3.val is nullity only (Kevin?s antennae just went up.) So another possible restacking is to say that atomicity is something that has to be *opted out of* at the declaration site (and maybe also at the use site.) With deliberately-wrong syntax: __non-id class B2 { } __non-atomic __non-id class B2a { } __zero-ok __non-id class B3 { } __non-atomic __zero-ok __non-id class B3a { } In this model, you can opt out of identity, and then you can further opt out of atomicity and/or null-default. This ?pulls up? the atomicity/tearaiblity to a property of the class (I?d prefer safe by default, with opt out), and makes zero-*capability* an opt-in property of the class. Then for those that have opted into zero-capability, at the use site, you can select .ref (null) / .val (zero). Obviously these all need better spellings. This model frames specific capabilities as modifiers on the main bucket, so it could be considered either a two bucket, or a four bucket model, depending on how you look. The author is in the best place to make the atomicity decision, since they know the integrity constraints. Single field classes, or classes with only single field invariants (denominator != 0), do not need atomicity. Classes with multi-field invariants do. This differs from the previous stacking in that it moves the spotlight from _references_ and their properties, to the properties themselves. It says to class writers: you should declare the ways in which you are willing to trade safety for performance; you can opt out of the requirement for references and nulls (saving some footprint) and atomicity (faster access). It says to class *users*, you can pick the combination of characteristics, allowed by the author, that meet your needs (can always choose null default if you want, just use a ref.) There are many choices here about ?what are the defaults?. More opting in at the declaration site might mean less need to opt in at the use site. Or not. (We are now in the stage which I call ?shake the box?; we?ve named all the moving parts, and now we?re looking for the lowest-energy state we can get them into.) From brian.goetz at oracle.com Wed Apr 27 23:17:42 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Apr 2022 23:17:42 +0000 Subject: [External] : Re: User model stacking In-Reply-To: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> Message-ID: <87261EA5-09A7-48B0-A8D8-58FB5FA3D5E8@oracle.com> (somehow two versions of this got sent, along with some cut and paste from another thread; please disregard whatever looks weird.) > On Apr 27, 2022, at 5:50 PM, Brian Goetz wrote: > > Let me try and put some more color on the bike shed (but, again, let?s focus on model, not syntax, for now.) > > We have two axes of variation we want to express with non-identity classes: atomicity constraints, and whether there is an additional zero-default companion type. These can be mostly orthogonal; you can have either, neither, or both. We've been previously assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which has been bothering us because primitives also code like a class. So we were trying to claw back some atomicity for primitives. > > But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening than a B3. That's a little sad. And also that it seems like a gratuitous difference, which makes the user model more complicated. So we?re suggesting restacking towards: > > - Value classes are those without identity > - Value classes can be atomic or non-atomic, the default is atomic (safe by default) > - Value classes can further opt into having a "val" projection (name TBD, val is probably not it) > - Val projections are non-nullable, zero-default ? this is the only difference > - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null > > Example: classic B2 > > value class B2a { } > > Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened (?full flat?), multi-field classes can be flattened with "fat load and store" heroics in the future (?low flat?), otherwise, indirection (?no flat?) > > Example: non-atomic B2 > > non-atomic value class B2n { } > > Here, the user has said "I have no atomicity rquirements." A B2n is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode, could use slack pointer bits etc.) > > Example: atomic B3 > > zero-capable value class B3a { } > > This says I am declaring two types, B3a and B3a.zero. (The syntax in this quadrant sucks; need to find better.) B3a is just like B2a above, because we haven?t activated the zero capability at the use site. B3a.zero/val/flat/whatever is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices on the part of the VM. > > Example: full primitive > > non-atomic zero-capable value class B3n { } > > Here, B3n is like B2n, and B3n.zero is a full classic-B3 Q primitive with full flattening. > > So: > > - value-ness means "no identity, == means state equality" > - You can add non-atomic to value-ness, meaning you give up state integrity > - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion, which inherits the atomic-ness > > Some of the characteristics of this scheme: > > - The default is atomicity / integrity FOR ALL BUCKETS (safe by default) > - The default is nullability FOR ALL BUCKETS > - All unadorned type names are reference types / nullable > - All Val-adorned type names (X.val) are non-nullable (or .zero, or .whatever) > - Atomicity is determined by declaration site, can?t be changed at use site > > The main syntactic hole is finding the right spelling for "zeroable" / .val. There is some chance we can get away with spelling it `T!`, though this has risks. > > Spelling zero-happy as any form of ?flat? is probably a bad idea, because B2 can still be flat. > > A possible spelling for ?non-atomic? is ?relaxed?: > > relaxed value class B3n { } > > Boilerplate-measurers would point out that to get full flattening, you have to say three things at the declaration site and one extra thing at the use site: > > relaxed zero-happy value class Complex { } > ? > Complex! c; > > If you forget relaxed, you might get atomicity (but might not cost anything, if the value is small.) If you forget zero-happy, you can?t say `Complex!`, you can only say Complex, and the compiler will remind you. If you forget the !, you maybe get some extra footprint for the null bit. None of these are too bad, but the verbosity police might want to issue a warning here. > > It is possible we might want to flip the declaration of zero-capable, where classes with no good default can opt OUT of the zero companion, rather than the the other way around: > > null-default value class LocalDate { } > > which says that LocalDate must use the nullable (LocalDate) form, not the non-nullable (LocalDate.val/zero/bang) form. > > > On 4/22/2022 2:24 PM, Brian Goetz wrote: > I think I have a restack of Dan's idea that feels like fewer buckets. > > We have two axes of variation we want to express with flattenable types: atomicity constraints, and whether there is an additional zero-default companion type. > > We've been assuming that "primitiveness" lumps this all together; primitives get more flattening, primitives can be non-nullable/zero-default, primitives means the good name goes to the "val" type. Primitive-ness implicitly flips the "safety vs performance" priority, which is bothering us because primitives also code like a class. So we're trying to claw back some atomicity for primitives. > > But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity than is necessarily needed; a B2 with no invariants still gets less flattening. That's a little sad. Let's restack the pieces (again). > > - Value classes are those without identity > - Value classes can be atomic or non-atomic, the default is atomic (safe) > - Value classes can further opt into having a "val" projection (name TBD, val is probably not it) > - Val projections are non-nullable, zero-default > - Both the ref and val projections inherit the atomicity constraints of the class, making atomicity mostly orthogonal to ref/val/zero/null > > Example: classic B2 > > value class B2 { } > > Because the default is atomic, we get the classic B2 semantics -- no identity, but full final field safety guarantees. VM has several strategies for flattening in the heap: single-field classes always flattened, multi-field classes can be flattened with "fat load and store" heroics in the future, otherwise, indirection. > > Example: non-atomic B2 > > non-atomic value class B2a { } > > Here, the user has said "I have no atomicity rquirements." A B2a is a loose aggregation of fields that can be individually written and read (full B3-like flattening), with maybe an extra boolean field to encode null (VM's choice how to encode.) > > Example: atomic B3 > > zero-capable value class B3a { } > > This says I am declaring two types, B3a and B3a.zero. (These names suck; need better ones.) B3a is just like B2 above. B3a.zero is non-nullable, zero-default, *but still has full B2-classic atomicity*. With the same set of flattening choices. > > Example: full primitive > > non-atomic zero-capable value class B3b { } > > Here, B3b is like B2a, and B3b.zero is a full classic-B3 Q primitive with full flattening. > > > So the stacking is: > > - value-ness means "no identity, == means state equality" > - You can add non-atomic to value-ness, meaning you give up state integrity > - You can orthogonally add zero-capable to value-ness, meaning you get a non-null, zero-happy companion > > This is starting to feel more honest.... > > > > > > On 4/19/2022 6:45 PM, Brian Goetz wrote: > By choosing to modify the class, we are implicitly splitting into Buckets 3a and 3n: > > - B2 gives up identity > - B3a further gives up nullity > - B3n further gives up atomicity > > Which opens us up to a new complaint: people didn't even like the B2/B3 split ("why does there have to be two"), and now there are three. > > Given that atomic/non-atomic only work with primitive, maybe there's a way to compress this further? > > On 4/19/2022 6:25 PM, Dan Smith wrote: > On Apr 19, 2022, at 2:49 PM, Brian Goetz > wrote: > > So, what shall we do when the user says non-atomic, but the constructor expresses a multi-field invariant? > > Lint warning, if we can detect it and that warning is turned on. > > > On Apr 19, 2022, at 3:22 PM, Brian Goetz > wrote: > > Stepping back, what you're saying is that we manage atomicity among a subset of fields by asking the user to arrange the related fields in a separate class, and give that class extra atomicity. If we wanted to express ColoredDiagonalPoint, in this model we'd say something like: > > non-atomic primitive ColoredDiagonalPoint { > private DiagonalPoint p; > private Color c; > > private atomic primitive DiagonalPoint { > private int x, y; > > DiagonalPoint(int x, int y) { > if (x != y) throw; > ... > } > } > } > > Right? > > Yep. Good illustration of how just providing a class modifier gives programmers significant fine-grained control. > > > We exempt the single-field classes from having an opinion. We could also exempt primitive records with no constructor behavior. > > Yeah, but (1) hard to identify all assumed invariants?some might appear in factories, etc., or informally in javadoc; and (2) even in a class with no invariants, it's probably useful for the author to explicitly acknowledge that they understand tearing risks. > > > What it gives up (without either a change in programming model, or compiler heroics), is the ability to correlate between user-written invariants and the corresponding atomicity constraints, which could guide users away from errors. Right? > > Right. Could still do that if we wanted, but my opinion is that it's too much language surface for the scale of the problem. If we did have additional construction constraints, I'd prefer that atomic primitives allow full imperative construction logic & encapsulation. > > This feels analogous to advanced typing analyses that might prove certain casts to be safe/unsafe. Sure, the language could try to be helpful by implementing that analysis, but it would add lots of complexity, and ultimately it's either a best-effort check or annoyingly restrictive. > >> On Apr 27, 2022, at 2:51 PM, Dan Heidinga wrote: >> >> I'm trying to understand how this refactoring fits the VM physics. >> >> In particular, __non-atomic & __zero-ok fit together at the VM level >> because the VM's natural state for non-atomic (flattened) data is zero >> filled. When those two items are decoupled, I'm unclear on what the >> VM would offer in that case. Thoughts? >> >> How does "__non-atomic __non-id class B2a { }" fit with the "no new >> nulls" requirements? >> >> --Dan >> >> On Wed, Apr 27, 2022 at 12:45 PM Brian Goetz wrote: >>> >>> Here?s some considerations for stacking the user model. (Again, please let?s resist the temptation to jump to the answer and then defend it.) >>> >>> We have a stacking today which says: >>> >>> - B1 is ordinary identity classes, giving rise to a single reference type >>> - B2 are identity-free classes, giving rise to a single reference type >>> - B3 are flattenable identity-free classes, giving rise to both a reference (L/ref) and primitive (Q/val) type. >>> >>> This stacking has some pleasant aspects. B2 differs from B1 by ?only one bit?: identity. The constraints on B2 are those that come from the lack of identity (mutability, extensibility, locking, etc.) B2 references behave like the object references we are familiar with; nullability, final field guarantees, etc. B3 further makes reference-ness optional; reference-free B3 values give up the affordances of references: they are zero-default and tearable. This stacking is nice because it can framed as a sequence of ?give up some X, get some Y?. >>> >>> People keep asking ?do we need B2, or could we get away with B1/B3?. The main reason for having this distinction is that some id-free classes have no sensible default, and so want to use null as their default. This is a declaration-site property; B3 means that the zero value is reasonable, and use sites can opt into / out of zero-default / nullity. We?d love to compress away this bucket but forcing a zero on classes that can?t give it a reasonable interpretation is problematic. But perhaps we can reduce the visibility of this in the model. >>> >>> The degrees of freedom we could conceivably offer are >>> >>> { identity or not, zero-capable or not, atomic or not } x { use-site, declaration-site } >>> >>> In actuality, not all of these boxes make sense (disavowing the identity of an ArrayList at the use site), and some have been disallowed by the stacking (some characteristics have been lumped.) Here?s another way to stack the declaration: >>> >>> - Some classes can disavow identity >>> - Identity-free classes can further opt into zero-default (currently, B3, polarity chosen at use site) >>> - Identity-free classes can further opt into tearability (currently, B3, polarity chosen at use site) >>> >>> It might seem the sensible move here is to further split B3 into B3a and B3b (where all B3 support zero default, and a/b differ with regard to whether immediate values are tearable). But that may not be the ideal stacking, because we want good flattening for B2 (and B3.ref) also. Ideally, the difference between B2 and B3.val is nullity only (Kevin?s antennae just went up.) >>> >>> So another possible restacking is to say that atomicity is something that has to be *opted out of* at the declaration site (and maybe also at the use site.) With deliberately-wrong syntax: >>> >>> __non-id class B2 { } >>> >>> __non-atomic __non-id class B2a { } >>> >>> __zero-ok __non-id class B3 { } >>> >>> __non-atomic __zero-ok __non-id class B3a { } >>> >>> In this model, you can opt out of identity, and then you can further opt out of atomicity and/or null-default. This ?pulls up? the atomicity/tearaiblity to a property of the class (I?d prefer safe by default, with opt out), and makes zero-*capability* an opt-in property of the class. Then for those that have opted into zero-capability, at the use site, you can select .ref (null) / .val (zero). Obviously these all need better spellings. This model frames specific capabilities as modifiers on the main bucket, so it could be considered either a two bucket, or a four bucket model, depending on how you look. >>> >>> The author is in the best place to make the atomicity decision, since they know the integrity constraints. Single field classes, or classes with only single field invariants (denominator != 0), do not need atomicity. Classes with multi-field invariants do. >>> >>> This differs from the previous stacking in that it moves the spotlight from _references_ and their properties, to the properties themselves. It says to class writers: you should declare the ways in which you are willing to trade safety for performance; you can opt out of the requirement for references and nulls (saving some footprint) and atomicity (faster access). It says to class *users*, you can pick the combination of characteristics, allowed by the author, that meet your needs (can always choose null default if you want, just use a ref.) >>> >>> There are many choices here about ?what are the defaults?. More opting in at the declaration site might mean less need to opt in at the use site. Or not. >>> >>> (We are now in the stage which I call ?shake the box?; we?ve named all the moving parts, and now we?re looking for the lowest-energy state we can get them into.) >>> >> > From kevinb at google.com Thu Apr 28 01:36:23 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 27 Apr 2022 21:36:23 -0400 Subject: User model stacking In-Reply-To: References: Message-ID: This is kinda reading as... * First we have 3 buckets * But people ask if there could be just 2 buckets * No, so let's have 5 buckets. I don't understand why this is happening, but I take it back! I take back what I said about 2 buckets! I'm not sure what problem this is needed to solve. ~~ By the way, as we talk about this zero problem, these are the example cases that go through my head: (Type R) e.g. Rational, EmployeeId: the default value is illegal; can't even construct it on purpose. Every method on it *should* call `checkValid()` first. Might as well repurpose it as a pseudo-null. Bugs could be prevented by some analogue of aftermarket nullness analysis. (Type I) e.g. Instant: the default value is legal, but it's a bad default value (while moderately guessable, it's arbitrary/meaningless). This makes the strongest case for being reference-only. Or it has to add a `boolean isValid` field (always set to true) to join Type R above. (Type C) e.g. Complex: the default value is a decent choice -- guessable, but probably not the identity of the *most* common reduction op (which I would guess is multiplication). (Type O) e.g. Optional, OptionalInt, UnsignedLong: the default value is the best possible kind -- guessable, and the identity of the presumably most common reduction operation. For type I, we would probably ban nonempty array instance creation expressions! This would force the arrays to be created by `Collection.toArray()` or by new alternative value-capable versions of `Arrays.fill()` and `Arrays.setAll()` which accept a size instead of a premade array. Actually, if the new Arrays.fill() could short-circuit when passed `TheType.default` then we might want to do this for types C and O too; why not make users be explicit. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Apr 28 01:38:35 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 27 Apr 2022 21:38:35 -0400 Subject: User model stacking In-Reply-To: References: Message-ID: On Wed, Apr 27, 2022 at 9:36 PM Kevin Bourrillion wrote: (Type R) e.g. Rational, EmployeeId: the default value is illegal; can't > even construct it on purpose. Every method on it *should* call > `checkValid()` first. Might as well repurpose it as a pseudo-null. Bugs > could be prevented by some analogue of aftermarket nullness analysis. > This is me admitting defeat on the rule *I've* meant by "no new nulls". Not sure if it's how others have used it too. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From asviraspossible at gmail.com Thu Apr 28 10:30:06 2022 From: asviraspossible at gmail.com (Victor Nazarov) Date: Thu, 28 Apr 2022 12:30:06 +0200 Subject: [External] : Re: User model stacking In-Reply-To: <3C0FB3E1-CCA3-4A1D-94F1-8172CE36E575@oracle.com> References: <3C0FB3E1-CCA3-4A1D-94F1-8172CE36E575@oracle.com> Message-ID: Hello Brian, I've got some comments to the current discussion. I hope that they may be helpful. On Thu, Apr 28, 2022 at 1:13 AM Brian Goetz wrote: > Some of the characteristics of this scheme: > > - The default is atomicity / integrity FOR ALL BUCKETS (safe by default) > - The default is nullability FOR ALL BUCKETS > - All unadorned type names are reference types / nullable > - All Val-adorned type names (X.val) are non-nullable (or .zero, or > .whatever) > - Atomicity is determined by declaration site, can?t be changed at use site > > Why do we need a use-site opt-in into a zero-happiness? I thought that the main concern with zero-happiness was a risk of tearing, but in the proposed model tearing is always controlled by the declaration-site annotation, so why do we need use-site opt-in? For me "zero-happiness" of the type means that the type is happy to be zero, which sounds like zero is a better default from the type point of view then null. If a user knows better about a particular use-site, then it's ok to override this with ".ref". We already have primitives that have zero-default values (int, long, ...), they are not written as "int.val" or "int!", are we going to recommend everybody to write "Integer!" instead of "int"? > The main syntactic hole is finding the right spelling for "zeroable" / > .val. There is some chance we can get away with spelling it `T!`, though > this has risks. > > Spelling zero-happy as any form of ?flat? is probably a bad idea, because > B2 can still be flat. > Syntax wise I think I would support something that Remi Forex promoted for quite some time: highlighting that zero-happy classes have default constructor, something like: relaxed value class Complex { // Default value of type Complex is one generated by default constructor of class Complex default Complex(); } From forax at univ-mlv.fr Thu Apr 28 13:09:38 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 28 Apr 2022 15:09:38 +0200 (CEST) Subject: [External] : Re: User model stacking In-Reply-To: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> Message-ID: <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Dan Heidinga" > Cc: "valhalla-spec-experts" > Sent: Thursday, April 28, 2022 1:15:08 AM > Subject: Re: [External] : Re: User model stacking > Let me try and put some more color on the bike shed (but, again, let?s focus on > model, not syntax, for now.) > We have two axes of variation we want to express with non-identity classes: > atomicity constraints, and whether there is an additional zero-default > companion type. These can be mostly orthogonal; you can have either, neither, > or both. We've been previously assuming that "primitiveness" lumps this all > together; primitives get more flattening, primitives can be > non-nullable/zero-default, primitives means the good name goes to the "val" > type. Primitive-ness implicitly flips the "safety vs performance" priority, > which has been bothering us because primitives also code like a class. So we > were trying to claw back some atomicity for primitives. > But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity > than is necessarily needed; a B2 with no invariants still gets less flattening > than a B3. That's a little sad. And also that it seems like a gratuitous > difference, which makes the user model more complicated. So we?re suggesting > restacking towards: > - Value classes are those without identity > - Value classes can be atomic or non-atomic, the default is atomic (safe by > default) > - Value classes can further opt into having a "val" projection (name TBD, val is > probably not it) > - Val projections are non-nullable, zero-default ? this is the only difference > - Both the ref and val projections inherit the atomicity constraints of the > class, making atomicity mostly orthogonal to ref/val/zero/null Now that the model is clearer, let's try to discuss about the val projection. Once we have universal generics, we will have an issue with value type with zero-default, there are a lot of API in the JDK that explicitly specify that they return/pass null as parameter, by example, Map.get(), for those call, we need a way to say that the type is not T but T | null. The current proposal is to use T.ref for that. Now, Kevin and Brian thinks that for zero-default value type, in the language, Complex.val should be used instead of Complex. Lets see how it goes 1/ There is a difference between Foo and Foo.ref for generics, Foo is a class while Foo.ref is a type. The idea of using Complex.val means that the relationship is reversed, Complex is the type and Complex.val is the class. If we ride with that horse, it means that in universal generics, we should not use T but T.val apart when we want T.val | null that can be spelled T. 2/ Because Complex.val is a class and Complex is a type, we have a weird dis-symmetry, User will declare a class Complex, but to create a Complex, they will have to use new Complex.val(). As a user this is weird. 3/ This may change but currently, Foo.class exists but Foo.ref.class is not allowed, you have to use a method to get the projection, something like Foo.class.getMeTheProjectionPlease(). With .val being the default, it means that Complex.val.class exists while Complex.class does not. Same to get the default value, Complex.class.getDefaultValue() will not compile, it should be Complex.val.class.getDefaultValue(). Again weird. 4/ It's a double opt-in, people have to opt-in at declaration site by asking for a zero-default value type but that is not enough, it only works if the type val is uses at use site. I don't know any feature in Java that requires a double opt-in. 5/ It's easy to forget a ".val". To work, people will have to pepper .val everywhere and it will be easy to miss one occurrence. Depending on where the ".val" is missed, performance will suffer. This is something i see when beginners dab with generics for the first times, the compiler emits a warning because they miss one pair of <> somewhere in the code. To avoid missing , compilers/IDEs will try to rely on null analysis to emit a warning when Complex.val should be used instead of Complex. Again, the relationship is seen in the wrong direction, with .ref, you get performance by default, the compiler does not compile if you try to return/store null so by design the compiler helps you to write the right code. 6/ Zero-default value type does not imply non-atomic anymore, so a zero-default value type is not more dangerous that a null-default value type anymore. regards, R?mi From brian.goetz at oracle.com Thu Apr 28 13:43:59 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 28 Apr 2022 13:43:59 +0000 Subject: [External] : Re: User model stacking In-Reply-To: <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> Message-ID: <9916411A-CF1E-4BA8-BFC8-7694A4862DBF@oracle.com> On Apr 28, 2022, at 9:09 AM, Remi Forax > wrote: 1/ There is a difference between Foo and Foo.ref for generics, Foo is a class while Foo.ref is a type. The idea of using Complex.val means that the relationship is reversed, Complex is the type and Complex.val is the class. This is simply wrong. Classes and types are separate things; declaring classes gives rise to types. The class is Complex, in both cases. In both cases, it gives rise to two types, which might be denoted Complex.ref and Complex.val, or one of which might be also called Complex. But just because they have the same name, doesn?t change what they are. Classes != types. The class is Complex. Always. I think most of the other points depend on this misconception, but I?ll skip to ?. 4/ It's a double opt-in, people have to opt-in at declaration site by asking for a zero-default value type but that is not enough, it only works if the type val is uses at use site. I don't know any feature in Java that requires a double opt-in. To say ?it doesn?t work? is like saying ?I have a sink with hot and cold taps, I turned on the cold tap, and no not water came out, this sink doesn?t work.? The declaration site enables the existence of the hot water tap; whether you turn it or not is your choice. I think what 4/5 are trying to get at is ?it feels like too much ceremony to have to say something at both the declaration and use sites in order to get full-flat vals.? This is a valid opinion! But its also pretty obvious that this is a potential concern, so I?m not sure what you?re getting at by raising it in this way (supported by the preceding misguided arguments?) 5/ It's easy to forget a ".val". To work, people will have to pepper .val everywhere and it will be easy to miss one occurrence. Depending on where the ".val" is missed, performance will suffer. I think this is your whole point: ?people will have to type .val a lot, and they might miss one?, right? This is exactly the sort of argument I was talking about by ?let?s not try to jump to the end and design the final syntax immediately.? As should be clear, a lot of thought has gone into teasing out the elements of the model; give yourself some time to internalize them. From kevinb at google.com Thu Apr 28 13:51:07 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 28 Apr 2022 09:51:07 -0400 Subject: [External] : Re: User model stacking In-Reply-To: <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> Message-ID: On Thu, Apr 28, 2022 at 9:09 AM Remi Forax wrote: So we?re suggesting restacking towards: > > - Value classes are those without identity > - Value classes can be atomic or non-atomic, the default is atomic (safe > by default) > - Value classes can further opt into having a "val" projection (name TBD, > val is probably not it) > - Val projections are non-nullable, zero-default ? this is the only > difference > - Both the ref and val projections inherit the atomicity constraints of > the class, making atomicity mostly orthogonal to ref/val/zero/null > > > Now that the model is clearer, let's try to discuss about the val > projection. > (For the record, I don't think the messages of the last 48 hours have made the model "clearer", just floated a lot of possibilities.) But I do want to say I appreciate you providing all these opposing arguments to my proposal (which I asked for!). I'm going to engage with your specific arguments, but I don't recall if you ever engaged properly with all of mine. I feel like if you took them into account also, your overall position might be more balanced? In particular, it is a *huge* simplification to be able to say that every class does the exact same thing, and some just do extra. Once we have universal generics, we will have an issue with value type with > zero-default, there are a lot of API in the JDK that explicitly specify > that they return/pass null as parameter, > by example, Map.get(), for those call, we need a way to say that the type > is not T but T | null. > The current proposal is to use T.ref for that. > Yes, for comparison, in the JSpecify nullness project we've found we can't avoid needing to support type projections in both directions for type variables. In this context, for now we can just call those `T.val` and `T.ref`. I'll note, though, that there will always be some methods that were designed in an older world that won't be a super fantastic experience to use anymore; many `Map.get()` users will feel compelled to switch to `Map.getOrDefault()`, and I think we'll have to be okay with some of that. Now, Kevin and Brian thinks that for zero-default value type, in the > language, Complex.val should be used instead of Complex. > Lets see how it goes > 1/ There is a difference between Foo and Foo.ref for generics, Foo is a > class while Foo.ref is a type. > The idea of using Complex.val means that the relationship is reversed, > Complex is the type and Complex.val is the class. > Not how I would put it, no. In the world of classes, there is only `Complex`. In the world of types, there is the type you're used to getting, `Complex`. And there is a second type `Complex.val`. The main trouble is that Java developers are not 100% comfortable with / accustomed to thinking about the difference between classes and types. I think they get it more than they *think* they do, but they wouldn't be able to explain. java.lang.Class will confuse some people. There will be both a `Complex.class` and a `Complex.val.class`. I'm currently thinking it should work similarly to the difference between `Complex.class` and `Complex[].class`: one actually represents *the class*, which gets loaded and initialized; the other is a special type that gets composed out of the first one. You can navigate between the two. We have no precedent for two `Class` instances that represent the exact same class, but there are three different precedents for there being "extra" `Class` instances beyond just one-per-class: `String[].class`, `int.class` -- and even `void.class` which has nothing to do with any class *or* any type. If we ride with that horse, it means that in universal generics, we > should not use T but T.val apart when we want T.val | null that can be > spelled T. > I'm not following, but again I think I'm naively assuming a type variable might need to be projected in either direction. 2/ Because Complex.val is a class and Complex is a type, we have a weird > dis-symmetry, > User will declare a class Complex, but to create a Complex, they will > have to use new Complex.val(). > As a user this is weird. > The class name in a CICE isn't a type usage, just a class name. It should always be just `new Complex()`. That should produce a value of type `Complex.val` so that it can be trivially assigned to either kind of variable. 3/ This may change but currently, Foo.class exists but Foo.ref.class is > not allowed, you have to use a method to get the projection, > something like Foo.class.getMeTheProjectionPlease(). > With .val being the default, it means that Complex.val.class exists > while Complex.class does not. > Same to get the default value, Complex.class.getDefaultValue() will > not compile, it should be Complex.val.class.getDefaultValue(). > Again weird. > But *why* is it weird? > 4/ It's a double opt-in, people have to opt-in at declaration site by > asking for a zero-default value type but that is not enough, > it only works if the type val is uses at use site. I don't know any > feature in Java that requires a double opt-in. > You have to opt into a class being subclassable, then you have to opt into subclassing it. There's tons of examples. I'm not sure that's a good framing anyway. The use-site doesn't really opt in or out. The class just opts in to generating two types. Now there are two types and clients use those types however they want. > 5/ It's easy to forget a ".val". To work, people will have to pepper .val > everywhere and it will be easy to miss one occurrence. > Depending on where the ".val" is missed, performance will suffer. > People can come back and purchase that better performance for the price of dealing with the safety hazards. imho, this is exactly as it should be. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From maurizio.cimadamore at oracle.com Thu Apr 28 14:02:44 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 28 Apr 2022 15:02:44 +0100 Subject: [External] : Re: User model stacking In-Reply-To: References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> Message-ID: On 28/04/2022 14:51, Kevin Bourrillion wrote: > > If we ride with that horse, it means that in universal generics, > we should not use T but T.val apart when we want T.val | null that > can be spelled T. > > > I'm not following, but again I think I'm naively assuming a type > variable might need to be projected in either direction. In my opinion, this does represent a change that is "less syntactic" than it looks. Saying "T.ref" means "give me the reference projection of T". Reference projections are defined for both reference classes (String.ref = String) and for value classes (of course!). By analogy, saying "T.val" means "give me the value projection of T". But here we have an issue: while value classes do have value projections (again, of course!), reference classes do not. This seems to be at odds with the "for all" semantics of type-variables. What we need to make "for all" works is a function that gives us the value projection _if it exists_, or the type unchanged otherwise. T.valOrT Naming asides (this is a bikeshed that will be painted later), one thing stands: this is no longer a "projection", this feels like a more ad-hoc type mapping (and one which might be a bit hard to explain). Maurizio From brian.goetz at oracle.com Thu Apr 28 14:13:00 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 28 Apr 2022 14:13:00 +0000 Subject: [External] : Re: User model stacking In-Reply-To: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> Message-ID: <0FDEFA76-E212-4636-9E64-A603F703D0A5@oracle.com> I threw a lot of ideas out there, and people have reacted to various corners of them. That?s fair; it?s a lot to take in at once. Let me focus on what I think the most important bit of what I wrote is, which is, where we get permission for relaxation of atomicity. Until now, we?ve been treating atomicity as a property of ref-ness, because the JMM requires that loads and stores of refs be atomic wrt each other. But I think this is (a) too much of a simplification and (b) too risky, because it will not be obvious that by opting into val, you are opting out of atomicity. Worse, people will write classes that are intended to be used with val, but for which the loss of atomicity will be astonishing. Another problem with the ?until now? straw man is that B2 and B3 are gratuitously different with respect to the flattening they can get. This makes the user performance model, and the recommendations of which to use in which situations, harder to understand, and leaves holes for ?but what if I want X and Y?, even if we could deliver both. My conclusion is that problem here is that we?re piggybacking atomicity on other things, in non-obvious ways. The author of the class knows when atomicity is needed to protect invariants (specifically, cross-field invariants), and when it is not, so let that simply be selected at the declaration site. Opting out of atomicity is safer and less surprising, so that argues for tagging classes that don?t need atomicity as `non-atomic`. (For some classes, such as single-field classes, it makes no difference, because preserving atomicity has no cost, so the VM will just do it.) In addition to the explicitness benefits, now atomicity works uniformly across B2 and B3, ref and val. Not only does this eliminate the asymmetries, but it means that classes that are B2 because they don?t have a good default, can *routinely get better flattening* than they would have under the status quo straw man; previously there was a big flattening gap, even with heroics like stuffing four ints into 128 bit atomic loads. When the user says ?this B2 is non-atomic?, we can immediately go full-flat, maybe with some extra footprint for null. So: - No difference between an atomic B2 and an atomic B3.ref - Only difference between atomic B2 and atomic B3.val is the null footprint - Only difference between non-atomic B2 and non-atomic B3.val is the null footprint This is a very nice place to be. (There are interesting discussions to be had about the null/zero part, whether we even need it now, how to denote it, what the default is, etc), but before we dive back into those, I?d like to sync on this, because this is the high order bit of what I?m getting at here. Factor out atomicity in the user model, which in turn renders the matrix much simpler. A side benefit is that `non-atomic` is new and weird! Which will immediately cause anyone who stumbles across it to run to Stack Overflow (?What is a non-atomic value class?), where they?ll find an excellent and very scary explanation of tearing. As part of adding value types, we?ve exposed a previously hidden, scary thing, but done so in an explicit way. I think this is much better than stapling it to one corner on the matrix, hidden behind something that looks like something else. > - The default is atomicity / integrity FOR ALL BUCKETS (safe by default) > - The default is nullability FOR ALL BUCKETS > - All unadorned type names are reference types / nullable > - All Val-adorned type names (X.val) are non-nullable (or .zero, or .whatever) > - Atomicity is determined by declaration site, can?t be changed at use site From forax at univ-mlv.fr Thu Apr 28 15:11:23 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 28 Apr 2022 17:11:23 +0200 (CEST) Subject: [External] : Re: User model stacking In-Reply-To: <9916411A-CF1E-4BA8-BFC8-7694A4862DBF@oracle.com> References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> <9916411A-CF1E-4BA8-BFC8-7694A4862DBF@oracle.com> Message-ID: <699538633.18174807.1651158683093.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Dan Heidinga" , "valhalla-spec-experts" > > Sent: Thursday, April 28, 2022 3:43:59 PM > Subject: Re: [External] : Re: User model stacking >> On Apr 28, 2022, at 9:09 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >> 1/ There is a difference between Foo and Foo.ref for generics, Foo is a class >> while Foo.ref is a type. >> The idea of using Complex.val means that the relationship is reversed, >> Complex is the type and Complex.val is the class. > This is simply wrong. > Classes and types are separate things; declaring classes gives rise to types. > The class is Complex, in both cases. In both cases, it gives rise to two types, > which might be denoted Complex.ref and Complex.val, or one of which might be > also called Complex. But just because they have the same name, doesn?t change > what they are. Classes != types. The class is Complex. Always. it can work both way either Complex.val c = new Complex.val(); or Complex.val c = new Complex(); I've chose the former notation because in a previous message, Dan H. expected the former too. The problem with Complex.val c = new Complex(); is obviously that the class and it's corresponding time are not the same. We will have to teach everybody var in var c = new Complex(); is inferred as Complex.val and not Complex. Again, it goes a principle of least surprise, the type of new X() is X. > I think most of the other points depend on this misconception, but I?ll skip to > ?. Even if creating a Complex is done by using new Complex, you still have all same issues because the type Complex and the class Complex disagree about who they are. Let suppose we have class Holder { Complex c; } What is its default value ? One may think that it can ask reflection for it Holder.class.getField("c").getType().getDefaultValue() but it does not work because the type Complex and the class Complex disagree. >> 4/ It's a double opt-in, people have to opt-in at declaration site by asking for >> a zero-default value type but that is not enough, >> it only works if the type val is uses at use site. I don't know any feature in >> Java that requires a double opt-in. > To say ?it doesn?t work? is like saying ?I have a sink with hot and cold taps, I > turned on the cold tap, and no not water came out, this sink doesn?t work.? The > declaration site enables the existence of the hot water tap; whether you turn > it or not is your choice. > I think what 4/5 are trying to get at is ?it feels like too much ceremony to > have to say something at both the declaration and use sites in order to get > full-flat vals.? > This is a valid opinion! But its also pretty obvious that this is a potential > concern, so I?m not sure what you?re getting at by raising it in this way > (supported by the preceding misguided arguments?) It's not about too much ceremony, it's about adding a construct that breaks how people think that Java or any typed languages work. Here, as a user i declare that the class Complex is zero-default but the type Complex is null-default. By example, new Complex().re == 0 is true while new Object() { Complex c; }.c.re throw a NPE. As i said, Java has no construct where you need to double opt-in. >> 5/ It's easy to forget a ".val". To work, people will have to pepper .val >> everywhere and it will be easy to miss one occurrence. >> Depending on where the ".val" is missed, performance will suffer. > I think this is your whole point: ?people will have to type .val a lot, and they > might miss one?, right? > This is exactly the sort of argument I was talking about by ?let?s not try to > jump to the end and design the final syntax immediately.? As should be clear, a > lot of thought has gone into teasing out the elements of the model; give > yourself some time to internalize them. Having Complex and Complex meaning two different things depending on if its a class or a type is something i don't want to internalize. We should keep thing simple. R?mi From forax at univ-mlv.fr Thu Apr 28 15:19:43 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 28 Apr 2022 17:19:43 +0200 (CEST) Subject: [External] : Re: User model stacking In-Reply-To: References: <560C92D3-ED77-4CB6-837A-A87FC6FC22D7@oracle.com> <1818860660.18081533.1651151378274.JavaMail.zimbra@u-pem.fr> Message-ID: <97433679.18178916.1651159183517.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "Remi Forax" > Cc: "Brian Goetz" , "Dan Heidinga" > , "valhalla-spec-experts" > > Sent: Thursday, April 28, 2022 3:51:07 PM > Subject: Re: [External] : Re: User model stacking > On Thu, Apr 28, 2022 at 9:09 AM Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >>> So we?re suggesting restacking towards: >>> - Value classes are those without identity >>> - Value classes can be atomic or non-atomic, the default is atomic (safe by >>> default) >>> - Value classes can further opt into having a "val" projection (name TBD, val is >>> probably not it) >>> - Val projections are non-nullable, zero-default ? this is the only difference >>> - Both the ref and val projections inherit the atomicity constraints of the >>> class, making atomicity mostly orthogonal to ref/val/zero/null >> Now that the model is clearer, let's try to discuss about the val projection. > (For the record, I don't think the messages of the last 48 hours have made the > model "clearer", just floated a lot of possibilities.) It's clearer to me, the properties of the value types whatever the way we decide to group them are runtime property, not type property. Using a L-type instead of a Q-type just add null into the set of possible values. > But I do want to say I appreciate you providing all these opposing arguments to > my proposal (which I asked for!). > I'm going to engage with your specific arguments, but I don't recall if you ever > engaged properly with all of mine. I feel like if you took them into account > also, your overall position might be more balanced? In particular, it is a > *huge* simplification to be able to say that every class does the exact same > thing, and some just do extra. [...] See my answers to Brian. >> 4/ It's a double opt-in, people have to opt-in at declaration site by asking for >> a zero-default value type but that is not enough, >> it only works if the type val is uses at use site. I don't know any feature in >> Java that requires a double opt-in. > You have to opt into a class being subclassable, then you have to opt into > subclassing it. > There's tons of examples. subclasseable and subclassing are not the same property. Here we are talking about zero-default and zero-default. zero-default value class Complex { ... } // opt-in here class Holder { Complex.val x; // and opt-in here } You need both otherwise it's not zero-default. > I'm not sure that's a good framing anyway. The use-site doesn't really opt in or > out. The class just opts in to generating two types. Now there are two types > and clients use those types however they want. >> 5/ It's easy to forget a ".val". To work, people will have to pepper .val >> everywhere and it will be easy to miss one occurrence. >> Depending on where the ".val" is missed, performance will suffer. > People can come back and purchase that better performance for the price of > dealing with the safety hazards. imho, this is exactly as it should be. This is only true if you control the whole code, otherwise you are using a library and the .val that is missing is inside the library. > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | > kevinb at google.com ] R?mi From daniel.smith at oracle.com Fri Apr 29 23:48:18 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 29 Apr 2022 23:48:18 +0000 Subject: User model stacking In-Reply-To: References: Message-ID: <4F70E6B7-FAC8-4845-8969-8D545B6FB4FB@oracle.com> > On Apr 27, 2022, at 7:36 PM, Kevin Bourrillion wrote: > > This is kinda reading as... > > * First we have 3 buckets > * But people ask if there could be just 2 buckets > * No, so let's have 5 buckets. > > I don't understand why this is happening, but I take it back! I take back what I said about 2 buckets! Just so we don't lose this history, a reminder that back when we settled on the 3 buckets, we viewed it as a useful simplification from a more general approach with lots of "knobs". Instead of asking developers to think about 3-4 mostly-orthogonal properties and set them all appropriately, we preferred a model in which *objects* and *primitive values* were distinct entities with distinct properties. Atomicity, nullability, etc., weren't extra things to have to reason about independently, they were natural consequences of what it meant to be (or not) a variable that stores objects. That was awhile ago, we may have learned some things since then, but I think there's still something to the idea that we can expect everybody to understand the difference between objects and primitives, even if they don't totally understand all the implications. (When they eventually discover some corner of the implications, we hope they'll say, "oh, sure, that makes sense because this is/isn't an object.") > On Apr 28, 2022, at 8:13 AM, Brian Goetz wrote: > > My conclusion is that problem here is that we?re piggybacking atomicity on other things, in non-obvious ways. The author of the class knows when atomicity is needed to protect invariants (specifically, cross-field invariants), and when it is not, so let that simply be selected at the declaration site. Opting out of atomicity is safer and less surprising, so that argues for tagging classes that don?t need atomicity as `non-atomic`. (For some classes, such as single-field classes, it makes no difference, because preserving atomicity has no cost, so the VM will just do it.) > > In addition to the explicitness benefits, now atomicity works uniformly across B2 and B3, ref and val. Not only does this eliminate the asymmetries, but it means that classes that are B2 because they don?t have a good default, can *routinely get better flattening* than they would have under the status quo straw man; previously there was a big flattening gap, even with heroics like stuffing four ints into 128 bit atomic loads. When the user says ?this B2 is non-atomic?, we can immediately go full-flat, maybe with some extra footprint for null. As a specific example, yes, there are some advantages to non-atomic B2s. But at the cost of disrupting the notion that B2 instances are always objects, and objects are, naturally, safely encapsulated. Would we say that objects are not necessarily atomic anymore? Or that these B2 instances aren't objects? My inclination would probably be to abandon the object/value dichotomy, revert to "everything is an object", perhaps revisit our ideas about conversions/subtyping between ref and val types, and develop a model that allows tearing of some objects. Probably all do-able, but I'm not sure it's a better model. If the main goal here is to have an intuitive story that minimizes surprises, I'm currently pretty happy with (all terms here subject to further bikeshedding): - Primitive classes (or just "primitives") have primitive value instances - Like the primitives you know, these tend to be stored directly in memory - Like the primitives you know, because of their storage sometimes there's a risk of tearing - If you're declaring a multi-field primitive, you need to understand this risk and choose whether to allow tearing (via 'atomic' or 'non-atomic') A critique here is that now we have more ad hoc "buckets", but the 'atomic' modifiers feel to me more like a minor piece of B3, not an entirely new bucket (and, bonus, a property that already exists within the space of primitives!). E.g., I can totally see javadoc having a tab for "Value Classes" and a separate tab for "Primitives", but I wouldn't expect tabs for "Atomic Primitives" and "Non-atomic Primitives". (Instead, maybe there's some boilerplate on the class page along the lines of "Note that this primitive is not atomic".)