From daniel.smith at oracle.com Wed Jun 1 14:28:47 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 1 Jun 2022 14:28:47 +0000 Subject: EG meeting *canceled*, 2022-06-01 Message-ID: Not much recent traffic, let's cancel today. Kevin had some comments about == and pre-migration warnings that are worth your attention if you haven't reviewed that thread... From daniel.smith at oracle.com Fri Jun 3 16:15:57 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 3 Jun 2022 16:15:57 +0000 Subject: Anonymous value classes Message-ID: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> Our javac prototype has long included support for a 'value' keyword after 'new' to indicate that an anonymous class is a value class: Runnable r = new value Runnable() { public void run() { x.foo(); } }; Is this something we'd like to preserve as a language feature? Arguments for: - Allows the semantics of "I don't need identity" to be conveyed (often true for anonymous classes). - Gives the JVM more information for optimization. If we don't need a heap object, evaluating the expression may be a no-op. Arguments against: - Opens a Pandora's box of syntax: what other keywords can go there? 'identity'? 'primitive'? 'static'? 'record'? - Because there's no named type, there are significantly fewer opportunities for optimization?you're probably going to end up with a heap object anyway. - Value classes are primarily focused on simple data-carrying use cases, but any data being carried by an anonymous class is usually incidental. A new language feature would draw a lot of attention to this out-of-the-mainstream use case. - In the simplest cases, you can use a lambda instead, and there the API implementation has freedom to implement lambdas with value classes if it turns out to be useful. - The workaround?declare a local class instead?is reasonably straightforward for the scenarios where there's a real performance bottleneck that 'value' can help with. From daniel.smith at oracle.com Fri Jun 3 16:18:42 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 3 Jun 2022 16:18:42 +0000 Subject: Anonymous value classes In-Reply-To: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> Message-ID: <0B5FC883-4E1B-4953-9C3F-C7B437586040@oracle.com> > On Jun 3, 2022, at 10:15 AM, Dan Smith wrote: > > Our javac prototype has long included support for a 'value' keyword after 'new' to indicate that an anonymous class is a value class (I see Remi brought this up in the list in July 2018, which is probably what inspired the prototype implementation. There wasn't really any followup discussion.) From brian.goetz at oracle.com Fri Jun 3 16:21:26 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 3 Jun 2022 12:21:26 -0400 Subject: Anonymous value classes In-Reply-To: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> Message-ID: <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> There is no chance to get any calling-convention optimization here, since the concrete class name will not show up in any method descriptor (or preload attribute).? There is no chance to get any heap flattening here, since the concrete class name will not show up in any field descriptor or `newarray` operand. So the main argument is "for completeness", which seems weak. On 6/3/2022 12:15 PM, Dan Smith wrote: > Our javac prototype has long included support for a 'value' keyword after 'new' to indicate that an anonymous class is a value class: > > Runnable r = new value Runnable() { > public void run() { x.foo(); } > }; > > Is this something we'd like to preserve as a language feature? > > Arguments for: > > - Allows the semantics of "I don't need identity" to be conveyed (often true for anonymous classes). > > - Gives the JVM more information for optimization. If we don't need a heap object, evaluating the expression may be a no-op. > > Arguments against: > > - Opens a Pandora's box of syntax: what other keywords can go there? 'identity'? 'primitive'? 'static'? 'record'? > > - Because there's no named type, there are significantly fewer opportunities for optimization?you're probably going to end up with a heap object anyway. > > - Value classes are primarily focused on simple data-carrying use cases, but any data being carried by an anonymous class is usually incidental. A new language feature would draw a lot of attention to this out-of-the-mainstream use case. > > - In the simplest cases, you can use a lambda instead, and there the API implementation has freedom to implement lambdas with value classes if it turns out to be useful. > > - The workaround?declare a local class instead?is reasonably straightforward for the scenarios where there's a real performance bottleneck that 'value' can help with. From forax at univ-mlv.fr Fri Jun 3 17:39:44 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 3 Jun 2022 19:39:44 +0200 (CEST) Subject: Anonymous value classes In-Reply-To: <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> Message-ID: <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "daniel smith" , "valhalla-spec-experts" > > Sent: Friday, June 3, 2022 6:21:26 PM > Subject: Re: Anonymous value classes > There is no chance to get any calling-convention optimization here, since the > concrete class name will not show up in any method descriptor (or preload > attribute). There is no chance to get any heap flattening here, since the > concrete class name will not show up in any field descriptor or `newarray` > operand. Nope, anonymous classes are anonymous only for Java not in the bytecode, by example var box = new Object() { }; Supplier supplier = () -> box; var list = Arrays.asList(box); is translated to Code: 0: new #7 // class AnonymousClassNameLeaking$1 3: dup 4: invokespecial #9 // Method AnonymousClassNameLeaking$1."":()V 7: astore_1 8: aload_1 9: invokedynamic #10, 0 // InvokeDynamic #0:get:(LAnonymousClassNameLeaking$1;)Ljava/util/function/Supplier; 14: astore_2 15: iconst_1 16: anewarray #7 // class AnonymousClassNameLeaking$1 19: dup 20: iconst_0 21: aload_1 22: aastore 23: invokestatic #14 // Method java/util/Arrays.asList:([Ljava/lang/Object;)Ljava/util/List; 26: astore_3 Here, the anonymous class name appears as parameter of invokedynamic, at runtime the field (box is captured) of the lambda proxy is also typed LAnonymousClassNameLeaking$1; and any varargs will create an array of the anonymous class. R?mi > On 6/3/2022 12:15 PM, Dan Smith wrote: >> Our javac prototype has long included support for a 'value' keyword after 'new' >> to indicate that an anonymous class is a value class: >> Runnable r = new value Runnable() { >> public void run() { x.foo(); } >> }; >> Is this something we'd like to preserve as a language feature? >> Arguments for: >> - Allows the semantics of "I don't need identity" to be conveyed (often true for >> anonymous classes). >> - Gives the JVM more information for optimization. If we don't need a heap >> object, evaluating the expression may be a no-op. >> Arguments against: >> - Opens a Pandora's box of syntax: what other keywords can go there? 'identity'? >> 'primitive'? 'static'? 'record'? >> - Because there's no named type, there are significantly fewer opportunities for >> optimization?you're probably going to end up with a heap object anyway. >> - Value classes are primarily focused on simple data-carrying use cases, but any >> data being carried by an anonymous class is usually incidental. A new language >> feature would draw a lot of attention to this out-of-the-mainstream use case. >> - In the simplest cases, you can use a lambda instead, and there the API >> implementation has freedom to implement lambdas with value classes if it turns >> out to be useful. >> - The workaround?declare a local class instead?is reasonably straightforward for >> the scenarios where there's a real performance bottleneck that 'value' can help >> with. From forax at univ-mlv.fr Fri Jun 3 17:43:33 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 3 Jun 2022 19:43:33 +0200 (CEST) Subject: Anonymous value classes In-Reply-To: <0B5FC883-4E1B-4953-9C3F-C7B437586040@oracle.com> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <0B5FC883-4E1B-4953-9C3F-C7B437586040@oracle.com> Message-ID: <761111773.1816088.1654278213040.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "daniel smith" > To: "valhalla-spec-experts" > Sent: Friday, June 3, 2022 6:18:42 PM > Subject: Re: Anonymous value classes >> On Jun 3, 2022, at 10:15 AM, Dan Smith wrote: >> >> Our javac prototype has long included support for a 'value' keyword after 'new' >> to indicate that an anonymous class is a value class > > (I see Remi brought this up in the list in July 2018, which is probably what > inspired the prototype implementation. There wasn't really any followup > discussion.) I need it when trying to implement List.of()/Map.of() with few elements using value classes. R?mi From brian.goetz at oracle.com Fri Jun 3 17:59:11 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 3 Jun 2022 13:59:11 -0400 Subject: Anonymous value classes In-Reply-To: <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> Message-ID: <15cb91b4-a951-9cf7-59b0-656255cc70b0@oracle.com> On 6/3/2022 1:39 PM, Remi Forax wrote: > > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"daniel smith" , > "valhalla-spec-experts" > *Sent: *Friday, June 3, 2022 6:21:26 PM > *Subject: *Re: Anonymous value classes > > There is no chance to get any calling-convention optimization > here, since the concrete class name will not show up in any method > descriptor (or preload attribute).? There is no chance to get any > heap flattening here, since the concrete class name will not show > up in any field descriptor or `newarray` operand. > > > Nope, anonymous classes are anonymous only for Java not in the > bytecode, by example OK, correction: such a vanishingly microscopic chance as to be completely ignorable :) From maurizio.cimadamore at oracle.com Fri Jun 3 18:18:44 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 3 Jun 2022 19:18:44 +0100 Subject: Anonymous value classes In-Reply-To: <15cb91b4-a951-9cf7-59b0-656255cc70b0@oracle.com> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> <15cb91b4-a951-9cf7-59b0-656255cc70b0@oracle.com> Message-ID: And `var` ? (but I agree this feels a niche) Maurizio On 03/06/2022 18:59, Brian Goetz wrote: > > > On 6/3/2022 1:39 PM, Remi Forax wrote: >> >> >> ------------------------------------------------------------------------ >> >> *From: *"Brian Goetz" >> *To: *"daniel smith" , >> "valhalla-spec-experts" >> *Sent: *Friday, June 3, 2022 6:21:26 PM >> *Subject: *Re: Anonymous value classes >> >> There is no chance to get any calling-convention optimization >> here, since the concrete class name will not show up in any >> method descriptor (or preload attribute).? There is no chance to >> get any heap flattening here, since the concrete class name will >> not show up in any field descriptor or `newarray` operand. >> >> >> Nope, anonymous classes are anonymous only for Java not in the >> bytecode, by example > > OK, correction: such a vanishingly microscopic chance as to be > completely ignorable :) > > From brian.goetz at oracle.com Fri Jun 3 19:14:39 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 3 Jun 2022 15:14:39 -0400 Subject: User model stacking: current status In-Reply-To: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> Message-ID: <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> Continuing to shake this tree. I'm glad we went through the exploration of "flattenable B3.ref"; while I think we probably could address the challenges of tearing across the null channel / data channels boundary, I'm pretty willing to let this one go.? Similarly I'm glad we went through the "atomicity orthogonal to buckets" exploration, and am ready to let that one go too. What I'm not willing to let go of us making atomicity explicit in the model.? Not only is piggybacking non-atomicity on something like val-ness too subtle and surprising, but non-atomicity seems like it is a property that the class author needs to ask for.? Flatness is an important benefit, but only when it doesn't get in the way of safety. Recall that we have three different representation techniques: ?- no-flat -- use a pointer ?- low-flat -- for sufficiently small (depending on size of atomic instructions provided by the hardware) values, pack multiple fields into a single, atomically accessed unit. ?- full-flat -- flatten the layout, access individual individual fields directly, may allow tearing. The "low-flat" bucket got some attention recently when we discovered that there are usable 128-bit atomics on Intel (based on a recent revision of the chip spec), but this is not a slam-dunk; it requires some serious compiler heroics to pack multiple values into single accesses.? But there may be targets of opportunity here for single-field values (like Optional) or final fields.? And we can always fall back to no-flat whenever the VM feels like it. One of the questions that has been raised is how similar B3.ref is to B2, specifically with respect to atomicity.? We've gone back and forth on this. Having shaken the tree quite a bit, what feels like the low energy state to me right now is: ?- The ref type of all on-identity classes are treated uniformly; B3.ref and B2.ref are translated the same, treated the same, have the same atomicity, the same nullity, etc. ?- The only difference across the spectrum of non-identity classes is the treatment of the val type.? For B2, this means the val type is *illegal*; for B3, this means it is atomic; for B3n, it is non-atomic (which in practice will mean more flatness.) ?- (controversial) For all types, the ref type is the default. This means that some current value-based classes can migrate not only to B2, but to B3 or B3n.? (And that we could migrate to B2 today and further to B3 tomorrow.) While this is technically four flavors, I don't think it needs to feel that complex.? I'll pick some obviously silly modifiers for exposition: ?- class B1 { } ?- zero-hostile value class B2 { } ?- value class B3 { } ?- tearing-happy value class B3n { } In other words: one new concept ("value class"), with two sub-modifiers (zero-hostile, and tearing-happy) which affect the behavior of the val type (forbidden for B2, loosened for B3n.) For heap flattening, what this gets us is: ?- B1 -- no-flat ?- B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) ?- B3 -- low-flat (atomic, no null channel) ?- B3n -- full-flat (non-atomic, no null channel) This is a slight departure from earlier tree-shakings with respect to tearing.? In particular, refs do not tear at all, so programs that use all refs will never see tearing (but it is still possible to get a torn value using .val and then box that into a ref.) If you turn this around, the declaration-site decision tree becomes: ?- Do I need identity (mutability, subclassing, aliasing)?? Then B1. ?- Are uninitialized values unacceptable?? Then B2. ?- Am I willing to tolerate tearing to enable more flattening? Then B3n. ?- Otherwise, B3. And the use-site decision tree becomes: ?- For B1, B2 -- no choices to make. ?- Do I need nullity?? Then .ref ?- Do I need atomicity, and the class doesn't already provide it?? Then .ref ?- Otherwise, can use .val The main downside of making ref the default is that people will grumble about having to say .val at the use site all the time. And they will!? And it does feel a little odd that you have to opt into val-ness at both the declaration and use sites.? But it unlocks a lot of things (see Kevin's list for more): ?- The default name is the safest version. ?- Every unadorned name works the same way; it's always a reference type.? You don't need to maintain a mental database around "which kind of name is this". ?- Migration from B1 -> B2 -> B3 is possible.? This is huge (and more than we had hoped for when we started this game.) (The one thing to still worry about is that while refs can't tear, you can still observe a torn value through a ref, if someone tore it and then boxed it.? I don't see how we defend against this, but the non-atomic label should be enough of a warning.) On 5/6/2022 10:04 AM, Brian Goetz wrote: > In this model, (non-atomic B3).ref takes the place of (non-atomic B2) > in the stacking I've been discussing.? Is that what you're saying? > > ??? class B1 { }? // ref, identity, atomic > ??? value-based class B2 { }? // ref, non-identity, atomic > ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, both > projections share atomicity > > If we go with ref-default, then this is a small leap from yesterday's > stacking, because "B3" and "B2" are both reference types, so if you > want a tearable, non-atomic reference type, saying `non-atomic value > class B3` and then just using B3 gets you that. Then: > > ?- B2 is like B1, minus identity > ?- B3 means "uninitialized values are OK, you get two types, a > zero-default and a non-default" > ?- Non-atomicity is an extra property we can add to B3, to get more > flattening in exchange for less integrity > ?- The use cases for non-atomic B2 are served by non-atomic B3 (when > .ref is the default) > > I think this still has the properties I want; I can freely choose the > reasonable subsets of { identity, has-zero, nullable, atomicity } that > I want; the orthogonality of non-atomic across buckets becomes > orthogonality of non-atomic with nullity, and the "B3.ref is just like > B2" is shown to be the "false friend." > From forax at univ-mlv.fr Sat Jun 4 09:33:51 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sat, 4 Jun 2022 11:33:51 +0200 (CEST) Subject: Anonymous value classes In-Reply-To: References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> <15cb91b4-a951-9cf7-59b0-656255cc70b0@oracle.com> Message-ID: <1106698295.2016847.1654335231501.JavaMail.zimbra@u-pem.fr> > From: "Maurizio Cimadamore" > To: "Brian Goetz" , "Remi Forax" > Cc: "daniel smith" , "valhalla-spec-experts" > > Sent: Friday, June 3, 2022 8:18:44 PM > Subject: Re: Anonymous value classes > And `var` ? > (but I agree this feels a niche) in conjunction of anything like a VarHandle, a MethodHandle, a lambda, string concatenation, etc. Anyway, Brian spins it as we do not get full-flattening and that may be correct, but half-flattening (flattening on stack) is as important, there is a lot of libraries that have APIs using interfaces that are implemented by anonymous classes, the collection API is one of them, fluent loggers (anything fluent in fact) is another, and those will benefit to have better than escape analysis performance. For half-flattening, being monomorphic and a value-based class at a callsite is enough for the JITs, you do not have to have the precise concrete class name mentioned. > Maurizio R?mi > On 03/06/2022 18:59, Brian Goetz wrote: >> On 6/3/2022 1:39 PM, Remi Forax wrote: >>>> From: "Brian Goetz" [ mailto:brian.goetz at oracle.com | ] >>>> To: "daniel smith" [ mailto:daniel.smith at oracle.com | >>>> ] , "valhalla-spec-experts" [ mailto:valhalla-spec-experts at openjdk.java.net | >>>> ] >>>> Sent: Friday, June 3, 2022 6:21:26 PM >>>> Subject: Re: Anonymous value classes >>>> There is no chance to get any calling-convention optimization here, since the >>>> concrete class name will not show up in any method descriptor (or preload >>>> attribute). There is no chance to get any heap flattening here, since the >>>> concrete class name will not show up in any field descriptor or `newarray` >>>> operand. >>> Nope, anonymous classes are anonymous only for Java not in the bytecode, by >>> example >> OK, correction: such a vanishingly microscopic chance as to be completely >> ignorable :) From forax at univ-mlv.fr Mon Jun 6 13:05:20 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 6 Jun 2022 15:05:20 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> Message-ID: <411385158.2732602.1654520720122.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "daniel smith" > Cc: "valhalla-spec-experts" > Sent: Friday, June 3, 2022 9:14:39 PM > Subject: Re: User model stacking: current status > Continuing to shake this tree. > I'm glad we went through the exploration of "flattenable B3.ref"; while I think > we probably could address the challenges of tearing across the null channel / > data channels boundary, I'm pretty willing to let this one go. Similarly I'm > glad we went through the "atomicity orthogonal to buckets" exploration, and am > ready to let that one go too. > What I'm not willing to let go of us making atomicity explicit in the model. Not > only is piggybacking non-atomicity on something like val-ness too subtle and > surprising, but non-atomicity seems like it is a property that the class author > needs to ask for. Flatness is an important benefit, but only when it doesn't > get in the way of safety. > Recall that we have three different representation techniques: > - no-flat -- use a pointer > - low-flat -- for sufficiently small (depending on size of atomic instructions > provided by the hardware) values, pack multiple fields into a single, > atomically accessed unit. > - full-flat -- flatten the layout, access individual individual fields directly, > may allow tearing. > The "low-flat" bucket got some attention recently when we discovered that there > are usable 128-bit atomics on Intel (based on a recent revision of the chip > spec), but this is not a slam-dunk; it requires some serious compiler heroics > to pack multiple values into single accesses. But there may be targets of > opportunity here for single-field values (like Optional) or final fields. And > we can always fall back to no-flat whenever the VM feels like it. > One of the questions that has been raised is how similar B3.ref is to B2, > specifically with respect to atomicity. We've gone back and forth on this. > Having shaken the tree quite a bit, what feels like the low energy state to me > right now is: > - The ref type of all on-identity classes are treated uniformly; B3.ref and > B2.ref are translated the same, treated the same, have the same atomicity, the > same nullity, etc. > - The only difference across the spectrum of non-identity classes is the > treatment of the val type. For B2, this means the val type is *illegal*; for > B3, this means it is atomic; for B3n, it is non-atomic (which in practice will > mean more flatness.) > - (controversial) For all types, the ref type is the default. This means that > some current value-based classes can migrate not only to B2, but to B3 or B3n. > (And that we could migrate to B2 today and further to B3 tomorrow.) > While this is technically four flavors, I don't think it needs to feel that > complex. I'll pick some obviously silly modifiers for exposition: > - class B1 { } > - zero-hostile value class B2 { } > - value class B3 { } > - tearing-happy value class B3n { } > In other words: one new concept ("value class"), with two sub-modifiers > (zero-hostile, and tearing-happy) which affect the behavior of the val type > (forbidden for B2, loosened for B3n.) > For heap flattening, what this gets us is: > - B1 -- no-flat > - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) > - B3 -- low-flat (atomic, no null channel) > - B3n -- full-flat (non-atomic, no null channel) > This is a slight departure from earlier tree-shakings with respect to tearing. > In particular, refs do not tear at all, so programs that use all refs will > never see tearing (but it is still possible to get a torn value using .val and > then box that into a ref.) > If you turn this around, the declaration-site decision tree becomes: > - Do I need identity (mutability, subclassing, aliasing)? Then B1. > - Are uninitialized values unacceptable? Then B2. > - Am I willing to tolerate tearing to enable more flattening? Then B3n. > - Otherwise, B3. > And the use-site decision tree becomes: > - For B1, B2 -- no choices to make. > - Do I need nullity? Then .ref > - Do I need atomicity, and the class doesn't already provide it? Then .ref > - Otherwise, can use .val > The main downside of making ref the default is that people will grumble about > having to say .val at the use site all the time. And they will! And it does > feel a little odd that you have to opt into val-ness at both the declaration > and use sites. But it unlocks a lot of things (see Kevin's list for more): > - The default name is the safest version. > - Every unadorned name works the same way; it's always a reference type. You > don't need to maintain a mental database around "which kind of name is this". > - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had > hoped for when we started this game.) > (The one thing to still worry about is that while refs can't tear, you can still > observe a torn value through a ref, if someone tore it and then boxed it. I > don't see how we defend against this, but the non-atomic label should be enough > of a warning.) I think B3 being ref by default is a mistake, but this is mistake that stem from a more important mistake, the notion of reference type. I don't think it's a good idea to introduce the notion of reference type in the Java spec. We have spend a lot of time in thinking that identity and value are two different types, they are not, being a value is a runtime capability not a capability inherited from a type. We have remove the idea of the interfaces ValueObject / IdentityObject for this exact reason. I think the moto code like a class works like an int fails us, because what we want is more code like a class, being optimized like an int. The VM representation, being flatten or not, is not something the Java spec should be aware of. This change makes the spec easier to write and the semantics easier to explain. Let me try to explain what i think is a better model: The addition of value class does not change the existing Java model, apart from primitive types, everything is an object, an instance of class. that's why we declare a class, use new to create it, can call methods on a value class exactly like on an identity class. What change as Brian said several times is that a value class does not have null as default value but all fields with zeroes. It's not the only difference, == (acmp) tests the fields, synchronized and weak refs do not work but having a different default value is the most important difference. Thus fhe fact that the VM can directly use the value of a value class (the immediate value) is not a property by itself, it's something VM implementations are free to do so it should not be part of the Java spec. We do not need to introduce the concept of reference type vs value type, but only the concept of reference projection (as C# does with what they call the nullable value types). So from the user POV, everything is an object, an instance of a class. A value class is a special kind class where the default value is not null but a bunch of zeroes and has no observable identity (hence the semantics of ==). >From that, we offer three different trade-off, - you may want to keep null has the default value, using a zero-hostile value class, in that case the VM may not be able to do all optimizations, but in exchange, you have a mostly binary backward compatible class with an identity class. - you may want to use existing code that suppose that null is the default value (generic collections by example), for that you can use the .ref projection that allow a value class to be nullable. In the future, you will need less of the .ref projection because generics will be overhaul to work with classes with a non null default. Note that .ref is a type projection, not a class. You can not write new Point.ref() but you can write Point.ref point = new Point(); - you may want the read/write of the value class being non-atomic, so the VM can do more optimization when storing/reading an instance of the value class from fields/arrays. Why this is better than making everything nullable by default ? First, making everything nullable by default goes against the idea that a value class is just a classical class with a different default value. If a value class is nullable by default, then a value class does not have a different default ? Right ?? If a value class is nullable by default, it inherently makes the model hard to understand because the discrepancies between how we explain the model, has all zeroes by default and the semantics which is nullable by default. Then, making everything nullable makes the performance model murky, nullable by default is equivalent to say, let's use Integer instead of int by default so we will have the very same kind of performance pot holes. Having the right defaults in term of the performance model is very important here because we have started Valhalla because of these performance issues. And making everything nullable does not work with the future generics code which uses T.ref as a type projection. To summarize, making value class nullable by default is an unproven design (remember that what C# calls value types is not nullable by default) which is based on the idea of the model being describe in term of reference type vs value type, which is IMO not the right way to describe the model. It does not work like an int, it is optimizable (we used flatenable in the past) as an int. Obviously there is at least a drawback to not use nullable value class by default, you can not refactor an identity class or a null-hostile value class to a value class because .ref is a type projection and not a real class. I can live with that. R?mi > On 5/6/2022 10:04 AM, Brian Goetz wrote: >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the >> stacking I've been discussing. Is that what you're saying? >> class B1 { } // ref, identity, atomic >> value-based class B2 { } // ref, non-identity, atomic >> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections >> share atomicity >> If we go with ref-default, then this is a small leap from yesterday's stacking, >> because "B3" and "B2" are both reference types, so if you want a tearable, >> non-atomic reference type, saying `non-atomic value class B3` and then just >> using B3 gets you that. Then: >> - B2 is like B1, minus identity >> - B3 means "uninitialized values are OK, you get two types, a zero-default and a >> non-default" >> - Non-atomicity is an extra property we can add to B3, to get more flattening in >> exchange for less integrity >> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the >> default) >> I think this still has the properties I want; I can freely choose the reasonable >> subsets of { identity, has-zero, nullable, atomicity } that I want; the >> orthogonality of non-atomic across buckets becomes orthogonality of non-atomic >> with nullity, and the "B3.ref is just like B2" is shown to be the "false >> friend." From brian.goetz at oracle.com Mon Jun 6 14:14:53 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 6 Jun 2022 14:14:53 +0000 Subject: User model stacking: current status In-Reply-To: <411385158.2732602.1654520720122.JavaMail.zimbra@u-pem.fr> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <411385158.2732602.1654520720122.JavaMail.zimbra@u-pem.fr> Message-ID: <899946D0-CBB7-4900-8E91-BFBB464F68F8@oracle.com> This makes no sense. We are not introducing a notion of reference type into the spec. The spec is already completely riddled with the concept of references and reference types. In fact many of the constraints that influence the current design come from that fact. Sent from my iPad > On Jun 6, 2022, at 9:10 AM, Remi Forax wrote: > > I don't think it's a good idea to introduce the notion of reference type in the Java spec. From daniel.smith at oracle.com Mon Jun 6 17:56:40 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 6 Jun 2022 17:56:40 +0000 Subject: Anonymous value classes In-Reply-To: <1106698295.2016847.1654335231501.JavaMail.zimbra@u-pem.fr> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> <15cb91b4-a951-9cf7-59b0-656255cc70b0@oracle.com> <1106698295.2016847.1654335231501.JavaMail.zimbra@u-pem.fr> Message-ID: <14DA7BC8-1E0D-4C29-8AE3-AE58D768E83D@oracle.com> > On Jun 4, 2022, at 3:33 AM, forax at univ-mlv.fr wrote: > > there is a lot of libraries that have APIs using interfaces that are implemented by anonymous classes, the collection API is one of them, fluent loggers (anything fluent in fact) is another, and those will benefit to have better than escape analysis performance. This could use validation. My very high-level sense is that within inlined code, escape analysis will do just fine with identity classes, with no observable performance gain when switching to a value class. *Across calls*, we can do much better with value classes, but at that point current HotSpot optimizations need a name in the descriptor. (Huge caveat that my understanding of this situation is very high-level, and there may be important things I'm missing.) Also note that if it's necessary to opt in anyway, it's not particularly much to ask these performance-sensitive users to declare a local class rather than an anonymous value class. From forax at univ-mlv.fr Tue Jun 7 09:55:24 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 7 Jun 2022 11:55:24 +0200 (CEST) Subject: Anonymous value classes In-Reply-To: <14DA7BC8-1E0D-4C29-8AE3-AE58D768E83D@oracle.com> References: <05B5F014-70A0-4F4A-8C2C-426FFB16F61C@oracle.com> <59c29885-c6a1-a00e-475c-ca0d24aee844@oracle.com> <643250617.1814571.1654277984929.JavaMail.zimbra@u-pem.fr> <15cb91b4-a951-9cf7-59b0-656255cc70b0@oracle.com> <1106698295.2016847.1654335231501.JavaMail.zimbra@u-pem.fr> <14DA7BC8-1E0D-4C29-8AE3-AE58D768E83D@oracle.com> Message-ID: <869467424.3340290.1654595724962.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "daniel smith" > To: "Remi Forax" > Cc: "Maurizio Cimadamore" , "Brian Goetz" , > "valhalla-spec-experts" > Sent: Monday, June 6, 2022 7:56:40 PM > Subject: Re: Anonymous value classes >> On Jun 4, 2022, at 3:33 AM, forax at univ-mlv.fr wrote: >> >> there is a lot of libraries that have APIs using interfaces that are implemented >> by anonymous classes, the collection API is one of them, fluent loggers >> (anything fluent in fact) is another, and those will benefit to have better >> than escape analysis performance. > > This could use validation. My very high-level sense is that within inlined code, > escape analysis will do just fine with identity classes, with no observable > performance gain when switching to a value class. In practice, escape analysis is weaker than what you think. That's why we need Valhalla in the first place, i believe John has written a text about why Escape Analysis is not good enough at the start of the project. > *Across calls*, we can do > much better with value classes, but at that point current HotSpot optimizations > need a name in the descriptor. (Huge caveat that my understanding of this > situation is very high-level, and there may be important things I'm missing.) Anonymous class are only anonymous for the Java code not for the VM, javac desugars anonymous classes to real classes with a funny names full or '$' so the VM considers them as real classes. > > Also note that if it's necessary to opt in anyway, it's not particularly much to > ask these performance-sensitive users to declare a local class rather than an > anonymous value class. I understand the initial reaction in front of a new value AbstractList<>() { ... } We can single out anonymous class as as you propose, but not if it's based on a misunderstanding (anonymous class name can not appear in descriptor) or because the syntax is a kind of ugly (the whole anonymous class syntax is ugly). R?mi From brian.goetz at oracle.com Mon Jun 13 23:04:39 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 13 Jun 2022 19:04:39 -0400 Subject: User model stacking: current status In-Reply-To: <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> Message-ID: <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> I've done a little more shaking of this tree.? It involves keeping the notion that the non-identity buckets differ only in the treatment of their val projection, but makes a further normalization that enables the buckets to mostly collapse away. "value class X" means: ?- Instances are identity-free ?- There are two types, X.ref (reference, nullable) and X.val (direct, non-nullable) ?- Reference types are atomic, as always ?- X is an alias for X.ref Now, what is the essence of B2?? B2 means not "I hate zeros", but "I don't like that uninitialized variables are initialized to zero."? It doesn't mean the .val projection is meaningless, it means that we don't trust arbitrary clients with it.? So, we can make a slight adjustment: ?- The .val type is always there, but for "B2" classes, it is *inaccessible outside the nest*, as per ordinary accessibility. This means that within the nest, code that understands the restrictions can, say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let the zero escape.? This gives B2 classes a lot more latitude to use the .val type in safe ways. Basically: if you don't trust people with the .val type, don't let the val type escape. There's a bikeshed to paint, but it might look something like: ??? value class B2 { ??????? private class val { } ??? } or, flipping the default: ??? value class B3a { ??????? public class val { } ??? } So B2 is really a B3a whose value projection is encapsulated. The other bucket, B3n, I think can live with a modifier: ??? non-atomic value class B3n { } While these are all the same buckets as before, this feels much more like "one new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't think of this as creating a different bucket of fields.) Summary: ??? class B1 { } ??? value class B2 { private class val { } } ??? value class B3a { } ??? non-atomic value class B3n { } Value class here is clearly the star of the show; all value classes are treated uniformly (ref-default, have a val); some value classes encapsulate the val type; some value classes further relax the integrity requirements of instances on the heap, to get better flattening and performance, when their semantics don't require it. It's an orthogonal choice whether the default is "val is private" and "val is public". On 6/3/2022 3:14 PM, Brian Goetz wrote: > Continuing to shake this tree. > > I'm glad we went through the exploration of "flattenable B3.ref"; > while I think we probably could address the challenges of tearing > across the null channel / data channels boundary, I'm pretty willing > to let this one go.? Similarly I'm glad we went through the "atomicity > orthogonal to buckets" exploration, and am ready to let that one go too. > > What I'm not willing to let go of us making atomicity explicit in the > model.? Not only is piggybacking non-atomicity on something like > val-ness too subtle and surprising, but non-atomicity seems like it is > a property that the class author needs to ask for.? Flatness is an > important benefit, but only when it doesn't get in the way of safety. > > Recall that we have three different representation techniques: > > ?- no-flat -- use a pointer > ?- low-flat -- for sufficiently small (depending on size of atomic > instructions provided by the hardware) values, pack multiple fields > into a single, atomically accessed unit. > ?- full-flat -- flatten the layout, access individual individual > fields directly, may allow tearing. > > The "low-flat" bucket got some attention recently when we discovered > that there are usable 128-bit atomics on Intel (based on a recent > revision of the chip spec), but this is not a slam-dunk; it requires > some serious compiler heroics to pack multiple values into single > accesses.? But there may be targets of opportunity here for > single-field values (like Optional) or final fields.? And we can > always fall back to no-flat whenever the VM feels like it. > > One of the questions that has been raised is how similar B3.ref is to > B2, specifically with respect to atomicity. We've gone back and forth > on this. > > Having shaken the tree quite a bit, what feels like the low energy > state to me right now is: > > ?- The ref type of all on-identity classes are treated uniformly; > B3.ref and B2.ref are translated the same, treated the same, have the > same atomicity, the same nullity, etc. > ?- The only difference across the spectrum of non-identity classes is > the treatment of the val type.? For B2, this means the val type is > *illegal*; for B3, this means it is atomic; for B3n, it is non-atomic > (which in practice will mean more flatness.) > ?- (controversial) For all types, the ref type is the default.? This > means that some current value-based classes can migrate not only to > B2, but to B3 or B3n.? (And that we could migrate to B2 today and > further to B3 tomorrow.) > > While this is technically four flavors, I don't think it needs to feel > that complex.? I'll pick some obviously silly modifiers for exposition: > > ?- class B1 { } > ?- zero-hostile value class B2 { } > ?- value class B3 { } > ?- tearing-happy value class B3n { } > > In other words: one new concept ("value class"), with two > sub-modifiers (zero-hostile, and tearing-happy) which affect the > behavior of the val type (forbidden for B2, loosened for B3n.) > > For heap flattening, what this gets us is: > > ?- B1 -- no-flat > ?- B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) > ?- B3 -- low-flat (atomic, no null channel) > ?- B3n -- full-flat (non-atomic, no null channel) > > This is a slight departure from earlier tree-shakings with respect to > tearing.? In particular, refs do not tear at all, so programs that use > all refs will never see tearing (but it is still possible to get a > torn value using .val and then box that into a ref.) > > If you turn this around, the declaration-site decision tree becomes: > > ?- Do I need identity (mutability, subclassing, aliasing)? Then B1. > ?- Are uninitialized values unacceptable?? Then B2. > ?- Am I willing to tolerate tearing to enable more flattening?? Then B3n. > ?- Otherwise, B3. > > And the use-site decision tree becomes: > > ?- For B1, B2 -- no choices to make. > ?- Do I need nullity?? Then .ref > ?- Do I need atomicity, and the class doesn't already provide it?? > Then .ref > ?- Otherwise, can use .val > > The main downside of making ref the default is that people will > grumble about having to say .val at the use site all the time.? And > they will!? And it does feel a little odd that you have to opt into > val-ness at both the declaration and use sites.? But it unlocks a lot > of things (see Kevin's list for more): > > ?- The default name is the safest version. > ?- Every unadorned name works the same way; it's always a reference > type.? You don't need to maintain a mental database around "which kind > of name is this". > ?- Migration from B1 -> B2 -> B3 is possible.? This is huge (and more > than we had hoped for when we started this game.) > > (The one thing to still worry about is that while refs can't tear, you > can still observe a torn value through a ref, if someone tore it and > then boxed it.? I don't see how we defend against this, but the > non-atomic label should be enough of a warning.) > > > > On 5/6/2022 10:04 AM, Brian Goetz wrote: >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) >> in the stacking I've been discussing.? Is that what you're saying? >> >> ??? class B1 { }? // ref, identity, atomic >> ??? value-based class B2 { }? // ref, non-identity, atomic >> ??? [ non-atomic ] value class B3 { }? // ref or val, zero is ok, >> both projections share atomicity >> >> If we go with ref-default, then this is a small leap from yesterday's >> stacking, because "B3" and "B2" are both reference types, so if you >> want a tearable, non-atomic reference type, saying `non-atomic value >> class B3` and then just using B3 gets you that. Then: >> >> ?- B2 is like B1, minus identity >> ?- B3 means "uninitialized values are OK, you get two types, a >> zero-default and a non-default" >> ?- Non-atomicity is an extra property we can add to B3, to get more >> flattening in exchange for less integrity >> ?- The use cases for non-atomic B2 are served by non-atomic B3 (when >> .ref is the default) >> >> I think this still has the properties I want; I can freely choose the >> reasonable subsets of { identity, has-zero, nullable, atomicity } >> that I want; the orthogonality of non-atomic across buckets becomes >> orthogonality of non-atomic with nullity, and the "B3.ref is just >> like B2" is shown to be the "false friend." >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Jun 14 07:13:05 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 14 Jun 2022 09:13:05 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "daniel smith" > Cc: "valhalla-spec-experts" > Sent: Tuesday, June 14, 2022 1:04:39 AM > Subject: Re: User model stacking: current status > I've done a little more shaking of this tree. It involves keeping the notion > that the non-identity buckets differ only in the treatment of their val > projection, but makes a further normalization that enables the buckets to > mostly collapse away. > "value class X" means: > - Instances are identity-free > - There are two types, X.ref (reference, nullable) and X.val (direct, > non-nullable) > - Reference types are atomic, as always > - X is an alias for X.ref > Now, what is the essence of B2? B2 means not "I hate zeros", but "I don't like > that uninitialized variables are initialized to zero." It doesn't mean the .val > projection is meaningless, it means that we don't trust arbitrary clients with > it. So, we can make a slight adjustment: > - The .val type is always there, but for "B2" classes, it is *inaccessible > outside the nest*, as per ordinary accessibility. > This means that within the nest, code that understands the restrictions can, > say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let > the zero escape. This gives B2 classes a lot more latitude to use the .val type > in safe ways. Basically: if you don't trust people with the .val type, don't > let the val type escape. I don't trust myself with a B2.val. The val type for B2 should not exist at all, otherwise any libraries using the reflection can do getClass() on a X.val[] (even typed as a X[]). > There's a bikeshed to paint, but it might look something like: > value class B2 { > private class val { } > } > or, flipping the default: > value class B3a { > public class val { } > } > So B2 is really a B3a whose value projection is encapsulated. and here you lost me, .ref and .val are supposed to be projection types not classes, at runtime there is only one class. > The other bucket, B3n, I think can live with a modifier: > non-atomic value class B3n { } > While these are all the same buckets as before, this feels much more like "one > new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't > think of this as creating a different bucket of fields.) yes ! > Summary: > class B1 { } > value class B2 { private class val { } } > value class B3a { } > non-atomic value class B3n { } > Value class here is clearly the star of the show; all value classes are treated > uniformly (ref-default, have a val); some value classes encapsulate the val > type; some value classes further relax the integrity requirements of instances > on the heap, to get better flattening and performance, when their semantics > don't require it. > It's an orthogonal choice whether the default is "val is private" and "val is > public". It makes B2.val a reality, but B3 has no sane default value otherwise it's a B3, so B2.val should not exist. regards, R?mi > On 6/3/2022 3:14 PM, Brian Goetz wrote: >> Continuing to shake this tree. >> I'm glad we went through the exploration of "flattenable B3.ref"; while I think >> we probably could address the challenges of tearing across the null channel / >> data channels boundary, I'm pretty willing to let this one go. Similarly I'm >> glad we went through the "atomicity orthogonal to buckets" exploration, and am >> ready to let that one go too. >> What I'm not willing to let go of us making atomicity explicit in the model. Not >> only is piggybacking non-atomicity on something like val-ness too subtle and >> surprising, but non-atomicity seems like it is a property that the class author >> needs to ask for. Flatness is an important benefit, but only when it doesn't >> get in the way of safety. >> Recall that we have three different representation techniques: >> - no-flat -- use a pointer >> - low-flat -- for sufficiently small (depending on size of atomic instructions >> provided by the hardware) values, pack multiple fields into a single, >> atomically accessed unit. >> - full-flat -- flatten the layout, access individual individual fields directly, >> may allow tearing. >> The "low-flat" bucket got some attention recently when we discovered that there >> are usable 128-bit atomics on Intel (based on a recent revision of the chip >> spec), but this is not a slam-dunk; it requires some serious compiler heroics >> to pack multiple values into single accesses. But there may be targets of >> opportunity here for single-field values (like Optional) or final fields. And >> we can always fall back to no-flat whenever the VM feels like it. >> One of the questions that has been raised is how similar B3.ref is to B2, >> specifically with respect to atomicity. We've gone back and forth on this. >> Having shaken the tree quite a bit, what feels like the low energy state to me >> right now is: >> - The ref type of all on-identity classes are treated uniformly; B3.ref and >> B2.ref are translated the same, treated the same, have the same atomicity, the >> same nullity, etc. >> - The only difference across the spectrum of non-identity classes is the >> treatment of the val type. For B2, this means the val type is *illegal*; for >> B3, this means it is atomic; for B3n, it is non-atomic (which in practice will >> mean more flatness.) >> - (controversial) For all types, the ref type is the default. This means that >> some current value-based classes can migrate not only to B2, but to B3 or B3n. >> (And that we could migrate to B2 today and further to B3 tomorrow.) >> While this is technically four flavors, I don't think it needs to feel that >> complex. I'll pick some obviously silly modifiers for exposition: >> - class B1 { } >> - zero-hostile value class B2 { } >> - value class B3 { } >> - tearing-happy value class B3n { } >> In other words: one new concept ("value class"), with two sub-modifiers >> (zero-hostile, and tearing-happy) which affect the behavior of the val type >> (forbidden for B2, loosened for B3n.) >> For heap flattening, what this gets us is: >> - B1 -- no-flat >> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) >> - B3 -- low-flat (atomic, no null channel) >> - B3n -- full-flat (non-atomic, no null channel) >> This is a slight departure from earlier tree-shakings with respect to tearing. >> In particular, refs do not tear at all, so programs that use all refs will >> never see tearing (but it is still possible to get a torn value using .val and >> then box that into a ref.) >> If you turn this around, the declaration-site decision tree becomes: >> - Do I need identity (mutability, subclassing, aliasing)? Then B1. >> - Are uninitialized values unacceptable? Then B2. >> - Am I willing to tolerate tearing to enable more flattening? Then B3n. >> - Otherwise, B3. >> And the use-site decision tree becomes: >> - For B1, B2 -- no choices to make. >> - Do I need nullity? Then .ref >> - Do I need atomicity, and the class doesn't already provide it? Then .ref >> - Otherwise, can use .val >> The main downside of making ref the default is that people will grumble about >> having to say .val at the use site all the time. And they will! And it does >> feel a little odd that you have to opt into val-ness at both the declaration >> and use sites. But it unlocks a lot of things (see Kevin's list for more): >> - The default name is the safest version. >> - Every unadorned name works the same way; it's always a reference type. You >> don't need to maintain a mental database around "which kind of name is this". >> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had >> hoped for when we started this game.) >> (The one thing to still worry about is that while refs can't tear, you can still >> observe a torn value through a ref, if someone tore it and then boxed it. I >> don't see how we defend against this, but the non-atomic label should be enough >> of a warning.) >> On 5/6/2022 10:04 AM, Brian Goetz wrote: >>> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the >>> stacking I've been discussing. Is that what you're saying? >>> class B1 { } // ref, identity, atomic >>> value-based class B2 { } // ref, non-identity, atomic >>> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections >>> share atomicity >>> If we go with ref-default, then this is a small leap from yesterday's stacking, >>> because "B3" and "B2" are both reference types, so if you want a tearable, >>> non-atomic reference type, saying `non-atomic value class B3` and then just >>> using B3 gets you that. Then: >>> - B2 is like B1, minus identity >>> - B3 means "uninitialized values are OK, you get two types, a zero-default and a >>> non-default" >>> - Non-atomicity is an extra property we can add to B3, to get more flattening in >>> exchange for less integrity >>> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the >>> default) >>> I think this still has the properties I want; I can freely choose the reasonable >>> subsets of { identity, has-zero, nullable, atomicity } that I want; the >>> orthogonality of non-atomic across buckets becomes orthogonality of non-atomic >>> with nullity, and the "B3.ref is just like B2" is shown to be the "false >>> friend." -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Jun 14 13:16:41 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Jun 2022 09:16:41 -0400 Subject: User model stacking: current status In-Reply-To: <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> Message-ID: <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> > The val type for B2 should not exist at all > > So B2 is really a B3a whose value projection is encapsulated. > > > and here you lost me, .ref and .val are supposed to be projection > types not classes, at runtime there is only one class. And apparently I have to say this again .... It's fine to not understand what is being proposed.? If so, ask questions, or think about it for a few days before responding.? But it's Not OK to jump to dogmatic "should not" / "wrong" pronouncements before you understand what is being proposed.? That's just unhelpful. > > > Summary: > > ??? class B1 { } > ??? value class B2 { private class val { } } > ??? value class B3a { } > ??? non-atomic value class B3n { } > > Value class here is clearly the star of the show; all value > classes are treated uniformly (ref-default, have a val); some > value classes encapsulate the val type; some value classes further > relax the integrity requirements of instances on the heap, to get > better flattening and performance, when their semantics don't > require it. > > It's an orthogonal choice whether the default is "val is private" > and "val is public". > > > It makes B2.val a reality, but B3 has no sane default value otherwise > it's a B3, so B2.val should not exist. > Let me try explaining again. All value types have .ref and .val types.? They have the properties we've been discussing for a long time: ref types are references, and are therefore nullable and atomic; val types are direct values, are not nullable, and are _not necessarily_ atomic. We've been describing B2 classes as those with "no good default", but that doesn't mean that they can't have a .val type.? It means we *can't trust arbitrary code to properly initialize a B2.val type.* Once initialized, B2.val is fine, and have the benefit of greater flatness.? We explored language and VM features to ensure B2.val types are properly initialized, but that ran into the rocks. But we can allow the B2 class itself to mediate access to the .val type.? This has two benefits: ?- We can get back some of the benefit of flattening B2.val types ?- Uniformity Here are two examples of where a B2 class could safely and beneficially use B2.val: ???? value class Rational { ????????? Rational[] harmonicSeq(int n) { ????????????? Rational.val[] rs = new Rational.val[n]; ????????????? for (int i=0; i From brian.goetz at oracle.com Tue Jun 14 13:29:02 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Jun 2022 09:29:02 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> Message-ID: > And with Rational.val requiring atomic access, we can only flatten it > if the underlying HW supports it (in this case, 2 ints fits nicely in > 64bits so we're good). Larger .val's can only be flattened if marked > as "non-atomic" (the B3n case). And because there's no tearing, > handing out the flattened Rational.val[] is safe. > > Do I have that right? Correct.? If Rational were non-atomic, the values could tear, but the array would still be null/init-safe. I'd add there is one more thing going on that makes handing out the Rational.val[] as a Rational; the user can't put zeroes in it, because they can't construct the zero.? (And they can't put nulls in it, because the array store check will prevent that.) From brian.goetz at oracle.com Tue Jun 14 14:19:35 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Jun 2022 10:19:35 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: It took me a while to understand your concern, but I think I have it now -- it is that we're effectively doing separate access control on LFoo and QFoo.? At the language level this is no problem, but the VM needs a story here.? Is this the whole of your concern, or is there more? >> - The .val type is always there, but for "B2" classes, it is *inaccessible outside the nest*, as per ordinary accessibility. > Is this the first time we'll be checking nest mate accessibility at > class creation? If so (and I think it is) we'll need to update the > spec to define when the nest mates + nest host can be loaded to > complete this check in the (already complicated) class loading > process. > > The case I'm thinking of is needing to do the accessibility check on > the defining class of a static field (and possibly an instance field) > when defining a class like: > > class Foo { > static QRational myRational; > } > > To know if Foo can have a field of Rational.val, we need to check both > Foo and Rational are in the same nest. First you need to check that Rational is accessible, and *then* you need to check that QRational satisfies the additional accessibility requirements, based on the public/package/private accessibility of the Q type.? Right? > This will require additional > class loads mitigated somewhat by the existing rules for preloading > Qs. So maybe we can do the nest check there? We'll probably need to > make this explicit in the spec that these additional classes can be > loaded as part of the accessible check during class definition. > > Another option would be to delay the nest check until either the > or instance methods, until the "new" bytecode? I like that > less but it may be easier to fit into the spec. > > --Dan > >> This means that within the nest, code that understands the restrictions can, say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let the zero escape. This gives B2 classes a lot more latitude to use the .val type in safe ways. Basically: if you don't trust people with the .val type, don't let the val type escape. >> >> There's a bikeshed to paint, but it might look something like: >> >> value class B2 { >> private class val { } >> } >> >> or, flipping the default: >> >> value class B3a { >> public class val { } >> } >> >> So B2 is really a B3a whose value projection is encapsulated. >> >> The other bucket, B3n, I think can live with a modifier: >> >> non-atomic value class B3n { } >> >> While these are all the same buckets as before, this feels much more like "one new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't think of this as creating a different bucket of fields.) >> >> Summary: >> >> class B1 { } >> value class B2 { private class val { } } >> value class B3a { } >> non-atomic value class B3n { } >> >> Value class here is clearly the star of the show; all value classes are treated uniformly (ref-default, have a val); some value classes encapsulate the val type; some value classes further relax the integrity requirements of instances on the heap, to get better flattening and performance, when their semantics don't require it. >> >> It's an orthogonal choice whether the default is "val is private" and "val is public". >> >> >> >> On 6/3/2022 3:14 PM, Brian Goetz wrote: >> >> Continuing to shake this tree. >> >> I'm glad we went through the exploration of "flattenable B3.ref"; while I think we probably could address the challenges of tearing across the null channel / data channels boundary, I'm pretty willing to let this one go. Similarly I'm glad we went through the "atomicity orthogonal to buckets" exploration, and am ready to let that one go too. >> >> What I'm not willing to let go of us making atomicity explicit in the model. Not only is piggybacking non-atomicity on something like val-ness too subtle and surprising, but non-atomicity seems like it is a property that the class author needs to ask for. Flatness is an important benefit, but only when it doesn't get in the way of safety. >> >> Recall that we have three different representation techniques: >> >> - no-flat -- use a pointer >> - low-flat -- for sufficiently small (depending on size of atomic instructions provided by the hardware) values, pack multiple fields into a single, atomically accessed unit. >> - full-flat -- flatten the layout, access individual individual fields directly, may allow tearing. >> >> The "low-flat" bucket got some attention recently when we discovered that there are usable 128-bit atomics on Intel (based on a recent revision of the chip spec), but this is not a slam-dunk; it requires some serious compiler heroics to pack multiple values into single accesses. But there may be targets of opportunity here for single-field values (like Optional) or final fields. And we can always fall back to no-flat whenever the VM feels like it. >> >> One of the questions that has been raised is how similar B3.ref is to B2, specifically with respect to atomicity. We've gone back and forth on this. >> >> Having shaken the tree quite a bit, what feels like the low energy state to me right now is: >> >> - The ref type of all on-identity classes are treated uniformly; B3.ref and B2.ref are translated the same, treated the same, have the same atomicity, the same nullity, etc. >> - The only difference across the spectrum of non-identity classes is the treatment of the val type. For B2, this means the val type is *illegal*; for B3, this means it is atomic; for B3n, it is non-atomic (which in practice will mean more flatness.) >> - (controversial) For all types, the ref type is the default. This means that some current value-based classes can migrate not only to B2, but to B3 or B3n. (And that we could migrate to B2 today and further to B3 tomorrow.) >> >> While this is technically four flavors, I don't think it needs to feel that complex. I'll pick some obviously silly modifiers for exposition: >> >> - class B1 { } >> - zero-hostile value class B2 { } >> - value class B3 { } >> - tearing-happy value class B3n { } >> >> In other words: one new concept ("value class"), with two sub-modifiers (zero-hostile, and tearing-happy) which affect the behavior of the val type (forbidden for B2, loosened for B3n.) >> >> For heap flattening, what this gets us is: >> >> - B1 -- no-flat >> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) >> - B3 -- low-flat (atomic, no null channel) >> - B3n -- full-flat (non-atomic, no null channel) >> >> This is a slight departure from earlier tree-shakings with respect to tearing. In particular, refs do not tear at all, so programs that use all refs will never see tearing (but it is still possible to get a torn value using .val and then box that into a ref.) >> >> If you turn this around, the declaration-site decision tree becomes: >> >> - Do I need identity (mutability, subclassing, aliasing)? Then B1. >> - Are uninitialized values unacceptable? Then B2. >> - Am I willing to tolerate tearing to enable more flattening? Then B3n. >> - Otherwise, B3. >> >> And the use-site decision tree becomes: >> >> - For B1, B2 -- no choices to make. >> - Do I need nullity? Then .ref >> - Do I need atomicity, and the class doesn't already provide it? Then .ref >> - Otherwise, can use .val >> >> The main downside of making ref the default is that people will grumble about having to say .val at the use site all the time. And they will! And it does feel a little odd that you have to opt into val-ness at both the declaration and use sites. But it unlocks a lot of things (see Kevin's list for more): >> >> - The default name is the safest version. >> - Every unadorned name works the same way; it's always a reference type. You don't need to maintain a mental database around "which kind of name is this". >> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had hoped for when we started this game.) >> >> (The one thing to still worry about is that while refs can't tear, you can still observe a torn value through a ref, if someone tore it and then boxed it. I don't see how we defend against this, but the non-atomic label should be enough of a warning.) >> >> >> >> On 5/6/2022 10:04 AM, Brian Goetz wrote: >> >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the stacking I've been discussing. Is that what you're saying? >> >> class B1 { } // ref, identity, atomic >> value-based class B2 { } // ref, non-identity, atomic >> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections share atomicity >> >> If we go with ref-default, then this is a small leap from yesterday's stacking, because "B3" and "B2" are both reference types, so if you want a tearable, non-atomic reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then: >> >> - B2 is like B1, minus identity >> - B3 means "uninitialized values are OK, you get two types, a zero-default and a non-default" >> - Non-atomicity is an extra property we can add to B3, to get more flattening in exchange for less integrity >> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the default) >> >> I think this still has the properties I want; I can freely choose the reasonable subsets of { identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be the "false friend." >> >> >> From forax at univ-mlv.fr Tue Jun 14 16:06:19 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 14 Jun 2022 18:06:19 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> Message-ID: <378455064.7555429.1655222779146.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" , "valhalla-spec-experts" > > Cc: "daniel smith" , "valhalla-spec-experts" > > Sent: Tuesday, June 14, 2022 3:16:41 PM > Subject: Re: User model stacking: current status [...] >>> Summary: >>> class B1 { } >>> value class B2 { private class val { } } >>> value class B3a { } >>> non-atomic value class B3n { } >>> Value class here is clearly the star of the show; all value classes are treated >>> uniformly (ref-default, have a val); some value classes encapsulate the val >>> type; some value classes further relax the integrity requirements of instances >>> on the heap, to get better flattening and performance, when their semantics >>> don't require it. >>> It's an orthogonal choice whether the default is "val is private" and "val is >>> public". >> It makes B2.val a reality, but B3 has no sane default value otherwise it's a B3, >> so B2.val should not exist. > Let me try explaining again. > All value types have .ref and .val types. They have the properties we've been > discussing for a long time: ref types are references, and are therefore > nullable and atomic; val types are direct values, are not nullable, and are > _not necessarily_ atomic. > We've been describing B2 classes as those with "no good default", but that > doesn't mean that they can't have a .val type. It means we *can't trust > arbitrary code to properly initialize a B2.val type.* Once initialized, B2.val > is fine, and have the benefit of greater flatness. We explored language and VM > features to ensure B2.val types are properly initialized, but that ran into the > rocks. > But we can allow the B2 class itself to mediate access to the .val type. This > has two benefits: > - We can get back some of the benefit of flattening B2.val types > - Uniformity > Here are two examples of where a B2 class could safely and beneficially use > B2.val: > value class Rational { > Rational[] harmonicSeq(int n) { > Rational.val[] rs = new Rational.val[n]; > for (int i=0; i rs[i] = new Rational(1, n); > return rs; > } > } > Here, we've made a _flat_ array of Rational.val, properly initialized it, and > returned it to the user. THe user gets the benefit of flatness, but can't screw > it up, because of the array store check. If Rational.val were illegal, then no > array of rationals could be flat. but you are leaking Rational.val as the class of the array, so one can write var array = Rational. harmonicSeq(3); var array2 = Arrays.copyOf(array, array.length + 1); var defaultValue = array2[array2.length - 1]; Currently we live in a world where apart of using jdk.unsupported, there is no way to get an uninitialized object even by reflection, you propose to shatter that idea and to allow to bypass the constructor and have access to the default value directly in the language. I think it may be reasonable to allow access to Rational.val through a specific reflection API but it means disallowing some methods like Arrays.copyOf() and add more like createAValueArrayFromAnExistingRefArray(), but accessing directly to B2.val is just a big security hole. > Similarly, a nestmate could take advantage of packing: > value class Complex { > value class C3 { > Complex.val x, y, z; > ... > } > C3 c3(Complex x, Complex y, Complex z) { return new C3(x, y, z); } > } > C3 gets the benefit of full flattening, which it can do because its in the nest; > it can share the flattened instances safely with code outside the nest. > (Access control is powerful thing.) and it's hard to get it right. regards, R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Jun 14 16:28:47 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Jun 2022 12:28:47 -0400 Subject: User model stacking: current status In-Reply-To: <378455064.7555429.1655222779146.JavaMail.zimbra@u-pem.fr> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> <378455064.7555429.1655222779146.JavaMail.zimbra@u-pem.fr> Message-ID: <12d768b8-8ca1-d979-e495-7d5d08d35ded@oracle.com> > > Here are two examples of where a B2 class could safely and > beneficially use B2.val: > > ???? value class Rational { > ????????? Rational[] harmonicSeq(int n) { > ????????????? Rational.val[] rs = new Rational.val[n]; > ????????????? for (int i=0; i ????????????????? rs[i] = new Rational(1, n); > ????????????? return rs; > ????????? } > ???? } > > Here, we've made a _flat_ array of Rational.val, properly > initialized it, and returned it to the user.? THe user gets the > benefit of flatness, but can't screw it up, because of the array > store check.? If Rational.val were illegal, then no array of > rationals could be flat. > > > but you are leaking Rational.val as the class of the array, so one can > write > ? var array = Rational.harmonicSeq(3); > ?var array2 = Arrays.copyOf(array, array.length + 1); > ?var defaultValue = array2[array2.length - 1]; I'm going to ask you, again, to choose your words more carefully. It seems you are suggesting "this model is bad because look, here's a hole."? If that's what you mean, this is a very inappropriate way to engage.? This is a high-level discussion of user model; to pick nits over bad assumptions about low-level details that haven't even been discussed yet, and then use to cast doubt over the design, is the height of unconstructive interaction. If what you mean is "This is great, but don't forget we'll have to address the accessibility model", then you should say that.? But OF COURSE we have to defend the accessibility model.? This is an obvious and normal part of the design process (in fact, John and I were talking about this not one hour ago.)? Don't forget that Object::getClass leaks class mirrors too!? But there are other checks to prevent bad things from happening. > Currently we live in a world where apart of using jdk.unsupported, > there is no way to get an uninitialized object even by reflection, > you propose to shatter that idea and to allow to bypass the > constructor and have access to the default value directly in the language. Words like "shatter" are unnecessarily inflammatory.? Worse, leveling inflammatory accusations before you've even understood what is being proposed, is completely unconstructive. Further, you have been working with us for long enough that you know that we don't casually undermine the safety of the programming model.? So if what we're proposing sounds dangerous, you should think first that maybe you don't fully understand it, before making accusations? Please -- do better. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Jun 14 17:23:45 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Jun 2022 13:23:45 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: <484c248e-10f6-9b3b-9026-1db44bb35521@oracle.com> >> The concern is that we need Foo.nestHost == Rational.nestHost and that >> the common nestHost includes both Foo and Rational as nestMembers. To >> do that, we need to load the nestHost class (if it isn't already). >> Getting the interaction between the access check and the additional >> class loads right (and clearly spec'd) is my concern. > My assumption - which I'm starting to question - is that Foo is an > invalid class if it isn't a nestmate to Rational and that attempts to > load Foo should fail. > > Thinking about this more, there's a second model here which says Foo > is fine (after all we allow other classes to have fields of types they > can never fill in) but all attempts to resolve the 'myRational' field > will fail. This moves the nest mates check to resolution (similar to > existing nest checks) rather than during class definition. Is this > second model more what you had in mind? > Now its my turn to say you're ahead of me :) From a language perspective, if X is inccessible to Y, then ??? class Y { ??????? X x; ???? } will fail compilation.? If such a class sneaks by anyway, whether we reject it when we load Y or when we try to resolve field x, those seem mostly indistinguishable to me from a design perspective, since they'll never be able to do the bad thing, which is use x when it is uninitialized. But (and there's a whole conversation to be had here) it does mean that there is separate access control on LFoo vs QFoo, and we have to either prevent or detect leaks before they let us do something bad (like Y reflectively creating an array of X.val).? But this seems manageable, and not all the different from the sort of leak detection and plugging we do with reflection today. From brian.goetz at oracle.com Tue Jun 14 19:18:39 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Jun 2022 15:18:39 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <484c248e-10f6-9b3b-9026-1db44bb35521@oracle.com> Message-ID: <09b8a72e-fdce-2599-4d47-b93dd6b717c8@oracle.com> On 6/14/2022 2:54 PM, Dan Heidinga wrote: >> But (and there's a whole conversation to be had here) it does mean that >> there is separate access control on LFoo vs QFoo, > Pulling on this thread a little, is it the class that has different > access control or something else? We've meandered a bit over the years on the distinction between the class Foo, the types Foo.ref and Foo.val, and their respective mirrors.? It's probably time for a check-in on where we are there. Today, Integer is a class, with a full-power mirror Integer.class; int is a type, with a limited mirror int.class, whose job is mostly limited to reflecting over field and method descriptors. With Valhalla, Point is a class, with types Point.ref and Point.val; Point.class is a full-power mirror for Point.ref, and Point.val.class is a limited mirror that is analogous to the int mirror.? If you ask a Point for its getClass(), it always returns Point.class.? It would not bother me if the Point.val.class mirror is fully limited, and the only way to do lookups (e.g., getMethods) is to switch over to the class mirror (for which we'd probably have a `Class::getPrimaryClass` method.) Having the two encode separate accessibilities sounds a little messy, but the various Lookup::checkAccess methods key off of a Class, so that seems a reasonable place to hang this information.? I would assume such checks would check both the primary class and then the secondary class, or we'd arrange that the primary mirror always was at least as accessible as the secondary mirror.? (Protected isn't a useful option, and public/package/private are suitably ordered.) At the language level, there is a question about how to present the val class.? One obvious (but possibly bad) idea is to pretend it is a special kind of nested class (which is supported by the Point.val naming convention): ??? value class Rational { ??????? private class val { } ??? } This is a little cheesy, but may fit mostly cleanly into user's mental models.?? In this case, the accessibility is going on something that looks like a class declaration at the source level, and which has a class mirror at the VM level, but it really a "shadow class", just like the class described by int.class (or String[].class.)? This might be OK. At one point I considered whether we could hang this on the accessibility of the no-arg constructor, but I quickly soured on this idea.? But that has an accessibility too. Or we could invent a new kind of member, to describe the val projection, and hang accessibility on that. > To create an identity object, we do access control on the both class > (public/package) and the constructor (public/package/private (aka > nest)). > To create a value object, we do nest mate access control (aka private) > on the bytecodes that create values (aconst_init / withfield). This > proposal extends the nest mates access check to the default values of > Qs. It's not just nestmate access control; it would be reasonable to declare the val as package-access, and trust your package mates too. > In both cases, we're looking at the access control of two things - the > class and the "creator of instances". Are we applying different > access control to LFoo vs QFoo, or to construction mechanisms? The thing we're trying to protect is creation of uninitialized heap-based instances.? But it felt a little weird to draw such a complex line; seemed simpler (and not giving up much) to access-control the type name.? But we can explore this some more. Maybe we are access-controlling the `defaultvalue` bytecode, since its effectively public if someone can create a flat array. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Tue Jun 14 23:40:46 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 14 Jun 2022 23:40:46 +0000 Subject: EG meeting *canceled*, 2022-06-15 Message-ID: I'm on vacation this week, so let's cancel the EG meeting. I'll also be gone next time (June 29), but I'll ask around about interest in meeting anyway. TBD. From brian.goetz at oracle.com Wed Jun 15 15:14:27 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Jun 2022 11:14:27 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <484c248e-10f6-9b3b-9026-1db44bb35521@oracle.com> <09b8a72e-fdce-2599-4d47-b93dd6b717c8@oracle.com> Message-ID: >> With Valhalla, Point is a class, with types Point.ref and Point.val; Point.class is a full-power mirror for Point.ref, and Point.val.class is a limited mirror that is analogous to the int mirror. If you ask a Point for its getClass(), it always returns Point.class. It would not bother me if the Point.val.class mirror is fully limited, and the only way to do lookups (e.g., getMethods) is to switch over to the class mirror (for which we'd probably have a `Class::getPrimaryClass` method.) > That approach seems reasonable for Reflection. For MethodHandles, I > think we'll need to support MethodHandles.Lookup with both the L & Q > version to correctly type the receiver argument, at least for virtual > calls. Agreed.? Which is a straightforward extension of how we already handle the int/Integer story, except that now a receiver is involved too.? Will we need new API surface to allow this to be expressed? >> It's not just nestmate access control; it would be reasonable to declare the val as package-access, and trust your package mates too. > At the VM-level, classes are either public or package. While the > source code also allows specifying private or protected for > innerclasses, the VM doesn't use those bits for access control (though > Reflection sometimes does). That's part of why I'm reaching for > something else to hang the private bit on (and thus trigger the nest > check). > > I don't think we want to teach the VM to use the inner class flags for > *some* access checks as it will be easy to confuse when each set of > flags is used (long bug tail) and will lead to inconsistencies with > existing programs. > > Without some extra "thing" to hang the accessibility bits on, I don't > think we can express public / package / private (nest) in the existing > public/package bits used for classes at the VM level. OK, thanks for connecting those dots for me. From kevinb at google.com Wed Jun 15 16:41:14 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 15 Jun 2022 09:41:14 -0700 Subject: User model stacking: current status In-Reply-To: <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: All else being equal, the idea to use "inaccessible value type" over "value type doesn't exist" feels very good and simplifying, with the main problem that the syntax can't help but be gross. So this makes it a local maximum, but I am persistently troubled by at least 2 broader things. * It feels wrong to restrict access to the type only because of two very specific things we don't want people to do with the type. We don't want them to write `new TheType.val[size]`, and we don't want them to write `TheType.val someUnintializedField;`. Is there a third? And can we really not just prevent those specific things? It feels like baby/bathwater, especially since delayed initialization scenarios like those are already problematic in many ways as it is. * I still am saddled with the deep feeling that ultimate victory here looks like "we don't need a val type, because by capturing the nullness bit and tearability info alone we will make *enough* usage patterns always-optimizable, and we can live with the downsides". To me the upsides of this simplification are enormous, so if we really must reject it, I may need some help understanding why. It's been stated that a non-null value type means something slightly different from a non-null reference type, but I'm not convinced of this; it's just that sometimes you have the technical ability to conjure a "default" instance and sometimes you don't, but nullness of the type means what it means either way. * I think if we plan to go this way (.val), and then we one day have a nullable types feature, some things will then be permanently gross that I would hope we can avoid. For example, nullness *also* demands the concept of bidirectional projection of type variables, and for very overlapping reasons. This puts things in a super weird place. On Mon, Jun 13, 2022 at 4:35 PM Brian Goetz wrote: or, flipping the default: > > value class B3a { > public class val { } > } > (assuming we go this way) A minor point: if we wanted, we could provide a way for this to also *name* the value type, but not allow anything outside java.lang to use it. The benefit would be if it means the word "Integer.val" doesn't have to exist at all, and overall the more we can demystifies how the whole int/Integer business works the more people can understand value types by comparison to them. The drawback is "no fair, why can't we do it too", but the answer to that is easy and compelling and it's easy to see why Integer and friends deserve an exception to it. > It's an orthogonal choice whether the default is "val is private" and "val > is public". > "The default should always be to expose fewer capabilities to users and let them opt into what they actually need" -- earnestly, does anyone know a good counterexample to this rule? An awkwardness of the default being private would just be that it's slightly confusing what is being accomplished by `class val { }` before you realize oh yeah, that's letting other classes in the package access it. Would that be justification for making the default be package visibility (or whatever it's really called), I'm not sure. > On 6/3/2022 3:14 PM, Brian Goetz wrote: > > Continuing to shake this tree. > > I'm glad we went through the exploration of "flattenable B3.ref"; while I > think we probably could address the challenges of tearing across the null > channel / data channels boundary, I'm pretty willing to let this one go. > Similarly I'm glad we went through the "atomicity orthogonal to buckets" > exploration, and am ready to let that one go too. > > What I'm not willing to let go of us making atomicity explicit in the > model. Not only is piggybacking non-atomicity on something like val-ness > too subtle and surprising, but non-atomicity seems like it is a property > that the class author needs to ask for. Flatness is an important benefit, > but only when it doesn't get in the way of safety. > > Recall that we have three different representation techniques: > > - no-flat -- use a pointer > - low-flat -- for sufficiently small (depending on size of atomic > instructions provided by the hardware) values, pack multiple fields into a > single, atomically accessed unit. > - full-flat -- flatten the layout, access individual individual fields > directly, may allow tearing. > > The "low-flat" bucket got some attention recently when we discovered that > there are usable 128-bit atomics on Intel (based on a recent revision of > the chip spec), but this is not a slam-dunk; it requires some serious > compiler heroics to pack multiple values into single accesses. But there > may be targets of opportunity here for single-field values (like Optional) > or final fields. And we can always fall back to no-flat whenever the VM > feels like it. > > One of the questions that has been raised is how similar B3.ref is to B2, > specifically with respect to atomicity. We've gone back and forth on > this. > > Having shaken the tree quite a bit, what feels like the low energy state > to me right now is: > > - The ref type of all on-identity classes are treated uniformly; B3.ref > and B2.ref are translated the same, treated the same, have the same > atomicity, the same nullity, etc. > - The only difference across the spectrum of non-identity classes is the > treatment of the val type. For B2, this means the val type is *illegal*; > for B3, this means it is atomic; for B3n, it is non-atomic (which in > practice will mean more flatness.) > - (controversial) For all types, the ref type is the default. This means > that some current value-based classes can migrate not only to B2, but to B3 > or B3n. (And that we could migrate to B2 today and further to B3 > tomorrow.) > > While this is technically four flavors, I don't think it needs to feel > that complex. I'll pick some obviously silly modifiers for exposition: > > - class B1 { } > - zero-hostile value class B2 { } > - value class B3 { } > - tearing-happy value class B3n { } > > In other words: one new concept ("value class"), with two sub-modifiers > (zero-hostile, and tearing-happy) which affect the behavior of the val type > (forbidden for B2, loosened for B3n.) > > For heap flattening, what this gets us is: > > - B1 -- no-flat > - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) > - B3 -- low-flat (atomic, no null channel) > - B3n -- full-flat (non-atomic, no null channel) > > This is a slight departure from earlier tree-shakings with respect to > tearing. In particular, refs do not tear at all, so programs that use all > refs will never see tearing (but it is still possible to get a torn value > using .val and then box that into a ref.) > > If you turn this around, the declaration-site decision tree becomes: > > - Do I need identity (mutability, subclassing, aliasing)? Then B1. > - Are uninitialized values unacceptable? Then B2. > - Am I willing to tolerate tearing to enable more flattening? Then B3n. > - Otherwise, B3. > > And the use-site decision tree becomes: > > - For B1, B2 -- no choices to make. > - Do I need nullity? Then .ref > - Do I need atomicity, and the class doesn't already provide it? Then > .ref > - Otherwise, can use .val > > The main downside of making ref the default is that people will grumble > about having to say .val at the use site all the time. And they will! And > it does feel a little odd that you have to opt into val-ness at both the > declaration and use sites. But it unlocks a lot of things (see Kevin's > list for more): > > - The default name is the safest version. > - Every unadorned name works the same way; it's always a reference type. > You don't need to maintain a mental database around "which kind of name is > this". > - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than > we had hoped for when we started this game.) > > (The one thing to still worry about is that while refs can't tear, you can > still observe a torn value through a ref, if someone tore it and then boxed > it. I don't see how we defend against this, but the non-atomic label > should be enough of a warning.) > > > > On 5/6/2022 10:04 AM, Brian Goetz wrote: > > In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in > the stacking I've been discussing. Is that what you're saying? > > class B1 { } // ref, identity, atomic > value-based class B2 { } // ref, non-identity, atomic > [ non-atomic ] value class B3 { } // ref or val, zero is ok, both > projections share atomicity > > If we go with ref-default, then this is a small leap from yesterday's > stacking, because "B3" and "B2" are both reference types, so if you want a > tearable, non-atomic reference type, saying `non-atomic value class B3` and > then just using B3 gets you that. Then: > > - B2 is like B1, minus identity > - B3 means "uninitialized values are OK, you get two types, a > zero-default and a non-default" > - Non-atomicity is an extra property we can add to B3, to get more > flattening in exchange for less integrity > - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref > is the default) > > I think this still has the properties I want; I can freely choose the > reasonable subsets of { identity, has-zero, nullable, atomicity } that I > want; the orthogonality of non-atomic across buckets becomes orthogonality > of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be > the "false friend." > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jun 15 17:51:06 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Jun 2022 13:51:06 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> > All else being equal, the idea to use "inaccessible value type" over > "value type doesn't exist" feels very good and simplifying, with the > main problem that the syntax can't help but be gross. Yep. > * It feels wrong to restrict access to the type only because of two > very specific things we don't want people to do with the type. We > don't want them to write `new TheType.val[size]`, and we don't want > them to write `TheType.val someUnintializedField;`. Is there a third? > And can we really not just prevent those specific things? It feels > like baby/bathwater, especially since delayed initialization scenarios > like those are already problematic in many ways as it is. This is where I started; that the thing really being protected against is _creating heap locations with their default value_, which is done by fields and by array creation expressions.? (When we get to specializable generics, there will be a third: using the .val class as a type parameter, since `Foo` might have uninitialized T-valued fields, so a `Foo` could have the same problem as any other class that declares a `P.val` field.) The reason I backed off is that this seemed (a) a hard border to explain, (b) calling users attention to low-level details like heap vs stack values, and (c) the value of being able to use P.val here-but-not-there is not all that strong. Hard to explain.? Understanding this requires understanding a lot of low-level things that many Java developers have never thought about, such as heap vs stack, or the fact that, even if a static field has an initializer, you can still observe its default value with unlucky timing.? Drawing the border to keep the zeroes out will require a lot of context, focus on details we would rather users not dwell on, and will invariably be perceived as a "weird" restriction.? Whereas "this type is private, you can't see it" seems pretty normal. Small incremental benefit.? This one requires appealing to the low-level details, but basically, the benefit of using P.val instead of P.ref in stack contexts (method parameter and returns, locals) appears to be pretty small, because of the excellent calling convention optimization that we get even on (preloaded) L-types of value classes.? The jury is out on "appears", but it is looking that way.? So the argument of "don't sweat the difference between P.ref and P.val except in the heap" seems reasonable.? \ Similarly, if P wants to, it can dispense safely constructed instances of P.val[], which are covariant to the exported P.ref[]. So this felt like a reasonable "worse is better" move, but I'm open to new ideas. > * I still am saddled with the deep feeling that ultimate victory here > looks like "we don't need a val type, because by capturing the > nullness bit and tearability info alone we will make /enough/ usage > patterns always-optimizable, and we can live with the downsides". To > me the upsides of this simplification are enormous, so if we really > must reject it, I may need some help understanding why. It's been > stated that a non-null value type means something slightly different > from a non-null reference type, but I'm not convinced of this; it's > just that sometimes you have the technical ability to conjure a > "default" instance and sometimes you don't, but nullness of the type > means what it means either way. Here's the chain of reasoning that works to get to this state of affairs. ?- Reference types don't tear.? The JMM gives us strong safety guarantees about references to objects with final fields.? We want this to work the same way for references to value objects as well as references to identity objects, because otherwise, the "immutability means thread safety" promise is undermined. ?- P.ref is a reference type; P.ref[] is an array of references. ?- For non-atomic value classes, P.val fields can tear under race (and similarly elements of P.val[]). ?- If we spelled .val as !, then switching from P[] to P![] not only prohibits null elements, but changes the layout and _introduces tearing_.? Hiding tearability behind "non-null" is likely to be a lifetime subscription to Astonishment Digest, since 99.9999 out of 100 Java developers will not be able to say "non-null, oh, that also means I sacrifice atomicity." The link you probably want to attack is this last one, where you are likely to say "well, that's what you opted into when you said `non-atomic`; you just happen to get atomicity for free with references, but that's a bonus." -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Jun 15 17:51:14 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 15 Jun 2022 10:51:14 -0700 Subject: Concerns about the plan for `==` Message-ID: What I think I understand so far: The current plan for `==` for all bucket 2+ types (except the 8 _primitive_ types, as I still use the word) is to have it perform a fieldwise `==` comparison: identity equality for bucket 1 fields, what it's always done for primitive fields, and of course recurse for the rest. If we consider that the broadest meaning of `a == b` has always been "a and b are definitely absolutely indistinguishable no matter what", then this plan seems to compatibly preserve that, which makes sense for purposes of transition. What concerns me: It's good for transition, at least on the surface, but it's a bad long-term outcome. Users hunger for a shorter way to write `.equals()`, and they will think this is it. I would not underestimate the pushback they will experience to writing it out the long way in cases where `==` at least *seems* to do the right thing. Because in some number of cases, it *will* do the same thing; specifically, if you can recurse through your fields and never hit a type that overrides equals(). This is extremely fragile. A legitimate change to one type can break these expectations for all the types directly or indirectly depending on it, no matter how far away. In supporting our Java users here, there's no good stance we can take on it: if we forbid this practice and require them to call `.equals`, we're being overzealous. If we try to help them use it carefully, at best users will stop seeing `Object==Object` as a code smell (as we have spent years training them to do) and then will start misusing it even for reference types again. btw, why did I say it's good for transition "on the surface"? Because for any class a user might migrate to bucket 2+, any existing calls to `==` in the wild are extremely suspect and *should* be revisited anyway; this is no less true here than it is for existing synchronization etc. code. What's an alternative?: I'm sure what I propose is flawed, but I hope the core arguments are compelling enough to at least help me fix it. The problem is that while we *can* retcon `==` as described above, it's not behavior anyone really *wants*. So instead we double down on the idea that non-primitive `==` has always been about identity and must continue to be. That means it has to be invalid for bucket 2+ (at compile-time for the .val type; failing later otherwise?). This would break some usages, but again, only at sites that deserve to be reconsidered anyway. Some bugs will get fixed in the process. And at least it's not the language upgrade itself that breaks them, only the specific decision to move some type to new bucket. Lastly, we don't need to break anyone abruptly; we can roll out warnings as I proposed in the email "We need help to migrate from bucket 1 to 2". A non-record class that forgets to override equals() from Object even upon migrating to bucket 2+ is also suspect. If nothing special is done, it would fail at runtime just like any other usage of `Foo.ref==Foo.ref`, and maybe that's fine. Again, I'm probably missing things, maybe even big things, but I'm just trying to start a discussion. And if this can't happen I am just searching for a solid understanding of why. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Jun 15 18:10:39 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 15 Jun 2022 11:10:39 -0700 Subject: User model stacking: current status In-Reply-To: <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> Message-ID: On Wed, Jun 15, 2022 at 10:51 AM Brian Goetz wrote: - If we spelled .val as !, then switching from P[] to P![] not only > prohibits null elements, but changes the layout and _introduces tearing_. > Hiding tearability behind "non-null" is likely to be a lifetime > subscription to Astonishment Digest, since 99.9999 out of 100 Java > developers will not be able to say "non-null, oh, that also means I > sacrifice atomicity." > Well, that's what you opted into when you... wait a minute... > The link you probably want to attack is this last one, where you are > likely to say "well, that's what you opted into when you said `non-atomic`; > you just happen to get atomicity for free with references, but that's a > bonus." > Your Kevin's Brain Emulator has gotten pretty decent over time... check whether the next things it said were these (probably so): A good clean Basic Conceptual Model For Novices is allowed to have a bunch of asterisks, of the form "well, in $circumstance, this will be revealed to be totally false", and that's not always a strike against the model. How do we discern the difference between a good asterisk and a bad one? How common the circumstance; how recognizable as *being* a special circumstance; how disproportionate a truth discrepancy we're talking about; etc. I know I've said this before. If I'm in a class being taught how this stuff works, and the teacher says "Now unsafe concurrent code can break this in horrible ways, and in $otherClass you will learn what's really going on in the presence of data races" ... I feel fully satisfied by that. I know I won't get away with playing fast and loose with The Concurrency Rules; I'm not advanced enough and might never be. (Many people aren't but *don't *know it, and therein lies the problem, but do we really have much power to protect such people from themselves?) I could be wrong, but I suspect this kind of viewpoint might be more common and respected in the wider world than it is among the rarefied kind of individuals who join expert groups, no offense to anyone here meant. You're always going to see all the details, and you're always going to *want* to see all the details. The general public just hopes the details stay out of their way. When they don't, they have a bad day, but it doesn't mean they were better served by a complex model that tried to account for everything. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jun 15 18:14:36 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Jun 2022 14:14:36 -0400 Subject: Concerns about the plan for `==` In-Reply-To: References: Message-ID: <3753c56c-59c2-61fa-8b0f-32d21c939a85@oracle.com> The way I would interpret this is "Great, you just made == slightly more reliable, but *still unreliable*.? Now people will be even more likely to make mistakes with it."? Right? The basic problem stems from that fact that users want to think in terms of an `eq` operation, and the language needs a "primitive" equals which gets used at the base of the tower of overloaded equals() methods, but the language has given the good name to the thing that users usually don't want.? (Javascript bit the bullet and introduced `===` for this reason, though they had managed to bork up equality much more dramatically before they pulled this ripcord.) Let's also observe that there is currently not just one `==` operation, but nine; there is one for object references, and one for each primitive type, each with its own ad-hoc meaning.? If Valhalla is to deliver on the promise of "programmable primitives", not being able to compare "new primitives" with `==` doesn't feel like "works like an int." The obvious first counter-argument here is "then let me overload `==` for primitives, because otherwise, I can't even write classes like `float` which treat 0.00 as == to -0.00.? To which I might say "I agree, we need to address operator overloading, at least for user-definable numerics, anyway", at which point the current `==` is just the default for classes that don't override `==`. At which point you counter again with "Great, so I can do that for identity classes too?"? At which point we have to face the real problem. The root problem of `==` and `.equals()` is that *Object::equals was fundamentally the wrong design.*? Josh spends half a dozen Items in EJ about "how not to shoot your foot with equals".? And this comes from two roots: equals never should have been an instance method in the first place, and we want to be able to say that instances of C can sometimes be equals to instances of subclasses of C, under the right conditions.? This is not a stable design.? (An example of a more stable design is where equality is a witness to `Eq t`, where both sides have to agree on the definition before they can be compared.)? But that runs into the desire for equality across subclasses.? The design for equality, in the context of extensible classes, is on a collision course with reality. What this says is that `==` for identity classes will remain a permanent toxic waste zone, but it can be made to behave well -- in fact, the way people want -- for values.? Which is truly a glass half everything; the fact that we can get what we want on one side, makes it more galling that we can never get what we want on the other side. We can choose to drain the glass preemptively to avoid regret, or fill the glass halfway with something good now (and better later), with the understanding that this glass will not be fully filled.? Both choices suck.? (As does papering over both with `===`.) How's your day going? On 6/15/2022 1:51 PM, Kevin Bourrillion wrote: > What I think I understand so far: > > The current plan for `==` for all bucket 2+ types (except the 8 > _primitive_ types, as I still use the word) is to have it perform a > fieldwise `==` comparison: identity equality for bucket 1 fields, what > it's always done for primitive fields, and of course recurse for the rest. > > If we consider that the broadest meaning of `a == b` has always been > "a and b are definitely absolutely indistinguishable no matter what", > then this plan seems to compatibly?preserve that, which makes sense > for purposes of transition. > > What concerns me: > > It's good for transition, at least on the surface, but it's a bad > long-term outcome. > > Users hunger for a shorter way to write `.equals()`, and they will > think this is it. I would not underestimate the pushback they will > experience to writing it out the long way in cases where `==` at least > *seems* to do the right thing. Because in some number of cases, it > *will* do the same thing; specifically, if you can recurse through > your fields and never hit a type that overrides equals(). > > This is extremely fragile. A legitimate change to one type can break > these expectations for all the types directly or indirectly depending > on it, no matter how far away. > > In supporting our Java users here, there's no good stance we can take > on it: if we forbid this practice and require them to call `.equals`, > we're being overzealous. If we try to help them use it carefully, at > best users will stop seeing `Object==Object` as a code smell (as we > have spent years training them to do) and then will start misusing it > even for reference types again. > > btw, why did I say it's good for transition "on the surface"? Because > for any class a user might migrate to bucket 2+, any existing calls to > `==` in the wild are extremely suspect and *should* be revisited > anyway; this is no less true here than it is for existing > synchronization etc. code. > > What's an alternative?: > > I'm sure what I propose is flawed, but I hope the core arguments are > compelling enough to at least help me fix it. > > The problem is that while we /can/?retcon `==` as described above, > it's not behavior anyone? really /wants/. So instead we double down on > the idea that non-primitive `==` has always been about identity and > must continue to be. That means it has to be invalid for bucket 2+ (at > compile-time for the .val type; failing later otherwise?). > > This would break some usages, but again, only at sites that deserve to > be reconsidered anyway. Some bugs will get fixed in the process. And > at least it's not the language upgrade itself that breaks them, only > the specific decision to move some type to new bucket. Lastly, we > don't need to break anyone abruptly; we can roll out warnings as I > proposed in the email "We need help to migrate from bucket 1 to 2". > > A non-record class that forgets to override equals() from Object even > upon migrating to bucket 2+ is also suspect. If nothing special is > done, it would fail at runtime just like any other usage of > `Foo.ref==Foo.ref`, and maybe that's fine. > > Again, I'm probably missing things, maybe even big things, but I'm > just trying to start a discussion. And if this can't happen I am just > searching for a solid understanding of why. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jun 15 18:19:11 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Jun 2022 14:19:11 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> Message-ID: All fair.? So what this comes down to is whether the uniformity of "ref types are all the same" is one that we want to have trumped by "non-atomic classes tear in all conditions."? Truly a battle for conceptual supremacy. On 6/15/2022 2:10 PM, Kevin Bourrillion wrote: > On Wed, Jun 15, 2022 at 10:51 AM Brian Goetz > wrote: > > ?- If we spelled .val as !, then switching from P[] to P![] not > only prohibits null elements, but changes the layout and > _introduces tearing_.? Hiding tearability behind "non-null" is > likely to be a lifetime subscription to Astonishment Digest, since > 99.9999 out of 100 Java developers will not be able to say > "non-null, oh, that also means I sacrifice atomicity." > > > Well, that's what you opted into when you... wait a minute... > > The link you probably want to attack is this last one, where you > are likely to say "well, that's what you opted into when you said > `non-atomic`; you just happen to get atomicity for free with > references, but that's a bonus." > > > Your Kevin's Brain Emulator has gotten pretty decent over time... > check whether the next things it said were these (probably so): > > A good clean Basic Conceptual Model For Novices is allowed to have a > bunch of asterisks, of the form "well, in $circumstance, this will be > revealed to be totally false", and that's not always a strike against > the model. How do we discern the difference between a good asterisk > and a bad one? How common the circumstance; how recognizable as > /being/?a special circumstance; how disproportionate a truth > discrepancy we're talking about; etc. > > I know I've said this before. If I'm in a class being taught how this > stuff works, and the teacher says "Now unsafe concurrent code can > break this in horrible ways, and in $otherClass you will learn what's > really going on in the presence of data races" ... I feel fully > satisfied by that. I know I won't get away with playing fast and loose > with The Concurrency Rules; I'm not advanced enough and might never > be. (Many people aren't but /don't /know it, and therein lies the > problem, but do we really have much power to protect such people from > themselves?) > > I could be wrong, but I suspect this kind of viewpoint might be more > common and respected in the wider world than it is among the rarefied > kind of individuals who join expert groups, no offense to anyone here > meant. You're always going to see all the details, and you're always > going to /want/?to see all the details. The general public just hopes > the details stay out of their way. When they don't, they have a bad > day, but it doesn't mean they were better served by a complex model > that tried to account for everything. > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jun 15 19:01:51 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 15 Jun 2022 15:01:51 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> Message-ID: <009a5d59-0d32-2ebb-4f20-72e99eb18c51@oracle.com> OK, let's say for sake of argument that "well, that's what you opted into."? Non-atomic means no one can count on cross-field integrity; don't select non-atomic if you have invariants to protect.? OK fine.? And let's flip over to what T! means. Let's say that T! is a restriction type; it can take on the values of T, except for those prohibited by the restriction "t != null".? So, what is the default value of `String!`? For locals, it's pretty clear we don't have to answer, because locals cannot be accessed unless they are DA at the point of access.? But for fields, we have a problem -- and for arrays, a bigger one.? We can try to require that fields have initializers, but there are all sorts of situations in which a field can be read before its initializer runs.? And arrays are much worse. Which I think connects back to your question about "are we throwing out the baby with the bathwater when we choose to encapsulate the whole type rather than just its use in fields or array components" -- that `String!` is a type that we can really only use in locals, parameters, and return types, but not in fields or array components.? !!!!? Didn't see that connection coming, though I guess I should have.? (I'm sure John did.) So one possible perverse answer here -- one that you probably hate -- is that we *can* spell .val as !, but then ! in fields / array components are restricted to classes that have a good default -- and that excludes all identity classes. I swear I didn't think that's where this mail was going to end up. On 6/15/2022 2:10 PM, Kevin Bourrillion wrote: > On Wed, Jun 15, 2022 at 10:51 AM Brian Goetz > wrote: > > ?- If we spelled .val as !, then switching from P[] to P![] not > only prohibits null elements, but changes the layout and > _introduces tearing_.? Hiding tearability behind "non-null" is > likely to be a lifetime subscription to Astonishment Digest, since > 99.9999 out of 100 Java developers will not be able to say > "non-null, oh, that also means I sacrifice atomicity." > > > Well, that's what you opted into when you... wait a minute... > > The link you probably want to attack is this last one, where you > are likely to say "well, that's what you opted into when you said > `non-atomic`; you just happen to get atomicity for free with > references, but that's a bonus." > > > Your Kevin's Brain Emulator has gotten pretty decent over time... > check whether the next things it said were these (probably so): > > A good clean Basic Conceptual Model For Novices is allowed to have a > bunch of asterisks, of the form "well, in $circumstance, this will be > revealed to be totally false", and that's not always a strike against > the model. How do we discern the difference between a good asterisk > and a bad one? How common the circumstance; how recognizable as > /being/?a special circumstance; how disproportionate a truth > discrepancy we're talking about; etc. > > I know I've said this before. If I'm in a class being taught how this > stuff works, and the teacher says "Now unsafe concurrent code can > break this in horrible ways, and in $otherClass you will learn what's > really going on in the presence of data races" ... I feel fully > satisfied by that. I know I won't get away with playing fast and loose > with The Concurrency Rules; I'm not advanced enough and might never > be. (Many people aren't but /don't /know it, and therein lies the > problem, but do we really have much power to protect such people from > themselves?) > > I could be wrong, but I suspect this kind of viewpoint might be more > common and respected in the wider world than it is among the rarefied > kind of individuals who join expert groups, no offense to anyone here > meant. You're always going to see all the details, and you're always > going to /want/?to see all the details. The general public just hopes > the details stay out of their way. When they don't, they have a bad > day, but it doesn't mean they were better served by a complex model > that tried to account for everything. > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Tue Jun 14 13:26:27 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 14 Jun 2022 09:26:27 -0400 Subject: User model stacking: current status In-Reply-To: <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <187309785.7141993.1655190785532.JavaMail.zimbra@u-pem.fr> <55fa3f2f-7571-b94c-0b90-2bbeb7937a6a@oracle.com> Message-ID: On Tue, Jun 14, 2022 at 9:17 AM Brian Goetz wrote: > > > The val type for B2 should not exist at all > > > > So B2 is really a B3a whose value projection is encapsulated. > > > and here you lost me, .ref and .val are supposed to be projection types not classes, at runtime there is only one class. > > > And apparently I have to say this again .... It's fine to not understand what is being proposed. If so, ask questions, or think about it for a few days before responding. But it's Not OK to jump to dogmatic "should not" / "wrong" pronouncements before you understand what is being proposed. That's just unhelpful. > > > > Summary: > > class B1 { } > value class B2 { private class val { } } > value class B3a { } > non-atomic value class B3n { } > > Value class here is clearly the star of the show; all value classes are treated uniformly (ref-default, have a val); some value classes encapsulate the val type; some value classes further relax the integrity requirements of instances on the heap, to get better flattening and performance, when their semantics don't require it. > > It's an orthogonal choice whether the default is "val is private" and "val is public". > > > It makes B2.val a reality, but B3 has no sane default value otherwise it's a B3, so B2.val should not exist. > > > Let me try explaining again. > > All value types have .ref and .val types. They have the properties we've been discussing for a long time: ref types are references, and are therefore nullable and atomic; val types are direct values, are not nullable, and are _not necessarily_ atomic. > > We've been describing B2 classes as those with "no good default", but that doesn't mean that they can't have a .val type. It means we *can't trust arbitrary code to properly initialize a B2.val type.* Once initialized, B2.val is fine, and have the benefit of greater flatness. We explored language and VM features to ensure B2.val types are properly initialized, but that ran into the rocks. > > But we can allow the B2 class itself to mediate access to the .val type. This has two benefits: > > - We can get back some of the benefit of flattening B2.val types > - Uniformity > > Here are two examples of where a B2 class could safely and beneficially use B2.val: > > value class Rational { > Rational[] harmonicSeq(int n) { > Rational.val[] rs = new Rational.val[n]; > for (int i=0; i rs[i] = new Rational(1, n); > return rs; > } > } > > Here, we've made a _flat_ array of Rational.val, properly initialized it, and returned it to the user. THe user gets the benefit of flatness, but can't screw it up, because of the array store check. If Rational.val were illegal, then no array of rationals could be flat. And with Rational.val requiring atomic access, we can only flatten it if the underlying HW supports it (in this case, 2 ints fits nicely in 64bits so we're good). Larger .val's can only be flattened if marked as "non-atomic" (the B3n case). And because there's no tearing, handing out the flattened Rational.val[] is safe. Do I have that right? --Dan > > Similarly, a nestmate could take advantage of packing: > > value class Complex { > value class C3 { > Complex.val x, y, z; > > ... > } > > C3 c3(Complex x, Complex y, Complex z) { return new C3(x, y, z); } > > } > > C3 gets the benefit of full flattening, which it can do because its in the nest; it can share the flattened instances safely with code outside the nest. > > (Access control is powerful thing.) > > From heidinga at redhat.com Tue Jun 14 14:12:25 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 14 Jun 2022 10:12:25 -0400 Subject: User model stacking: current status In-Reply-To: <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: Overall, I like this proposal. It gives us one new "thing" value classes, and then some additional rules on exposing the zero (or not) and right defaults for atomic-ness. One concern, sketched out below, about using the nest for accessibility during class loading. Mostly an area where we'll need to look at the existing spec closely. Details below. On Mon, Jun 13, 2022 at 7:36 PM Brian Goetz wrote: > > I've done a little more shaking of this tree. It involves keeping the notion that the non-identity buckets differ only in the treatment of their val projection, but makes a further normalization that enables the buckets to mostly collapse away. > > "value class X" means: > > - Instances are identity-free > - There are two types, X.ref (reference, nullable) and X.val (direct, non-nullable) > - Reference types are atomic, as always > - X is an alias for X.ref > > Now, what is the essence of B2? B2 means not "I hate zeros", but "I don't like that uninitialized variables are initialized to zero." It doesn't mean the .val projection is meaningless, it means that we don't trust arbitrary clients with it. So, we can make a slight adjustment: > > - The .val type is always there, but for "B2" classes, it is *inaccessible outside the nest*, as per ordinary accessibility. Is this the first time we'll be checking nest mate accessibility at class creation? If so (and I think it is) we'll need to update the spec to define when the nest mates + nest host can be loaded to complete this check in the (already complicated) class loading process. The case I'm thinking of is needing to do the accessibility check on the defining class of a static field (and possibly an instance field) when defining a class like: class Foo { static QRational myRational; } To know if Foo can have a field of Rational.val, we need to check both Foo and Rational are in the same nest. This will require additional class loads mitigated somewhat by the existing rules for preloading Qs. So maybe we can do the nest check there? We'll probably need to make this explicit in the spec that these additional classes can be loaded as part of the accessible check during class definition. Another option would be to delay the nest check until either the or instance methods, until the "new" bytecode? I like that less but it may be easier to fit into the spec. --Dan > > This means that within the nest, code that understands the restrictions can, say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let the zero escape. This gives B2 classes a lot more latitude to use the .val type in safe ways. Basically: if you don't trust people with the .val type, don't let the val type escape. > > There's a bikeshed to paint, but it might look something like: > > value class B2 { > private class val { } > } > > or, flipping the default: > > value class B3a { > public class val { } > } > > So B2 is really a B3a whose value projection is encapsulated. > > The other bucket, B3n, I think can live with a modifier: > > non-atomic value class B3n { } > > While these are all the same buckets as before, this feels much more like "one new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't think of this as creating a different bucket of fields.) > > Summary: > > class B1 { } > value class B2 { private class val { } } > value class B3a { } > non-atomic value class B3n { } > > Value class here is clearly the star of the show; all value classes are treated uniformly (ref-default, have a val); some value classes encapsulate the val type; some value classes further relax the integrity requirements of instances on the heap, to get better flattening and performance, when their semantics don't require it. > > It's an orthogonal choice whether the default is "val is private" and "val is public". > > > > On 6/3/2022 3:14 PM, Brian Goetz wrote: > > Continuing to shake this tree. > > I'm glad we went through the exploration of "flattenable B3.ref"; while I think we probably could address the challenges of tearing across the null channel / data channels boundary, I'm pretty willing to let this one go. Similarly I'm glad we went through the "atomicity orthogonal to buckets" exploration, and am ready to let that one go too. > > What I'm not willing to let go of us making atomicity explicit in the model. Not only is piggybacking non-atomicity on something like val-ness too subtle and surprising, but non-atomicity seems like it is a property that the class author needs to ask for. Flatness is an important benefit, but only when it doesn't get in the way of safety. > > Recall that we have three different representation techniques: > > - no-flat -- use a pointer > - low-flat -- for sufficiently small (depending on size of atomic instructions provided by the hardware) values, pack multiple fields into a single, atomically accessed unit. > - full-flat -- flatten the layout, access individual individual fields directly, may allow tearing. > > The "low-flat" bucket got some attention recently when we discovered that there are usable 128-bit atomics on Intel (based on a recent revision of the chip spec), but this is not a slam-dunk; it requires some serious compiler heroics to pack multiple values into single accesses. But there may be targets of opportunity here for single-field values (like Optional) or final fields. And we can always fall back to no-flat whenever the VM feels like it. > > One of the questions that has been raised is how similar B3.ref is to B2, specifically with respect to atomicity. We've gone back and forth on this. > > Having shaken the tree quite a bit, what feels like the low energy state to me right now is: > > - The ref type of all on-identity classes are treated uniformly; B3.ref and B2.ref are translated the same, treated the same, have the same atomicity, the same nullity, etc. > - The only difference across the spectrum of non-identity classes is the treatment of the val type. For B2, this means the val type is *illegal*; for B3, this means it is atomic; for B3n, it is non-atomic (which in practice will mean more flatness.) > - (controversial) For all types, the ref type is the default. This means that some current value-based classes can migrate not only to B2, but to B3 or B3n. (And that we could migrate to B2 today and further to B3 tomorrow.) > > While this is technically four flavors, I don't think it needs to feel that complex. I'll pick some obviously silly modifiers for exposition: > > - class B1 { } > - zero-hostile value class B2 { } > - value class B3 { } > - tearing-happy value class B3n { } > > In other words: one new concept ("value class"), with two sub-modifiers (zero-hostile, and tearing-happy) which affect the behavior of the val type (forbidden for B2, loosened for B3n.) > > For heap flattening, what this gets us is: > > - B1 -- no-flat > - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) > - B3 -- low-flat (atomic, no null channel) > - B3n -- full-flat (non-atomic, no null channel) > > This is a slight departure from earlier tree-shakings with respect to tearing. In particular, refs do not tear at all, so programs that use all refs will never see tearing (but it is still possible to get a torn value using .val and then box that into a ref.) > > If you turn this around, the declaration-site decision tree becomes: > > - Do I need identity (mutability, subclassing, aliasing)? Then B1. > - Are uninitialized values unacceptable? Then B2. > - Am I willing to tolerate tearing to enable more flattening? Then B3n. > - Otherwise, B3. > > And the use-site decision tree becomes: > > - For B1, B2 -- no choices to make. > - Do I need nullity? Then .ref > - Do I need atomicity, and the class doesn't already provide it? Then .ref > - Otherwise, can use .val > > The main downside of making ref the default is that people will grumble about having to say .val at the use site all the time. And they will! And it does feel a little odd that you have to opt into val-ness at both the declaration and use sites. But it unlocks a lot of things (see Kevin's list for more): > > - The default name is the safest version. > - Every unadorned name works the same way; it's always a reference type. You don't need to maintain a mental database around "which kind of name is this". > - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had hoped for when we started this game.) > > (The one thing to still worry about is that while refs can't tear, you can still observe a torn value through a ref, if someone tore it and then boxed it. I don't see how we defend against this, but the non-atomic label should be enough of a warning.) > > > > On 5/6/2022 10:04 AM, Brian Goetz wrote: > > In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the stacking I've been discussing. Is that what you're saying? > > class B1 { } // ref, identity, atomic > value-based class B2 { } // ref, non-identity, atomic > [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections share atomicity > > If we go with ref-default, then this is a small leap from yesterday's stacking, because "B3" and "B2" are both reference types, so if you want a tearable, non-atomic reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then: > > - B2 is like B1, minus identity > - B3 means "uninitialized values are OK, you get two types, a zero-default and a non-default" > - Non-atomicity is an extra property we can add to B3, to get more flattening in exchange for less integrity > - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the default) > > I think this still has the properties I want; I can freely choose the reasonable subsets of { identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be the "false friend." > > > From heidinga at redhat.com Tue Jun 14 15:29:28 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 14 Jun 2022 11:29:28 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: On Tue, Jun 14, 2022 at 10:19 AM Brian Goetz wrote: > > It took me a while to understand your concern, but I think I have it now > -- it is that we're effectively doing separate access control on LFoo > and QFoo. At the language level this is no problem, but the VM needs a > story here. Is this the whole of your concern, or is there more? > That's further along than I had thought. My concern was more base level relating to the rules around class loading during a class definition. We had to carefully spec when the Qs and Preload classes are loaded to ensure it fit well in the existing class definition process. Nest access checks add another set of classes to load at potentially a new point in the process. Separate access control is fine assuming we can express that with a single NestMembers attribute as there is only one classfile shared by both the L & Q. > >> - The .val type is always there, but for "B2" classes, it is *inaccessible outside the nest*, as per ordinary accessibility. > > Is this the first time we'll be checking nest mate accessibility at > > class creation? If so (and I think it is) we'll need to update the > > spec to define when the nest mates + nest host can be loaded to > > complete this check in the (already complicated) class loading > > process. > > > > The case I'm thinking of is needing to do the accessibility check on > > the defining class of a static field (and possibly an instance field) > > when defining a class like: > > > > class Foo { > > static QRational myRational; > > } > > > > To know if Foo can have a field of Rational.val, we need to check both > > Foo and Rational are in the same nest. > > First you need to check that Rational is accessible, and *then* you need > to check that QRational satisfies the additional accessibility > requirements, based on the public/package/private accessibility of the Q > type. Right? The concern is that we need Foo.nestHost == Rational.nestHost and that the common nestHost includes both Foo and Rational as nestMembers. To do that, we need to load the nestHost class (if it isn't already). Getting the interaction between the access check and the additional class loads right (and clearly spec'd) is my concern. I tried looking through Dan S.'s latest spec draft and I don't see the special "go and look" class load treatment for Q types or when classes listed in the Preload attribute are loaded. I'm not sure when to do the nestHost loading to complete the access check in this process. --Dan > > > This will require additional > > class loads mitigated somewhat by the existing rules for preloading > > Qs. So maybe we can do the nest check there? We'll probably need to > > make this explicit in the spec that these additional classes can be > > loaded as part of the accessible check during class definition. > > > > Another option would be to delay the nest check until either the > > or instance methods, until the "new" bytecode? I like that > > less but it may be easier to fit into the spec. > > > > --Dan > > > >> This means that within the nest, code that understands the restrictions can, say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let the zero escape. This gives B2 classes a lot more latitude to use the .val type in safe ways. Basically: if you don't trust people with the .val type, don't let the val type escape. > >> > >> There's a bikeshed to paint, but it might look something like: > >> > >> value class B2 { > >> private class val { } > >> } > >> > >> or, flipping the default: > >> > >> value class B3a { > >> public class val { } > >> } > >> > >> So B2 is really a B3a whose value projection is encapsulated. > >> > >> The other bucket, B3n, I think can live with a modifier: > >> > >> non-atomic value class B3n { } > >> > >> While these are all the same buckets as before, this feels much more like "one new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't think of this as creating a different bucket of fields.) > >> > >> Summary: > >> > >> class B1 { } > >> value class B2 { private class val { } } > >> value class B3a { } > >> non-atomic value class B3n { } > >> > >> Value class here is clearly the star of the show; all value classes are treated uniformly (ref-default, have a val); some value classes encapsulate the val type; some value classes further relax the integrity requirements of instances on the heap, to get better flattening and performance, when their semantics don't require it. > >> > >> It's an orthogonal choice whether the default is "val is private" and "val is public". > >> > >> > >> > >> On 6/3/2022 3:14 PM, Brian Goetz wrote: > >> > >> Continuing to shake this tree. > >> > >> I'm glad we went through the exploration of "flattenable B3.ref"; while I think we probably could address the challenges of tearing across the null channel / data channels boundary, I'm pretty willing to let this one go. Similarly I'm glad we went through the "atomicity orthogonal to buckets" exploration, and am ready to let that one go too. > >> > >> What I'm not willing to let go of us making atomicity explicit in the model. Not only is piggybacking non-atomicity on something like val-ness too subtle and surprising, but non-atomicity seems like it is a property that the class author needs to ask for. Flatness is an important benefit, but only when it doesn't get in the way of safety. > >> > >> Recall that we have three different representation techniques: > >> > >> - no-flat -- use a pointer > >> - low-flat -- for sufficiently small (depending on size of atomic instructions provided by the hardware) values, pack multiple fields into a single, atomically accessed unit. > >> - full-flat -- flatten the layout, access individual individual fields directly, may allow tearing. > >> > >> The "low-flat" bucket got some attention recently when we discovered that there are usable 128-bit atomics on Intel (based on a recent revision of the chip spec), but this is not a slam-dunk; it requires some serious compiler heroics to pack multiple values into single accesses. But there may be targets of opportunity here for single-field values (like Optional) or final fields. And we can always fall back to no-flat whenever the VM feels like it. > >> > >> One of the questions that has been raised is how similar B3.ref is to B2, specifically with respect to atomicity. We've gone back and forth on this. > >> > >> Having shaken the tree quite a bit, what feels like the low energy state to me right now is: > >> > >> - The ref type of all on-identity classes are treated uniformly; B3.ref and B2.ref are translated the same, treated the same, have the same atomicity, the same nullity, etc. > >> - The only difference across the spectrum of non-identity classes is the treatment of the val type. For B2, this means the val type is *illegal*; for B3, this means it is atomic; for B3n, it is non-atomic (which in practice will mean more flatness.) > >> - (controversial) For all types, the ref type is the default. This means that some current value-based classes can migrate not only to B2, but to B3 or B3n. (And that we could migrate to B2 today and further to B3 tomorrow.) > >> > >> While this is technically four flavors, I don't think it needs to feel that complex. I'll pick some obviously silly modifiers for exposition: > >> > >> - class B1 { } > >> - zero-hostile value class B2 { } > >> - value class B3 { } > >> - tearing-happy value class B3n { } > >> > >> In other words: one new concept ("value class"), with two sub-modifiers (zero-hostile, and tearing-happy) which affect the behavior of the val type (forbidden for B2, loosened for B3n.) > >> > >> For heap flattening, what this gets us is: > >> > >> - B1 -- no-flat > >> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) > >> - B3 -- low-flat (atomic, no null channel) > >> - B3n -- full-flat (non-atomic, no null channel) > >> > >> This is a slight departure from earlier tree-shakings with respect to tearing. In particular, refs do not tear at all, so programs that use all refs will never see tearing (but it is still possible to get a torn value using .val and then box that into a ref.) > >> > >> If you turn this around, the declaration-site decision tree becomes: > >> > >> - Do I need identity (mutability, subclassing, aliasing)? Then B1. > >> - Are uninitialized values unacceptable? Then B2. > >> - Am I willing to tolerate tearing to enable more flattening? Then B3n. > >> - Otherwise, B3. > >> > >> And the use-site decision tree becomes: > >> > >> - For B1, B2 -- no choices to make. > >> - Do I need nullity? Then .ref > >> - Do I need atomicity, and the class doesn't already provide it? Then .ref > >> - Otherwise, can use .val > >> > >> The main downside of making ref the default is that people will grumble about having to say .val at the use site all the time. And they will! And it does feel a little odd that you have to opt into val-ness at both the declaration and use sites. But it unlocks a lot of things (see Kevin's list for more): > >> > >> - The default name is the safest version. > >> - Every unadorned name works the same way; it's always a reference type. You don't need to maintain a mental database around "which kind of name is this". > >> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had hoped for when we started this game.) > >> > >> (The one thing to still worry about is that while refs can't tear, you can still observe a torn value through a ref, if someone tore it and then boxed it. I don't see how we defend against this, but the non-atomic label should be enough of a warning.) > >> > >> > >> > >> On 5/6/2022 10:04 AM, Brian Goetz wrote: > >> > >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the stacking I've been discussing. Is that what you're saying? > >> > >> class B1 { } // ref, identity, atomic > >> value-based class B2 { } // ref, non-identity, atomic > >> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections share atomicity > >> > >> If we go with ref-default, then this is a small leap from yesterday's stacking, because "B3" and "B2" are both reference types, so if you want a tearable, non-atomic reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then: > >> > >> - B2 is like B1, minus identity > >> - B3 means "uninitialized values are OK, you get two types, a zero-default and a non-default" > >> - Non-atomicity is an extra property we can add to B3, to get more flattening in exchange for less integrity > >> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the default) > >> > >> I think this still has the properties I want; I can freely choose the reasonable subsets of { identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be the "false friend." > >> > >> > >> > From heidinga at redhat.com Tue Jun 14 17:13:52 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 14 Jun 2022 13:13:52 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: On Tue, Jun 14, 2022 at 11:29 AM Dan Heidinga wrote: > > On Tue, Jun 14, 2022 at 10:19 AM Brian Goetz wrote: > > > > It took me a while to understand your concern, but I think I have it now > > -- it is that we're effectively doing separate access control on LFoo > > and QFoo. At the language level this is no problem, but the VM needs a > > story here. Is this the whole of your concern, or is there more? > > > > That's further along than I had thought. My concern was more base > level relating to the rules around class loading during a class > definition. We had to carefully spec when the Qs and Preload classes > are loaded to ensure it fit well in the existing class definition > process. Nest access checks add another set of classes to load at > potentially a new point in the process. > > Separate access control is fine assuming we can express that with a > single NestMembers attribute as there is only one classfile shared by > both the L & Q. > > > >> - The .val type is always there, but for "B2" classes, it is *inaccessible outside the nest*, as per ordinary accessibility. > > > Is this the first time we'll be checking nest mate accessibility at > > > class creation? If so (and I think it is) we'll need to update the > > > spec to define when the nest mates + nest host can be loaded to > > > complete this check in the (already complicated) class loading > > > process. > > > > > > The case I'm thinking of is needing to do the accessibility check on > > > the defining class of a static field (and possibly an instance field) > > > when defining a class like: > > > > > > class Foo { > > > static QRational myRational; > > > } > > > > > > To know if Foo can have a field of Rational.val, we need to check both > > > Foo and Rational are in the same nest. > > > > First you need to check that Rational is accessible, and *then* you need > > to check that QRational satisfies the additional accessibility > > requirements, based on the public/package/private accessibility of the Q > > type. Right? > > The concern is that we need Foo.nestHost == Rational.nestHost and that > the common nestHost includes both Foo and Rational as nestMembers. To > do that, we need to load the nestHost class (if it isn't already). > Getting the interaction between the access check and the additional > class loads right (and clearly spec'd) is my concern. My assumption - which I'm starting to question - is that Foo is an invalid class if it isn't a nestmate to Rational and that attempts to load Foo should fail. Thinking about this more, there's a second model here which says Foo is fine (after all we allow other classes to have fields of types they can never fill in) but all attempts to resolve the 'myRational' field will fail. This moves the nest mates check to resolution (similar to existing nest checks) rather than during class definition. Is this second model more what you had in mind? If the second model is the intended, that means Foo.myRational can be a Q type, painted with zeros by the VM, but no one can ever access it as the resolve check will always fail.... which seems OK. Reflection already does the access checking here (apart from #setAccessible) so I think this avoids my concern. --Dan > > I tried looking through Dan S.'s latest spec draft and I don't see the > special "go and look" class load treatment for Q types or when classes > listed in the Preload attribute are loaded. I'm not sure when to do > the nestHost loading to complete the access check in this process. > > --Dan > > > > > > > This will require additional > > > class loads mitigated somewhat by the existing rules for preloading > > > Qs. So maybe we can do the nest check there? We'll probably need to > > > make this explicit in the spec that these additional classes can be > > > loaded as part of the accessible check during class definition. > > > > > > Another option would be to delay the nest check until either the > > > or instance methods, until the "new" bytecode? I like that > > > less but it may be easier to fit into the spec. > > > > > > --Dan > > > > > >> This means that within the nest, code that understands the restrictions can, say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let the zero escape. This gives B2 classes a lot more latitude to use the .val type in safe ways. Basically: if you don't trust people with the .val type, don't let the val type escape. > > >> > > >> There's a bikeshed to paint, but it might look something like: > > >> > > >> value class B2 { > > >> private class val { } > > >> } > > >> > > >> or, flipping the default: > > >> > > >> value class B3a { > > >> public class val { } > > >> } > > >> > > >> So B2 is really a B3a whose value projection is encapsulated. > > >> > > >> The other bucket, B3n, I think can live with a modifier: > > >> > > >> non-atomic value class B3n { } > > >> > > >> While these are all the same buckets as before, this feels much more like "one new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't think of this as creating a different bucket of fields.) > > >> > > >> Summary: > > >> > > >> class B1 { } > > >> value class B2 { private class val { } } > > >> value class B3a { } > > >> non-atomic value class B3n { } > > >> > > >> Value class here is clearly the star of the show; all value classes are treated uniformly (ref-default, have a val); some value classes encapsulate the val type; some value classes further relax the integrity requirements of instances on the heap, to get better flattening and performance, when their semantics don't require it. > > >> > > >> It's an orthogonal choice whether the default is "val is private" and "val is public". > > >> > > >> > > >> > > >> On 6/3/2022 3:14 PM, Brian Goetz wrote: > > >> > > >> Continuing to shake this tree. > > >> > > >> I'm glad we went through the exploration of "flattenable B3.ref"; while I think we probably could address the challenges of tearing across the null channel / data channels boundary, I'm pretty willing to let this one go. Similarly I'm glad we went through the "atomicity orthogonal to buckets" exploration, and am ready to let that one go too. > > >> > > >> What I'm not willing to let go of us making atomicity explicit in the model. Not only is piggybacking non-atomicity on something like val-ness too subtle and surprising, but non-atomicity seems like it is a property that the class author needs to ask for. Flatness is an important benefit, but only when it doesn't get in the way of safety. > > >> > > >> Recall that we have three different representation techniques: > > >> > > >> - no-flat -- use a pointer > > >> - low-flat -- for sufficiently small (depending on size of atomic instructions provided by the hardware) values, pack multiple fields into a single, atomically accessed unit. > > >> - full-flat -- flatten the layout, access individual individual fields directly, may allow tearing. > > >> > > >> The "low-flat" bucket got some attention recently when we discovered that there are usable 128-bit atomics on Intel (based on a recent revision of the chip spec), but this is not a slam-dunk; it requires some serious compiler heroics to pack multiple values into single accesses. But there may be targets of opportunity here for single-field values (like Optional) or final fields. And we can always fall back to no-flat whenever the VM feels like it. > > >> > > >> One of the questions that has been raised is how similar B3.ref is to B2, specifically with respect to atomicity. We've gone back and forth on this. > > >> > > >> Having shaken the tree quite a bit, what feels like the low energy state to me right now is: > > >> > > >> - The ref type of all on-identity classes are treated uniformly; B3.ref and B2.ref are translated the same, treated the same, have the same atomicity, the same nullity, etc. > > >> - The only difference across the spectrum of non-identity classes is the treatment of the val type. For B2, this means the val type is *illegal*; for B3, this means it is atomic; for B3n, it is non-atomic (which in practice will mean more flatness.) > > >> - (controversial) For all types, the ref type is the default. This means that some current value-based classes can migrate not only to B2, but to B3 or B3n. (And that we could migrate to B2 today and further to B3 tomorrow.) > > >> > > >> While this is technically four flavors, I don't think it needs to feel that complex. I'll pick some obviously silly modifiers for exposition: > > >> > > >> - class B1 { } > > >> - zero-hostile value class B2 { } > > >> - value class B3 { } > > >> - tearing-happy value class B3n { } > > >> > > >> In other words: one new concept ("value class"), with two sub-modifiers (zero-hostile, and tearing-happy) which affect the behavior of the val type (forbidden for B2, loosened for B3n.) > > >> > > >> For heap flattening, what this gets us is: > > >> > > >> - B1 -- no-flat > > >> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) > > >> - B3 -- low-flat (atomic, no null channel) > > >> - B3n -- full-flat (non-atomic, no null channel) > > >> > > >> This is a slight departure from earlier tree-shakings with respect to tearing. In particular, refs do not tear at all, so programs that use all refs will never see tearing (but it is still possible to get a torn value using .val and then box that into a ref.) > > >> > > >> If you turn this around, the declaration-site decision tree becomes: > > >> > > >> - Do I need identity (mutability, subclassing, aliasing)? Then B1. > > >> - Are uninitialized values unacceptable? Then B2. > > >> - Am I willing to tolerate tearing to enable more flattening? Then B3n. > > >> - Otherwise, B3. > > >> > > >> And the use-site decision tree becomes: > > >> > > >> - For B1, B2 -- no choices to make. > > >> - Do I need nullity? Then .ref > > >> - Do I need atomicity, and the class doesn't already provide it? Then .ref > > >> - Otherwise, can use .val > > >> > > >> The main downside of making ref the default is that people will grumble about having to say .val at the use site all the time. And they will! And it does feel a little odd that you have to opt into val-ness at both the declaration and use sites. But it unlocks a lot of things (see Kevin's list for more): > > >> > > >> - The default name is the safest version. > > >> - Every unadorned name works the same way; it's always a reference type. You don't need to maintain a mental database around "which kind of name is this". > > >> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we had hoped for when we started this game.) > > >> > > >> (The one thing to still worry about is that while refs can't tear, you can still observe a torn value through a ref, if someone tore it and then boxed it. I don't see how we defend against this, but the non-atomic label should be enough of a warning.) > > >> > > >> > > >> > > >> On 5/6/2022 10:04 AM, Brian Goetz wrote: > > >> > > >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the stacking I've been discussing. Is that what you're saying? > > >> > > >> class B1 { } // ref, identity, atomic > > >> value-based class B2 { } // ref, non-identity, atomic > > >> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both projections share atomicity > > >> > > >> If we go with ref-default, then this is a small leap from yesterday's stacking, because "B3" and "B2" are both reference types, so if you want a tearable, non-atomic reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then: > > >> > > >> - B2 is like B1, minus identity > > >> - B3 means "uninitialized values are OK, you get two types, a zero-default and a non-default" > > >> - Non-atomicity is an extra property we can add to B3, to get more flattening in exchange for less integrity > > >> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the default) > > >> > > >> I think this still has the properties I want; I can freely choose the reasonable subsets of { identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be the "false friend." > > >> > > >> > > >> > > From heidinga at redhat.com Tue Jun 14 18:54:34 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 14 Jun 2022 14:54:34 -0400 Subject: User model stacking: current status In-Reply-To: <484c248e-10f6-9b3b-9026-1db44bb35521@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <484c248e-10f6-9b3b-9026-1db44bb35521@oracle.com> Message-ID: On Tue, Jun 14, 2022 at 1:23 PM Brian Goetz wrote: > > > >> The concern is that we need Foo.nestHost == Rational.nestHost and that > >> the common nestHost includes both Foo and Rational as nestMembers. To > >> do that, we need to load the nestHost class (if it isn't already). > >> Getting the interaction between the access check and the additional > >> class loads right (and clearly spec'd) is my concern. > > My assumption - which I'm starting to question - is that Foo is an > > invalid class if it isn't a nestmate to Rational and that attempts to > > load Foo should fail. > > > > Thinking about this more, there's a second model here which says Foo > > is fine (after all we allow other classes to have fields of types they > > can never fill in) but all attempts to resolve the 'myRational' field > > will fail. This moves the nest mates check to resolution (similar to > > existing nest checks) rather than during class definition. Is this > > second model more what you had in mind? > > > > Now its my turn to say you're ahead of me :) > > From a language perspective, if X is inccessible to Y, then > > class Y { > X x; > } > > will fail compilation. If such a class sneaks by anyway, This is the challenge for the VM. We can't count on all classes being generated by correct compilers - too many bytecode spinners doing the wrong thing either intentionally or unintentionally. > whether we > reject it when we load Y or when we try to resolve field x, those seem > mostly indistinguishable to me from a design perspective, since they'll > never be able to do the bad thing, which is use x when it is uninitialized. >From my perspective, the two approaches are very similar but differ in how many places need to be touched. Refusing to load the class slams the door shut and ensures no one can access the unitialized "x" value in Y while refusing access (failing resolution) means we need to look at all the places the VM uses its zero brush to paint memory. Of course we'll need to look at those places anyway for e.g. the array example so either model appears workable. > But (and there's a whole conversation to be had here) it does mean that > there is separate access control on LFoo vs QFoo, Pulling on this thread a little, is it the class that has different access control or something else? To create an identity object, we do access control on the both class (public/package) and the constructor (public/package/private (aka nest)). To create a value object, we do nest mate access control (aka private) on the bytecodes that create values (aconst_init / withfield). This proposal extends the nest mates access check to the default values of Qs. In both cases, we're looking at the access control of two things - the class and the "creator of instances". Are we applying different access control to LFoo vs QFoo, or to construction mechanisms? Since we're compiling value constructors to "" factory methods, there isn't a convenient 1-1 mapping from the access control of an identity constructor to the value creation bytecodes + default values. Do we want to say the LvsQ has different access control or is there another thing (like constructors) we can lean on here? > and we have to either > prevent or detect leaks before they let us do something bad (like Y > reflectively creating an array of X.val). But this seems manageable, > and not all the different from the sort of leak detection and plugging > we do with reflection today. > We're agreed here. There's some level of patching that will be needed but it shouldn't be insurmountable. --Dan From jf.mend at gmail.com Wed Jun 15 09:34:47 2022 From: jf.mend at gmail.com (=?UTF-8?B?Sm/Do28gTWVuZG9uw6dh?=) Date: Wed, 15 Jun 2022 10:34:47 +0100 Subject: Valhalla user-model Message-ID: Hello, I would like to bring to your consideration the following set of observations and user-model suggestions, in the hope that they will bring some useful ideas to the development of the Valhalla project. *Definition* *shared-mutable* - a variable that is mutable (non-final) and can be shared between threads; shared-mutables are the non-final subset of the shared-variables (?17.4.1. ) *Observations* Shared-mutables are the only variables that have these two apparently independent properties: 1. lack definite-assignment (?16. ) - the variable is initialized with a default-value if not definitely-assigned ( ?4.12.5. ) 2. allow data-races - the variable may be read/written while being written by another thread, with both events happening in an unpredictable order (?17.4.5. ) Via properties 1 and 2, nullability and encoding-mode, respectively, affect the semantics of variables in a way that is unique to shared-mutables: 1. if not definitely-assigned, the variable is initialized with: - if nullable: the null value, regardless of type - if not nullable: the zero-value of the type 2. in a data-race, the value read/written: - if reference: has unpredictable origin in *one* of the various writes - if inline, either: - has unpredictable origin in one of the various writes - is torn i.e. has distinct internal parts with separate unpredictable origins in *more than one* of the various writes ( ?17.7. ) These are the 3 kinds of shared-mutable variables (?4.12.3. ): - non-final class variables - non-final instance variables - array components The remaining kinds of variables don't have any of the above properties: - final class variables - final instance variables - method parameters - constructor parameters - lambda parameters - exception parameters - local variables *Ideal user-model* In this user-model, the encoding-mode is not complected with nullability. For class-authors: - *value-knob* to reject identity - Applicable on class declarations, if used by a class-author to indicate that the class instances don't require identity (a value-class), the runtime will be free to copy these values and choose between reference or inline encoding everywhere except in shared-mutables, as doing so does not introduce any semantic changes to the program. In shared-mutables, however, value-instances can only be inlined if atomicity is guaranteed, which will depend on the hardware and the variable bit-size (value plus nullability). - *tearable-knob* to allow tearing - Applicable on value-class declarations, may be used by the class-author to hand the class-user the responsibility of how to avoid tearing, freeing the runtime to always inline instances in shared-mutables (bikeshedding: when dealing with tearable value-classes, the "terrible" sound can work as a warning for the dangers of neglecting this responsibility). Conversely, if this knob is not used, instances will be kept atomic, which allows the class-author to guarantee constructor invariants, which may be useful for the class implementation and class-users to rely upon. - *no-default-knob* to forbid using the zero-value as a default - Applicable on value-class declarations, may be used by the class-author to force shared-mutables of this type to either be definitely-assigned or nullable. (Actually, I don't believe this knob will be useful, since I think all zero-values are equally bad: moving across the spectrum from useful to useless to out-of-domain default-values is moving from hidden to noticeable to obvious missed-initialization-bugs. Ex: false -> 0.0 -> Jan 1, 1970 -> 0/0 -> null. The only good solution is definite-assignment on as many kinds of shared-mutables as possible.) For class-users: - *nullable-knob* to include null in the value-set of variables - Applicable on any variable declaration. In either encoding-mode, the runtime is free to choose the encoding for the extra bit of information required to represent null. In shared-mutables that are not definitely-assigned, controls the default-value: either null or the zero-value of the type. Since identity-types lack a zero-value, any non-nullable shared-mutable with an identity-type must be definitely-assigned. If, on some kinds of shared-mutables, definite-assignment can't be enforced, then those kinds of variables (probably array components) cannot have an identity-type and be non-nullable. - *atomic-knob* to avoid tearing - Applicable on shared-mutable declarations, may be used by the class-user to reverse the effect of the tearable-knob, thereby restoring atomicity. *Complected user-model* If, for compatibility, the encoding-mode must remain complected with nullability, the above user-model can be adapted as follows. The knobs for class-users are replaced with: - *inline-knob* to require inline encoding - Applicable on shared-mutable declarations with a value-type, gives the class-user control over the tradeoffs between performance, footprint, nullability, nullability's implied default-values, and how to avoid tearing. The knobs for class-authors get their scope restricted: - *value-knob* - by itself, no longer allows the runtime to inline any value-instances on shared-mutables. - *tearable-knob* - only applicable if the instance bit-size is higher than 32 bits, will simply enable/prohibit using the inline-knob on variables of this type. - *no-default-knob* - only affecting inline shared-mutables of this type, forces them to be definitely-assigned. For any kind of shared-mutable where definite-assignment can't be enforced, this knob prevents those variables (probably array components) to have values of this type encoded inline. If, on the other hand, definite-assignment can be enforced in all kinds of shared-mutables, then it should always be enforced, as this solves both the useless zero-values problem and the missed-initialization-bugs problem. However, in that case, this knob no longer has a reason to exist. Kind regards, Jo?o Mendon?a -------------- next part -------------- An HTML attachment was scrubbed... URL: From jf.mend at gmail.com Sun Jun 12 11:55:02 2022 From: jf.mend at gmail.com (=?UTF-8?B?Sm/Do28gTWVuZG9uw6dh?=) Date: Sun, 12 Jun 2022 12:55:02 +0100 Subject: Valhalla user-model Message-ID: Hello, I would like to bring to your consideration the following set of observations and user-model suggestions as I am hopeful they will be useful in the context of the Valhalla project. *Definition* *shared-mutable* - a variable that is mutable (non-final) and can be shared between threads; shared-mutables are the non-final subset of the shared-variables (?17.4.1.) *Observations* Shared-mutables are the only variables that have these two apparently independent properties: 1. lack definite-assignment (?16.) - the variable is initialized with a default-value if not definitely assigned (?4.12.5.) 2. allow data-races - the variable may be read/written while being written by another thread, with both events happening with an unpredictable order (?17.4.5.) Property 1 and 2 share a unique characteristic: the external behavior of the variable is automatically altered when, respectively, the nullability and the representation mode (inlined/reference) of the variable is changed: 1. if not definitely-assigned, the variable is initialized with: - if nullable: the null value, regardless of the variable's type - if not nullable: the zero_value of the variable's type 2. in a data-race, the value read/written: - if reference: has unpredictable origin in *one* of the various writes - if inlined, either: - has unpredictable origin in one of the various writes - is teared i.e. has distinct internal parts with separate unpredictable origins in *more than one* of the various writes (?17.7.) These are the 3 kinds of shared-mutable variables (?4.12.3.): - non-final class variables - non-final instance variables - array components The remaining kinds of variables don't have any of the above properties: - final class variables - final instance variables - method parameters - constructor parameters - lambda parameters - exception parameters - local variables *User-model* For class-authors: - *value-knob* to reject identity - Applicable on class declarations, if used by a class-author to indicate that the class instances don't require identity (a value-class), the runtime will be free to copy these instances and swap between reference and inlined representations everywhere except in shared-mutables, as doing so does not introduce any changes to the external behavior of the program. - *atomicity-knob* to forbid tearing - Applicable on value-class declarations, may be used by the class-author to ensure tearing cannot happen on shared-mutables in order to preserve constructor invariants that may be useful for class-users to rely upon. The runtime is free to implement this through a reference representation or by ensuring inlined-write atomicity. - *noDefault-knob* forbids the zero-value being used as a default-value - Aplicable on value-class declarations, may be used by the class-author to force shared-mutables of this class to either be definitely-assigned or nullable. (Actually, I don't believe this knob will be useful, since I think all zero-values are equally bad. Moving across the spectrum from useful to useless to out-of-domain default-values is moving from hidden to noticeable to obvious missed-initialization-bugs. Ex: false -> 0.0 -> Jan 1, 1970 -> 0/0 -> null) For class-users: - *nullable-knob* includes null in the value-set of a variable - Applicable on any variable declaration, in either mode of representation. The runtime is free to choose the encoding for the extra bit of information required to represent null. In shared-mutables that are not definitely-assigned, controls the default-value: either null or the zero_value of the value-type. Since identity-types lack a default-value, any non-nullable shared-mutable of an identity-type must be definitely-assigned. - *atomic-knob* to avoid tearing - Applicable on shared-mutable declarations, may be used by the class-user to ensure tearing does not happen. The runtime is free to implement this through a reference representation or by ensuring inlined-write atomicity. If, for compatibility, the representation modes must remain complected with nullability, the nullable-knob and the atomic-knob above can be replaced with: - *inline-knob* to require inlined/reference representation - Applicable on shared-mutable declarations with a value-class type, gives the class-user control over the tradeoffs between performance, footprint, nullability and implied default-values, and if/how to manually avoid tearing. Kind regards, Jo?o Mendon?a -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Wed Jun 15 14:57:49 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 15 Jun 2022 10:57:49 -0400 Subject: User model stacking: current status In-Reply-To: <09b8a72e-fdce-2599-4d47-b93dd6b717c8@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <484c248e-10f6-9b3b-9026-1db44bb35521@oracle.com> <09b8a72e-fdce-2599-4d47-b93dd6b717c8@oracle.com> Message-ID: On Tue, Jun 14, 2022 at 3:18 PM Brian Goetz wrote: > > > > On 6/14/2022 2:54 PM, Dan Heidinga wrote: > > But (and there's a whole conversation to be had here) it does mean that > there is separate access control on LFoo vs QFoo, > > Pulling on this thread a little, is it the class that has different > access control or something else? > > > We've meandered a bit over the years on the distinction between the class Foo, the types Foo.ref and Foo.val, and their respective mirrors. It's probably time for a check-in on where we are there. > > Today, Integer is a class, with a full-power mirror Integer.class; int is a type, with a limited mirror int.class, whose job is mostly limited to reflecting over field and method descriptors. > > With Valhalla, Point is a class, with types Point.ref and Point.val; Point.class is a full-power mirror for Point.ref, and Point.val.class is a limited mirror that is analogous to the int mirror. If you ask a Point for its getClass(), it always returns Point.class. It would not bother me if the Point.val.class mirror is fully limited, and the only way to do lookups (e.g., getMethods) is to switch over to the class mirror (for which we'd probably have a `Class::getPrimaryClass` method.) That approach seems reasonable for Reflection. For MethodHandles, I think we'll need to support MethodHandles.Lookup with both the L & Q version to correctly type the receiver argument, at least for virtual calls. > > Having the two encode separate accessibilities sounds a little messy, but the various Lookup::checkAccess methods key off of a Class, so that seems a reasonable place to hang this information. I would assume such checks would check both the primary class and then the secondary class, or we'd arrange that the primary mirror always was at least as accessible as the secondary mirror. (Protected isn't a useful option, and public/package/private are suitably ordered.) > > At the language level, there is a question about how to present the val class. One obvious (but possibly bad) idea is to pretend it is a special kind of nested class (which is supported by the Point.val naming convention): > > value class Rational { > private class val { } > } > > This is a little cheesy, but may fit mostly cleanly into user's mental models. In this case, the accessibility is going on something that looks like a class declaration at the source level, and which has a class mirror at the VM level, but it really a "shadow class", just like the class described by int.class (or String[].class.) This might be OK. > > At one point I considered whether we could hang this on the accessibility of the no-arg constructor, but I quickly soured on this idea. But that has an accessibility too. > > Or we could invent a new kind of member, to describe the val projection, and hang accessibility on that. > > To create an identity object, we do access control on the both class > (public/package) and the constructor (public/package/private (aka > nest)). > To create a value object, we do nest mate access control (aka private) > on the bytecodes that create values (aconst_init / withfield). This > proposal extends the nest mates access check to the default values of > Qs. > > > It's not just nestmate access control; it would be reasonable to declare the val as package-access, and trust your package mates too. At the VM-level, classes are either public or package. While the source code also allows specifying private or protected for innerclasses, the VM doesn't use those bits for access control (though Reflection sometimes does). That's part of why I'm reaching for something else to hang the private bit on (and thus trigger the nest check). I don't think we want to teach the VM to use the inner class flags for *some* access checks as it will be easy to confuse when each set of flags is used (long bug tail) and will lead to inconsistencies with existing programs. Without some extra "thing" to hang the accessibility bits on, I don't think we can express public / package / private (nest) in the existing public/package bits used for classes at the VM level. > > In both cases, we're looking at the access control of two things - the > class and the "creator of instances". Are we applying different > access control to LFoo vs QFoo, or to construction mechanisms? > > > The thing we're trying to protect is creation of uninitialized heap-based instances. But it felt a little weird to draw such a complex line; seemed simpler (and not giving up much) to access-control the type name. But we can explore this some more. Maybe we are access-controlling the `defaultvalue` bytecode, since its effectively public if someone can create a flat array. > I think the latter is where we'll need to end up - it's the ability to create a defaultvalue and leak the zeros that we're protecting against. --Dan From jf.mend at gmail.com Thu Jun 16 11:17:25 2022 From: jf.mend at gmail.com (=?UTF-8?B?Sm/Do28gTWVuZG9uw6dh?=) Date: Thu, 16 Jun 2022 12:17:25 +0100 Subject: No subject Message-ID: Hello, I would like to bring to your consideration the following set of observations and user-model suggestions, in the hope that they will bring some useful ideas to the development of the Valhalla project. *Definition* *shared-mutable* - a variable that is mutable (non-final) and can be shared between threads; shared-mutables are the non-final subset of the shared-variables (?17.4.1. ) *Observations* Shared-mutables are the only variables that have these two apparently independent properties: 1. lack definite-assignment (?16. ) - the variable is initialized with a default-value if not definitely-assigned ( ?4.12.5. ) 2. allow data-races - the variable may be read/written while being written by another thread, with both events happening in an unpredictable order (?17.4.5. ) Via properties 1 and 2, nullability and encoding-mode, respectively, affect the semantics of variables in a way that is unique to shared-mutables: 1. if not definitely-assigned, the variable is initialized with: - if nullable: the null value, regardless of type - if not nullable: the zero-value of the type 2. in a data-race, the value read/written: - if reference: has unpredictable origin in *one* of the various writes - if inline, either: - has unpredictable origin in one of the various writes - is torn i.e. has distinct internal parts with separate unpredictable origins in *more than one* of the various writes ( ?17.7. ) These are the 3 kinds of shared-mutable variables (?4.12.3. ): - non-final class variables - non-final instance variables - array components The remaining kinds of variables don't have any of the above properties: - final class variables - final instance variables - method parameters - constructor parameters - lambda parameters - exception parameters - local variables *User-model* For class-authors: - *value-knob* to reject identity - Applicable on class declarations, if used by a class-author to indicate that the class instances don't require identity (a value-class), the runtime will be free to copy these values and choose between reference or inline encoding everywhere except in shared-mutables, as doing so does not introduce any semantic changes to the program. In shared-mutables, however, value-instances can only be inlined if atomicity is guaranteed, which will depend on the hardware and the variable bit-size (value plus nullability). - *tearable-knob* to allow tearing - Applicable on value-class declarations, may be used by the class-author to hand the class-user the responsibility of how to avoid tearing, freeing the runtime to always inline instances in shared-mutables (bikeshedding: when dealing with tearable value-classes, the "terrible" sound can work as a warning for the dangers of neglecting this responsibility). Conversely, if this knob is not used, instances will be kept atomic, which allows the class-author to guarantee constructor invariants, which may be useful for the class implementation and class-users to rely upon. - *zero-knob* to allow using the zero-value as a default - by omission, the class-author will force shared-mutables of this type to either be definitely-assigned or nullable. For class-users: - *not-nullable-knob* to exclude null from a variable's value-set - Applicable on any variable declaration. Since identity-types lack a zero-value, any non-nullable shared-mutable with an identity-type must be definitely-assigned. For nullable variables, in either encoding-mode, the runtime is free to choose the encoding for the extra bit of information required to represent null. In shared-mutables that are not definitely-assigned, this knob controls the default-value: either null or the zero-value of the type. - *atomic-knob* to avoid tearing - Applicable on shared-mutable declarations, may be used by the class-user to reverse the effect of the tearable-knob, thereby restoring atomicity. *Nullable types* - For compatibility, we cannot have a nullable-knob in the new user-model since unadorned types must remain nullable as they are now - (!) as a non-nullable-knob is pretty concise although not very readable - In method bodies, var will mitigate the majority of the noise of (!) - In method signatures, the proliferation of (!) in arguments and return types will look ugly - The compiler will be able to help us avoid the majority of NullPointerExceptions - Old APIs can compatibly update return types to be non-nullable where appropriate, which is more convenient for new client code. Also, removing the nullability overhead and may increase performance. Ex: Stream::findAny can be updated to return Optional! *Zero-knob vs no-zero-knob* I am going with the zero-knob because I feel it gives us the safest and most common default: - Allows a more cautious API introduction - A late addition of the zero-default to a class doesn't break client code, but a late removal does. - Definite-assignment is safe - Without a zero-default, class-users are forced to definitely-assign their shared-mutables, preventing missed-initialization-bugs. - It's the right default for value Records - The vast majority of Records are semantically value classes, since using any identity operations on them would be a bug (locking or identity comparison). Making these Records value-classes will prevent such bugs. So I am predicting that the vast majority of value-classes written by average developers will be Records, which mostly don't have a sensible zero-value. *Migration of value-based classes* For compatibility with existing code, no value-based class can be tearable, and somewhat amazingly, not even Double or Long. The reason is that where in the current model we have a field declaration such as: ValueBasedClass v = someValue; v is always reference encoded and, therefore, atomic. In the new model, the encoding-mode is fully encapsulated, so the only way for v to remain atomic is all the migrated value-based classes not being declared tearable. For Double and Long, this is a bit awkward, because it means that for these two primitives, and for them alone, each of these pair of field declarations will not be semantically equivalent: long v; // tearable Long! v; // atomic double d; // tearable Double! d; // atomic Regardless of this peculiarity, the major downside of being forced to make all value-based classes atomic is that, depending on: target architecture, primitive bit-size and nullability, we may not get inline encoding where we otherwise could. So, even though in the new model we can still achieve the same inlining as before (as the old primitives are still available), in a few situations, the runtime may have to resort to reference encoding to ensure atomicity, even if atomicity is not needed. I think this is a relatively small price to pay for compatibility. *Sample code* // For brevity, imports and the modifiers public, final, extends and implements are omitted. // Declaration of the primitive wrappers. zero value class Boolean {...} zero value class Char {...} zero value class Byte {...} zero value class Short {...} zero value class Integer {...} zero value class Float {...} zero value class Long {...} zero value class Double {...} // declaration of some value-based classes zero value class Optional {...} value class Instant {...} value class LocalDate {...} // declaration of some value classes tearable value class Rational {...} tearable zero value class Complex {...} // Fields class C { double _tearable_0d; Double! _atomic_0d; Integer! _atomic_2i = 2; // Integer! <==> int Instant t_null; Intant! t_error; // error: Blank field not initialized Instant! t = Instant.now(); LocalDate! ld; // error: Blank field not initialized atomic Rational! r = new Rational(2, 3); final atomic Rational r2; // error: Final fields are already atomic Rational r_null; Rational! r3; // error: Blank field not initialized } // Local Variables and Arrays var _2zeros_d = new Double![2]; var _3zeros_L= new atomic Long![3]; var ints = new atomic Integer![3]; // error: Integer is already atomic atomic Complex nullableComplex; // error: local variables are already atomic var s_nulls = new String[3]; var s_error = new String![3]; // error: array components not initialized var letters = new String![]{"a", "b"}; String nullable_letter_a = letters[0]; var nonNullable_letter_b = letters[1]; letters[0] = "z"; letters[1] = null; // error: cannot convert from null to String! var _3emptyOpts = Optional![3]; Kind regards, Jo?o Menodn?a -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jun 16 19:16:43 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 16 Jun 2022 12:16:43 -0700 Subject: User model stacking: current status In-Reply-To: <009a5d59-0d32-2ebb-4f20-72e99eb18c51@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> <009a5d59-0d32-2ebb-4f20-72e99eb18c51@oracle.com> Message-ID: On Wed, Jun 15, 2022 at 12:01 PM Brian Goetz wrote: OK, let's say for sake of argument that "well, that's what you opted > into." Non-atomic means no one can count on cross-field integrity; don't > select non-atomic if you have invariants to protect. OK fine. And let's > flip over to what T! means. > > Let's say that T! is a restriction type; it can take on the values of T, > except for those prohibited by the restriction "t != null". So, what is > the default value of `String!`? > I'd like to slightly rephrase your question to "What do we do when we need a default value *for* `String!`", because I don't like the framing that suggests that a default value is an inherent property *of a type* itself. (I've no idea what type theorists would say.) And we were talking about bucket 2, a "value class with no good default" so I'll substitute `Instant!` instead of `String!` for most of this. So, what to do when we need a default value for `Instant!`? I guess "just blow up" is a non-option because every field has to start off somewhere. So I guess we have to answer "it's `fromEpochMilli(0)` because it can't be anything else, but we're going to do what we can to prevent its users from depending on that fact." Definitely, making an explicit value type that's nonpublic is a way to do just that. Is there a more surgical way to do it? One easy way to get surgical is to have OpenJDK just stop worrying about the bad-default-values problem, and let aftermarket static analyzers like ours take up that mantle. We can have an annotation to mark classes like Instant, and we can issue warnings when we see bogus usages (some of which we warn on anyway). In fact, if you do exactly what you're planning (so flip back from Instant! to Instant.val, and give the val type an access modifier), I guess we might end up doing this in Error Prone anyway, so that people can make their value types public safely. That would feel actually totally fine to me. And in the `Instant!` world, there's not much to hang a modifier on, but we wouldn't care if we were doing this checking anyway. You don't need to explain that "we'd rather release language features that *don't* need aftermarket tools to use safely", I know it. But it is just a platitude really. I think that any language design expressive enough to users do good things will inevitably be expressive enough to let them do bad things too; static analysis always has a crucial role to play imho. And of course it is always the trio of language/libraries/tools together that drives the user's ultimate experience. (Now changing back to `String`, a bucket-1 class, I've been expecting it will be much longer before we'd roll out !/? to those types, but when we do, I think your particular question comes out better. "What's the default for `String!` when we absolutely must have one?" Well, when we must we must, so we must commit null pollution. We try to issue enough of the right warnings to live with the fallout. If we ever make a transition like this, we have to level expectations; I'm convinced null pollution will be a part of all of our lives, more so than heap pollution of the generics kind ever was, but I'm also still optimistic that it will still be worth it. You could say that today we live with 100% null pollution...) For locals, it's pretty clear we don't have to answer, because locals > cannot be accessed unless they are DA at the point of access. But for > fields, we have a problem -- and for arrays, a bigger one. We can try to > require that fields have initializers, but there are all sorts of > situations in which a field can be read before its initializer runs. > ... which situations already lead to bad behavior / puzzlers as it is. We might miss a warning we'd rather have been able to give, but life goes on? > And arrays are much worse. > Arrays in general, or just the one single construction path `new TheType![size]` (or `new TheType.val[size]`)? I would just say please give us new Arrays methods or syntax that create and fill at once, and we'll get busy clamping down on everything else. > On 6/15/2022 2:10 PM, Kevin Bourrillion wrote: > > On Wed, Jun 15, 2022 at 10:51 AM Brian Goetz > wrote: > > - If we spelled .val as !, then switching from P[] to P![] not only >> prohibits null elements, but changes the layout and _introduces tearing_. >> Hiding tearability behind "non-null" is likely to be a lifetime >> subscription to Astonishment Digest, since 99.9999 out of 100 Java >> developers will not be able to say "non-null, oh, that also means I >> sacrifice atomicity." >> > > Well, that's what you opted into when you... wait a minute... > > > >> The link you probably want to attack is this last one, where you are >> likely to say "well, that's what you opted into when you said `non-atomic`; >> you just happen to get atomicity for free with references, but that's a >> bonus." >> > > Your Kevin's Brain Emulator has gotten pretty decent over time... check > whether the next things it said were these (probably so): > > A good clean Basic Conceptual Model For Novices is allowed to have a bunch > of asterisks, of the form "well, in $circumstance, this will be revealed to > be totally false", and that's not always a strike against the model. How do > we discern the difference between a good asterisk and a bad one? How common > the circumstance; how recognizable as *being* a special circumstance; how > disproportionate a truth discrepancy we're talking about; etc. > > I know I've said this before. If I'm in a class being taught how this > stuff works, and the teacher says "Now unsafe concurrent code can break > this in horrible ways, and in $otherClass you will learn what's really > going on in the presence of data races" ... I feel fully satisfied by that. > I know I won't get away with playing fast and loose with The Concurrency > Rules; I'm not advanced enough and might never be. (Many people aren't but *don't > *know it, and therein lies the problem, but do we really have much power > to protect such people from themselves?) > > I could be wrong, but I suspect this kind of viewpoint might be more > common and respected in the wider world than it is among the rarefied kind > of individuals who join expert groups, no offense to anyone here meant. > You're always going to see all the details, and you're always going to > *want* to see all the details. The general public just hopes the details > stay out of their way. When they don't, they have a bad day, but it doesn't > mean they were better served by a complex model that tried to account for > everything. > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 16 19:21:42 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 Jun 2022 15:21:42 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> <009a5d59-0d32-2ebb-4f20-72e99eb18c51@oracle.com> Message-ID: Quick observation: the default value of `String!` is less dangerous to allow free, because its "null", and the user can already freely make nulls.?? Whereas the default value of `Instant` is not something the user can get their hands on, if they can't observe fields of type `Instant.val` during initialization.? So the problem may be less severe for B1. On 6/16/2022 3:16 PM, Kevin Bourrillion wrote: > On Wed, Jun 15, 2022 at 12:01 PM Brian Goetz > wrote: > > OK, let's say for sake of argument that "well, that's what you > opted into."? Non-atomic means no one can count on cross-field > integrity; don't select non-atomic if you have invariants to > protect.? OK fine.? And let's flip over to what T! means. > > Let's say that T! is a restriction type; it can take on the values > of T, except for those prohibited by the restriction "t != null".? > So, what is the default value of `String!`? > > > I'd like to slightly rephrase your question to "What do we do when we > need a default value /for/?`String!`", because I don't like the > framing that suggests that a default value is an inherent property /of > a type/?itself. (I've no idea what type theorists would say.) > > And we were talking about bucket 2, a "value class with no good > default" so I'll substitute `Instant!` instead of `String!` for most > of this. > > So, what to do?when we need a default value for `Instant!`? I guess > "just blow up" is a non-option because every field has to start off > somewhere. So I guess we have to answer "it's `fromEpochMilli(0)` > because it can't be anything else, but we're going to do what we can > to prevent its users from depending on that fact." Definitely, making > an explicit value type that's nonpublic is a way to do just that. Is > there a more surgical way to do it? > > One easy way to get surgical is to have OpenJDK just stop worrying > about the bad-default-values problem, and let aftermarket static > analyzers like ours take up that mantle. We can have an annotation to > mark classes like Instant, and we can issue warnings when we see bogus > usages (some of which we warn on anyway). In fact, if you do exactly > what you're planning (so flip back from Instant! to Instant.val, and > give the val type an access modifier), I guess we might end up doing > this in Error Prone anyway, so that people can make their value types > public safely. That would feel actually totally fine to me. And in the > `Instant!` world, there's not much to hang a modifier on, but we > wouldn't care if we were doing this checking anyway. > > You don't need to explain that "we'd rather release language features > that /don't/?need aftermarket tools to use safely", I know it. But it > is just a platitude really. I think that any language design > expressive enough to users do good things will inevitably be > expressive enough to let them do bad things too; static analysis > always has a crucial role to play imho. And of course it is always the > trio of language/libraries/tools together that drives the user's > ultimate experience. > > (Now changing back to `String`, a bucket-1 class, I've been expecting > it will be much longer before we'd roll out !/? to those types, but > when we do, I think your particular question comes out better. "What's > the default for `String!` when we absolutely must have one?" Well, > when we must we must, so we must commit null pollution. We try to > issue enough of the right warnings to live with the fallout. If we > ever make a transition like this, we have to level expectations; I'm > convinced null pollution will be a part of all of our lives, more so > than heap pollution of the generics kind ever was, but I'm also still > optimistic that it will still be worth it. You could say that today we > live with 100% null pollution...) > > > For locals, it's pretty clear we don't have to answer, because > locals cannot be accessed unless they are DA at the point of > access.? But for fields, we have a problem -- and for arrays, a > bigger one.? We can try to require that fields have initializers, > but there are all sorts of situations in which a field can be read > before its initializer runs. > > > ... which situations already lead to bad behavior / puzzlers as it is. > We might miss a warning we'd rather have been able to give, but life > goes on? > > ? And arrays are much worse. > > > Arrays in general, or just the one single construction?path `new > TheType![size]` (or `new TheType.val[size]`)? I would just say please > give us new Arrays methods or syntax that create and fill at once, and > we'll get busy clamping down on everything else. > > > > On 6/15/2022 2:10 PM, Kevin Bourrillion wrote: >> On Wed, Jun 15, 2022 at 10:51 AM Brian Goetz >> wrote: >> >> ?- If we spelled .val as !, then switching from P[] to P![] >> not only prohibits null elements, but changes the layout and >> _introduces tearing_.? Hiding tearability behind "non-null" >> is likely to be a lifetime subscription to Astonishment >> Digest, since 99.9999 out of 100 Java developers will not be >> able to say "non-null, oh, that also means I sacrifice >> atomicity." >> >> >> Well, that's what you opted into when you... wait a minute... >> >> The link you probably want to attack is this last one, where >> you are likely to say "well, that's what you opted into when >> you said `non-atomic`; you just happen to get atomicity for >> free with references, but that's a bonus." >> >> >> Your Kevin's Brain Emulator has gotten pretty decent over time... >> check whether the next things it said were these (probably so): >> >> A good clean Basic Conceptual Model For Novices is allowed to >> have a bunch of asterisks, of the form "well, in $circumstance, >> this will be revealed to be totally false", and that's not always >> a strike against the model. How do we discern the difference >> between a good asterisk and a bad one? How common the >> circumstance; how recognizable as /being/?a special circumstance; >> how disproportionate a truth discrepancy we're talking about; etc. >> >> I know I've said this before. If I'm in a class being taught how >> this stuff works, and the teacher says "Now unsafe concurrent >> code can break this in horrible ways, and in $otherClass you will >> learn what's really going on in the presence of data races" ... I >> feel fully satisfied by that. I know I won't get away with >> playing fast and loose with The Concurrency Rules; I'm not >> advanced enough and might never be. (Many people aren't but >> /don't /know it, and therein lies the problem, but do we really >> have much power to protect such people from themselves?) >> >> I could be wrong, but I suspect this kind of viewpoint might be >> more common and respected in the wider world than it is among the >> rarefied kind of individuals who join expert groups, no offense >> to anyone here meant. You're always going to see all the details, >> and you're always going to /want/?to see all the details. The >> general public just hopes the details stay out of their way. When >> they don't, they have a bad day, but it doesn't mean they were >> better served by a complex model that tried to account for >> everything. >> >> >> -- >> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Jun 16 19:55:24 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 16 Jun 2022 21:55:24 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: References: <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <9d4f7e1b-e727-4205-907d-32353556e14c@oracle.com> <009a5d59-0d32-2ebb-4f20-72e99eb18c51@oracle.com> Message-ID: <600302983.731476.1655409324115.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "Brian Goetz" > Cc: "daniel smith" , "valhalla-spec-experts" > > Sent: Thursday, June 16, 2022 9:16:43 PM > Subject: Re: User model stacking: current status Hi Kevin, >> And arrays are much worse. > Arrays in general, or just the one single construction path `new TheType![size]` > (or `new TheType.val[size]`)? I would just say please give us new Arrays > methods or syntax that create and fill at once, and we'll get busy clamping > down on everything else. It does work even in simple cases, by example try to write ArrayList or ArrayDeque with that primitive (array + fill). R?mi >> On 6/15/2022 2:10 PM, Kevin Bourrillion wrote: >>> On Wed, Jun 15, 2022 at 10:51 AM Brian Goetz < [ mailto:brian.goetz at oracle.com | >>> brian.goetz at oracle.com ] > wrote: >>>> - If we spelled .val as !, then switching from P[] to P![] not only prohibits >>>> null elements, but changes the layout and _introduces tearing_. Hiding >>>> tearability behind "non-null" is likely to be a lifetime subscription to >>>> Astonishment Digest, since 99.9999 out of 100 Java developers will not be able >>>> to say "non-null, oh, that also means I sacrifice atomicity." >>> Well, that's what you opted into when you... wait a minute... >>>> The link you probably want to attack is this last one, where you are likely to >>>> say "well, that's what you opted into when you said `non-atomic`; you just >>>> happen to get atomicity for free with references, but that's a bonus." >>> Your Kevin's Brain Emulator has gotten pretty decent over time... check whether >>> the next things it said were these (probably so): >>> A good clean Basic Conceptual Model For Novices is allowed to have a bunch of >>> asterisks, of the form "well, in $circumstance, this will be revealed to be >>> totally false", and that's not always a strike against the model. How do we >>> discern the difference between a good asterisk and a bad one? How common the >>> circumstance; how recognizable as being a special circumstance; how >>> disproportionate a truth discrepancy we're talking about; etc. >>> I know I've said this before. If I'm in a class being taught how this stuff >>> works, and the teacher says "Now unsafe concurrent code can break this in >>> horrible ways, and in $otherClass you will learn what's really going on in the >>> presence of data races" ... I feel fully satisfied by that. I know I won't get >>> away with playing fast and loose with The Concurrency Rules; I'm not advanced >>> enough and might never be. (Many people aren't but don't know it, and therein >>> lies the problem, but do we really have much power to protect such people from >>> themselves?) >>> I could be wrong, but I suspect this kind of viewpoint might be more common and >>> respected in the wider world than it is among the rarefied kind of individuals >>> who join expert groups, no offense to anyone here meant. You're always going to >>> see all the details, and you're always going to want to see all the details. >>> The general public just hopes the details stay out of their way. When they >>> don't, they have a bad day, but it doesn't mean they were better served by a >>> complex model that tried to account for everything. >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | >>> kevinb at google.com ] > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | > kevinb at google.com ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From jf.mend at gmail.com Fri Jun 17 18:32:15 2022 From: jf.mend at gmail.com (=?UTF-8?B?Sm/Do28gTWVuZG9uw6dh?=) Date: Fri, 17 Jun 2022 19:32:15 +0100 Subject: User model stacking: current status Message-ID: > For locals, it's pretty clear we don't have to answer, because locals > cannot be accessed unless they are DA at the point of access. But for > fields, we have a problem -- and for arrays, a bigger one. We can try to > require that fields have initializers, but there are all sorts of > situations in which a field can be read before its initializer runs. And > arrays are much worse. I'm sorry, Brian, are you saying that *the compiler* can't enforce definite-assignment on non-final fields and arrays when declared with a non-nullable no-zero type like String! or Rational!? The user-model I layed out (my last no-subject email) depends on getting compiler errors like these: class C { String null_str; String! hello_str = "hello"; // error: field not initialized, String has no zero value: String! zero_str; Integer null_int; // OK: Integer is has a zero value: Integer! _0_int; Optional null_opt; // OK: Optional has a zero value Optional! empty_opt; // OK: Optional.of returns an Optional! Optional! helloStr_opt = Optional.of("hello"); Rational null_ratio; Rational! _2Thirds_ratio = new Rational(2, 3); // error: field not initialized, Rational has no zero value: Rational! zero_ratio; Double[] _3_nulls = new Double[3]; Double![] _3_zeros = new Double![3]; String[] _2_nulls = new String[2]; String![] _2_strings = {"a", "b"}; // error: array components not initialized, String has no zero value: String![] _2_zeroStrings = new String![2]; String![] nonNullable_strings = nonNullableStrings_array(); // error: cannot convert from String[] to String![] String![] nullable_strings = nullableStrings_array(); } -------------- next part -------------- An HTML attachment was scrubbed... URL: From webseiten.designer at googlemail.com Sat Jun 18 13:12:26 2022 From: webseiten.designer at googlemail.com (Tim Feuerbach) Date: Sat, 18 Jun 2022 15:12:26 +0200 Subject: User model stacking: current status [Observer discussion] In-Reply-To: References: Message-ID: Hi?Jo?o, > Date: Fri, 17 Jun 2022 19:32:15 +0100 > From: Jo?o Mendon?a > To: valhalla-spec-observers at openjdk.java.net > Subject: User model stacking: current status > Message-ID: > > Content-Type: text/plain; charset="utf-8" > >> For locals, it's pretty clear we don't have to answer, because locals >> cannot be accessed unless they are DA at the point of access. But for >> fields, we have a problem -- and for arrays, a bigger one. We can try to >> require that fields have initializers, but there are all sorts of >> situations in which a field can be read before its initializer runs. And >> arrays are much worse. > > I'm sorry, Brian, are you saying that *the compiler* can't enforce > definite-assignment on non-final fields and arrays when declared with a > non-nullable no-zero type like String! or Rational!? By leaving the constructor before definite assignment, you can read an uninitialized final field: class A { ? final String definitelyAssigned; ? A() { ??? printAssigned(); ??? this.definitelyAssigned = "assigned"; ? } ? void printAssigned() { ??? System.out.println(definitelyAssigned); ? } } The problem is not limited to methods of your own class; you can leak `this` in the constructor through subclasses or just handing it over to someone else. Of?course this?is?bad?style,?but the compiler does not prevent you from doing it. If I interpret Brian's mail correctly, he is concerned about leaking a default value that could actually be illegal from the point of view of the?value?class: > Quick observation: the default value of `String!` is less dangerous to > allow free, because its "null", and the user can already freely make > nulls.???Whereas?the?default?value?of?`Instant`?is?not?something?the > user can get their hands on, if they can't observe fields of type > `Instant.val` during initialization.? So the problem may be less > severe for?B1. A?big?part?of?Java's integrity model is?that?instance?creation?should?not?bypass?the?constructor (that doesn't mean you can't, e.g. using serialization or sun.misc.Unsafe).?When?you?declare?a?B3?class,?you?explicitly?allow the JVM *not*?to?call?the?constructor?on?initialization. This?is?counter?intuitive,?but?as?John?explains here,?necessary: https://mail.openjdk.org/pipermail/valhalla-spec-experts/2021-December/001710.html So?from?my point of view as a layman, I see four options (leaving out arrays here): ??? 1) "!" and "val" are really separate. B2! is a non-nullable _reference_ type, and observing the null before its first assignment is unfortunate, but as Brian wrote, B2 = null is a value anyone can already create. B3! may be a non-nullable reference type as well for consistency (it is "untearable"). ??? 2) B2! *is* B2.val and requires initial assignment following the rules of definite assignment, but it carries a separate "initialized" bit, and you get null polution as in 1) if you observe it before its first assignment. The JVM has to check this flag whenever you read the field. ??? 3) Like 2), but instead of an "initialization lock", you live with the fact that it is possible to obtain the illegal default, and blame the user of your class for exposing it. Thus as a class author, if you opt-in to "value", you buy into instances of your class being in an illegal state (for a short time). In this world, instead of making the "val" public, B3 rather makes its all-zero default public, so you could say "Point! p = Point.default", which is optimized away by the compiler to just leave the field as is after spraying zeroes over it. I think this aligns well with how R?mi asked for a mandatory no-body constructor in the thread linked above. ??? 4) Like 3), but you limit the places where the default can escape,by making up strict rules for using "!" on a field. For?the?above?example,?the?rule?simply?would?be?"do?not expose `this` before first assignment to all non-nullable fields". Unfortunately, it?doesn't?stop on instance members.?As?Brian?points?out: > even if a static field has an initializer, you can still observe its > default value with unlucky timing. You can leak an unitialized class member in the same vein as above, but it's even worse, because this time you have less control over the thing that escapes; if there is a publicly accessible static path to the field, anyone you call could potentially access it during initialization. But you can't forbid calling anything when assigning to a static field, otherwise how do you construct the value class instance you want to assign? 2)-4) all have the additional downside that they mingle always-atomic references and non-atomic value classes behind the same symbol "!", and in the future painted by Kevin where "!" would become the default and "?" is the exception, you need to consult the docs whether you atomically update a non-null reference type, or are in the process of tearing a non-null value type. A way to mitigate this could be to force the type user to turn off the atomic knob on their side as well. Alternatively, you teach developers about the dangers of tearing (Kevin's argument) and always use `volatile`, `AtomicReferenceArray`, locks etc. for updating state concurrently. Tim From jf.mend at gmail.com Sun Jun 19 07:29:53 2022 From: jf.mend at gmail.com (=?UTF-8?B?Sm/Do28gTWVuZG9uw6dh?=) Date: Sun, 19 Jun 2022 08:29:53 +0100 Subject: User model stacking: current status [Observer discussion] Message-ID: Hi Tim, Thank you very much for all your excellent explanations. The main take away for me was: It's gonna be next to impossible to guarantee that illigal zero-values are kept hidden. But, if I understood you correctly, for no-zero types, the compiler can still enforce definite-assignment as it does for final-fields and locals, if it determines, albeit in a limited way, that the shared-mutable (non-final field or array) is used before initialization. Did I get this right? So here is another, parhaps dumb, question: why is it so bad that sensless zero-values may be leaked? If the language cannot guarantee that illegal zero-values won't be seen, then Java users will need to keep one rule in mind when creating/reading a value-class: *No invariant can exclude the zero-value.* For example: if the constructor of Rational throws on a zero denominator, that is not an invariant. It's just a fail-fast that works for most situations. This is certainly not ideal, but is it a big problem? Or is there something else that I'm still missing? Now, here's the main question: Why can't shared-mutables be inlined by the runtime in an opaque/automatic way? I.e. when encountering a shared-mutable, why can't the runtime decide the encoding-mode based on something similar to this ternary expression: var encodingMode = hasIdentity(varType) ? reference : tooBig(varType.bitSize) ? reference : atomicWrite(varType.bitSize) ? inline : atomicClass(varType) ? reference : inline; With varType.bitSize depending on nullability I.e. a nullable Rational would be less likely to be inlined than a non-nullable Rational!, simply due to the extra bits required to also represent null. The predicates tooBig and atomicWrite would depend on the hardware specs. If this could be done, users would not have to know about ref-projections or val-projections or about inlining or references. The only Valhalla related concepts that would matter for users would be: - object identity - variable nullability - object write atomicity Jo?o -------------- next part -------------- An HTML attachment was scrubbed... URL: From jf.mend at gmail.com Tue Jun 21 18:51:08 2022 From: jf.mend at gmail.com (=?UTF-8?B?Sm/Do28gTWVuZG9uw6dh?=) Date: Tue, 21 Jun 2022 19:51:08 +0100 Subject: User model stacking: current status [Observer discussion] Message-ID: Hello again, Pondering a bit more on Tim Feuerbach's fine analysis of the current state of the discussion concerning the illegal zeros leak problem, has made me reconsider its effects on the user-model I had suggested in my "no-subject" email: *Inlining in shared-variables (?17.4.1.) makes it impossible to hide the zero-value* therefore: *No constructor invariant of a value-class can exclude the zero-value* This is actually bad. It seems to me that most Records are effectively value-based classes, and I was thinking that it would be nice to make them value-classes, not for the inlining, but to clearly express the intended semantics and get help from the compiler/runtime to avoid bugs, such as synchronizing on them or comparing their identity. This is a problem for the user-model I had suggested, because all these Records are filled with "requireNonNull" in their constructors (using "!" is the same), and having to keep in mind that none of this can really be trusted upon is just awful. So, biting the bullet, how about this user-model: For class-authors: - *value-knob* to reject identity - Applicable on class declarations. Used by a class-author to indicate that the class instances don't require identity (a value-class). - *zero-knob* to indicate that the value-class has a zero-value - if a value-class does not have a zero-value, its instances won't be inlined in any shared-variables since this is the only way for the language to ensure the non-existence of the zero-value. If the value-class is declared with a zero-value, then care must be taken when reading/writing constructors since *no constructor invariant can exclude the zero-value*. - *tearable-knob* to allow tearing - Applicable on zero value-class declarations, may be used by the class-author to hand the class-user the responsibility of how to avoid tearing, freeing the runtime to always inline instances in shared-mutables (non-final shared-variables). Conversely, if this knob is not used, instances will be kept atomic, which allows the class-author to guarantee constructor invariants *provided they're not broken by the zero-value*, which may be useful for the class implementation and class-users to rely upon. For class-users: - *not-nullable-knob* to exclude null from a variable's value-set - Applicable on any variable declaration. *All non-nullable shared-mutables must be definitely-assigned*. For nullable variables, the default value is null and, in either encoding-mode, the runtime is free to choose the encoding for the extra bit of information required to represent whether or not a variable is null. - *atomic-knob* to avoid tearing - Applicable on shared-mutable declarations, may be used by the class-user to reverse the effect of the tearable-knob, thereby restoring atomicity. Requiring definite-assignment to all non-nullable shared-mutables is useful to get rid of missed-initialization-bugs. As Tim explained to me, to initialize non-nullables with zero-values we can assign the constant ValueClass.zero or the array constructor new ValueClass.zero[size] which can be optimized away by the compiler. In this model, all value-based classes are migrated to (non-tearable) zero value-classes. So, even though LocalDate is a zero value-class, due to definite-assignment, it will be very hard to get an accidental "Jan 1, 1970". Rational can also be a zero value-class but users will have to keep in mind that it's possible to get a zero-denominator Rational, even if the constructor throws when we try to build one. Most Records, on the other hand, will be (no-zero) value-classes, making all their constructor invariants always reliable. They won't be inlined in shared-variables but, due to their big size, most wouldn't be anyway. The runtime chooses the encoding-mode of a ClassType variable according to this ternary expression: var encodingMode = : !valueClass(variable.type) ? REFERENCE : tooBig(variable.type.bitSize) ? REFERENCE // Illegal zeros: : !shared(variable) ? INLINE : !zeroValueClass(variable.type) ? REFERENCE // Atomicity: : final(variable) ? INLINE : atomicWrite(variable.type.bitSize) ? INLINE : atomic(variable) ? REFERENCE // atomic-knob : tearableValueClass(variable.type) ? INLINE : REFERENCE; The variable.type.bitSize changes with nullability i.e. a nullable type will have a higher bitSize due to the extra bits required to also represent null. The predicates tooBig and atomicWrite depend on the hardware specs. For example, on my laptop they could be (written in the "Concise Method Bodies" style): boolean tooBig(int bitSize) -> bitSize>256; boolean atomicWrite(int bitSize) -> bitSize<=64; Jo?o Mendon?a PS: I realize that, probably, all of this has been analyzed and discussed to death during the past 7 years in this mailing list, and I, for only having learned a tiny bit of it, am just making a fool out of myself with all these silly suggestions. If that is the case, please let me know and I apologize in advance for wasting your time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 23 19:01:24 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 23 Jun 2022 15:01:24 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: > All else being equal, the idea to use "inaccessible value type" over > "value type doesn't exist" feels very good and simplifying, with the > main problem that the syntax can't help but be gross. A few weeks in, and this latest stacking is still feeling pretty good: ?- There are no coarse buckets any more; there are just identity classes and value classes. ?- Value classes have ref and val companion types with the obvious properties.? (Notably, refs are always atomic.) ?- For `value class C`, C as a type is an alias for `C.ref`. ?- The bucket formerly known as B2 becomes "value class, whose .val type is private."? This is the default for a value class. ?- The bucket formerly known as B3a is denoted by explicitly making the val companion public, with a public modifier on a "member" of the class. ?- The bucket formerly known as B3n is denoted by explicitly making the val companion public and non-atomic, again using modifiers. I went and updated the State of the Values document to use the new terminology, test-driving some new syntax.? (Usual rules: syntax comments are premature at this time.)? I was very pleased with the result, because almost all the changes were small changes in terminology (e.g., "value companion type"), and eliminating the clumsy distinction between value classes and primitive classes.? Overall the structure remains the same, but feels more compact and clean.? MD source is below, for review. Kevin's two questions remain, but I don't think they get in the way of refining the model in this way: ?- Have we made the right choices around == ? ?- Are we missing a big opportunity by not spelling Complex.val with a bang? # State of Valhalla ## Part 2: The Language Model {.subtitle} #### Brian Goetz {.author} #### June 2022 {.date} > _This is the second of three documents describing the current State of ? Valhalla.? The first is [The Road to Valhalla](01-background); the ? third is [The JVM Model](03-vm-model)._ This document describes the directions for the Java _language_ charted by Project Valhalla.? (In this document, we use "currently" to describe the language as it stands today, without value classes.) Valhalla started with the goal of providing user-programmable classes which can be flat and dense in memory.? Numerics are one of the motivating use cases; adding new primitive types directly to the language has a very high barrier.? As we learned from [Growing a Language][growing] there are infinitely many numeric types we might want to add to Java, but the proper way to do that is via libraries, not as a language feature. ## Primitive and reference types in Java today Java currently has eight built-in primitive types.? Primitives represent pure _values_; any `int` value of "3" is equivalent to, and indistinguishable from, any other `int` value of "3".? Primitives are monolithic (their bits cannot be addressed individually) and have no canonical location, and so are _freely copyable_. With the exception of the unusual treatment of exotic floating point values such as `NaN`, the `==` operator performs a _substitutibility test_ -- it asks "are these two values the same value". Java also has _objects_, and each object has a unique _object identity_. Because of identity, objects are not freely copyable; each object lives in exactly one place at any given time, and to access its state we have to go to that place. But we mostly don't notice this because objects are not manipulated or accessed directly, but instead through _object references_.? Object references are also a kind of value -- they encode the identity of the object to which they refer, and the `==` operator on object references asks "do these two references refer to the same object."? Accordingly, object _references_ (like other values) can be freely copied, but the objects they refer to cannot. Primitives and objects differ in almost every conceivable way: | Primitives???????????????????????????????? | Objects??????????????????????????? | | ------------------------------------------ | ---------------------------------- | | No identity (pure values)????????????????? | Identity?????????????????????????? | | `==` compares values?????????????????????? | `==` compares object identity????? | | Built-in?????????????????????????????????? | Declared in classes??????????????? | | No members (fields, methods, constructors) | Members (including mutable fields) | | No supertypes or subtypes????????????????? | Class and interface inheritance??? | | Accessed directly????????????????????????? | Accessed via object references???? | | Not nullable?????????????????????????????? | Nullable?????????????????????????? | | Default value is zero????????????????????? | Default value is null????????????? | | Arrays are monomorphic???????????????????? | Arrays are covariant?????????????? | | May tear under race??????????????????????? | Initialization safety guarantees?? | | Have reference companions (boxes)????????? | Don't need reference companions??? | The design of primitives represents various tradeoffs aimed at maximizing performance and usability of the primtive types.? Reference types default to `null`, meaning "referring to no object"; primitives default to a usable zero value (which for most primitives is the additive identity). Reference types provide initialization safety guarantees against a certain category of data races; primitives allow tearing under race for larger-than-32-bit values. We could characterize the design principles behind these tradeoffs are "make objects safer, make primitives faster." The following figure illustrates the current universe of Java's types.? The upper left quadrant is the built-in primitives; the rest of the space is reference types.? In the upper-right, we have the abstract reference types -- abstract classes, interfaces, and `Object` (which, though concrete, acts more like an interface than a concrete class).? The built-in primitives have wrappers or boxes, which are reference types.
? ??? Current universe of Java 
field types ?
Valhalla aims to unify primitives and objects in that they can both be declared with classes, but maintains the special runtime characteristics primitives have.? But while everyone likes the flatness and density that user-definable value types promise, in some cases we want them to be more like classical objects (nullable, non-tearable), and in other cases we want them to be more like classical primitives (trading some safety for performance). ## Value classes: separating references from identity Many of the impediments to optimization that Valhalla seeks to remove center around _unwanted object identity_.? The primitive wrapper classes have identity, but it is a purely accidental one.? Not only is it not directly useful, it can be a source of bugs.? For example, due to caching, `Integer` can be accidentally compared correctly with `==` just often enough that people keep doing it. Similarly, [value-based classes][valuebased] such as `Optional` have no need for identity, but pay the costs of having identity anyway. Our first step is allowing class declarations to explicitly disavow identity, by declaring themselves as _value classes_.? The instances of a value class are called _value objects_. ``` value class ArrayCursor { ??? T[] array; ??? int offset; ??? public ArrayCursor(T[] array, int offset) { ??????? this.array = array; ??????? this.offset = offset; ??? } ??? public boolean hasNext() { ??????? return offset < array.length; ??? } ??? public T next() { ??????? return array[offset]; ??? } ??? public ArrayCursor advance() { ??????? return new ArrayCursor(array, offset+1); ??? } } ``` This says that an `ArrayCursor` is a class whose instances have no identity -- that instead they have _value semantics_.? As a consequence, it must give up the things that depend on identity; the class and its fields are implicitly final. But, value classes are still classes, and can have most of the things classes can have -- fields, methods, constructors, type parameters, superclasses (with some restrictions), nested classes, class literals, interfaces, etc.? The classes they can extend are restricted: `Object` or abstract classes with no instance fields, empty no-arg constructor bodies, no other constructors, no instance initializers, no synchronized methods, and whose superclasses all meet this same set of conditions.? (`Number` meets these conditions.) Classes in Java give rise to types; the class `ArrayCursor` gives rise to a type `ArrayCursor` (actually a parametric family of instantiations `ArrayCursor`.) `ArrayCursor` is still a reference type, just one whose references refer to value objects rather than identity objects. For the types in the upper-right quadrant of the diagram (interfaces, abstract classes, and `Object`), references to these types might refer to either an identity object or a value object. (Historically, JVMs were effectively forced to represent object references with pointers; for references to value objects, JVMs now have more flexibility.) Because `ArrayCursor` is a reference type, it is nullable (because references are nullable), its default value is null, and loads and stores of references are atomic with respect to each other even in the presence of data races, providing the initialization safety we are used to with classical objects. Because instances of `ArrayCursor` have value semantics, `==` compares by state rather than identity.? This means that value objects, like primitives, are _freely copyable_; we can explode them into their fields and re-aggregate them into another value object, and we cannot tell the difference. (Because they have no identity, some identity-sensitive operations, such as synchronization, are disallowed.) So far we've addressed the first two lines of the table of differences above; rather than identity being a property of all object instances, classes can decide whether their instances have identity or not.? By allowing classes that don't need identity to exclude it, we free the runtime to make better layout and compilation decisions -- and avoid a whole category of bugs. In looking at the code for `ArrayCursor`, we might mistakenly assume it will be inefficient, as each loop iteration appears to allocate a new cursor: ``` for (ArrayCursor c = Arrays.cursor(array); ???? c.hasNext(); ???? c = c.advance()) { ??? // use c.next(); } ``` One should generally expect here that _no_ cursors are actually allocated. Because an `ArrayCursor` is just its two fields, these fields will routinely get scalarized and hoisted into registers, and the constructor call in `advance` will typically compile down to incrementing one of these registers. ### Migration The JDK (as well as other libraries) has many [value-based classes][valuebased] such as `Optional` and `LocalDateTime`.? Value-based classes adhere to the semantic restrictions of value classes, but are still identity classes -- even though they don't want to be.? Value-based classes can be migrated to true value classes simply by redeclaring them as value classes, which is both source- and binary-compatible. We plan to migrate many value-based classes in the JDK to value classes. Additionally, the primitive wrappers can be migrated to value classes as well, making the conversion between `int` and `Integer` cheaper; see the section "Legacy Primitives" below.? (In some cases, this may be _behaviorally_ incompatible for code that synchronizes on the primitive wrappers.? [JEP 390][jep390] has supported both compile-time and runtime warnings for synchronizing on primitive wrappers since Java 16.)
? ??? Java field types adding 
value classes ?
### Equality Earlier we said that `==` compares value objects by state rather than by identity.? More precisely, two value objects are `==` if they are of the same type, and each of their fields are pairwise equal, where equality is given by `==` for primitives (except `float` and `double`, which are compared with `Float::equals` and `Double::equals` to avoid anomalies), `==` for references to identity objects, and recursively with `==` for references to value objects.? In no case is a value object ever `==` to a reference to an identity object. ### Value records While records have a lot in common with value classes -- they are final and their fields are final -- they are still identity classes. Records embody a tradeoff: give up on decoupling the API from the representation, and in return get various syntactic and semantic benefits.? Value classes embody another tradeoff: give up identity, and get various semantic and performance benefits. If we are willing to give up both, we can get both sets of benefits. ``` value record NameAndScore(String name, int score) { } ``` Value records combine the data-carrier idiom of records with the improved scalarization and flattening benefits of value classes. In theory, it would be possible to apply `value` to certain enums as well, but this is not currently possible because the `java.lang.Enum` base class that enums extend do not meet the requirements for superclasses of value classes (it has fields and non-empty constructors). ## Unboxing values for flatness and density Value classes shed object identity, gaining a host of performance and predictability benefits in the process.? They are an ideal replacement for many of today's value-based classes, fully preserving their semantics (except for the accidental identity these classes never wanted).? But identity-free reference types are only one point a spectrum of tradeoffs between abstraction and performance, and other desired use cases -- such as numerics -- may want a different set of tradeoffs. Reference types are nullable, and therefore must account for null somehow in their representation, which may involve additional footprint. Similarly, they offer the initialization safety guarantees for final fields that we come to expect from identity objects, which may entail limits on flatness.? For certain use cases, it may be desire to additionally give up something else to make further flatness and footprint gains -- and that something else is reference-ness. The built-in primitives are best understood as _pairs_ of types: a primitive type (e.g., `int`) and its reference companion or box (`Integer`), with conversions between the two (boxing and unboxing.)? We have both types because the two have different characteristics.? Primitives are optimized for efficient storage and access: they are not nullable, they tolerate uninitialized (zero) values, and larger primitive types (`long`, `double`) may tear under racy access.? References err on the side of safety and flexibility; they support nullity, polymorphism, and offer initialization safety (freedom from tearing), but by comparison to primitives, they pay a footprint and indirection cost. For these reasons, value classes give rise to pairs of types as well: a reference type and a _value companion type_.? We've seen the reference type so far; for a value class `Point`, the reference type is called `Point`.? (The full name for the reference type is `Point.ref`; `Point` is an alias for that.)? The value companion type is called `Point.val`, and the two types have the same conversions between them as primitives do today with their boxes.? (If we are talking explicitly about the value companion type of a value class, we may sometimes describe the corresponding reference type as its _reference companion_.) ``` value class Point implements Serializable { ??? int x; ??? int y; ??? Point(int x, int y) { ??????? this.x = x; ??????? this.y = y; ??? } ??? Point scale(int s) { ??????? return new Point(s*x, s*y); ??? } } ``` The default value of the value companion type is the one for which all fields take on their default value; the default value of the reference type is, like all reference types, null. In our diagram, these new types show up as another entity that straddles the line between primitives and identity-free references, alongside the legacy primitives: ** UPDATE DIAGRAM **
? ??? Java field types with 
extended primitives ?
### Member access Both the reference and value companion types are seen to have the same instance members.? Unlike today's primitives, value companion types can be used as receivers to access fields and invoke methods, subject to accessibility constraints: ``` Point.val p = new Point(1, 2); assert p.x == 1; p = p.scale(2); assert p.x == 2; ``` ### Polymorphism When we declare a class today, we set up a subtyping (is-a) relationship between the declared class and its supertypes.? When we declare a value class, we set up a subtyping relationship between the _reference type_ and the declared supertypes. This means that if we declare: ``` value class UnsignedShort extends Number ????????????????????????? implements Comparable { ?? ... } ``` then `UnsignedShort` is a subtype of `Number` and `Comparable`, and we can ask questions about subtyping using `instanceof` or pattern matching. What happens if we ask such a question of the value companion type? ``` UnsignedShort.val us = ... if (us instanceof Number) { ... } ``` Since subtyping is defined only on reference types, the `instanceof` operator (and corresponding type patterns) will behave as if both sides were lifted to the approrpriate reference type, and we can answer the question that way.? (This may trigger fears of expensive boxing conversions, but in reality no actual allocation will happen.) We introduce a new relationship based on `extends` / `implements` clauses, which we'll call "extends"; we define `A extends B` as meaning `A <: B` when A is a reference type, and `A.ref <: B` when A is a value companion type.? The `instanceof` relation, reflection, and pattern matching are updated to use "extends". ### Arrays Arrays of reference types are _covariant_; this means that if `A <: B`, then `A[] <: B[]`.? This allows `Object[]` to be the "top array type", at least for arrays of references.? But arrays of primitives are currently left out of this story.?? We can unify the treatment of arrays by defining array covariance over the new "extends" relationship; if A extends B, then `A[] <: B[]`.? For a value class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the top type for all arrays. ### Equality Just as with `instanceof`, we define `==` on values by appealing to the reference companion (though no actual boxing need occur). Evaluating `a == b`, where one or both operands are of a value companion type, can be defined as if the operands are first converted to their corresponding reference type, and then comparing the results.? This means that the following will succeed: ``` Point.val p = new Point(3, 4); Point pr = p; assert p == pr; ``` The base implementation of `Object::equals` delegates to `==`, which is a suitable default for both reference and value classes. ### Serialization If a value class implements `Serializable`, this is also really a statement about the reference type.? Just as with other aspects described here, serialization of value companions can be defined by converting to the corresponding reference type and serializing that, and reversing the process at deserialization time. Serialization currently uses object identity to preserve the topology of an object graph.? This generalizes cleanly to objects without identity, because `==` on value objects treats two identical copies of a value object as equal. So any observations we make about graph topology prior to serialization with `==` are consistent with those after deserialization. ### Identity-sensitive operations Certain operations are currently defined in terms of object identity.? As we've already seen, some of these, like equality, can be sensibly extended to cover all instances.? Others, like synchronization, will become partial. Identity-sensitive operations include: ? - **Equality.**? We extend `==` on references to include references to value ??? objects.? Where it currently has a meaning, the new definition coincides ??? with that meaning. ? - **System::identityHashCode.**? The main use of `identityHashCode` is in the ??? implementation of data structures such as `IdentityHashMap`.? We can extend ??? `identityHashCode` in the same way we extend equality -- deriving a hash on ??? primitive objects from the hash of all the fields. ? - **Synchronization.**? This becomes a partial operation.? If we can ??? statically detect that a synchronization will fail at runtime (including ??? declaring a `synchronized` method in a value class), we can issue a ??? compilation error; if not, attempts to lock on a value object results in ??? `IllegalMonitorStateException`.? This is justifiable because it is ??? intrinsically imprudent to lock on an object for which you do not have a ??? clear understanding of its locking protocol; locking on an arbitrary ??? `Object` or interface instance is doing exactly that. ? - **Weak, soft, and phantom references.**? Capturing an exotic reference to a ??? value object becomes a partial operation, as these are intrinsically tied to ??? reachability (and hence to identity).? However, we will likely make ??? enhancements to `WeakHashMap` to support mixed identity and value keys. ### What about Object? The root class `Object` poses an unusual problem, in that every class must extend it directly or indirectly, but it is also instantiable (non-abstract), and its instances have identity -- it is common to use `new Object()` as a way to obtain a new object identity for purposes of locking. ## Why two types? It is sensible to ask: why do we need companion types at all? This is analogous to the need for boxes in 1995: we'd made one set of tradeoffs for primitives, favoring performance (non-nullable, zero-default, tolerant of non-initialization, tolerant of tearing under race, unrelated to `Object`), and another for references, favoring flexibility and safety.? Most of the time, we ignored the primitive wrapper classes, but sometimes we needed to temporarily suppress one of these properties, such as when interoperating with code that expects an `Object` or the ability to express "no value".? The reasons we needed boxes in 1995 still apply today: sometimes we need the affordances of references, and in those cases, we appeal to the reference companion. Reasons we might want to use the reference companion include: ?- **Interoperation with reference types.**? Value classes can implement ?? interfaces and extend classes (including `Object` and some abstract classes), ?? which means some class and interface types are going to be polymorphic over ?? both identity and primitive objects.? This polymorphism is achieved through ?? object references; a reference to `Object` may be a reference to an identity ?? object, or a reference to a value object. ?- **Nullability.**? Nullability is an affordance of object _references_, not ?? objects themselves.? Most of the time, it makes sense that primitive types ?? are non-nullable (as the primitives are today), but there may be situations ?? where null is a semantically important value.? Using the reference companion ?? when nullability is required is semantically clear, and avoids the need to ?? invent new sentinel values for "no value." ?? This need comes up when migrating existing classes; the method `Map::get` ?? uses `null` to signal that the requested key was not present in the map. But, ?? if the `V` parameter to `Map` is a primitive class, `null` is not a valid ?? value.? We can capture the "`V` or null" requirement by changing the ?? descriptor of `Map::get` to: ?? ``` ?? public V.ref get(K key); ?? ``` ?? where, whatever type `V` is instantiated as, `Map::get` returns the reference ?? companion. (For a type `V` that already is a reference type, this is just `V` ?? itself.) This captures the notion that the return type of `Map::get` will ?? either be a reference to a `V`, or the `null` reference. (This is a ?? compatible change, since both erase to the same thing.) ?- **Self-referential types.**? Some types may want to directly or indirectly ?? refer to themselves, such as the "next" field in the node type of a linked ?? list: ?? ``` ?? class Node { ?????? T theValue; ?????? Node nextNode; ?? } ?? ``` ?? We might want to represent this as a value class, but if the type of ?? `nextNode` were `Node.val`, the layout of `Node` would be ?? self-referential, since we would be trying to flatten a `Node` into its own ?? layout. ?- **Protection from tearing.**? For a value class with a non-atomic value ?? companion type, we may want to use the reference companion in cases where we ?? are concerned about tearing; because loads and stores of references are ?? atomic, `P.ref` is immune to the tearing under race that `P.val` might be ?? subject to. ?- **Compatibility with existing boxing.**? Autoboxing is convenient, in that it ?? lets us pass a primitive where a reference is required.? But boxing affects ?? far more than assignment conversion; it also affects method overload ?? selection.? The rules are designed to prefer overloads that require no ?? conversions to those requiring boxing (or varargs) conversions.? Having both ?? a value and reference type for every value class means that these rules can ?? be cleanly and intuitively extended to cover value classes. ## Refining the value companion Value classes have several options for refining the behavior of the value companion type and how they are exposed to clients. ### Classes with no good default value For a value class `C`, the default value of `C.ref` is the same as any other reference type: `null`.? For the value companion type `C.val`, the default value is the one where all of its fields are initialized to their default value. The built-in primitives reflect the design assumption that zero is a reasonable default.? The choice to use a zero default for uninitialized variables was one of the central tradeoffs in the design of the built-in primitives.? It gives us a usable initial value (most of the time), and requires less storage footprint than a representation that supports null (`int` uses all 2^32 of its bit patterns, so a nullable `int` would have to either make some 32 bit signed integers unrepresentable, or use a 33rd bit).? This was a reasonable tradeoff for the built-in primitives, and is also a reasonable tradeoff for many (but not all) other potential value classes (such as complex numbers, 2D points, half-floats, etc). But for others potential value classes, such as `LocalDate`, there _is_ no reasonable default.? If we choose to represent a date as the number of days since some some epoch, there will invariably be bugs that stem from uninitialized dates; we've all been mistakenly told by computers that something will happen on or near 1 January 1970.? Even if we could choose a default other than the zero representation, an uninitialized date is still likely to be an error -- there simply is no good default date value. For this reason, value classes have the choice of encapsulating or exposing their value companion type.? If the class is willing to tolerate an uninitialized (zero) value, it can freely share its `.val` companion with the world; if uninitialized values are dangerous (such as for `LocalDate`), it can be encapsulated to the class or package. Encapsulation is accomplished using ordinary access control.? By default, the value companion is `private`, and need not be declared explicitly; a class that wishes to share its value companion can make it public: ``` public value record Complex(double real, double imag) { ??? public value companion Complex.val; } ``` ### Atomicity and tearing For the primitive types longer than 32 bits (long and double), it is not guaranteed that reads and writes from different threads (without suitable coordination) are atomic with respect to each other.? The result is that, if accessed under data race, a long or double field or array element can be seen to "tear", and a read might see the low 32 bits of one write and the high 32 bits of another.? (Declaring the containing field `volatile` is sufficient to restore atomicity, as is properly coordinating with locks or other concurrency control, or not sharing across threads in the first place.) This was a pragmatic tradeoff given the hardware of the time; the cost of 64-bit atomicity on 1995 hardware would have been prohibitive, and problems only arise when the program already has data races -- and most numeric code deals with thread-local data.? Just like with the tradeoff of nulls vs zeros, the design of the built-in primitives permits tearing as part of a tradeoff between performance and correctness, where primitives chose "as fast as possible" and reference types chose more safety. Today, most JVMs give us atomic loads and stores of 64-bit primitives, because the hardware makes them cheap enough.? But value classes bring us back to 1995; atomic loads and stores of larger-than-64-bit values are still expensive on many CPUs, leaving us with a choice of "make operations on primitives slower" or permitting tearing when accessed under race. It would not be wise for the language to select a one-size-fits-all policy about tearing; choosing "no tearing" means that types like `Complex` are slower than they need to be, even in a single-threaded program; choosing "tearing" means that classes like `Range` can be seen to not exhibit invariants asserted by their constructor.? Class authors have to choose, with full knowledge of their domain, whether their types can tolerate tearing.? The default is no tearing (safe by default); a class can opt for greater flattening at the cost of potential tearing by declaring the value companion as `non-atomic`: ``` public value record Complex(double real, double imag) { ??? public non-atomic value companion Complex.val; } ``` For classes like `Complex`, all of whose bit patterns are valid, this is very much like the choice around `long` in 1995.? For other classes that might have nontrivial representational invariants, they likely want to stick to the default of atomicity. ## Migrating legacy primitives As part of generalizing primitives, we want to adjust the built-in primitives to behave as consistently with value classes as possible.? While we can't change the fact that `int`'s reference companion is the oddly-named `Integer`, we can give them more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an alias for `Integer.val`) -- so that we can use a consistent rule for naming companions. Similarly, we can extend member access to the legacy primitives, and allow `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) We will redeclare `Integer` as a value class with a public value companion: ``` value class Integer { ??? public value companion Integer.val; ??? // existing methods } ``` where the type name `int` is an alias for `Integer.val`.? The primitive array types will be retrofitted such that arrays of primitives are subtypes of arrays of their boxes (`int[] <: Integer[]`). ## Unifying primitives with classes Earlier, we had a chart of the differences between primitive and reference types: | Primitives???????????????????????????????? | Objects??????????????????????????? | | ------------------------------------------ | ---------------------------------- | | No identity (pure values)????????????????? | Identity?????????????????????????? | | `==` compares values?????????????????????? | `==` compares object identity????? | | Built-in?????????????????????????????????? | Declared in classes??????????????? | | No members (fields, methods, constructors) | Members (including mutable fields) | | No supertypes or subtypes????????????????? | Class and interface inheritance??? | | Accessed directly????????????????????????? | Accessed via object references???? | | Not nullable?????????????????????????????? | Nullable?????????????????????????? | | Default value is zero????????????????????? | Default value is null????????????? | | Arrays are monomorphic???????????????????? | Arrays are covariant?????????????? | | May tear under race??????????????????????? | Initialization safety guarantees?? | | Have reference companions (boxes)????????? | Don't need reference companions??? | The addition of value classes addresses many of these directly. Rather than saying "classes have identity, primitives do not", we make identity an optional characteristic of classes (and derive equality semantics from that.)? Rather than primitives being built in, we derive all types, including primitives, from classes, and endow value companion types with the members and supertypes declared with the value class.? Rather than having primitive arrays be monomorphic, we make all arrays covariant under the `extends` relation. The remaining differences now become differences between reference types and value types: | Value types?????????????????????????????????? | Reference types????????????????? | | --------------------------------------------- | -------------------------------- | | Accessed directly???????????????????????????? | Accessed via object references?? | | Not nullable????????????????????????????????? | Nullable???????????????????????? | | Default value is zero???????????????????????? | Default value is null??????????? | | May tear under race, if declared `non-atomic` | Initialization safety guarantees | ### Choosing which to use How would we choose between declaring an identity class or a value class, and the various options on value companiones?? Here are some quick rules of thumb: ?- If you need mutability, subclassing, or aliasing, choose an identity class. ?- If uninitialized (zero) values are unacceptable, choose a value class with ?? the value companion encapsulated. ?- If you have no cross-field invariants and are willing to tolerate tearing to ?? enable more flattening, choose a value class with a non-atomic value ?? companion. ## Summary Valhalla unifies, to the extent possible, primitives and objects.?? The following table summarizes the transition from the current world to Valhalla. | Current World?????????????????????????????? | Valhalla????????????????????????????????????????????????? | | ------------------------------------------- | --------------------------------------------------------- | | All objects have identity?????????????????? | Some objects have identity??????????????????????????????? | | Fixed, built-in set of primitives?????????? | Open-ended set of primitives, declared via classes??????? | | Primitives don't have methods or supertypes | Primitives are classes, with methods and supertypes?????? | | Primitives have ad-hoc boxes??????????????? | Primitives have regularized reference companions????????? | | Boxes have accidental identity????????????? | Reference companions have no identity???????????????????? | | Boxing and unboxing conversions???????????? | Primitive reference and value conversions, but same rules | | Primitive arrays are monomorphic??????????? | All arrays are covariant????????????????????????????????? | [valuebased]: https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 [jep390]: https://openjdk.java.net/jeps/390 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jun 24 15:04:17 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 24 Jun 2022 11:04:17 -0400 Subject: Concerns about the plan for `==` In-Reply-To: References: Message-ID: <41c6c1d3-0139-c3a2-b0a4-1a6d44875ee2@oracle.com> I don't have an answer for you, but I can add some information to the mix. Currently there are _nine_ "implementations" of `==`; one for references, and one for each of the eight primitives. Regardless of whether or not they are perfect tests of substitutibility (curse you, floating point), the eight primitive `==` functions are highly domain-specific.? They can be so because the primitives are monomorphic.? In a sense, we've allowed primitives to "overload" `==` because monomorphism means we can define `==` with full knowledge of the domain, and without worry about non-well-definedness or the various other problems of `equals` in extensible class hierarchies (as EJ exhaustively catalogued.) What's being proposed here is that we evolve `Object==` from "compare identities" to a case analysis, to account for the fact that Object will describe more things: ??? case (IdentityObject a, IdentityObject b) -> identity==(a, b) ??? case (ValueObject a, ValueObject b) -> (isNull(a) == isNull(b)) && (type(a) == type(b)) && (state(a) == state(b)) ??? default -> false Just as `identity==` was the best we could do as a default on polymorphic identity objects, this is the best we can do on polymorphic mixed identity/value objects.? (There's a whole digression into overloading `==` on value types, but I'm not going to go there right now.) While we're not making the problem of "`==` is unreliable" better, and arguably making it incrementally worse by making it work in more cases that look a little like the cases in which it is unreliable, we *are* making something better here: you can now use `.equals()` everywhere.? One of the complains about `==` is that sometimes you use `==` and sometimes you use `.equals()` and sometimes you can accidentally use one where you should use the other.? But this is because you couldn't previous use .equals() on primitives, so an `equals()` method would necessarily do things like: ??? boolean equals(Object o) { ??????? return o instanceof Foo f ??????????? && f.size == this.size ??????????? && f.name.equals(this.name); ??? } What stinks here is that at each point, you have to ask yourself "equals, or =="?? Now you can have a fixed rule: always say `.equals()`: ??? boolean equals(Object o) { ??????? return o instanceof Foo f ??????????? && f.size.equals(this.size)?? // works on int! ??????????? && f.name.equals(this.name); ??? } (The equals method on primitives is monomorphic so will JIT away, for anyone worried about the performance.) It is a little sad because we had to resolve the problem by using the unfortunate spelling all the time, because `==` got the good name, but that's not a new problem.? But it means the cognitive load can disappear if we train ourselves to uniformly use `.equals()`. We will surely have about a million calls to make `===` or `eq` or something else sugar for `.equals()`.? We can consider that, but I don't think its essential to do that now. On 6/15/2022 1:51 PM, Kevin Bourrillion wrote: > What I think I understand so far: > > The current plan for `==` for all bucket 2+ types (except the 8 > _primitive_ types, as I still use the word) is to have it perform a > fieldwise `==` comparison: identity equality for bucket 1 fields, what > it's always done for primitive fields, and of course recurse for the rest. > > If we consider that the broadest meaning of `a == b` has always been > "a and b are definitely absolutely indistinguishable no matter what", > then this plan seems to compatibly?preserve that, which makes sense > for purposes of transition. > > What concerns me: > > It's good for transition, at least on the surface, but it's a bad > long-term outcome. > > Users hunger for a shorter way to write `.equals()`, and they will > think this is it. I would not underestimate the pushback they will > experience to writing it out the long way in cases where `==` at least > *seems* to do the right thing. Because in some number of cases, it > *will* do the same thing; specifically, if you can recurse through > your fields and never hit a type that overrides equals(). > > This is extremely fragile. A legitimate change to one type can break > these expectations for all the types directly or indirectly depending > on it, no matter how far away. > > In supporting our Java users here, there's no good stance we can take > on it: if we forbid this practice and require them to call `.equals`, > we're being overzealous. If we try to help them use it carefully, at > best users will stop seeing `Object==Object` as a code smell (as we > have spent years training them to do) and then will start misusing it > even for reference types again. > > btw, why did I say it's good for transition "on the surface"? Because > for any class a user might migrate to bucket 2+, any existing calls to > `==` in the wild are extremely suspect and *should* be revisited > anyway; this is no less true here than it is for existing > synchronization etc. code. > > What's an alternative?: > > I'm sure what I propose is flawed, but I hope the core arguments are > compelling enough to at least help me fix it. > > The problem is that while we /can/?retcon `==` as described above, > it's not behavior anyone? really /wants/. So instead we double down on > the idea that non-primitive `==` has always been about identity and > must continue to be. That means it has to be invalid for bucket 2+ (at > compile-time for the .val type; failing later otherwise?). > > This would break some usages, but again, only at sites that deserve to > be reconsidered anyway. Some bugs will get fixed in the process. And > at least it's not the language upgrade itself that breaks them, only > the specific decision to move some type to new bucket. Lastly, we > don't need to break anyone abruptly; we can roll out warnings as I > proposed in the email "We need help to migrate from bucket 1 to 2". > > A non-record class that forgets to override equals() from Object even > upon migrating to bucket 2+ is also suspect. If nothing special is > done, it would fail at runtime just like any other usage of > `Foo.ref==Foo.ref`, and maybe that's fine. > > Again, I'm probably missing things, maybe even big things, but I'm > just trying to start a discussion. And if this can't happen I am just > searching for a solid understanding of why. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Jun 25 01:38:58 2022 From: john.r.rose at oracle.com (John Rose) Date: Fri, 24 Jun 2022 18:38:58 -0700 Subject: Concerns about the plan for `==` In-Reply-To: <41c6c1d3-0139-c3a2-b0a4-1a6d44875ee2@oracle.com> References: <41c6c1d3-0139-c3a2-b0a4-1a6d44875ee2@oracle.com> Message-ID: On 24 Jun 2022, at 8:04, Brian Goetz wrote: > ? > > It is a little sad because we had to resolve the problem by using the unfortunate spelling all the time, because `==` got the good name, but that's not a new problem. But it means the cognitive load can disappear if we train ourselves to uniformly use `.equals()`. > > We will surely have about a million calls to make `===` or `eq` or something else sugar for `.equals()`. We can consider that, but I don't think its essential to do that now. Well said. I agree. The good names are not the important part of this story. And we can improve them later; it doesn?t have to be now. From brian.goetz at oracle.com Mon Jun 27 18:48:07 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 27 Jun 2022 14:48:07 -0400 Subject: User model stacking: current status In-Reply-To: References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> Message-ID: <4e1e09aa-2ec8-6141-3b52-d0c39ea6965a@oracle.com> I've been bothered by an uncomfortable feeling that .val and ! are somehow different in nature, but haven't been able to put my finger on it.? Let me make another attempt. The "bang" and "question" operators operate on types.? In the strictest form, the bang operator takes a type that has null in its value set, and returns a type whose value set is the same, except for null.?? But observe that if the value set contains null, then the type has to be a reference type.? And the resulting type also has to be a reference type (except maybe for weird classes like Void) because we're preserving the remaining values, which are references.? So we could say: ??? bang :: RefType -> RefType Bang doesn't change the ref-ness, or id-ness, of a type, it just excludes a specific value from the value set. Now, what do ref and val do?? They don't operate on types, they operates on _classes_, to produce a type.? Val can only be applied to value classes, and produces a value type.? In the strictest interpretation (for consistency with bang), ref also only operates on value classes.? So: ??? val :: ValClass -> ValType ??? ref :: ValClass -> RefType Now, we've been strict with bang and ref to say they only work when they have a nontrivial effect, and could totalize them in the obvious way (ref is a no-op on an id class; bang is a no-op on a value type.)? Which would give us: ??? bang :: Type -> Type ??? val :: ValClass -> ValType ??? ref :: Class -> RefType with the added invariant that bang preserves id-ness/val-ness/ref-ness of types. But still, bang and ref operate on different things, and and produce different things; one takes a type and yields a slightly refined type with similar characteristics, the other takes a class and yields a type with highly specific characteristics.? We can conclude a lot from `val` (its a value type, which already says a lot), but we cannot conclude anything other than? non-nullity from `bang`; it might be a ref or a val type, it might come from an identity or value class. What this says to me is "val is a subtype of bang"; all vals are bangs, but not all bangs are vals. A harder problem is what to do about `question`.? The strict interpretation says we can only apply `question` to a type that is already non-null.? In our world, that's ValType. ??? question :: ValType -> Type Or we could totalize as we did with bang, and we get an invariant that question preserves id-ness, val-ness, ref-ness.? But, what does `question` really mean?? Null is a reference. So there are two interpretations: that question always yields a reference type (which means non-references need to be lifted/boxed), or that question yields a union type. It turns out that the latter is super-useful on the stack but kind of sucks in the heap.? The return value of `Map::get`, which we've been calling `T.ref`, really wants a union type (T or Null); similarly, many difficult questions in pattern matching might be made less difficult with a `T or Null` Type.? But there is no efficient heap-based representation for such a union type; we could use tagged unions (blech) or just fall back to boxing.? Which leaves us with the asymmetry that bang is representation-preserving (as well as other things), but question is not.? (Which makes sense in that one is subtractive and the other is additive.) So, to your question: is this permanently gross?? I think if we adopt the strictest intepretations: ?- bang is only allowed on types that are already nullable ?- question is only allowed on types that are not nullable (or on type variables) ?- val is only allowed on value classes ?- ref is only allowed on value classes (or on type variables) (And we can possibly boil away the last one, since if we can say `T?`, there is no need for `T.ref` anywhere.) What this means is that you can say `String!`, but not `Optional!`, because Optional is already null-free.? Which means there is never any question whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` if we exclude ref entirely).? So then, rather than two ways to say the same thing, there are two ways to say two different things, which have different absolute strengths. This is somewhat unfortunate, but not "permanently gross." If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we can consider finding another way to spell `.val` which is less intrusive, though there are not too many options that don't look like line noise. On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: > > * I still am saddled with the deep feeling that ultimate victory here > looks like "we don't need a val type, because by capturing the > nullness bit and tearability info alone we will make /enough/ usage > patterns always-optimizable, and we can live with the downsides". To > me the upsides of this simplification are enormous, so if we really > must reject it, I may need some help understanding why. It's been > stated that a non-null value type means something slightly different > from a non-null reference type, but I'm not convinced of this; it's > just that sometimes you have the technical ability to conjure a > "default" instance and sometimes you don't, but nullness of the type > means what it means either way. > > * I think if we plan to go this way (.val), and then we one day > have a nullable types feature, some things will then be > permanently gross that I would hope we can avoid. For example, > nullness *also* demands the concept of bidirectional projection of > type variables, and for very overlapping reasons. This puts things > in a super weird place. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Jun 28 19:25:42 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 28 Jun 2022 15:25:42 -0400 Subject: Bang, question, ref, and val (was: User model stacking: current status) In-Reply-To: <4e1e09aa-2ec8-6141-3b52-d0c39ea6965a@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <4e1e09aa-2ec8-6141-3b52-d0c39ea6965a@oracle.com> Message-ID: <6f53f206-a979-ae92-2a36-ffbe8799d5e7@oracle.com> Some further thoughts on the nature of bang, question, ref, and val. The model outlined in my mail from yesterday accounted for the distinction between class and type, but left something important out: carriers.? Adding these into the mix, I think this clarifies why `.val` and `!` are different, and why `!` and `?` are not pure inverses. The user declares _classes_, which includes identity and value classes.? Ignoring generics for the moment, we derive _types_ from classes.? Identity classes give rise to a single principal type (whose name is the written the same as the class, but let's call this `C.ref` for clarity); value classes give rise to two principal types, `C.ref` and `C.val`. So `val` and `ref` are functions from Class to Type (val is partial): ??? val :: ValueClass -> Type ??? ref :: Class -> Type What's missing is Carrier.? Ignoring the legacy primitive carriers (I, J, F, D), we have two carriers, L and Q.? Every type has a carrier.? For the "ref" types, the carrier is L; for the "val" types, the carrier is Q: ??? carrier ref T = L ??? carrier val T = Q Now, bang and question.? These are operators on types.? Bang restricts the value set; question (potentially) augments the value set to include null.? Question is best describe as yielding a union type: `T? === T|Null`.? (Note that for all reference types T, T|Null == T, because Null <: T.) What are the carriers for bang and question types?? We define the carrier on union types by taking the stronger of the two carriers: ??? carrier T|U = max (carrier T) (carrier U) which means that ??? carrier question T = L since we need an L carrier to represent null.? But for "bang", we can preserve the carrier, since we're representing fewer values: ??? carrier bang T = carrier T (Why wouldn't we downgrade the carrier of `Point!` to Q? Because the carrier means more than nullity; it affects atomicity, layout, initialization strategy, etc.) What this means is that `question` is always information-losing, and that: ??? carrier bang question T = L ??? carrier question bang T = L So, the ugly fact here is that "bang" and "question" are not inverses; `T!?` is not always T, nor is `T?!`. But what I want to know is this: how do we want to denote "T or null", when T is a type variable?? This turns out to be the only place we currently have to utter `.ref`.? And uttering `.ref` here feels like asking the user to do the language's job; what the user wants is to describe the union type "T|Null".? (Since the only sensible representation for this is a reference type, the language will translate it as such anyway, but that's the language's job.) This is related to how we ask people to describe "nullable int".? There are three choices: `int?`, `int.ref`, and `Integer`.? I would argue that the first is closest to what the user wants: a statement about value sets.? `int.ref` brings in carriers, which is unrelated to what the user really wants here; `Integer` is even worse because the relationship between int and Integer is ad-hoc.? Of course, they will all translate the same way (the L carrier), but that's the compiler's job. For the only remaining use of `.ref` (returning V.ref from Map::get and friends), I think we want the same; Map::get wants to return "V or null".? Again, ref-ness is a dependent thing, not the essence; the essence is "T|Null".? (Also there's a connection with type patterns, where we may want to expand a null-rejecting type pattern to a null-including one.) The problem, of course, is that once people see `?`, they will think it is "obvious" that we left out "!" by mistake, because of course they go together.? But they don't, really; they're different things.? But let's set bang aside, and turn to Kevin's next question, which is: if `?` is a union type with the null type, what does that say about `String?`?? This seems to be on a collision course, in that null-analysis efforts would want to treat `String?` as "String, with explicit nullness", but the union interpretation will collapse to just `String`. Which points the way towards what seems the proper role for bang and question in the surface syntax, if any: to *modify* types with respect to their inclusion of null.? So `String?` and `int!` should probably be errors, since String is already nullable and int is already non-nullable. Bottom line: as we've discovered half a dozen times already in this project, nearly every time we think that nullity is perfectly correlated to something, we discover it is not. Bang/question are not val/ref; we might be able to get away with using `int.ref` to describe nullable ints, but that doesn't help us at all with nullable or non-nullable type patterns; and none of these are the same as "known vs unknown nullity" (or known vs unknown initialization status.) On 6/27/2022 2:48 PM, Brian Goetz wrote: > I've been bothered by an uncomfortable feeling that .val and ! are > somehow different in nature, but haven't been able to put my finger on > it.? Let me make another attempt. > > The "bang" and "question" operators operate on types.? In the > strictest form, the bang operator takes a type that has null in its > value set, and returns a type whose value set is the same, except for > null.?? But observe that if the value set contains null, then the type > has to be a reference type.? And the resulting type also has to be a > reference type (except maybe for weird classes like Void) because > we're preserving the remaining values, which are references.? So we > could say: > > ??? bang :: RefType -> RefType > > Bang doesn't change the ref-ness, or id-ness, of a type, it just > excludes a specific value from the value set. > > Now, what do ref and val do?? They don't operate on types, they > operates on _classes_, to produce a type.? Val can only be applied to > value classes, and produces a value type.? In the strictest > interpretation (for consistency with bang), ref also only operates on > value classes.? So: > > ??? val :: ValClass -> ValType > ??? ref :: ValClass -> RefType > > Now, we've been strict with bang and ref to say they only work when > they have a nontrivial effect, and could totalize them in the obvious > way (ref is a no-op on an id class; bang is a no-op on a value type.)? > Which would give us: > > ??? bang :: Type -> Type > ??? val :: ValClass -> ValType > ??? ref :: Class -> RefType > > with the added invariant that bang preserves id-ness/val-ness/ref-ness > of types. > > But still, bang and ref operate on different things, and and produce > different things; one takes a type and yields a slightly refined type > with similar characteristics, the other takes a class and yields a > type with highly specific characteristics.? We can conclude a lot from > `val` (its a value type, which already says a lot), but we cannot > conclude anything other than? non-nullity from `bang`; it might be a > ref or a val type, it might come from an identity or value class. > > What this says to me is "val is a subtype of bang"; all vals are > bangs, but not all bangs are vals. > > A harder problem is what to do about `question`.? The strict > interpretation says we can only apply `question` to a type that is > already non-null.? In our world, that's ValType. > > ??? question :: ValType -> Type > > Or we could totalize as we did with bang, and we get an invariant that > question preserves id-ness, val-ness, ref-ness.? But, what does > `question` really mean?? Null is a reference.? So there are two > interpretations: that question always yields a reference type (which > means non-references need to be lifted/boxed), or that question yields > a union type. > > It turns out that the latter is super-useful on the stack but kind of > sucks in the heap.? The return value of `Map::get`, which we've been > calling `T.ref`, really wants a union type (T or Null); similarly, > many difficult questions in pattern matching might be made less > difficult with a `T or Null` Type.? But there is no efficient > heap-based representation for such a union type; we could use tagged > unions (blech) or just fall back to boxing. Which leaves us with the > asymmetry that bang is representation-preserving (as well as other > things), but question is not.? (Which makes sense in that one is > subtractive and the other is additive.) > > So, to your question: is this permanently gross?? I think if we adopt > the strictest intepretations: > > ?- bang is only allowed on types that are already nullable > ?- question is only allowed on types that are not nullable (or on type > variables) > ?- val is only allowed on value classes > ?- ref is only allowed on value classes (or on type variables) > > (And we can possibly boil away the last one, since if we can say `T?`, > there is no need for `T.ref` anywhere.) > > What this means is that you can say `String!`, but not `Optional!`, > because Optional is already null-free.? Which means there is never any > question whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` if > we exclude ref entirely).? So then, rather than two ways to say the > same thing, there are two ways to say two different things, which have > different absolute strengths. > > This is somewhat unfortunate, but not "permanently gross." > > If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we can > consider finding another way to spell `.val` which is less intrusive, > though there are not too many options that don't look like line noise. > > > > > > On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: >> >> * I still am saddled with the deep feeling that ultimate victory here >> looks like "we don't need a val type, because by capturing the >> nullness bit and tearability info alone we will make /enough/ usage >> patterns always-optimizable, and we can live with the downsides". To >> me the upsides of this simplification are enormous, so if we really >> must reject it, I may need some help understanding why. It's been >> stated that a non-null value type means something slightly different >> from a non-null reference type, but I'm not convinced of this; it's >> just that sometimes you have the technical ability to conjure a >> "default" instance and sometimes you don't, but nullness of the type >> means what it means either way. >> >> * I think if we plan to go this way (.val), and then we one day >> have a nullable types feature, some things will then be >> permanently gross that I would hope we can avoid. For example, >> nullness *also* demands the concept of bidirectional projection >> of type variables, and for very overlapping reasons. This puts >> things in a super weird place. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jun 29 14:38:08 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 29 Jun 2022 16:38:08 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> Message-ID: <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Kevin Bourrillion" > Cc: "daniel smith" , "valhalla-spec-experts" > > Sent: Thursday, June 23, 2022 9:01:24 PM > Subject: Re: User model stacking: current status > On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: >> All else being equal, the idea to use "inaccessible value type" over "value type >> doesn't exist" feels very good and simplifying, with the main problem that the >> syntax can't help but be gross. > A few weeks in, and this latest stacking is still feeling pretty good: > - There are no coarse buckets any more; there are just identity classes and > value classes. > - Value classes have ref and val companion types with the obvious properties. > (Notably, refs are always atomic.) > - For `value class C`, C as a type is an alias for `C.ref`. > - The bucket formerly known as B2 becomes "value class, whose .val type is > private." This is the default for a value class. > - The bucket formerly known as B3a is denoted by explicitly making the val > companion public, with a public modifier on a "member" of the class. > - The bucket formerly known as B3n is denoted by explicitly making the val > companion public and non-atomic, again using modifiers. > I went and updated the State of the Values document to use the new terminology, > test-driving some new syntax. (Usual rules: syntax comments are premature at > this time.) I was very pleased with the result, because almost all the changes > were small changes in terminology (e.g., "value companion type"), and > eliminating the clumsy distinction between value classes and primitive classes. > Overall the structure remains the same, but feels more compact and clean. MD > source is below, for review. > Kevin's two questions remain, but I don't think they get in the way of refining > the model in this way: > - Have we made the right choices around == ? > - Are we missing a big opportunity by not spelling Complex.val with a bang? I think you have done a good job describing the pro of that model but weirdly not list the cons of that model. I see three reasons your proposed model, let's call it the companion class model, needs improvements. It fails our moto, the companion class model and the VM models are not aligned and the performance model is a "sigil for performance" model. It fails our moto (code like a class, works like an int): If i say that an Image is an array of pixels with each pixel have three colors, the obvious translation is not the right one: class Image { Pixel[][] pixels; } value record Pixel(Color red, Color green, Color blue) {} value record Color(byte value) {} because a value class is nullable, only it's companion class is not nullable, the correct code is class Image { Pixel.val[][] pixels; } value record Pixel(Color.val red, Color.val green, Color.val blue) {} value record Color(byte value) {} Color and byte does not work the same way, it's not code like a class works like an int but code like a class, works like an Integer. The VM models and the Java model are not aligned: For the VM model, L-type and Q-type on equal footing, not one is more important than the other, but the companion class model you propose makes the value class a first citizen and the companion class a second citizen. We know that when the Java model and the VM model are not aligned, bugs will lie in between. Those can be mild bugs, by example you can throw a checked exception from a method not declaring that exception or painful bugs in the case of generics or serialization. I think we should list all the cases where the Java Model and the VM model disagree to see the kind of bugs we will ask the future generation to solve. By example, having a value class with a default constructor and public companion class looks like a lot like a deserialization bug to me, in both case you are able to produce an instance that bypass the constructor. The other problem is for the other languages than Java. Do those languages will have to define a companion class or a companion class is purely a javac artifact the same way an attribute like InnerClass is. The proposed performance model is a "sigil for performance" model. There is a tradeoff between the safety of the reference vs the performance of flattened value type. In the proposed model, the choice is not done by the maintainer of the class but by the user of the class. This is not fully true, the maintainer of the class can make the companion class private choosing safety but it can not choose performance. The performance has to be chosen by the user of the class. This is unlike everything we know in Java, this kind of model where the user choose performance is usually called "sigil for performance", the user has to add some magical keywords or sigil to get performance. A good example of such performance model is the keyword "register" in C. You have to opt-in at use site to get performance. Moreover unlike in C, in Java we also have to take care of the fact that adding .val is not a backward compatible change, if a value class is used in a public method a user can not change it to its companion class after the fact. We know from the errors of past that a "sigil for performance" model is a terrible model. Overall, i don't think it's the wrong model, but it over-rotates on the notion of reference value class, it's refreshing because in the past we had the tendency to over-rotate on the notion of flattened value class. I really think that this model can be improved by allowing top-level value class to be declared either as reference or as value and the companion class to be either a value class projection or a reference class projection so the Java model and the VM model will be more in sync. R?mi > # State of Valhalla > ## Part 2: The Language Model {.subtitle} > #### Brian Goetz {.author} > #### June 2022 {.date} > > _This is the second of three documents describing the current State of > Valhalla. The first is [The Road to Valhalla](01-background); the > third is [The JVM Model](03-vm-model)._ > This document describes the directions for the Java _language_ charted by > Project Valhalla. (In this document, we use "currently" to describe the > language as it stands today, without value classes.) > Valhalla started with the goal of providing user-programmable classes which can > be flat and dense in memory. Numerics are one of the motivating use cases; > adding new primitive types directly to the language has a very high barrier. As > we learned from [Growing a Language][growing] there are infinitely many numeric > types we might want to add to Java, but the proper way to do that is via > libraries, not as a language feature. > ## Primitive and reference types in Java today > Java currently has eight built-in primitive types. Primitives represent pure > _values_; any `int` value of "3" is equivalent to, and indistinguishable from, > any other `int` value of "3". Primitives are monolithic (their bits cannot be > addressed individually) and have no canonical location, and so are _freely > copyable_. With the exception of the unusual treatment of exotic floating point > values such as `NaN`, the `==` operator performs a _substitutibility test_ -- it > asks "are these two values the same value". > Java also has _objects_, and each object has a unique _object identity_. Because > of identity, objects are not freely copyable; each object lives in exactly one > place at any given time, and to access its state we have to go to that place. > But we mostly don't notice this because objects are not manipulated or accessed > directly, but instead through _object references_. Object references are also a > kind of value -- they encode the identity of the object to which they refer, and > the `==` operator on object references asks "do these two references refer to > the same object." Accordingly, object _references_ (like other values) can be > freely copied, but the objects they refer to cannot. > Primitives and objects differ in almost every conceivable way: > | Primitives | Objects | >| ------------------------------------------ | ---------------------------------- > | | > | No identity (pure values) | Identity | > | `==` compares values | `==` compares object identity | > | Built-in | Declared in classes | >| No members (fields, methods, constructors) | Members (including mutable fields) > | | > | No supertypes or subtypes | Class and interface inheritance | > | Accessed directly | Accessed via object references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | Arrays are monomorphic | Arrays are covariant | > | May tear under race | Initialization safety guarantees | > | Have reference companions (boxes) | Don't need reference companions | > The design of primitives represents various tradeoffs aimed at maximizing > performance and usability of the primtive types. Reference types default to > `null`, meaning "referring to no object"; primitives default to a usable zero > value (which for most primitives is the additive identity). Reference types > provide initialization safety guarantees against a certain category of data > races; primitives allow tearing under race for larger-than-32-bit values. > We could characterize the design principles behind these tradeoffs are "make > objects safer, make primitives faster." > The following figure illustrates the current universe of Java's types. The > upper left quadrant is the built-in primitives; the rest of the space is > reference types. In the upper-right, we have the abstract reference types -- > abstract classes, interfaces, and `Object` (which, though concrete, acts more > like an interface than a concrete class). The built-in primitives have wrappers > or boxes, which are reference types. >
> > Current universe of Java field types > >
> Valhalla aims to unify primitives and objects in that they can both be > declared with classes, but maintains the special runtime characteristics > primitives have. But while everyone likes the flatness and density that > user-definable value types promise, in some cases we want them to be more like > classical objects (nullable, non-tearable), and in other cases we want them to > be more like classical primitives (trading some safety for performance). > ## Value classes: separating references from identity > Many of the impediments to optimization that Valhalla seeks to remove center > around _unwanted object identity_. The primitive wrapper classes have identity, > but it is a purely accidental one. Not only is it not directly useful, it can > be a source of bugs. For example, due to caching, `Integer` can be accidentally > compared correctly with `==` just often enough that people keep doing it. > Similarly, [value-based classes][valuebased] such as `Optional` have no need for > identity, but pay the costs of having identity anyway. > Our first step is allowing class declarations to explicitly disavow identity, by > declaring themselves as _value classes_. The instances of a value class are > called _value objects_. > ``` > value class ArrayCursor { > T[] array; > int offset; > public ArrayCursor(T[] array, int offset) { > this.array = array; > this.offset = offset; > } > public boolean hasNext() { > return offset < array.length; > } > public T next() { > return array[offset]; > } > public ArrayCursor advance() { > return new ArrayCursor(array, offset+1); > } > } > ``` > This says that an `ArrayCursor` is a class whose instances have no identity -- > that instead they have _value semantics_. As a consequence, it must give up the > things that depend on identity; the class and its fields are implicitly final. > But, value classes are still classes, and can have most of the things classes > can have -- fields, methods, constructors, type parameters, superclasses (with > some restrictions), nested classes, class literals, interfaces, etc. The > classes they can extend are restricted: `Object` or abstract classes with no > instance fields, empty no-arg constructor bodies, no other constructors, no > instance > initializers, no synchronized methods, and whose superclasses all meet this same > set of conditions. (`Number` meets these conditions.) > Classes in Java give rise to types; the class `ArrayCursor` gives rise to a type > `ArrayCursor` (actually a parametric family of instantiations `ArrayCursor`.) > `ArrayCursor` is still a reference type, just one whose references refer to > value objects rather than identity objects. For the types in the upper-right > quadrant of the diagram (interfaces, abstract classes, and `Object`), references > to these types might refer to either an identity object or a value object. > (Historically, JVMs were effectively forced to represent object references with > pointers; for references to value objects, JVMs now have more flexibility.) > Because `ArrayCursor` is a reference type, it is nullable (because references > are nullable), its default value is null, and loads and stores of references are > atomic with respect to each other even in the presence of data races, providing > the initialization safety we are used to with classical objects. > Because instances of `ArrayCursor` have value semantics, `==` compares by state > rather than identity. This means that value objects, like primitives, are > _freely copyable_; we can explode them into their fields and re-aggregate them > into another value object, and we cannot tell the difference. (Because they > have no identity, some identity-sensitive operations, such as synchronization, > are disallowed.) > So far we've addressed the first two lines of the table of differences above; > rather than identity being a property of all object instances, classes can > decide whether their instances have identity or not. By allowing classes that > don't need identity to exclude it, we free the runtime to make better layout and > compilation decisions -- and avoid a whole category of bugs. > In looking at the code for `ArrayCursor`, we might mistakenly assume it will be > inefficient, as each loop iteration appears to allocate a new cursor: > ``` > for (ArrayCursor c = Arrays.cursor(array); > c.hasNext(); > c = c.advance()) { > // use c.next(); > } > ``` > One should generally expect here that _no_ cursors are actually allocated. > Because an `ArrayCursor` is just its two fields, these fields will routinely get > scalarized and hoisted into registers, and the constructor call in `advance` > will typically compile down to incrementing one of these registers. > ### Migration > The JDK (as well as other libraries) has many [value-based classes][valuebased] > such as `Optional` and `LocalDateTime`. Value-based classes adhere to the > semantic restrictions of value classes, but are still identity classes -- even > though they don't want to be. Value-based classes can be migrated to true value > classes simply by redeclaring them as value classes, which is both source- and > binary-compatible. > We plan to migrate many value-based classes in the JDK to value classes. > Additionally, the primitive wrappers can be migrated to value classes as well, > making the conversion between `int` and `Integer` cheaper; see the section > "Legacy Primitives" below. (In some cases, this may be _behaviorally_ > incompatible for code that synchronizes on the primitive wrappers. [JEP > 390][jep390] has supported both compile-time and runtime warnings for > synchronizing on primitive wrappers since Java 16.) >
> > Java field types adding value classes > >
> ### Equality > Earlier we said that `==` compares value objects by state rather than by > identity. More precisely, two value objects are `==` if they are of the same > type, and each of their fields are pairwise equal, where equality is given by > `==` for primitives (except `float` and `double`, which are compared with > `Float::equals` and `Double::equals` to avoid anomalies), `==` for references to > identity objects, and recursively with `==` for references to value objects. In > no case is a value object ever `==` to a reference to an identity object. > ### Value records > While records have a lot in common with value classes -- they are final and > their fields are final -- they are still identity classes. Records embody a > tradeoff: give up on decoupling the API from the representation, and in return > get various syntactic and semantic benefits. Value classes embody another > tradeoff: give up identity, and get various semantic and performance benefits. > If we are willing to give up both, we can get both sets of benefits. > ``` > value record NameAndScore(String name, int score) { } > ``` > Value records combine the data-carrier idiom of records with the improved > scalarization and flattening benefits of value classes. > In theory, it would be possible to apply `value` to certain enums as well, but > this is not currently possible because the `java.lang.Enum` base class that > enums extend do not meet the requirements for superclasses of value classes (it > has fields and non-empty constructors). > ## Unboxing values for flatness and density > Value classes shed object identity, gaining a host of performance and > predictability benefits in the process. They are an ideal replacement for many > of today's value-based classes, fully preserving their semantics (except for the > accidental identity these classes never wanted). But identity-free reference > types are only one point a spectrum of tradeoffs between abstraction and > performance, and other desired use cases -- such as numerics -- may want a > different set of tradeoffs. > Reference types are nullable, and therefore must account for null somehow in > their representation, which may involve additional footprint. Similarly, they > offer the initialization safety guarantees for final fields that we come to > expect from identity objects, which may entail limits on flatness. For certain > use cases, it may be desire to additionally give up something else to make > further flatness and footprint gains -- and that something else is > reference-ness. > The built-in primitives are best understood as _pairs_ of types: a primitive > type (e.g., `int`) and its reference companion or box (`Integer`), with > conversions between the two (boxing and unboxing.) We have both types because > the two have different characteristics. Primitives are optimized for efficient > storage and access: they are not nullable, they tolerate uninitialized (zero) > values, and larger primitive types (`long`, `double`) may tear under racy > access. References err on the side of safety and flexibility; they support > nullity, polymorphism, and offer initialization safety (freedom from tearing), > but by comparison to primitives, they pay a footprint and indirection cost. > For these reasons, value classes give rise to pairs of types as well: a > reference type and a _value companion type_. We've seen the reference type so > far; for a value class `Point`, the reference type is called `Point`. (The full > name for the reference type is `Point.ref`; `Point` is an alias for that.) The > value companion type is called `Point.val`, and the two types have the same > conversions between them as primitives do today with their boxes. (If we are > talking explicitly about the value companion type of a value class, we may > sometimes describe the corresponding reference type as its _reference > companion_.) > ``` > value class Point implements Serializable { > int x; > int y; > Point(int x, int y) { > this.x = x; > this.y = y; > } > Point scale(int s) { > return new Point(s*x, s*y); > } > } > ``` > The default value of the value companion type is the one for which all fields > take on their default value; the default value of the reference type is, like > all reference types, null. > In our diagram, these new types show up as another entity that straddles the > line between primitives and identity-free references, alongside the legacy > primitives: > ** UPDATE DIAGRAM ** >
> > Java field types with extended
> primitives > >
> ### Member access > Both the reference and value companion types are seen to have the same instance > members. Unlike today's primitives, value companion types can be used as > receivers to access fields and invoke methods, subject to accessibility > constraints: > ``` > Point.val p = new Point(1, 2); > assert p.x == 1; > p = p.scale(2); > assert p.x == 2; > ``` > ### Polymorphism > When we declare a class today, we set up a subtyping (is-a) relationship between > the declared class and its supertypes. When we declare a value class, we set up > a subtyping relationship between the _reference type_ and the declared > supertypes. This means that if we declare: > ``` > value class UnsignedShort extends Number > implements Comparable { > ... > } > ``` > then `UnsignedShort` is a subtype of `Number` and `Comparable`, > and we can ask questions about subtyping using `instanceof` or pattern matching. > What happens if we ask such a question of the value companion type? > ``` > UnsignedShort.val us = ... > if (us instanceof Number) { ... } > ``` > Since subtyping is defined only on reference types, the `instanceof` operator > (and corresponding type patterns) will behave as if both sides were lifted to > the approrpriate reference type, and we can answer the question that way. (This > may trigger fears of expensive boxing conversions, but in reality no actual > allocation will happen.) > We introduce a new relationship based on `extends` / `implements` clauses, which > we'll call "extends"; we define `A extends B` as meaning `A <: B` when A is a > reference type, and `A.ref <: B` when A is a value companion type. The > `instanceof` relation, reflection, and pattern matching are updated to use > "extends". > ### Arrays > Arrays of reference types are _covariant_; this means that if `A <: B`, then > `A[] <: B[]`. This allows `Object[]` to be the "top array type", at least for > arrays of references. But arrays of primitives are currently left out of this > story. We can unify the treatment of arrays by defining array covariance over > the new "extends" relationship; if A extends B, then `A[] <: B[]`. For a value > class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the top > type for all arrays. > ### Equality > Just as with `instanceof`, we define `==` on values by appealing to the > reference companion (though no actual boxing need occur). Evaluating `a == b`, > where one or both operands are of a value companion type, can be defined as if > the operands are first converted to their corresponding reference type, and then > comparing the results. This means that the following will succeed: > ``` > Point.val p = new Point(3, 4); > Point pr = p; > assert p == pr; > ``` > The base implementation of `Object::equals` delegates to `==`, which is a > suitable default for both reference and value classes. > ### Serialization > If a value class implements `Serializable`, this is also really a statement > about the reference type. Just as with other aspects described here, > serialization of value companions can be defined by converting to the > corresponding reference type and serializing that, and reversing the process at > deserialization time. > Serialization currently uses object identity to preserve the topology of an > object graph. This generalizes cleanly to objects without identity, because > `==` on value objects treats two identical copies of a value object as equal. > So any observations we make about graph topology prior to serialization with > `==` are consistent with those after deserialization. > ### Identity-sensitive operations > Certain operations are currently defined in terms of object identity. As we've > already seen, some of these, like equality, can be sensibly extended to cover > all instances. Others, like synchronization, will become partial. > Identity-sensitive operations include: > - **Equality.** We extend `==` on references to include references to value > objects. Where it currently has a meaning, the new definition coincides > with that meaning. > - **System::identityHashCode.** The main use of `identityHashCode` is in the > implementation of data structures such as `IdentityHashMap`. We can extend > `identityHashCode` in the same way we extend equality -- deriving a hash on > primitive objects from the hash of all the fields. > - **Synchronization.** This becomes a partial operation. If we can > statically detect that a synchronization will fail at runtime (including > declaring a `synchronized` method in a value class), we can issue a > compilation error; if not, attempts to lock on a value object results in > `IllegalMonitorStateException`. This is justifiable because it is > intrinsically imprudent to lock on an object for which you do not have a > clear understanding of its locking protocol; locking on an arbitrary > `Object` or interface instance is doing exactly that. > - **Weak, soft, and phantom references.** Capturing an exotic reference to a > value object becomes a partial operation, as these are intrinsically tied to > reachability (and hence to identity). However, we will likely make > enhancements to `WeakHashMap` to support mixed identity and value keys. > ### What about Object? > The root class `Object` poses an unusual problem, in that every class must > extend it directly or indirectly, but it is also instantiable (non-abstract), > and its instances have identity -- it is common to use `new Object()` as a way > to obtain a new object identity for purposes of locking. > ## Why two types? > It is sensible to ask: why do we need companion types at all? This is analogous > to the need for boxes in 1995: we'd made one set of tradeoffs for primitives, > favoring performance (non-nullable, zero-default, tolerant of > non-initialization, tolerant of tearing under race, unrelated to `Object`), and > another for references, favoring flexibility and safety. Most of the time, we > ignored the primitive wrapper classes, but sometimes we needed to temporarily > suppress one of these properties, such as when interoperating with code that > expects an `Object` or the ability to express "no value". The reasons we needed > boxes in 1995 still apply today: sometimes we need the affordances of > references, and in those cases, we appeal to the reference companion. > Reasons we might want to use the reference companion include: > - **Interoperation with reference types.** Value classes can implement > interfaces and extend classes (including `Object` and some abstract classes), > which means some class and interface types are going to be polymorphic over > both identity and primitive objects. This polymorphism is achieved through > object references; a reference to `Object` may be a reference to an identity > object, or a reference to a value object. > - **Nullability.** Nullability is an affordance of object _references_, not > objects themselves. Most of the time, it makes sense that primitive types > are non-nullable (as the primitives are today), but there may be situations > where null is a semantically important value. Using the reference companion > when nullability is required is semantically clear, and avoids the need to > invent new sentinel values for "no value." > This need comes up when migrating existing classes; the method `Map::get` > uses `null` to signal that the requested key was not present in the map. But, > if the `V` parameter to `Map` is a primitive class, `null` is not a valid > value. We can capture the "`V` or null" requirement by changing the > descriptor of `Map::get` to: > ``` > public V.ref get(K key); > ``` > where, whatever type `V` is instantiated as, `Map::get` returns the reference > companion. (For a type `V` that already is a reference type, this is just `V` > itself.) This captures the notion that the return type of `Map::get` will > either be a reference to a `V`, or the `null` reference. (This is a > compatible change, since both erase to the same thing.) > - **Self-referential types.** Some types may want to directly or indirectly > refer to themselves, such as the "next" field in the node type of a linked > list: > ``` > class Node { > T theValue; > Node nextNode; > } > ``` > We might want to represent this as a value class, but if the type of > `nextNode` were `Node.val`, the layout of `Node` would be > self-referential, since we would be trying to flatten a `Node` into its own > layout. > - **Protection from tearing.** For a value class with a non-atomic value > companion type, we may want to use the reference companion in cases where we > are concerned about tearing; because loads and stores of references are > atomic, `P.ref` is immune to the tearing under race that `P.val` might be > subject to. > - **Compatibility with existing boxing.** Autoboxing is convenient, in that it > lets us pass a primitive where a reference is required. But boxing affects > far more than assignment conversion; it also affects method overload > selection. The rules are designed to prefer overloads that require no > conversions to those requiring boxing (or varargs) conversions. Having both > a value and reference type for every value class means that these rules can > be cleanly and intuitively extended to cover value classes. > ## Refining the value companion > Value classes have several options for refining the behavior of the value > companion type and how they are exposed to clients. > ### Classes with no good default value > For a value class `C`, the default value of `C.ref` is the same as any other > reference type: `null`. For the value companion type `C.val`, the default value > is the one where all of its fields are initialized to their default value. > The built-in primitives reflect the design assumption that zero is a reasonable > default. The choice to use a zero default for uninitialized variables was one > of the central tradeoffs in the design of the built-in primitives. It gives us > a usable initial value (most of the time), and requires less storage footprint > than a representation that supports null (`int` uses all 2^32 of its bit > patterns, so a nullable `int` would have to either make some 32 bit signed > integers unrepresentable, or use a 33rd bit). This was a reasonable tradeoff > for the built-in primitives, and is also a reasonable tradeoff for many (but not > all) other potential value classes (such as complex numbers, 2D points, > half-floats, etc). > But for others potential value classes, such as `LocalDate`, there _is_ no > reasonable default. If we choose to represent a date as the number of days > since some some epoch, there will invariably be bugs that stem from > uninitialized dates; we've all been mistakenly told by computers that something > will happen on or near 1 January 1970. Even if we could choose a default other > than the zero representation, an uninitialized date is still likely to be an > error -- there simply is no good default date value. > For this reason, value classes have the choice of encapsulating or exposing > their value companion type. If the class is willing to tolerate an > uninitialized (zero) value, it can freely share its `.val` companion with the > world; if uninitialized values are dangerous (such as for `LocalDate`), it can > be encapsulated to the class or package. > Encapsulation is accomplished using ordinary access control. By default, the > value companion is `private`, and need not be declared explicitly; a class that > wishes to share its value companion can make it public: > ``` > public value record Complex(double real, double imag) { > public value companion Complex.val; > } > ``` > ### Atomicity and tearing > For the primitive types longer than 32 bits (long and double), it is not > guaranteed that reads and writes from different threads (without suitable > coordination) are atomic with respect to each other. The result is that, if > accessed under data race, a long or double field or array element can be seen to > "tear", and a read might see the low 32 bits of one write and the high 32 bits > of another. (Declaring the containing field `volatile` is sufficient to restore > atomicity, as is properly coordinating with locks or other concurrency control, > or not sharing across threads in the first place.) > This was a pragmatic tradeoff given the hardware of the time; the cost of 64-bit > atomicity on 1995 hardware would have been prohibitive, and problems only arise > when the program already has data races -- and most numeric code deals with > thread-local data. Just like with the tradeoff of nulls vs zeros, the design of > the built-in primitives permits tearing as part of a tradeoff between > performance and correctness, where primitives chose "as fast as possible" and > reference types chose more safety. > Today, most JVMs give us atomic loads and stores of 64-bit primitives, because > the hardware makes them cheap enough. But value classes bring us back to > 1995; atomic loads and stores of larger-than-64-bit values are still expensive > on many CPUs, leaving us with a choice of "make operations on primitives slower" > or permitting tearing when accessed under race. > It would not be wise for the language to select a one-size-fits-all policy about > tearing; choosing "no tearing" means that types like `Complex` are slower than > they need to be, even in a single-threaded program; choosing "tearing" means > that classes like `Range` can be seen to not exhibit invariants asserted by > their constructor. Class authors have to choose, with full knowledge of their > domain, whether their types can tolerate tearing. The default is no tearing > (safe by default); a class can opt for greater flattening at the cost of > potential tearing by declaring the value companion as `non-atomic`: > ``` > public value record Complex(double real, double imag) { > public non-atomic value companion Complex.val; > } > ``` > For classes like `Complex`, all of whose bit patterns are valid, this is very > much like the choice around `long` in 1995. For other classes that might have > nontrivial representational invariants, they likely want to stick to the default > of atomicity. > ## Migrating legacy primitives > As part of generalizing primitives, we want to adjust the built-in primitives to > behave as consistently with value classes as possible. While we can't change > the fact that `int`'s reference companion is the oddly-named `Integer`, we can > give them > more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an alias for > `Integer.val`) -- so that we can use a consistent rule for naming companions. > Similarly, we can extend member access to the legacy primitives, and allow > `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) > We will redeclare `Integer` as a value class with a public value companion: > ``` > value class Integer { > public value companion Integer.val; > // existing methods > } > ``` > where the type name `int` is an alias for `Integer.val`. The primitive array > types will be retrofitted such that arrays of primitives are subtypes of arrays > of their boxes (`int[] <: Integer[]`). > ## Unifying primitives with classes > Earlier, we had a chart of the differences between primitive and reference > types: > | Primitives | Objects | >| ------------------------------------------ | ---------------------------------- > | | > | No identity (pure values) | Identity | > | `==` compares values | `==` compares object identity | > | Built-in | Declared in classes | >| No members (fields, methods, constructors) | Members (including mutable fields) > | | > | No supertypes or subtypes | Class and interface inheritance | > | Accessed directly | Accessed via object references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | Arrays are monomorphic | Arrays are covariant | > | May tear under race | Initialization safety guarantees | > | Have reference companions (boxes) | Don't need reference companions | > The addition of value classes addresses many of these directly. Rather than > saying "classes have identity, primitives do not", we make identity an optional > characteristic of classes (and derive equality semantics from that.) Rather > than primitives being built in, we derive all types, including primitives, from > classes, and endow value companion types with the members and supertypes > declared with the value class. Rather than having primitive arrays be > monomorphic, we make all arrays covariant under the `extends` relation. > The remaining differences now become differences between reference types and > value types: > | Value types | Reference types | >| --------------------------------------------- | -------------------------------- > | | > | Accessed directly | Accessed via object references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | >| May tear under race, if declared `non-atomic` | Initialization safety guarantees > | | > ### Choosing which to use > How would we choose between declaring an identity class or a value class, and > the various options on value companiones? Here are some quick rules of thumb: > - If you need mutability, subclassing, or aliasing, choose an identity class. > - If uninitialized (zero) values are unacceptable, choose a value class with > the value companion encapsulated. > - If you have no cross-field invariants and are willing to tolerate tearing to > enable more flattening, choose a value class with a non-atomic value > companion. > ## Summary > Valhalla unifies, to the extent possible, primitives and objects. The > following table summarizes the transition from the current world to Valhalla. > | Current World | Valhalla | >| ------------------------------------------- | > | --------------------------------------------------------- | > | All objects have identity | Some objects have identity | >| Fixed, built-in set of primitives | Open-ended set of primitives, declared via > | classes | >| Primitives don't have methods or supertypes | Primitives are classes, with > | methods and supertypes | >| Primitives have ad-hoc boxes | Primitives have regularized reference companions > | | > | Boxes have accidental identity | Reference companions have no identity | >| Boxing and unboxing conversions | Primitive reference and value conversions, but > | same rules | > | Primitive arrays are monomorphic | All arrays are covariant | > [valuebased]: [ > https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html | > https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html ] > [growing]: [ https://dl.acm.org/doi/abs/10.1145/1176617.1176621 | > https://dl.acm.org/doi/abs/10.1145/1176617.1176621 ] > [jep390]: [ https://openjdk.java.net/jeps/390 | > https://openjdk.java.net/jeps/390 ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jun 29 15:32:38 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 29 Jun 2022 11:32:38 -0400 Subject: User model stacking: current status In-Reply-To: <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> Message-ID: <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> > I think you have done a good job describing the pro of that model but > weirdly not list the cons of that model. I think we described the con pretty clearly: .val is ugly, and this puts it in people's face.? This point was mentioned multiple times during the discussions.? But the notable thing is: no one has raised other cons.? The con is syntax. All your points here are basically a dressed-up version of this same issue: at least in some cases, some users will be grumpy that the good name goes to the thing they don't want.? And this is a point we are painfully aware of, so none of this is particularly new. And we have explored all the positions on this (Point is ref, Point is val, let the user pick two names, let the declarer choose, etc), and they all have downsides.? Specifically, we explored having `ref-default` and `val-default` as declaration-site options; this "gives the user more control" (developers love knobs!)? But it also imposes a significant cognitive load on all developers: people no longer know what `Point` means.? Is it nullable?? Is it a reference?? You have to look it up, or "carry around a mental database."? If anyone has the choices, then everyone has more responsibility.? And given that the performance differences between Point.ref and Point.val accrue pretty much exclusively in the heap, which is to say, apply only to implementation code and not API, sticking the implementation with this burden seems resaonable. Honestly, I think this is entirely a syntax concern; .val is ugly.? Open to better ideas here, though many attempts have already been made.? (If we're at the "all we have left to complain about is syntax" point, then we're winning!) On 6/29/2022 10:38 AM, Remi Forax wrote: > > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"Kevin Bourrillion" > *Cc: *"daniel smith" , > "valhalla-spec-experts" > *Sent: *Thursday, June 23, 2022 9:01:24 PM > *Subject: *Re: User model stacking: current status > > > On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: > > All else being equal, the idea to use "inaccessible value > type" over "value type doesn't exist" feels very good and > simplifying, with the main problem that the syntax can't help > but be gross. > > > A few weeks in, and this latest stacking is still feeling pretty good: > > ?- There are no coarse buckets any more; there are just identity > classes and value classes. > ?- Value classes have ref and val companion types with the obvious > properties.? (Notably, refs are always atomic.) > ?- For `value class C`, C as a type is an alias for `C.ref`. > ?- The bucket formerly known as B2 becomes "value class, whose > .val type is private."? This is the default for a value class. > ?- The bucket formerly known as B3a is denoted by explicitly > making the val companion public, with a public modifier on a > "member" of the class. > ?- The bucket formerly known as B3n is denoted by explicitly > making the val companion public and non-atomic, again using > modifiers. > > I went and updated the State of the Values document to use the new > terminology, test-driving some new syntax. (Usual rules: syntax > comments are premature at this time.)? I was very pleased with the > result, because almost all the changes were small changes in > terminology (e.g., "value companion type"), and eliminating the > clumsy distinction between value classes and primitive classes.? > Overall the structure remains the same, but feels more compact and > clean.? MD source is below, for review. > > Kevin's two questions remain, but I don't think they get in the > way of refining the model in this way: > > ?- Have we made the right choices around == ? > ?- Are we missing a big opportunity by not spelling Complex.val > with a bang? > > > I think you have done a good job describing the pro of that model but > weirdly not list the cons of that model. > > I see three reasons your proposed model, let's call it the companion > class model, needs improvements. > It fails our moto, the companion class model and the VM models are not > aligned and the performance model is a "sigil for performance" model. > > > It fails our moto (code like a class, works like an int): > If i say that an Image is an array of pixels with each pixel have > three colors, > the obvious translation is not the right one: > > ?? class Image { > ???? Pixel[][] pixels; > ?? } > ?? value record Pixel(Color red, Color green, Color blue) {} > ?? value record Color(byte value) {} > > because a value class is nullable, only it's companion class is not > nullable, the correct code is > ? class Image { > ???? Pixel.val[][] pixels; > ?? } > ?? value record Pixel(Color.val red, Color.val green, Color.val blue) {} > ?? value record Color(byte value) {} > > Color and byte does not work the same way, it's not code like a class > works like an int but code like a class, works like an Integer. > > > The VM models and the Java model are not aligned: > For the VM model, L-type and Q-type on equal footing, not one is more > important than the other, but the companion class model you propose > makes the value class a first citizen and the companion class a second > citizen. > We know that when the Java model and the VM model are not aligned, > bugs will lie in between. Those can be mild bugs, by example you can > throw a checked exception from a method not declaring that exception > or painful bugs in the case of generics or serialization. > I think we should list all the cases where the Java Model and the VM > model disagree to see the kind of bugs we will ask the future > generation to solve. > By example, having a value class with a default constructor and public > companion class looks like a lot like a deserialization bug to me, in > both case you are able to produce an instance that bypass the constructor. > The other problem is for the other languages than Java. Do those > languages will have to define a companion class or a companion class > is purely a javac artifact the same way an attribute like InnerClass is. > > The proposed performance model is a "sigil for performance" model. > There is a tradeoff between the safety of the reference vs the > performance of flattened value type. In the proposed model, the choice > is not done by the maintainer of the class but by the user of the > class. This is not fully true, the maintainer of the class can make > the companion class private choosing safety but it can not choose > performance. The performance has to be chosen by the user of the class. > This is unlike everything we know in Java, this kind of model where > the user choose performance is usually called "sigil for performance", > the user has to add some magical keywords or sigil to get performance. > A good example of such performance model is the keyword "register" in > C. You have to opt-in at use site to get performance. > Moreover unlike in C, in Java we also have to take care of the fact > that adding .val is not a backward compatible change, if a value class > is used in a public method a user can not change it to its companion > class after the fact. > We know from the errors of past that a "sigil for performance" model > is a terrible model. > > Overall, i don't think it's the wrong model, but it over-rotates on > the notion of reference value class, it's refreshing because in the > past we had the tendency to over-rotate on the notion of flattened > value class. > I really think that this model can be improved by allowing top-level > value class to be declared either as reference or as value and the > companion class to be either a value class projection or a reference > class projection so the Java model and the VM model will be more in sync. > > R?mi > > > > > # State of Valhalla > ## Part 2: The Language Model {.subtitle} > > #### Brian Goetz {.author} > #### June 2022 {.date} > > > _This is the second of three documents describing the current > State of > ? Valhalla.? The first is [The Road to Valhalla](01-background); the > ? third is [The JVM Model](03-vm-model)._ > > This document describes the directions for the Java _language_ > charted by > Project Valhalla.? (In this document, we use "currently" to > describe the > language as it stands today, without value classes.) > > Valhalla started with the goal of providing user-programmable > classes which can > be flat and dense in memory.? Numerics are one of the motivating > use cases; > adding new primitive types directly to the language has a very > high barrier.? As > we learned from [Growing a Language][growing] there are infinitely > many numeric > types we might want to add to Java, but the proper way to do that > is via > libraries, not as a language feature. > > ## Primitive and reference types in Java today > > Java currently has eight built-in primitive types. Primitives > represent pure > _values_; any `int` value of "3" is equivalent to, and > indistinguishable from, > any other `int` value of "3".? Primitives are monolithic (their > bits cannot be > addressed individually) and have no canonical location, and so are > _freely > copyable_. With the exception of the unusual treatment of exotic > floating point > values such as `NaN`, the `==` operator performs a > _substitutibility test_ -- it > asks "are these two values the same value". > > Java also has _objects_, and each object has a unique _object > identity_. Because > of identity, objects are not freely copyable; each object lives in > exactly one > place at any given time, and to access its state we have to go to > that place. > But we mostly don't notice this because objects are not > manipulated or accessed > directly, but instead through _object references_. Object > references are also a > kind of value -- they encode the identity of the object to which > they refer, and > the `==` operator on object references asks "do these two > references refer to > the same object."? Accordingly, object _references_ (like other > values) can be > freely copied, but the objects they refer to cannot. > > Primitives and objects differ in almost every conceivable way: > > | Primitives???????????????????????????????? | > Objects??????????????????????????? | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values)????????????????? | > Identity?????????????????????????? | > | `==` compares values?????????????????????? | `==` compares > object identity????? | > | Built-in?????????????????????????????????? | Declared in > classes??????????????? | > | No members (fields, methods, constructors) | Members (including > mutable fields) | > | No supertypes or subtypes????????????????? | Class and interface > inheritance??? | > | Accessed directly????????????????????????? | Accessed via object > references???? | > | Not nullable?????????????????????????????? | > Nullable?????????????????????????? | > | Default value is zero????????????????????? | Default value is > null????????????? | > | Arrays are monomorphic???????????????????? | Arrays are > covariant?????????????? | > | May tear under race??????????????????????? | Initialization > safety guarantees?? | > | Have reference companions (boxes)????????? | Don't need > reference companions??? | > > The design of primitives represents various tradeoffs aimed at > maximizing > performance and usability of the primtive types. Reference types > default to > `null`, meaning "referring to no object"; primitives default to a > usable zero > value (which for most primitives is the additive identity).? > Reference types > provide initialization safety guarantees against a certain > category of data > races; primitives allow tearing under race for larger-than-32-bit > values. > We could characterize the design principles behind these tradeoffs > are "make > objects safer, make primitives faster." > > The following figure illustrates the current universe of Java's > types.? The > upper left quadrant is the built-in primitives; the rest of the > space is > reference types.? In the upper-right, we have the abstract > reference types -- > abstract classes, interfaces, and `Object` (which, though > concrete, acts more > like an interface than a concrete class).? The built-in primitives > have wrappers > or boxes, which are reference types. > >
> ? > ??? Current universe of
>     Java field types > ? >
> > Valhalla aims to unify primitives and objects in that they can both be > declared with classes, but maintains the special runtime > characteristics > primitives have.? But while everyone likes the flatness and > density that > user-definable value types promise, in some cases we want them to > be more like > classical objects (nullable, non-tearable), and in other cases we > want them to > be more like classical primitives (trading some safety for > performance). > > ## Value classes: separating references from identity > > Many of the impediments to optimization that Valhalla seeks to > remove center > around _unwanted object identity_.? The primitive wrapper classes > have identity, > but it is a purely accidental one.? Not only is it not directly > useful, it can > be a source of bugs.? For example, due to caching, `Integer` can > be accidentally > compared correctly with `==` just often enough that people keep > doing it. > Similarly, [value-based classes][valuebased] such as `Optional` > have no need for > identity, but pay the costs of having identity anyway. > > Our first step is allowing class declarations to explicitly > disavow identity, by > declaring themselves as _value classes_.? The instances of a value > class are > called _value objects_. > > ``` > value class ArrayCursor { > ??? T[] array; > ??? int offset; > > ??? public ArrayCursor(T[] array, int offset) { > ??????? this.array = array; > ??????? this.offset = offset; > ??? } > > ??? public boolean hasNext() { > ??????? return offset < array.length; > ??? } > > ??? public T next() { > ??????? return array[offset]; > ??? } > > ??? public ArrayCursor advance() { > ??????? return new ArrayCursor(array, offset+1); > ??? } > } > ``` > > This says that an `ArrayCursor` is a class whose instances have no > identity -- > that instead they have _value semantics_.? As a consequence, it > must give up the > things that depend on identity; the class and its fields are > implicitly final. > > But, value classes are still classes, and can have most of the > things classes > can have -- fields, methods, constructors, type parameters, > superclasses (with > some restrictions), nested classes, class literals, interfaces, > etc.? The > classes they can extend are restricted: `Object` or abstract > classes with no > instance fields, empty no-arg constructor bodies, no other > constructors, no instance > initializers, no synchronized methods, and whose superclasses all > meet this same > set of conditions.? (`Number` meets these conditions.) > > Classes in Java give rise to types; the class `ArrayCursor` gives > rise to a type > `ArrayCursor` (actually a parametric family of instantiations > `ArrayCursor`.) > `ArrayCursor` is still a reference type, just one whose references > refer to > value objects rather than identity objects. For the types in the > upper-right > quadrant of the diagram (interfaces, abstract classes, and > `Object`), references > to these types might refer to either an identity object or a value > object. > (Historically, JVMs were effectively forced to represent object > references with > pointers; for references to value objects, JVMs now have more > flexibility.) > > Because `ArrayCursor` is a reference type, it is nullable (because > references > are nullable), its default value is null, and loads and stores of > references are > atomic with respect to each other even in the presence of data > races, providing > the initialization safety we are used to with classical objects. > > Because instances of `ArrayCursor` have value semantics, `==` > compares by state > rather than identity.? This means that value objects, like > primitives, are > _freely copyable_; we can explode them into their fields and > re-aggregate them > into another value object, and we cannot tell the difference.? > (Because they > have no identity, some identity-sensitive operations, such as > synchronization, > are disallowed.) > > So far we've addressed the first two lines of the table of > differences above; > rather than identity being a property of all object instances, > classes can > decide whether their instances have identity or not.? By allowing > classes that > don't need identity to exclude it, we free the runtime to make > better layout and > compilation decisions -- and avoid a whole category of bugs. > > In looking at the code for `ArrayCursor`, we might mistakenly > assume it will be > inefficient, as each loop iteration appears to allocate a new cursor: > > ``` > for (ArrayCursor c = Arrays.cursor(array); > ???? c.hasNext(); > ???? c = c.advance()) { > ??? // use c.next(); > } > ``` > > One should generally expect here that _no_ cursors are actually > allocated. > Because an `ArrayCursor` is just its two fields, these fields will > routinely get > scalarized and hoisted into registers, and the constructor call in > `advance` > will typically compile down to incrementing one of these registers. > > ### Migration > > The JDK (as well as other libraries) has many [value-based > classes][valuebased] > such as `Optional` and `LocalDateTime`.? Value-based classes > adhere to the > semantic restrictions of value classes, but are still identity > classes -- even > though they don't want to be.? Value-based classes can be migrated > to true value > classes simply by redeclaring them as value classes, which is both > source- and > binary-compatible. > > We plan to migrate many value-based classes in the JDK to value > classes. > Additionally, the primitive wrappers can be migrated to value > classes as well, > making the conversion between `int` and `Integer` cheaper; see the > section > "Legacy Primitives" below.? (In some cases, this may be _behaviorally_ > incompatible for code that synchronizes on the primitive > wrappers.? [JEP > 390][jep390] has supported both compile-time and runtime warnings for > synchronizing on primitive wrappers since Java 16.) > >
> ? > ??? Java field types adding
>     value classes > ? >
> > ### Equality > > Earlier we said that `==` compares value objects by state rather > than by > identity.? More precisely, two value objects are `==` if they are > of the same > type, and each of their fields are pairwise equal, where equality > is given by > `==` for primitives (except `float` and `double`, which are > compared with > `Float::equals` and `Double::equals` to avoid anomalies), `==` for > references to > identity objects, and recursively with `==` for references to > value objects.? In > no case is a value object ever `==` to a reference to an identity > object. > > ### Value records > > While records have a lot in common with value classes -- they are > final and > their fields are final -- they are still identity classes.? > Records embody a > tradeoff: give up on decoupling the API from the representation, > and in return > get various syntactic and semantic benefits.? Value classes embody > another > tradeoff: give up identity, and get various semantic and > performance benefits. > If we are willing to give up both, we can get both sets of benefits. > > ``` > value record NameAndScore(String name, int score) { } > ``` > > Value records combine the data-carrier idiom of records with the > improved > scalarization and flattening benefits of value classes. > > In theory, it would be possible to apply `value` to certain enums > as well, but > this is not currently possible because the `java.lang.Enum` base > class that > enums extend do not meet the requirements for superclasses of > value classes (it > has fields and non-empty constructors). > > ## Unboxing values for flatness and density > > Value classes shed object identity, gaining a host of performance and > predictability benefits in the process.? They are an ideal > replacement for many > of today's value-based classes, fully preserving their semantics > (except for the > accidental identity these classes never wanted).? But > identity-free reference > types are only one point a spectrum of tradeoffs between > abstraction and > performance, and other desired use cases -- such as numerics -- > may want a > different set of tradeoffs. > > Reference types are nullable, and therefore must account for null > somehow in > their representation, which may involve additional footprint.? > Similarly, they > offer the initialization safety guarantees for final fields that > we come to > expect from identity objects, which may entail limits on > flatness.? For certain > use cases, it may be desire to additionally give up something else > to make > further flatness and footprint gains -- and that something else is > reference-ness. > > The built-in primitives are best understood as _pairs_ of types: a > primitive > type (e.g., `int`) and its reference companion or box (`Integer`), > with > conversions between the two (boxing and unboxing.)? We have both > types because > the two have different characteristics.? Primitives are optimized > for efficient > storage and access: they are not nullable, they tolerate > uninitialized (zero) > values, and larger primitive types (`long`, `double`) may tear > under racy > access.? References err on the side of safety and flexibility; > they support > nullity, polymorphism, and offer initialization safety (freedom > from tearing), > but by comparison to primitives, they pay a footprint and > indirection cost. > > For these reasons, value classes give rise to pairs of types as > well: a > reference type and a _value companion type_.? We've seen the > reference type so > far; for a value class `Point`, the reference type is called > `Point`.? (The full > name for the reference type is `Point.ref`; `Point` is an alias > for that.)? The > value companion type is called `Point.val`, and the two types have > the same > conversions between them as primitives do today with their boxes.? > (If we are > talking explicitly about the value companion type of a value > class, we may > sometimes describe the corresponding reference type as its _reference > companion_.) > > ``` > value class Point implements Serializable { > ??? int x; > ??? int y; > > ??? Point(int x, int y) { > ??????? this.x = x; > ??????? this.y = y; > ??? } > > ??? Point scale(int s) { > ??????? return new Point(s*x, s*y); > ??? } > } > ``` > > The default value of the value companion type is the one for which > all fields > take on their default value; the default value of the reference > type is, like > all reference types, null. > > In our diagram, these new types show up as another entity that > straddles the > line between primitives and identity-free references, alongside > the legacy > primitives: > > ** UPDATE DIAGRAM ** > >
> ? > ??? Java field types with
>     extended primitives > ? >
> > ### Member access > > Both the reference and value companion types are seen to have the > same instance > members.? Unlike today's primitives, value companion types can be > used as > receivers to access fields and invoke methods, subject to > accessibility > constraints: > > ``` > Point.val p = new Point(1, 2); > assert p.x == 1; > > p = p.scale(2); > assert p.x == 2; > ``` > > ### Polymorphism > > When we declare a class today, we set up a subtyping (is-a) > relationship between > the declared class and its supertypes.? When we declare a value > class, we set up > a subtyping relationship between the _reference type_ and the declared > supertypes. This means that if we declare: > > ``` > value class UnsignedShort extends Number > ????????????????????????? implements Comparable { > ?? ... > } > ``` > > then `UnsignedShort` is a subtype of `Number` and > `Comparable`, > and we can ask questions about subtyping using `instanceof` or > pattern matching. > What happens if we ask such a question of the value companion type? > > ``` > UnsignedShort.val us = ... > if (us instanceof Number) { ... } > ``` > > Since subtyping is defined only on reference types, the > `instanceof` operator > (and corresponding type patterns) will behave as if both sides > were lifted to > the approrpriate reference type, and we can answer the question > that way.? (This > may trigger fears of expensive boxing conversions, but in reality > no actual > allocation will happen.) > > We introduce a new relationship based on `extends` / `implements` > clauses, which > we'll call "extends"; we define `A extends B` as meaning `A <: B` > when A is a > reference type, and `A.ref <: B` when A is a value companion > type.? The > `instanceof` relation, reflection, and pattern matching are > updated to use > "extends". > > ### Arrays > > Arrays of reference types are _covariant_; this means that if `A > <: B`, then > `A[] <: B[]`.? This allows `Object[]` to be the "top array type", > at least for > arrays of references.? But arrays of primitives are currently left > out of this > story.?? We can unify the treatment of arrays by defining array > covariance over > the new "extends" relationship; if A extends B, then `A[] <: > B[]`.? For a value > class P, `P.val[] <: P.ref[] <: Object[]`, finally making > `Object[]` the top > type for all arrays. > > ### Equality > > Just as with `instanceof`, we define `==` on values by appealing > to the > reference companion (though no actual boxing need occur).? > Evaluating `a == b`, > where one or both operands are of a value companion type, can be > defined as if > the operands are first converted to their corresponding reference > type, and then > comparing the results.? This means that the following will succeed: > > ``` > Point.val p = new Point(3, 4); > Point pr = p; > assert p == pr; > ``` > > The base implementation of `Object::equals` delegates to `==`, > which is a > suitable default for both reference and value classes. > > ### Serialization > > If a value class implements `Serializable`, this is also really a > statement > about the reference type.? Just as with other aspects described here, > serialization of value companions can be defined by converting to the > corresponding reference type and serializing that, and reversing > the process at > deserialization time. > > Serialization currently uses object identity to preserve the > topology of an > object graph.? This generalizes cleanly to objects without > identity, because > `==` on value objects treats two identical copies of a value > object as equal. > So any observations we make about graph topology prior to > serialization with > `==` are consistent with those after deserialization. > > ### Identity-sensitive operations > > Certain operations are currently defined in terms of object > identity.? As we've > already seen, some of these, like equality, can be sensibly > extended to cover > all instances.? Others, like synchronization, will become partial. > Identity-sensitive operations include: > > ? - **Equality.**? We extend `==` on references to include > references to value > ??? objects.? Where it currently has a meaning, the new definition > coincides > ??? with that meaning. > > ? - **System::identityHashCode.**? The main use of > `identityHashCode` is in the > ??? implementation of data structures such as `IdentityHashMap`.? > We can extend > ??? `identityHashCode` in the same way we extend equality -- > deriving a hash on > ??? primitive objects from the hash of all the fields. > > ? - **Synchronization.**? This becomes a partial operation.? If we can > ??? statically detect that a synchronization will fail at runtime > (including > ??? declaring a `synchronized` method in a value class), we can > issue a > ??? compilation error; if not, attempts to lock on a value object > results in > ??? `IllegalMonitorStateException`.? This is justifiable because it is > ??? intrinsically imprudent to lock on an object for which you do > not have a > ??? clear understanding of its locking protocol; locking on an > arbitrary > ??? `Object` or interface instance is doing exactly that. > > ? - **Weak, soft, and phantom references.**? Capturing an exotic > reference to a > ??? value object becomes a partial operation, as these are > intrinsically tied to > ??? reachability (and hence to identity).? However, we will likely > make > ??? enhancements to `WeakHashMap` to support mixed identity and > value keys. > > ### What about Object? > > The root class `Object` poses an unusual problem, in that every > class must > extend it directly or indirectly, but it is also instantiable > (non-abstract), > and its instances have identity -- it is common to use `new > Object()` as a way > to obtain a new object identity for purposes of locking. > > ## Why two types? > > It is sensible to ask: why do we need companion types at all?? > This is analogous > to the need for boxes in 1995: we'd made one set of tradeoffs for > primitives, > favoring performance (non-nullable, zero-default, tolerant of > non-initialization, tolerant of tearing under race, unrelated to > `Object`), and > another for references, favoring flexibility and safety.? Most of > the time, we > ignored the primitive wrapper classes, but sometimes we needed to > temporarily > suppress one of these properties, such as when interoperating with > code that > expects an `Object` or the ability to express "no value".? The > reasons we needed > boxes in 1995 still apply today: sometimes we need the affordances of > references, and in those cases, we appeal to the reference companion. > > Reasons we might want to use the reference companion include: > > ?- **Interoperation with reference types.**? Value classes can > implement > ?? interfaces and extend classes (including `Object` and some > abstract classes), > ?? which means some class and interface types are going to be > polymorphic over > ?? both identity and primitive objects.? This polymorphism is > achieved through > ?? object references; a reference to `Object` may be a reference > to an identity > ?? object, or a reference to a value object. > > ?- **Nullability.**? Nullability is an affordance of object > _references_, not > ?? objects themselves.? Most of the time, it makes sense that > primitive types > ?? are non-nullable (as the primitives are today), but there may > be situations > ?? where null is a semantically important value.? Using the > reference companion > ?? when nullability is required is semantically clear, and avoids > the need to > ?? invent new sentinel values for "no value." > > ?? This need comes up when migrating existing classes; the method > `Map::get` > ?? uses `null` to signal that the requested key was not present in > the map. But, > ?? if the `V` parameter to `Map` is a primitive class, `null` is > not a valid > ?? value.? We can capture the "`V` or null" requirement by > changing the > ?? descriptor of `Map::get` to: > > ?? ``` > ?? public V.ref get(K key); > ?? ``` > > ?? where, whatever type `V` is instantiated as, `Map::get` returns > the reference > ?? companion. (For a type `V` that already is a reference type, > this is just `V` > ?? itself.) This captures the notion that the return type of > `Map::get` will > ?? either be a reference to a `V`, or the `null` reference. (This is a > ?? compatible change, since both erase to the same thing.) > > > ?- **Self-referential types.**? Some types may want to directly or > indirectly > ?? refer to themselves, such as the "next" field in the node type > of a linked > ?? list: > > ?? ``` > ?? class Node { > ?????? T theValue; > ?????? Node nextNode; > ?? } > ?? ``` > > ?? We might want to represent this as a value class, but if the > type of > ?? `nextNode` were `Node.val`, the layout of `Node` would be > ?? self-referential, since we would be trying to flatten a `Node` > into its own > ?? layout. > > ?- **Protection from tearing.**? For a value class with a > non-atomic value > ?? companion type, we may want to use the reference companion in > cases where we > ?? are concerned about tearing; because loads and stores of > references are > ?? atomic, `P.ref` is immune to the tearing under race that > `P.val` might be > ?? subject to. > > ?- **Compatibility with existing boxing.**? Autoboxing is > convenient, in that it > ?? lets us pass a primitive where a reference is required.? But > boxing affects > ?? far more than assignment conversion; it also affects method > overload > ?? selection.? The rules are designed to prefer overloads that > require no > ?? conversions to those requiring boxing (or varargs) > conversions.? Having both > ?? a value and reference type for every value class means that > these rules can > ?? be cleanly and intuitively extended to cover value classes. > > ## Refining the value companion > > Value classes have several options for refining the behavior of > the value > companion type and how they are exposed to clients. > > ### Classes with no good default value > > For a value class `C`, the default value of `C.ref` is the same as > any other > reference type: `null`.? For the value companion type `C.val`, the > default value > is the one where all of its fields are initialized to their > default value. > > The built-in primitives reflect the design assumption that zero is > a reasonable > default.? The choice to use a zero default for uninitialized > variables was one > of the central tradeoffs in the design of the built-in > primitives.? It gives us > a usable initial value (most of the time), and requires less > storage footprint > than a representation that supports null (`int` uses all 2^32 of > its bit > patterns, so a nullable `int` would have to either make some 32 > bit signed > integers unrepresentable, or use a 33rd bit).? This was a > reasonable tradeoff > for the built-in primitives, and is also a reasonable tradeoff for > many (but not > all) other potential value classes (such as complex numbers, 2D > points, > half-floats, etc). > > But for others potential value classes, such as `LocalDate`, there > _is_ no > reasonable default.? If we choose to represent a date as the > number of days > since some some epoch, there will invariably be bugs that stem from > uninitialized dates; we've all been mistakenly told by computers > that something > will happen on or near 1 January 1970.? Even if we could choose a > default other > than the zero representation, an uninitialized date is still > likely to be an > error -- there simply is no good default date value. > > For this reason, value classes have the choice of encapsulating or > exposing > their value companion type.? If the class is willing to tolerate an > uninitialized (zero) value, it can freely share its `.val` > companion with the > world; if uninitialized values are dangerous (such as for > `LocalDate`), it can > be encapsulated to the class or package. > > Encapsulation is accomplished using ordinary access control.? By > default, the > value companion is `private`, and need not be declared explicitly; > a class that > wishes to share its value companion can make it public: > > ``` > public value record Complex(double real, double imag) { > ??? public value companion Complex.val; > } > ``` > > ### Atomicity and tearing > > For the primitive types longer than 32 bits (long and double), it > is not > guaranteed that reads and writes from different threads (without > suitable > coordination) are atomic with respect to each other. The result is > that, if > accessed under data race, a long or double field or array element > can be seen to > "tear", and a read might see the low 32 bits of one write and the > high 32 bits > of another.? (Declaring the containing field `volatile` is > sufficient to restore > atomicity, as is properly coordinating with locks or other > concurrency control, > or not sharing across threads in the first place.) > > This was a pragmatic tradeoff given the hardware of the time; the > cost of 64-bit > atomicity on 1995 hardware would have been prohibitive, and > problems only arise > when the program already has data races -- and most numeric code > deals with > thread-local data.? Just like with the tradeoff of nulls vs zeros, > the design of > the built-in primitives permits tearing as part of a tradeoff between > performance and correctness, where primitives chose "as fast as > possible" and > reference types chose more safety. > > Today, most JVMs give us atomic loads and stores of 64-bit > primitives, because > the hardware makes them cheap enough.? But value classes bring us > back to > 1995; atomic loads and stores of larger-than-64-bit values are > still expensive > on many CPUs, leaving us with a choice of "make operations on > primitives slower" > or permitting tearing when accessed under race. > > It would not be wise for the language to select a > one-size-fits-all policy about > tearing; choosing "no tearing" means that types like `Complex` are > slower than > they need to be, even in a single-threaded program; choosing > "tearing" means > that classes like `Range` can be seen to not exhibit invariants > asserted by > their constructor.? Class authors have to choose, with full > knowledge of their > domain, whether their types can tolerate tearing.? The default is > no tearing > (safe by default); a class can opt for greater flattening at the > cost of > potential tearing by declaring the value companion as `non-atomic`: > > ``` > public value record Complex(double real, double imag) { > ??? public non-atomic value companion Complex.val; > } > ``` > > For classes like `Complex`, all of whose bit patterns are valid, > this is very > much like the choice around `long` in 1995.? For other classes > that might have > nontrivial representational invariants, they likely want to stick > to the default > of atomicity. > > ## Migrating legacy primitives > > As part of generalizing primitives, we want to adjust the built-in > primitives to > behave as consistently with value classes as possible. While we > can't change > the fact that `int`'s reference companion is the oddly-named > `Integer`, we can give them > more uniform aliases (`int.ref` is an alias for `Integer`; `int` > is an alias for > `Integer.val`) -- so that we can use a consistent rule for naming > companions. > Similarly, we can extend member access to the legacy primitives, > and allow > `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) > > We will redeclare `Integer` as a value class with a public value > companion: > > ``` > value class Integer { > ??? public value companion Integer.val; > > ??? // existing methods > } > ``` > > where the type name `int` is an alias for `Integer.val`.? The > primitive array > types will be retrofitted such that arrays of primitives are > subtypes of arrays > of their boxes (`int[] <: Integer[]`). > > ## Unifying primitives with classes > > Earlier, we had a chart of the differences between primitive and > reference > types: > > | Primitives???????????????????????????????? | > Objects??????????????????????????? | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values)????????????????? | > Identity?????????????????????????? | > | `==` compares values?????????????????????? | `==` compares > object identity????? | > | Built-in?????????????????????????????????? | Declared in > classes??????????????? | > | No members (fields, methods, constructors) | Members (including > mutable fields) | > | No supertypes or subtypes????????????????? | Class and interface > inheritance??? | > | Accessed directly????????????????????????? | Accessed via object > references???? | > | Not nullable?????????????????????????????? | > Nullable?????????????????????????? | > | Default value is zero????????????????????? | Default value is > null????????????? | > | Arrays are monomorphic???????????????????? | Arrays are > covariant?????????????? | > | May tear under race??????????????????????? | Initialization > safety guarantees?? | > | Have reference companions (boxes)????????? | Don't need > reference companions??? | > > The addition of value classes addresses many of these directly.? > Rather than > saying "classes have identity, primitives do not", we make > identity an optional > characteristic of classes (and derive equality semantics from > that.)? Rather > than primitives being built in, we derive all types, including > primitives, from > classes, and endow value companion types with the members and > supertypes > declared with the value class.? Rather than having primitive arrays be > monomorphic, we make all arrays covariant under the `extends` > relation. > > The remaining differences now become differences between reference > types and > value types: > > | Value types?????????????????????????????????? | Reference > types????????????????? | > | --------------------------------------------- | > -------------------------------- | > | Accessed directly???????????????????????????? | Accessed via > object references?? | > | Not nullable????????????????????????????????? | > Nullable???????????????????????? | > | Default value is zero???????????????????????? | Default value is > null??????????? | > | May tear under race, if declared `non-atomic` | Initialization > safety guarantees | > > > ### Choosing which to use > > How would we choose between declaring an identity class or a value > class, and > the various options on value companiones?? Here are some quick > rules of thumb: > > ?- If you need mutability, subclassing, or aliasing, choose an > identity class. > ?- If uninitialized (zero) values are unacceptable, choose a value > class with > ?? the value companion encapsulated. > ?- If you have no cross-field invariants and are willing to > tolerate tearing to > ?? enable more flattening, choose a value class with a non-atomic > value > ?? companion. > > ## Summary > > Valhalla unifies, to the extent possible, primitives and > objects.?? The > following table summarizes the transition from the current world > to Valhalla. > > | Current World?????????????????????????????? | Valhalla | > | ------------------------------------------- | > --------------------------------------------------------- | > | All objects have identity?????????????????? | Some objects have > identity??????????????????????????????? | > | Fixed, built-in set of primitives?????????? | Open-ended set of > primitives, declared via classes??????? | > | Primitives don't have methods or supertypes | Primitives are > classes, with methods and supertypes?????? | > | Primitives have ad-hoc boxes??????????????? | Primitives have > regularized reference companions????????? | > | Boxes have accidental identity????????????? | Reference > companions have no identity???????????????????? | > | Boxing and unboxing conversions???????????? | Primitive > reference and value conversions, but same rules | > | Primitive arrays are monomorphic??????????? | All arrays are > covariant????????????????????????????????? | > > > [valuebased]: > https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html > [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 > [jep390]: https://openjdk.java.net/jeps/390 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jun 29 15:57:30 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 29 Jun 2022 17:57:30 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> Message-ID: <2116995606.733122.1656518250393.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Kevin Bourrillion" , "daniel smith" > , "valhalla-spec-experts" > > Sent: Wednesday, June 29, 2022 5:32:38 PM > Subject: Re: User model stacking: current status >> I think you have done a good job describing the pro of that model but weirdly >> not list the cons of that model. > I think we described the con pretty clearly: .val is ugly, and this puts it in > people's face. This point was mentioned multiple times during the discussions. > But the notable thing is: no one has raised other cons. The con is syntax. no, the major con is the fact that the model you propose and the VM model are not aligned. > All your points here are basically a dressed-up version of this same issue: at > least in some cases, some users will be grumpy that the good name goes to the > thing they don't want. And this is a point we are painfully aware of, so none > of this is particularly new. > And we have explored all the positions on this (Point is ref, Point is val, let > the user pick two names, let the declarer choose, etc), and they all have > downsides. Specifically, we explored having `ref-default` and `val-default` as > declaration-site options; this "gives the user more control" (developers love > knobs!) But it also imposes a significant cognitive load on all developers: > people no longer know what `Point` means. Is it nullable? Is it a reference? > You have to look it up, or "carry around a mental database." Let suppose we offer a model with with ref-default and val-default at declaration site. In that case, is it a nullable or is it reference are questions from the past, nullable becomes less important because there is a notion of default value. And knowing if something is a reference or not is not something people really care. In Python, everything is reference, even integers, but nobody cares. Does VMs do escape analysis or not, noone care. What is important is if there is a difference in behavior between being a reference or not. Those questions that you have to carry around are only important if we make them important. You are judging your model with the questions of the past, not the questions we will have 10 years after the new model is introduced. > If anyone has the choices, then everyone has more responsibility. And given that > the performance differences between Point.ref and Point.val accrue pretty much > exclusively in the heap, which is to say, apply only to implementation code and > not API, sticking the implementation with this burden seems reasonable. no, you can not change a Point.ref to a Point.val without breaking the backward compatibility, so it's an issue for APIs. If your description of the world was true, then we do not need Q-type, the attribute Preload which say that a L-type is a value type is enough. In that case, then the VM model and the language model you propose are more in sync. R?mi > On 6/29/2022 10:38 AM, Remi Forax wrote: >>> From: "Brian Goetz" [ mailto:brian.goetz at oracle.com | ] >>> To: "Kevin Bourrillion" [ mailto:kevinb at google.com | ] >>> Cc: "daniel smith" [ mailto:daniel.smith at oracle.com | >>> ] , "valhalla-spec-experts" [ mailto:valhalla-spec-experts at openjdk.java.net | >>> ] >>> Sent: Thursday, June 23, 2022 9:01:24 PM >>> Subject: Re: User model stacking: current status >>> On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: >>>> All else being equal, the idea to use "inaccessible value type" over "value type >>>> doesn't exist" feels very good and simplifying, with the main problem that the >>>> syntax can't help but be gross. >>> A few weeks in, and this latest stacking is still feeling pretty good: >>> - There are no coarse buckets any more; there are just identity classes and >>> value classes. >>> - Value classes have ref and val companion types with the obvious properties. >>> (Notably, refs are always atomic.) >>> - For `value class C`, C as a type is an alias for `C.ref`. >>> - The bucket formerly known as B2 becomes "value class, whose .val type is >>> private." This is the default for a value class. >>> - The bucket formerly known as B3a is denoted by explicitly making the val >>> companion public, with a public modifier on a "member" of the class. >>> - The bucket formerly known as B3n is denoted by explicitly making the val >>> companion public and non-atomic, again using modifiers. >>> I went and updated the State of the Values document to use the new terminology, >>> test-driving some new syntax. (Usual rules: syntax comments are premature at >>> this time.) I was very pleased with the result, because almost all the changes >>> were small changes in terminology (e.g., "value companion type"), and >>> eliminating the clumsy distinction between value classes and primitive classes. >>> Overall the structure remains the same, but feels more compact and clean. MD >>> source is below, for review. >>> Kevin's two questions remain, but I don't think they get in the way of refining >>> the model in this way: >>> - Have we made the right choices around == ? >>> - Are we missing a big opportunity by not spelling Complex.val with a bang? >> I think you have done a good job describing the pro of that model but weirdly >> not list the cons of that model. >> I see three reasons your proposed model, let's call it the companion class >> model, needs improvements. >> It fails our moto, the companion class model and the VM models are not aligned >> and the performance model is a "sigil for performance" model. >> It fails our moto (code like a class, works like an int): >> If i say that an Image is an array of pixels with each pixel have three colors, >> the obvious translation is not the right one: >> class Image { >> Pixel[][] pixels; >> } >> value record Pixel(Color red, Color green, Color blue) {} >> value record Color(byte value) {} >> because a value class is nullable, only it's companion class is not nullable, >> the correct code is >> class Image { >> Pixel.val[][] pixels; >> } >> value record Pixel(Color.val red, Color.val green, Color.val blue) {} >> value record Color(byte value) {} >> Color and byte does not work the same way, it's not code like a class works like >> an int but code like a class, works like an Integer. >> The VM models and the Java model are not aligned: >> For the VM model, L-type and Q-type on equal footing, not one is more important >> than the other, but the companion class model you propose makes the value class >> a first citizen and the companion class a second citizen. >> We know that when the Java model and the VM model are not aligned, bugs will lie >> in between. Those can be mild bugs, by example you can throw a checked >> exception from a method not declaring that exception or painful bugs in the >> case of generics or serialization. >> I think we should list all the cases where the Java Model and the VM model >> disagree to see the kind of bugs we will ask the future generation to solve. >> By example, having a value class with a default constructor and public companion >> class looks like a lot like a deserialization bug to me, in both case you are >> able to produce an instance that bypass the constructor. >> The other problem is for the other languages than Java. Do those languages will >> have to define a companion class or a companion class is purely a javac >> artifact the same way an attribute like InnerClass is. >> The proposed performance model is a "sigil for performance" model. >> There is a tradeoff between the safety of the reference vs the performance of >> flattened value type. In the proposed model, the choice is not done by the >> maintainer of the class but by the user of the class. This is not fully true, >> the maintainer of the class can make the companion class private choosing >> safety but it can not choose performance. The performance has to be chosen by >> the user of the class. >> This is unlike everything we know in Java, this kind of model where the user >> choose performance is usually called "sigil for performance", the user has to >> add some magical keywords or sigil to get performance. >> A good example of such performance model is the keyword "register" in C. You >> have to opt-in at use site to get performance. >> Moreover unlike in C, in Java we also have to take care of the fact that adding >> .val is not a backward compatible change, if a value class is used in a public >> method a user can not change it to its companion class after the fact. >> We know from the errors of past that a "sigil for performance" model is a >> terrible model. >> Overall, i don't think it's the wrong model, but it over-rotates on the notion >> of reference value class, it's refreshing because in the past we had the >> tendency to over-rotate on the notion of flattened value class. >> I really think that this model can be improved by allowing top-level value class >> to be declared either as reference or as value and the companion class to be >> either a value class projection or a reference class projection so the Java >> model and the VM model will be more in sync. >> R?mi >>> # State of Valhalla >>> ## Part 2: The Language Model {.subtitle} >>> #### Brian Goetz {.author} >>> #### June 2022 {.date} >>> > _This is the second of three documents describing the current State of >>> Valhalla. The first is [The Road to Valhalla](01-background); the >>> third is [The JVM Model](03-vm-model)._ >>> This document describes the directions for the Java _language_ charted by >>> Project Valhalla. (In this document, we use "currently" to describe the >>> language as it stands today, without value classes.) >>> Valhalla started with the goal of providing user-programmable classes which can >>> be flat and dense in memory. Numerics are one of the motivating use cases; >>> adding new primitive types directly to the language has a very high barrier. As >>> we learned from [Growing a Language][growing] there are infinitely many numeric >>> types we might want to add to Java, but the proper way to do that is via >>> libraries, not as a language feature. >>> ## Primitive and reference types in Java today >>> Java currently has eight built-in primitive types. Primitives represent pure >>> _values_; any `int` value of "3" is equivalent to, and indistinguishable from, >>> any other `int` value of "3". Primitives are monolithic (their bits cannot be >>> addressed individually) and have no canonical location, and so are _freely >>> copyable_. With the exception of the unusual treatment of exotic floating point >>> values such as `NaN`, the `==` operator performs a _substitutibility test_ -- it >>> asks "are these two values the same value". >>> Java also has _objects_, and each object has a unique _object identity_. Because >>> of identity, objects are not freely copyable; each object lives in exactly one >>> place at any given time, and to access its state we have to go to that place. >>> But we mostly don't notice this because objects are not manipulated or accessed >>> directly, but instead through _object references_. Object references are also a >>> kind of value -- they encode the identity of the object to which they refer, and >>> the `==` operator on object references asks "do these two references refer to >>> the same object." Accordingly, object _references_ (like other values) can be >>> freely copied, but the objects they refer to cannot. >>> Primitives and objects differ in almost every conceivable way: >>> | Primitives | Objects | >>>| ------------------------------------------ | ---------------------------------- >>> | | >>> | No identity (pure values) | Identity | >>> | `==` compares values | `==` compares object identity | >>> | Built-in | Declared in classes | >>>| No members (fields, methods, constructors) | Members (including mutable fields) >>> | | >>> | No supertypes or subtypes | Class and interface inheritance | >>> | Accessed directly | Accessed via object references | >>> | Not nullable | Nullable | >>> | Default value is zero | Default value is null | >>> | Arrays are monomorphic | Arrays are covariant | >>> | May tear under race | Initialization safety guarantees | >>> | Have reference companions (boxes) | Don't need reference companions | >>> The design of primitives represents various tradeoffs aimed at maximizing >>> performance and usability of the primtive types. Reference types default to >>> `null`, meaning "referring to no object"; primitives default to a usable zero >>> value (which for most primitives is the additive identity). Reference types >>> provide initialization safety guarantees against a certain category of data >>> races; primitives allow tearing under race for larger-than-32-bit values. >>> We could characterize the design principles behind these tradeoffs are "make >>> objects safer, make primitives faster." >>> The following figure illustrates the current universe of Java's types. The >>> upper left quadrant is the built-in primitives; the rest of the space is >>> reference types. In the upper-right, we have the abstract reference types -- >>> abstract classes, interfaces, and `Object` (which, though concrete, acts more >>> like an interface than a concrete class). The built-in primitives have wrappers >>> or boxes, which are reference types. >>>
>>> >>> Current universe of Java field types >>> >>>
>>> Valhalla aims to unify primitives and objects in that they can both be >>> declared with classes, but maintains the special runtime characteristics >>> primitives have. But while everyone likes the flatness and density that >>> user-definable value types promise, in some cases we want them to be more like >>> classical objects (nullable, non-tearable), and in other cases we want them to >>> be more like classical primitives (trading some safety for performance). >>> ## Value classes: separating references from identity >>> Many of the impediments to optimization that Valhalla seeks to remove center >>> around _unwanted object identity_. The primitive wrapper classes have identity, >>> but it is a purely accidental one. Not only is it not directly useful, it can >>> be a source of bugs. For example, due to caching, `Integer` can be accidentally >>> compared correctly with `==` just often enough that people keep doing it. >>> Similarly, [value-based classes][valuebased] such as `Optional` have no need for >>> identity, but pay the costs of having identity anyway. >>> Our first step is allowing class declarations to explicitly disavow identity, by >>> declaring themselves as _value classes_. The instances of a value class are >>> called _value objects_. >>> ``` >>> value class ArrayCursor { >>> T[] array; >>> int offset; >>> public ArrayCursor(T[] array, int offset) { >>> this.array = array; >>> this.offset = offset; >>> } >>> public boolean hasNext() { >>> return offset < array.length; >>> } >>> public T next() { >>> return array[offset]; >>> } >>> public ArrayCursor advance() { >>> return new ArrayCursor(array, offset+1); >>> } >>> } >>> ``` >>> This says that an `ArrayCursor` is a class whose instances have no identity -- >>> that instead they have _value semantics_. As a consequence, it must give up the >>> things that depend on identity; the class and its fields are implicitly final. >>> But, value classes are still classes, and can have most of the things classes >>> can have -- fields, methods, constructors, type parameters, superclasses (with >>> some restrictions), nested classes, class literals, interfaces, etc. The >>> classes they can extend are restricted: `Object` or abstract classes with no >>> instance fields, empty no-arg constructor bodies, no other constructors, no >>> instance >>> initializers, no synchronized methods, and whose superclasses all meet this same >>> set of conditions. (`Number` meets these conditions.) >>> Classes in Java give rise to types; the class `ArrayCursor` gives rise to a type >>> `ArrayCursor` (actually a parametric family of instantiations `ArrayCursor`.) >>> `ArrayCursor` is still a reference type, just one whose references refer to >>> value objects rather than identity objects. For the types in the upper-right >>> quadrant of the diagram (interfaces, abstract classes, and `Object`), references >>> to these types might refer to either an identity object or a value object. >>> (Historically, JVMs were effectively forced to represent object references with >>> pointers; for references to value objects, JVMs now have more flexibility.) >>> Because `ArrayCursor` is a reference type, it is nullable (because references >>> are nullable), its default value is null, and loads and stores of references are >>> atomic with respect to each other even in the presence of data races, providing >>> the initialization safety we are used to with classical objects. >>> Because instances of `ArrayCursor` have value semantics, `==` compares by state >>> rather than identity. This means that value objects, like primitives, are >>> _freely copyable_; we can explode them into their fields and re-aggregate them >>> into another value object, and we cannot tell the difference. (Because they >>> have no identity, some identity-sensitive operations, such as synchronization, >>> are disallowed.) >>> So far we've addressed the first two lines of the table of differences above; >>> rather than identity being a property of all object instances, classes can >>> decide whether their instances have identity or not. By allowing classes that >>> don't need identity to exclude it, we free the runtime to make better layout and >>> compilation decisions -- and avoid a whole category of bugs. >>> In looking at the code for `ArrayCursor`, we might mistakenly assume it will be >>> inefficient, as each loop iteration appears to allocate a new cursor: >>> ``` >>> for (ArrayCursor c = Arrays.cursor(array); >>> c.hasNext(); >>> c = c.advance()) { >>> // use c.next(); >>> } >>> ``` >>> One should generally expect here that _no_ cursors are actually allocated. >>> Because an `ArrayCursor` is just its two fields, these fields will routinely get >>> scalarized and hoisted into registers, and the constructor call in `advance` >>> will typically compile down to incrementing one of these registers. >>> ### Migration >>> The JDK (as well as other libraries) has many [value-based classes][valuebased] >>> such as `Optional` and `LocalDateTime`. Value-based classes adhere to the >>> semantic restrictions of value classes, but are still identity classes -- even >>> though they don't want to be. Value-based classes can be migrated to true value >>> classes simply by redeclaring them as value classes, which is both source- and >>> binary-compatible. >>> We plan to migrate many value-based classes in the JDK to value classes. >>> Additionally, the primitive wrappers can be migrated to value classes as well, >>> making the conversion between `int` and `Integer` cheaper; see the section >>> "Legacy Primitives" below. (In some cases, this may be _behaviorally_ >>> incompatible for code that synchronizes on the primitive wrappers. [JEP >>> 390][jep390] has supported both compile-time and runtime warnings for >>> synchronizing on primitive wrappers since Java 16.) >>>
>>> >>> Java field types adding value classes >>> >>>
>>> ### Equality >>> Earlier we said that `==` compares value objects by state rather than by >>> identity. More precisely, two value objects are `==` if they are of the same >>> type, and each of their fields are pairwise equal, where equality is given by >>> `==` for primitives (except `float` and `double`, which are compared with >>> `Float::equals` and `Double::equals` to avoid anomalies), `==` for references to >>> identity objects, and recursively with `==` for references to value objects. In >>> no case is a value object ever `==` to a reference to an identity object. >>> ### Value records >>> While records have a lot in common with value classes -- they are final and >>> their fields are final -- they are still identity classes. Records embody a >>> tradeoff: give up on decoupling the API from the representation, and in return >>> get various syntactic and semantic benefits. Value classes embody another >>> tradeoff: give up identity, and get various semantic and performance benefits. >>> If we are willing to give up both, we can get both sets of benefits. >>> ``` >>> value record NameAndScore(String name, int score) { } >>> ``` >>> Value records combine the data-carrier idiom of records with the improved >>> scalarization and flattening benefits of value classes. >>> In theory, it would be possible to apply `value` to certain enums as well, but >>> this is not currently possible because the `java.lang.Enum` base class that >>> enums extend do not meet the requirements for superclasses of value classes (it >>> has fields and non-empty constructors). >>> ## Unboxing values for flatness and density >>> Value classes shed object identity, gaining a host of performance and >>> predictability benefits in the process. They are an ideal replacement for many >>> of today's value-based classes, fully preserving their semantics (except for the >>> accidental identity these classes never wanted). But identity-free reference >>> types are only one point a spectrum of tradeoffs between abstraction and >>> performance, and other desired use cases -- such as numerics -- may want a >>> different set of tradeoffs. >>> Reference types are nullable, and therefore must account for null somehow in >>> their representation, which may involve additional footprint. Similarly, they >>> offer the initialization safety guarantees for final fields that we come to >>> expect from identity objects, which may entail limits on flatness. For certain >>> use cases, it may be desire to additionally give up something else to make >>> further flatness and footprint gains -- and that something else is >>> reference-ness. >>> The built-in primitives are best understood as _pairs_ of types: a primitive >>> type (e.g., `int`) and its reference companion or box (`Integer`), with >>> conversions between the two (boxing and unboxing.) We have both types because >>> the two have different characteristics. Primitives are optimized for efficient >>> storage and access: they are not nullable, they tolerate uninitialized (zero) >>> values, and larger primitive types (`long`, `double`) may tear under racy >>> access. References err on the side of safety and flexibility; they support >>> nullity, polymorphism, and offer initialization safety (freedom from tearing), >>> but by comparison to primitives, they pay a footprint and indirection cost. >>> For these reasons, value classes give rise to pairs of types as well: a >>> reference type and a _value companion type_. We've seen the reference type so >>> far; for a value class `Point`, the reference type is called `Point`. (The full >>> name for the reference type is `Point.ref`; `Point` is an alias for that.) The >>> value companion type is called `Point.val`, and the two types have the same >>> conversions between them as primitives do today with their boxes. (If we are >>> talking explicitly about the value companion type of a value class, we may >>> sometimes describe the corresponding reference type as its _reference >>> companion_.) >>> ``` >>> value class Point implements Serializable { >>> int x; >>> int y; >>> Point(int x, int y) { >>> this.x = x; >>> this.y = y; >>> } >>> Point scale(int s) { >>> return new Point(s*x, s*y); >>> } >>> } >>> ``` >>> The default value of the value companion type is the one for which all fields >>> take on their default value; the default value of the reference type is, like >>> all reference types, null. >>> In our diagram, these new types show up as another entity that straddles the >>> line between primitives and identity-free references, alongside the legacy >>> primitives: >>> ** UPDATE DIAGRAM ** >>>
>>> >>> Java field types with extended
>>> primitives >>> >>>
>>> ### Member access >>> Both the reference and value companion types are seen to have the same instance >>> members. Unlike today's primitives, value companion types can be used as >>> receivers to access fields and invoke methods, subject to accessibility >>> constraints: >>> ``` >>> Point.val p = new Point(1, 2); >>> assert p.x == 1; >>> p = p.scale(2); >>> assert p.x == 2; >>> ``` >>> ### Polymorphism >>> When we declare a class today, we set up a subtyping (is-a) relationship between >>> the declared class and its supertypes. When we declare a value class, we set up >>> a subtyping relationship between the _reference type_ and the declared >>> supertypes. This means that if we declare: >>> ``` >>> value class UnsignedShort extends Number >>> implements Comparable { >>> ... >>> } >>> ``` >>> then `UnsignedShort` is a subtype of `Number` and `Comparable`, >>> and we can ask questions about subtyping using `instanceof` or pattern matching. >>> What happens if we ask such a question of the value companion type? >>> ``` >>> UnsignedShort.val us = ... >>> if (us instanceof Number) { ... } >>> ``` >>> Since subtyping is defined only on reference types, the `instanceof` operator >>> (and corresponding type patterns) will behave as if both sides were lifted to >>> the approrpriate reference type, and we can answer the question that way. (This >>> may trigger fears of expensive boxing conversions, but in reality no actual >>> allocation will happen.) >>> We introduce a new relationship based on `extends` / `implements` clauses, which >>> we'll call "extends"; we define `A extends B` as meaning `A <: B` when A is a >>> reference type, and `A.ref <: B` when A is a value companion type. The >>> `instanceof` relation, reflection, and pattern matching are updated to use >>> "extends". >>> ### Arrays >>> Arrays of reference types are _covariant_; this means that if `A <: B`, then >>> `A[] <: B[]`. This allows `Object[]` to be the "top array type", at least for >>> arrays of references. But arrays of primitives are currently left out of this >>> story. We can unify the treatment of arrays by defining array covariance over >>> the new "extends" relationship; if A extends B, then `A[] <: B[]`. For a value >>> class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the top >>> type for all arrays. >>> ### Equality >>> Just as with `instanceof`, we define `==` on values by appealing to the >>> reference companion (though no actual boxing need occur). Evaluating `a == b`, >>> where one or both operands are of a value companion type, can be defined as if >>> the operands are first converted to their corresponding reference type, and then >>> comparing the results. This means that the following will succeed: >>> ``` >>> Point.val p = new Point(3, 4); >>> Point pr = p; >>> assert p == pr; >>> ``` >>> The base implementation of `Object::equals` delegates to `==`, which is a >>> suitable default for both reference and value classes. >>> ### Serialization >>> If a value class implements `Serializable`, this is also really a statement >>> about the reference type. Just as with other aspects described here, >>> serialization of value companions can be defined by converting to the >>> corresponding reference type and serializing that, and reversing the process at >>> deserialization time. >>> Serialization currently uses object identity to preserve the topology of an >>> object graph. This generalizes cleanly to objects without identity, because >>> `==` on value objects treats two identical copies of a value object as equal. >>> So any observations we make about graph topology prior to serialization with >>> `==` are consistent with those after deserialization. >>> ### Identity-sensitive operations >>> Certain operations are currently defined in terms of object identity. As we've >>> already seen, some of these, like equality, can be sensibly extended to cover >>> all instances. Others, like synchronization, will become partial. >>> Identity-sensitive operations include: >>> - **Equality.** We extend `==` on references to include references to value >>> objects. Where it currently has a meaning, the new definition coincides >>> with that meaning. >>> - **System::identityHashCode.** The main use of `identityHashCode` is in the >>> implementation of data structures such as `IdentityHashMap`. We can extend >>> `identityHashCode` in the same way we extend equality -- deriving a hash on >>> primitive objects from the hash of all the fields. >>> - **Synchronization.** This becomes a partial operation. If we can >>> statically detect that a synchronization will fail at runtime (including >>> declaring a `synchronized` method in a value class), we can issue a >>> compilation error; if not, attempts to lock on a value object results in >>> `IllegalMonitorStateException`. This is justifiable because it is >>> intrinsically imprudent to lock on an object for which you do not have a >>> clear understanding of its locking protocol; locking on an arbitrary >>> `Object` or interface instance is doing exactly that. >>> - **Weak, soft, and phantom references.** Capturing an exotic reference to a >>> value object becomes a partial operation, as these are intrinsically tied to >>> reachability (and hence to identity). However, we will likely make >>> enhancements to `WeakHashMap` to support mixed identity and value keys. >>> ### What about Object? >>> The root class `Object` poses an unusual problem, in that every class must >>> extend it directly or indirectly, but it is also instantiable (non-abstract), >>> and its instances have identity -- it is common to use `new Object()` as a way >>> to obtain a new object identity for purposes of locking. >>> ## Why two types? >>> It is sensible to ask: why do we need companion types at all? This is analogous >>> to the need for boxes in 1995: we'd made one set of tradeoffs for primitives, >>> favoring performance (non-nullable, zero-default, tolerant of >>> non-initialization, tolerant of tearing under race, unrelated to `Object`), and >>> another for references, favoring flexibility and safety. Most of the time, we >>> ignored the primitive wrapper classes, but sometimes we needed to temporarily >>> suppress one of these properties, such as when interoperating with code that >>> expects an `Object` or the ability to express "no value". The reasons we needed >>> boxes in 1995 still apply today: sometimes we need the affordances of >>> references, and in those cases, we appeal to the reference companion. >>> Reasons we might want to use the reference companion include: >>> - **Interoperation with reference types.** Value classes can implement >>> interfaces and extend classes (including `Object` and some abstract classes), >>> which means some class and interface types are going to be polymorphic over >>> both identity and primitive objects. This polymorphism is achieved through >>> object references; a reference to `Object` may be a reference to an identity >>> object, or a reference to a value object. >>> - **Nullability.** Nullability is an affordance of object _references_, not >>> objects themselves. Most of the time, it makes sense that primitive types >>> are non-nullable (as the primitives are today), but there may be situations >>> where null is a semantically important value. Using the reference companion >>> when nullability is required is semantically clear, and avoids the need to >>> invent new sentinel values for "no value." >>> This need comes up when migrating existing classes; the method `Map::get` >>> uses `null` to signal that the requested key was not present in the map. But, >>> if the `V` parameter to `Map` is a primitive class, `null` is not a valid >>> value. We can capture the "`V` or null" requirement by changing the >>> descriptor of `Map::get` to: >>> ``` >>> public V.ref get(K key); >>> ``` >>> where, whatever type `V` is instantiated as, `Map::get` returns the reference >>> companion. (For a type `V` that already is a reference type, this is just `V` >>> itself.) This captures the notion that the return type of `Map::get` will >>> either be a reference to a `V`, or the `null` reference. (This is a >>> compatible change, since both erase to the same thing.) >>> - **Self-referential types.** Some types may want to directly or indirectly >>> refer to themselves, such as the "next" field in the node type of a linked >>> list: >>> ``` >>> class Node { >>> T theValue; >>> Node nextNode; >>> } >>> ``` >>> We might want to represent this as a value class, but if the type of >>> `nextNode` were `Node.val`, the layout of `Node` would be >>> self-referential, since we would be trying to flatten a `Node` into its own >>> layout. >>> - **Protection from tearing.** For a value class with a non-atomic value >>> companion type, we may want to use the reference companion in cases where we >>> are concerned about tearing; because loads and stores of references are >>> atomic, `P.ref` is immune to the tearing under race that `P.val` might be >>> subject to. >>> - **Compatibility with existing boxing.** Autoboxing is convenient, in that it >>> lets us pass a primitive where a reference is required. But boxing affects >>> far more than assignment conversion; it also affects method overload >>> selection. The rules are designed to prefer overloads that require no >>> conversions to those requiring boxing (or varargs) conversions. Having both >>> a value and reference type for every value class means that these rules can >>> be cleanly and intuitively extended to cover value classes. >>> ## Refining the value companion >>> Value classes have several options for refining the behavior of the value >>> companion type and how they are exposed to clients. >>> ### Classes with no good default value >>> For a value class `C`, the default value of `C.ref` is the same as any other >>> reference type: `null`. For the value companion type `C.val`, the default value >>> is the one where all of its fields are initialized to their default value. >>> The built-in primitives reflect the design assumption that zero is a reasonable >>> default. The choice to use a zero default for uninitialized variables was one >>> of the central tradeoffs in the design of the built-in primitives. It gives us >>> a usable initial value (most of the time), and requires less storage footprint >>> than a representation that supports null (`int` uses all 2^32 of its bit >>> patterns, so a nullable `int` would have to either make some 32 bit signed >>> integers unrepresentable, or use a 33rd bit). This was a reasonable tradeoff >>> for the built-in primitives, and is also a reasonable tradeoff for many (but not >>> all) other potential value classes (such as complex numbers, 2D points, >>> half-floats, etc). >>> But for others potential value classes, such as `LocalDate`, there _is_ no >>> reasonable default. If we choose to represent a date as the number of days >>> since some some epoch, there will invariably be bugs that stem from >>> uninitialized dates; we've all been mistakenly told by computers that something >>> will happen on or near 1 January 1970. Even if we could choose a default other >>> than the zero representation, an uninitialized date is still likely to be an >>> error -- there simply is no good default date value. >>> For this reason, value classes have the choice of encapsulating or exposing >>> their value companion type. If the class is willing to tolerate an >>> uninitialized (zero) value, it can freely share its `.val` companion with the >>> world; if uninitialized values are dangerous (such as for `LocalDate`), it can >>> be encapsulated to the class or package. >>> Encapsulation is accomplished using ordinary access control. By default, the >>> value companion is `private`, and need not be declared explicitly; a class that >>> wishes to share its value companion can make it public: >>> ``` >>> public value record Complex(double real, double imag) { >>> public value companion Complex.val; >>> } >>> ``` >>> ### Atomicity and tearing >>> For the primitive types longer than 32 bits (long and double), it is not >>> guaranteed that reads and writes from different threads (without suitable >>> coordination) are atomic with respect to each other. The result is that, if >>> accessed under data race, a long or double field or array element can be seen to >>> "tear", and a read might see the low 32 bits of one write and the high 32 bits >>> of another. (Declaring the containing field `volatile` is sufficient to restore >>> atomicity, as is properly coordinating with locks or other concurrency control, >>> or not sharing across threads in the first place.) >>> This was a pragmatic tradeoff given the hardware of the time; the cost of 64-bit >>> atomicity on 1995 hardware would have been prohibitive, and problems only arise >>> when the program already has data races -- and most numeric code deals with >>> thread-local data. Just like with the tradeoff of nulls vs zeros, the design of >>> the built-in primitives permits tearing as part of a tradeoff between >>> performance and correctness, where primitives chose "as fast as possible" and >>> reference types chose more safety. >>> Today, most JVMs give us atomic loads and stores of 64-bit primitives, because >>> the hardware makes them cheap enough. But value classes bring us back to >>> 1995; atomic loads and stores of larger-than-64-bit values are still expensive >>> on many CPUs, leaving us with a choice of "make operations on primitives slower" >>> or permitting tearing when accessed under race. >>> It would not be wise for the language to select a one-size-fits-all policy about >>> tearing; choosing "no tearing" means that types like `Complex` are slower than >>> they need to be, even in a single-threaded program; choosing "tearing" means >>> that classes like `Range` can be seen to not exhibit invariants asserted by >>> their constructor. Class authors have to choose, with full knowledge of their >>> domain, whether their types can tolerate tearing. The default is no tearing >>> (safe by default); a class can opt for greater flattening at the cost of >>> potential tearing by declaring the value companion as `non-atomic`: >>> ``` >>> public value record Complex(double real, double imag) { >>> public non-atomic value companion Complex.val; >>> } >>> ``` >>> For classes like `Complex`, all of whose bit patterns are valid, this is very >>> much like the choice around `long` in 1995. For other classes that might have >>> nontrivial representational invariants, they likely want to stick to the default >>> of atomicity. >>> ## Migrating legacy primitives >>> As part of generalizing primitives, we want to adjust the built-in primitives to >>> behave as consistently with value classes as possible. While we can't change >>> the fact that `int`'s reference companion is the oddly-named `Integer`, we can >>> give them >>> more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an alias for >>> `Integer.val`) -- so that we can use a consistent rule for naming companions. >>> Similarly, we can extend member access to the legacy primitives, and allow >>> `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) >>> We will redeclare `Integer` as a value class with a public value companion: >>> ``` >>> value class Integer { >>> public value companion Integer.val; >>> // existing methods >>> } >>> ``` >>> where the type name `int` is an alias for `Integer.val`. The primitive array >>> types will be retrofitted such that arrays of primitives are subtypes of arrays >>> of their boxes (`int[] <: Integer[]`). >>> ## Unifying primitives with classes >>> Earlier, we had a chart of the differences between primitive and reference >>> types: >>> | Primitives | Objects | >>>| ------------------------------------------ | ---------------------------------- >>> | | >>> | No identity (pure values) | Identity | >>> | `==` compares values | `==` compares object identity | >>> | Built-in | Declared in classes | >>>| No members (fields, methods, constructors) | Members (including mutable fields) >>> | | >>> | No supertypes or subtypes | Class and interface inheritance | >>> | Accessed directly | Accessed via object references | >>> | Not nullable | Nullable | >>> | Default value is zero | Default value is null | >>> | Arrays are monomorphic | Arrays are covariant | >>> | May tear under race | Initialization safety guarantees | >>> | Have reference companions (boxes) | Don't need reference companions | >>> The addition of value classes addresses many of these directly. Rather than >>> saying "classes have identity, primitives do not", we make identity an optional >>> characteristic of classes (and derive equality semantics from that.) Rather >>> than primitives being built in, we derive all types, including primitives, from >>> classes, and endow value companion types with the members and supertypes >>> declared with the value class. Rather than having primitive arrays be >>> monomorphic, we make all arrays covariant under the `extends` relation. >>> The remaining differences now become differences between reference types and >>> value types: >>> | Value types | Reference types | >>>| --------------------------------------------- | -------------------------------- >>> | | >>> | Accessed directly | Accessed via object references | >>> | Not nullable | Nullable | >>> | Default value is zero | Default value is null | >>>| May tear under race, if declared `non-atomic` | Initialization safety guarantees >>> | | >>> ### Choosing which to use >>> How would we choose between declaring an identity class or a value class, and >>> the various options on value companiones? Here are some quick rules of thumb: >>> - If you need mutability, subclassing, or aliasing, choose an identity class. >>> - If uninitialized (zero) values are unacceptable, choose a value class with >>> the value companion encapsulated. >>> - If you have no cross-field invariants and are willing to tolerate tearing to >>> enable more flattening, choose a value class with a non-atomic value >>> companion. >>> ## Summary >>> Valhalla unifies, to the extent possible, primitives and objects. The >>> following table summarizes the transition from the current world to Valhalla. >>> | Current World | Valhalla | >>>| ------------------------------------------- | >>> | --------------------------------------------------------- | >>> | All objects have identity | Some objects have identity | >>>| Fixed, built-in set of primitives | Open-ended set of primitives, declared via >>> | classes | >>>| Primitives don't have methods or supertypes | Primitives are classes, with >>> | methods and supertypes | >>>| Primitives have ad-hoc boxes | Primitives have regularized reference companions >>> | | >>> | Boxes have accidental identity | Reference companions have no identity | >>>| Boxing and unboxing conversions | Primitive reference and value conversions, but >>> | same rules | >>> | Primitive arrays are monomorphic | All arrays are covariant | >>> [valuebased]: [ >>> https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html | >>> https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html ] >>> [growing]: [ >>> https://urldefense.com/v3/__https://dl.acm.org/doi/abs/10.1145/1176617.1176621__;!!ACWV5N9M2RV99hQ!KmnfHlmDn7dP_Nxq9984m6AFRE5xDRQRRWsQk8BZgl_3-gKr0GCJd1nKDgbp4h_dnMfQHgY3Mr50rRd8HocG$ >>> | https://dl.acm.org/doi/abs/10.1145/1176617.1176621 ] >>> [jep390]: [ https://openjdk.java.net/jeps/390 | >>> https://openjdk.java.net/jeps/390 ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Wed Jun 29 19:22:04 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 29 Jun 2022 15:22:04 -0400 Subject: User model stacking: current status In-Reply-To: <2116995606.733122.1656518250393.JavaMail.zimbra@u-pem.fr> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> <2116995606.733122.1656518250393.JavaMail.zimbra@u-pem.fr> Message-ID: On Wed, Jun 29, 2022 at 2:11 PM wrote: > > > > ________________________________ > > From: "Brian Goetz" > To: "Remi Forax" > Cc: "Kevin Bourrillion" , "daniel smith" , "valhalla-spec-experts" > Sent: Wednesday, June 29, 2022 5:32:38 PM > Subject: Re: User model stacking: current status > > I think you have done a good job describing the pro of that model but weirdly not list the cons of that model. > > > I think we described the con pretty clearly: .val is ugly, and this puts it in people's face. This point was mentioned multiple times during the discussions. But the notable thing is: no one has raised other cons. The con is syntax. > > > no, the major con is the fact that the model you propose and the VM model are not aligned. Didn't we cover this at the EG meeting today? The consensus was that they *are* aligned. Both the VM and language default to the ref type (ie: CONSTANT_Class "Foo" vs CONSTANT_Class "QFoo;") and other examples discussed. Where is the misalignment? > > All your points here are basically a dressed-up version of this same issue: at least in some cases, some users will be grumpy that the good name goes to the thing they don't want. And this is a point we are painfully aware of, so none of this is particularly new. > > And we have explored all the positions on this (Point is ref, Point is val, let the user pick two names, let the declarer choose, etc), and they all have downsides. Specifically, we explored having `ref-default` and `val-default` as declaration-site options; this "gives the user more control" (developers love knobs!) But it also imposes a significant cognitive load on all developers: people no longer know what `Point` means. Is it nullable? Is it a reference? You have to look it up, or "carry around a mental database." > > > Let suppose we offer a model with with ref-default and val-default at declaration site. > In that case, is it a nullable or is it reference are questions from the past, nullable becomes less important because there is a notion of default value. And knowing if something is a reference or not is not something people really care. In Python, everything is reference, even integers, but nobody cares. Does VMs do escape analysis or not, noone care. What is important is if there is a difference in behavior between being a reference or not. > Those questions that you have to carry around are only important if we make them important. The model we've been working towards is (roughly) expressed as "codes like a class; works like an int" based on both user requirements and the underlying vm physics. There is a difference between being a reference and being a value - though we've done an incredible job of bringing the benefits of Valhalla to non-identity reference types (a bigger win than we expected when we started!). I'm confused by your assertion that "nullable becomes less important because there is a notion of default value." That default value - the all zeros value that the VM paints on freshly allocated instances - is something we've agreed many value classes want to encapsulate. That's the whole story of "no good default" value classes. We've spent a lot of time plumbing those depths before arriving at this point where such NGD classes want to be expressed with references to ensure their "bad" default values don't leak. So I'm kind of confused by this assertion. Overall - we're winning more than we expected to with this model. More cases can be scalarized on the stack than we initially thought and we can still offer heap flattening for the smaller set of use cases that really benefit from it. > > You are judging your model with the questions of the past, not the questions we will have 10 years after the new model is introduced. As always, today's solutions are tomorrow's problems. Can you be more specific about the questions you think will be asked in the next 10 years so we can dig into those? > > If anyone has the choices, then everyone has more responsibility. And given that the performance differences between Point.ref and Point.val accrue pretty much exclusively in the heap, which is to say, apply only to implementation code and not API, sticking the implementation with this burden seems reasonable. > > > no, you can not change a Point.ref to a Point.val without breaking the backward compatibility, so it's an issue for APIs. Point.ref (the "L" carrier) and Point.val (the "Q" carrier) are spelled differently from a VM perspective. So changing from one to the other is making a new API. The benefit of the approach we've landed on though, is that the difference should be small for API points as we can scalarize the identity-less L on the stack. For backwards compatibility, just leave it! Better to use the L in api signatures and limit the Q's to heap storage (fields and arrays). > > If your description of the world was true, then we do not need Q-type, the attribute Preload which say that a L-type is a value type is enough. > In that case, then the VM model and the language model you propose are more in sync. Preload and L-type give identity-less values flattening on the stack. That's part of the story. For heap flattening we still need the Q. I thought we covered this in the EG discussion. Are you just reading into the record the concerns raised in the meeting to get the answers captured? --Dan > > R?mi > > > > > > On 6/29/2022 10:38 AM, Remi Forax wrote: > > > > ________________________________ > > From: "Brian Goetz" > To: "Kevin Bourrillion" > Cc: "daniel smith" , "valhalla-spec-experts" > Sent: Thursday, June 23, 2022 9:01:24 PM > Subject: Re: User model stacking: current status > > > On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: > > All else being equal, the idea to use "inaccessible value type" over "value type doesn't exist" feels very good and simplifying, with the main problem that the syntax can't help but be gross. > > > A few weeks in, and this latest stacking is still feeling pretty good: > > - There are no coarse buckets any more; there are just identity classes and value classes. > - Value classes have ref and val companion types with the obvious properties. (Notably, refs are always atomic.) > - For `value class C`, C as a type is an alias for `C.ref`. > - The bucket formerly known as B2 becomes "value class, whose .val type is private." This is the default for a value class. > - The bucket formerly known as B3a is denoted by explicitly making the val companion public, with a public modifier on a "member" of the class. > - The bucket formerly known as B3n is denoted by explicitly making the val companion public and non-atomic, again using modifiers. > > I went and updated the State of the Values document to use the new terminology, test-driving some new syntax. (Usual rules: syntax comments are premature at this time.) I was very pleased with the result, because almost all the changes were small changes in terminology (e.g., "value companion type"), and eliminating the clumsy distinction between value classes and primitive classes. Overall the structure remains the same, but feels more compact and clean. MD source is below, for review. > > Kevin's two questions remain, but I don't think they get in the way of refining the model in this way: > > - Have we made the right choices around == ? > - Are we missing a big opportunity by not spelling Complex.val with a bang? > > > I think you have done a good job describing the pro of that model but weirdly not list the cons of that model. > > I see three reasons your proposed model, let's call it the companion class model, needs improvements. > It fails our moto, the companion class model and the VM models are not aligned and the performance model is a "sigil for performance" model. > > > It fails our moto (code like a class, works like an int): > If i say that an Image is an array of pixels with each pixel have three colors, > the obvious translation is not the right one: > > class Image { > Pixel[][] pixels; > } > value record Pixel(Color red, Color green, Color blue) {} > value record Color(byte value) {} > > because a value class is nullable, only it's companion class is not nullable, the correct code is > class Image { > Pixel.val[][] pixels; > } > value record Pixel(Color.val red, Color.val green, Color.val blue) {} > value record Color(byte value) {} > > Color and byte does not work the same way, it's not code like a class works like an int but code like a class, works like an Integer. > > > The VM models and the Java model are not aligned: > For the VM model, L-type and Q-type on equal footing, not one is more important than the other, but the companion class model you propose makes the value class a first citizen and the companion class a second citizen. > We know that when the Java model and the VM model are not aligned, bugs will lie in between. Those can be mild bugs, by example you can throw a checked exception from a method not declaring that exception or painful bugs in the case of generics or serialization. > I think we should list all the cases where the Java Model and the VM model disagree to see the kind of bugs we will ask the future generation to solve. > By example, having a value class with a default constructor and public companion class looks like a lot like a deserialization bug to me, in both case you are able to produce an instance that bypass the constructor. > The other problem is for the other languages than Java. Do those languages will have to define a companion class or a companion class is purely a javac artifact the same way an attribute like InnerClass is. > > The proposed performance model is a "sigil for performance" model. > There is a tradeoff between the safety of the reference vs the performance of flattened value type. In the proposed model, the choice is not done by the maintainer of the class but by the user of the class. This is not fully true, the maintainer of the class can make the companion class private choosing safety but it can not choose performance. The performance has to be chosen by the user of the class. > This is unlike everything we know in Java, this kind of model where the user choose performance is usually called "sigil for performance", the user has to add some magical keywords or sigil to get performance. > A good example of such performance model is the keyword "register" in C. You have to opt-in at use site to get performance. > Moreover unlike in C, in Java we also have to take care of the fact that adding .val is not a backward compatible change, if a value class is used in a public method a user can not change it to its companion class after the fact. > We know from the errors of past that a "sigil for performance" model is a terrible model. > > Overall, i don't think it's the wrong model, but it over-rotates on the notion of reference value class, it's refreshing because in the past we had the tendency to over-rotate on the notion of flattened value class. > I really think that this model can be improved by allowing top-level value class to be declared either as reference or as value and the companion class to be either a value class projection or a reference class projection so the Java model and the VM model will be more in sync. > > R?mi > > > > > # State of Valhalla > ## Part 2: The Language Model {.subtitle} > > #### Brian Goetz {.author} > #### June 2022 {.date} > > > _This is the second of three documents describing the current State of > Valhalla. The first is [The Road to Valhalla](01-background); the > third is [The JVM Model](03-vm-model)._ > > This document describes the directions for the Java _language_ charted by > Project Valhalla. (In this document, we use "currently" to describe the > language as it stands today, without value classes.) > > Valhalla started with the goal of providing user-programmable classes which can > be flat and dense in memory. Numerics are one of the motivating use cases; > adding new primitive types directly to the language has a very high barrier. As > we learned from [Growing a Language][growing] there are infinitely many numeric > types we might want to add to Java, but the proper way to do that is via > libraries, not as a language feature. > > ## Primitive and reference types in Java today > > Java currently has eight built-in primitive types. Primitives represent pure > _values_; any `int` value of "3" is equivalent to, and indistinguishable from, > any other `int` value of "3". Primitives are monolithic (their bits cannot be > addressed individually) and have no canonical location, and so are _freely > copyable_. With the exception of the unusual treatment of exotic floating point > values such as `NaN`, the `==` operator performs a _substitutibility test_ -- it > asks "are these two values the same value". > > Java also has _objects_, and each object has a unique _object identity_. Because > of identity, objects are not freely copyable; each object lives in exactly one > place at any given time, and to access its state we have to go to that place. > But we mostly don't notice this because objects are not manipulated or accessed > directly, but instead through _object references_. Object references are also a > kind of value -- they encode the identity of the object to which they refer, and > the `==` operator on object references asks "do these two references refer to > the same object." Accordingly, object _references_ (like other values) can be > freely copied, but the objects they refer to cannot. > > Primitives and objects differ in almost every conceivable way: > > | Primitives | Objects | > | ------------------------------------------ | ---------------------------------- | > | No identity (pure values) | Identity | > | `==` compares values | `==` compares object identity | > | Built-in | Declared in classes | > | No members (fields, methods, constructors) | Members (including mutable fields) | > | No supertypes or subtypes | Class and interface inheritance | > | Accessed directly | Accessed via object references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | Arrays are monomorphic | Arrays are covariant | > | May tear under race | Initialization safety guarantees | > | Have reference companions (boxes) | Don't need reference companions | > > The design of primitives represents various tradeoffs aimed at maximizing > performance and usability of the primtive types. Reference types default to > `null`, meaning "referring to no object"; primitives default to a usable zero > value (which for most primitives is the additive identity). Reference types > provide initialization safety guarantees against a certain category of data > races; primitives allow tearing under race for larger-than-32-bit values. > We could characterize the design principles behind these tradeoffs are "make > objects safer, make primitives faster." > > The following figure illustrates the current universe of Java's types. The > upper left quadrant is the built-in primitives; the rest of the space is > reference types. In the upper-right, we have the abstract reference types -- > abstract classes, interfaces, and `Object` (which, though concrete, acts more > like an interface than a concrete class). The built-in primitives have wrappers > or boxes, which are reference types. > >
> > Current universe of Java field types > >
> > Valhalla aims to unify primitives and objects in that they can both be > declared with classes, but maintains the special runtime characteristics > primitives have. But while everyone likes the flatness and density that > user-definable value types promise, in some cases we want them to be more like > classical objects (nullable, non-tearable), and in other cases we want them to > be more like classical primitives (trading some safety for performance). > > ## Value classes: separating references from identity > > Many of the impediments to optimization that Valhalla seeks to remove center > around _unwanted object identity_. The primitive wrapper classes have identity, > but it is a purely accidental one. Not only is it not directly useful, it can > be a source of bugs. For example, due to caching, `Integer` can be accidentally > compared correctly with `==` just often enough that people keep doing it. > Similarly, [value-based classes][valuebased] such as `Optional` have no need for > identity, but pay the costs of having identity anyway. > > Our first step is allowing class declarations to explicitly disavow identity, by > declaring themselves as _value classes_. The instances of a value class are > called _value objects_. > > ``` > value class ArrayCursor { > T[] array; > int offset; > > public ArrayCursor(T[] array, int offset) { > this.array = array; > this.offset = offset; > } > > public boolean hasNext() { > return offset < array.length; > } > > public T next() { > return array[offset]; > } > > public ArrayCursor advance() { > return new ArrayCursor(array, offset+1); > } > } > ``` > > This says that an `ArrayCursor` is a class whose instances have no identity -- > that instead they have _value semantics_. As a consequence, it must give up the > things that depend on identity; the class and its fields are implicitly final. > > But, value classes are still classes, and can have most of the things classes > can have -- fields, methods, constructors, type parameters, superclasses (with > some restrictions), nested classes, class literals, interfaces, etc. The > classes they can extend are restricted: `Object` or abstract classes with no > instance fields, empty no-arg constructor bodies, no other constructors, no instance > initializers, no synchronized methods, and whose superclasses all meet this same > set of conditions. (`Number` meets these conditions.) > > Classes in Java give rise to types; the class `ArrayCursor` gives rise to a type > `ArrayCursor` (actually a parametric family of instantiations `ArrayCursor`.) > `ArrayCursor` is still a reference type, just one whose references refer to > value objects rather than identity objects. For the types in the upper-right > quadrant of the diagram (interfaces, abstract classes, and `Object`), references > to these types might refer to either an identity object or a value object. > (Historically, JVMs were effectively forced to represent object references with > pointers; for references to value objects, JVMs now have more flexibility.) > > Because `ArrayCursor` is a reference type, it is nullable (because references > are nullable), its default value is null, and loads and stores of references are > atomic with respect to each other even in the presence of data races, providing > the initialization safety we are used to with classical objects. > > Because instances of `ArrayCursor` have value semantics, `==` compares by state > rather than identity. This means that value objects, like primitives, are > _freely copyable_; we can explode them into their fields and re-aggregate them > into another value object, and we cannot tell the difference. (Because they > have no identity, some identity-sensitive operations, such as synchronization, > are disallowed.) > > So far we've addressed the first two lines of the table of differences above; > rather than identity being a property of all object instances, classes can > decide whether their instances have identity or not. By allowing classes that > don't need identity to exclude it, we free the runtime to make better layout and > compilation decisions -- and avoid a whole category of bugs. > > In looking at the code for `ArrayCursor`, we might mistakenly assume it will be > inefficient, as each loop iteration appears to allocate a new cursor: > > ``` > for (ArrayCursor c = Arrays.cursor(array); > c.hasNext(); > c = c.advance()) { > // use c.next(); > } > ``` > > One should generally expect here that _no_ cursors are actually allocated. > Because an `ArrayCursor` is just its two fields, these fields will routinely get > scalarized and hoisted into registers, and the constructor call in `advance` > will typically compile down to incrementing one of these registers. > > ### Migration > > The JDK (as well as other libraries) has many [value-based classes][valuebased] > such as `Optional` and `LocalDateTime`. Value-based classes adhere to the > semantic restrictions of value classes, but are still identity classes -- even > though they don't want to be. Value-based classes can be migrated to true value > classes simply by redeclaring them as value classes, which is both source- and > binary-compatible. > > We plan to migrate many value-based classes in the JDK to value classes. > Additionally, the primitive wrappers can be migrated to value classes as well, > making the conversion between `int` and `Integer` cheaper; see the section > "Legacy Primitives" below. (In some cases, this may be _behaviorally_ > incompatible for code that synchronizes on the primitive wrappers. [JEP > 390][jep390] has supported both compile-time and runtime warnings for > synchronizing on primitive wrappers since Java 16.) > >
> > Java field types adding value classes > >
> > ### Equality > > Earlier we said that `==` compares value objects by state rather than by > identity. More precisely, two value objects are `==` if they are of the same > type, and each of their fields are pairwise equal, where equality is given by > `==` for primitives (except `float` and `double`, which are compared with > `Float::equals` and `Double::equals` to avoid anomalies), `==` for references to > identity objects, and recursively with `==` for references to value objects. In > no case is a value object ever `==` to a reference to an identity object. > > ### Value records > > While records have a lot in common with value classes -- they are final and > their fields are final -- they are still identity classes. Records embody a > tradeoff: give up on decoupling the API from the representation, and in return > get various syntactic and semantic benefits. Value classes embody another > tradeoff: give up identity, and get various semantic and performance benefits. > If we are willing to give up both, we can get both sets of benefits. > > ``` > value record NameAndScore(String name, int score) { } > ``` > > Value records combine the data-carrier idiom of records with the improved > scalarization and flattening benefits of value classes. > > In theory, it would be possible to apply `value` to certain enums as well, but > this is not currently possible because the `java.lang.Enum` base class that > enums extend do not meet the requirements for superclasses of value classes (it > has fields and non-empty constructors). > > ## Unboxing values for flatness and density > > Value classes shed object identity, gaining a host of performance and > predictability benefits in the process. They are an ideal replacement for many > of today's value-based classes, fully preserving their semantics (except for the > accidental identity these classes never wanted). But identity-free reference > types are only one point a spectrum of tradeoffs between abstraction and > performance, and other desired use cases -- such as numerics -- may want a > different set of tradeoffs. > > Reference types are nullable, and therefore must account for null somehow in > their representation, which may involve additional footprint. Similarly, they > offer the initialization safety guarantees for final fields that we come to > expect from identity objects, which may entail limits on flatness. For certain > use cases, it may be desire to additionally give up something else to make > further flatness and footprint gains -- and that something else is > reference-ness. > > The built-in primitives are best understood as _pairs_ of types: a primitive > type (e.g., `int`) and its reference companion or box (`Integer`), with > conversions between the two (boxing and unboxing.) We have both types because > the two have different characteristics. Primitives are optimized for efficient > storage and access: they are not nullable, they tolerate uninitialized (zero) > values, and larger primitive types (`long`, `double`) may tear under racy > access. References err on the side of safety and flexibility; they support > nullity, polymorphism, and offer initialization safety (freedom from tearing), > but by comparison to primitives, they pay a footprint and indirection cost. > > For these reasons, value classes give rise to pairs of types as well: a > reference type and a _value companion type_. We've seen the reference type so > far; for a value class `Point`, the reference type is called `Point`. (The full > name for the reference type is `Point.ref`; `Point` is an alias for that.) The > value companion type is called `Point.val`, and the two types have the same > conversions between them as primitives do today with their boxes. (If we are > talking explicitly about the value companion type of a value class, we may > sometimes describe the corresponding reference type as its _reference > companion_.) > > ``` > value class Point implements Serializable { > int x; > int y; > > Point(int x, int y) { > this.x = x; > this.y = y; > } > > Point scale(int s) { > return new Point(s*x, s*y); > } > } > ``` > > The default value of the value companion type is the one for which all fields > take on their default value; the default value of the reference type is, like > all reference types, null. > > In our diagram, these new types show up as another entity that straddles the > line between primitives and identity-free references, alongside the legacy > primitives: > > ** UPDATE DIAGRAM ** > >
> > Java field types with extended primitives > >
> > ### Member access > > Both the reference and value companion types are seen to have the same instance > members. Unlike today's primitives, value companion types can be used as > receivers to access fields and invoke methods, subject to accessibility > constraints: > > ``` > Point.val p = new Point(1, 2); > assert p.x == 1; > > p = p.scale(2); > assert p.x == 2; > ``` > > ### Polymorphism > > When we declare a class today, we set up a subtyping (is-a) relationship between > the declared class and its supertypes. When we declare a value class, we set up > a subtyping relationship between the _reference type_ and the declared > supertypes. This means that if we declare: > > ``` > value class UnsignedShort extends Number > implements Comparable { > ... > } > ``` > > then `UnsignedShort` is a subtype of `Number` and `Comparable`, > and we can ask questions about subtyping using `instanceof` or pattern matching. > What happens if we ask such a question of the value companion type? > > ``` > UnsignedShort.val us = ... > if (us instanceof Number) { ... } > ``` > > Since subtyping is defined only on reference types, the `instanceof` operator > (and corresponding type patterns) will behave as if both sides were lifted to > the approrpriate reference type, and we can answer the question that way. (This > may trigger fears of expensive boxing conversions, but in reality no actual > allocation will happen.) > > We introduce a new relationship based on `extends` / `implements` clauses, which > we'll call "extends"; we define `A extends B` as meaning `A <: B` when A is a > reference type, and `A.ref <: B` when A is a value companion type. The > `instanceof` relation, reflection, and pattern matching are updated to use > "extends". > > ### Arrays > > Arrays of reference types are _covariant_; this means that if `A <: B`, then > `A[] <: B[]`. This allows `Object[]` to be the "top array type", at least for > arrays of references. But arrays of primitives are currently left out of this > story. We can unify the treatment of arrays by defining array covariance over > the new "extends" relationship; if A extends B, then `A[] <: B[]`. For a value > class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the top > type for all arrays. > > ### Equality > > Just as with `instanceof`, we define `==` on values by appealing to the > reference companion (though no actual boxing need occur). Evaluating `a == b`, > where one or both operands are of a value companion type, can be defined as if > the operands are first converted to their corresponding reference type, and then > comparing the results. This means that the following will succeed: > > ``` > Point.val p = new Point(3, 4); > Point pr = p; > assert p == pr; > ``` > > The base implementation of `Object::equals` delegates to `==`, which is a > suitable default for both reference and value classes. > > ### Serialization > > If a value class implements `Serializable`, this is also really a statement > about the reference type. Just as with other aspects described here, > serialization of value companions can be defined by converting to the > corresponding reference type and serializing that, and reversing the process at > deserialization time. > > Serialization currently uses object identity to preserve the topology of an > object graph. This generalizes cleanly to objects without identity, because > `==` on value objects treats two identical copies of a value object as equal. > So any observations we make about graph topology prior to serialization with > `==` are consistent with those after deserialization. > > ### Identity-sensitive operations > > Certain operations are currently defined in terms of object identity. As we've > already seen, some of these, like equality, can be sensibly extended to cover > all instances. Others, like synchronization, will become partial. > Identity-sensitive operations include: > > - **Equality.** We extend `==` on references to include references to value > objects. Where it currently has a meaning, the new definition coincides > with that meaning. > > - **System::identityHashCode.** The main use of `identityHashCode` is in the > implementation of data structures such as `IdentityHashMap`. We can extend > `identityHashCode` in the same way we extend equality -- deriving a hash on > primitive objects from the hash of all the fields. > > - **Synchronization.** This becomes a partial operation. If we can > statically detect that a synchronization will fail at runtime (including > declaring a `synchronized` method in a value class), we can issue a > compilation error; if not, attempts to lock on a value object results in > `IllegalMonitorStateException`. This is justifiable because it is > intrinsically imprudent to lock on an object for which you do not have a > clear understanding of its locking protocol; locking on an arbitrary > `Object` or interface instance is doing exactly that. > > - **Weak, soft, and phantom references.** Capturing an exotic reference to a > value object becomes a partial operation, as these are intrinsically tied to > reachability (and hence to identity). However, we will likely make > enhancements to `WeakHashMap` to support mixed identity and value keys. > > ### What about Object? > > The root class `Object` poses an unusual problem, in that every class must > extend it directly or indirectly, but it is also instantiable (non-abstract), > and its instances have identity -- it is common to use `new Object()` as a way > to obtain a new object identity for purposes of locking. > > ## Why two types? > > It is sensible to ask: why do we need companion types at all? This is analogous > to the need for boxes in 1995: we'd made one set of tradeoffs for primitives, > favoring performance (non-nullable, zero-default, tolerant of > non-initialization, tolerant of tearing under race, unrelated to `Object`), and > another for references, favoring flexibility and safety. Most of the time, we > ignored the primitive wrapper classes, but sometimes we needed to temporarily > suppress one of these properties, such as when interoperating with code that > expects an `Object` or the ability to express "no value". The reasons we needed > boxes in 1995 still apply today: sometimes we need the affordances of > references, and in those cases, we appeal to the reference companion. > > Reasons we might want to use the reference companion include: > > - **Interoperation with reference types.** Value classes can implement > interfaces and extend classes (including `Object` and some abstract classes), > which means some class and interface types are going to be polymorphic over > both identity and primitive objects. This polymorphism is achieved through > object references; a reference to `Object` may be a reference to an identity > object, or a reference to a value object. > > - **Nullability.** Nullability is an affordance of object _references_, not > objects themselves. Most of the time, it makes sense that primitive types > are non-nullable (as the primitives are today), but there may be situations > where null is a semantically important value. Using the reference companion > when nullability is required is semantically clear, and avoids the need to > invent new sentinel values for "no value." > > This need comes up when migrating existing classes; the method `Map::get` > uses `null` to signal that the requested key was not present in the map. But, > if the `V` parameter to `Map` is a primitive class, `null` is not a valid > value. We can capture the "`V` or null" requirement by changing the > descriptor of `Map::get` to: > > ``` > public V.ref get(K key); > ``` > > where, whatever type `V` is instantiated as, `Map::get` returns the reference > companion. (For a type `V` that already is a reference type, this is just `V` > itself.) This captures the notion that the return type of `Map::get` will > either be a reference to a `V`, or the `null` reference. (This is a > compatible change, since both erase to the same thing.) > > > - **Self-referential types.** Some types may want to directly or indirectly > refer to themselves, such as the "next" field in the node type of a linked > list: > > ``` > class Node { > T theValue; > Node nextNode; > } > ``` > > We might want to represent this as a value class, but if the type of > `nextNode` were `Node.val`, the layout of `Node` would be > self-referential, since we would be trying to flatten a `Node` into its own > layout. > > - **Protection from tearing.** For a value class with a non-atomic value > companion type, we may want to use the reference companion in cases where we > are concerned about tearing; because loads and stores of references are > atomic, `P.ref` is immune to the tearing under race that `P.val` might be > subject to. > > - **Compatibility with existing boxing.** Autoboxing is convenient, in that it > lets us pass a primitive where a reference is required. But boxing affects > far more than assignment conversion; it also affects method overload > selection. The rules are designed to prefer overloads that require no > conversions to those requiring boxing (or varargs) conversions. Having both > a value and reference type for every value class means that these rules can > be cleanly and intuitively extended to cover value classes. > > ## Refining the value companion > > Value classes have several options for refining the behavior of the value > companion type and how they are exposed to clients. > > ### Classes with no good default value > > For a value class `C`, the default value of `C.ref` is the same as any other > reference type: `null`. For the value companion type `C.val`, the default value > is the one where all of its fields are initialized to their default value. > > The built-in primitives reflect the design assumption that zero is a reasonable > default. The choice to use a zero default for uninitialized variables was one > of the central tradeoffs in the design of the built-in primitives. It gives us > a usable initial value (most of the time), and requires less storage footprint > than a representation that supports null (`int` uses all 2^32 of its bit > patterns, so a nullable `int` would have to either make some 32 bit signed > integers unrepresentable, or use a 33rd bit). This was a reasonable tradeoff > for the built-in primitives, and is also a reasonable tradeoff for many (but not > all) other potential value classes (such as complex numbers, 2D points, > half-floats, etc). > > But for others potential value classes, such as `LocalDate`, there _is_ no > reasonable default. If we choose to represent a date as the number of days > since some some epoch, there will invariably be bugs that stem from > uninitialized dates; we've all been mistakenly told by computers that something > will happen on or near 1 January 1970. Even if we could choose a default other > than the zero representation, an uninitialized date is still likely to be an > error -- there simply is no good default date value. > > For this reason, value classes have the choice of encapsulating or exposing > their value companion type. If the class is willing to tolerate an > uninitialized (zero) value, it can freely share its `.val` companion with the > world; if uninitialized values are dangerous (such as for `LocalDate`), it can > be encapsulated to the class or package. > > Encapsulation is accomplished using ordinary access control. By default, the > value companion is `private`, and need not be declared explicitly; a class that > wishes to share its value companion can make it public: > > ``` > public value record Complex(double real, double imag) { > public value companion Complex.val; > } > ``` > > ### Atomicity and tearing > > For the primitive types longer than 32 bits (long and double), it is not > guaranteed that reads and writes from different threads (without suitable > coordination) are atomic with respect to each other. The result is that, if > accessed under data race, a long or double field or array element can be seen to > "tear", and a read might see the low 32 bits of one write and the high 32 bits > of another. (Declaring the containing field `volatile` is sufficient to restore > atomicity, as is properly coordinating with locks or other concurrency control, > or not sharing across threads in the first place.) > > This was a pragmatic tradeoff given the hardware of the time; the cost of 64-bit > atomicity on 1995 hardware would have been prohibitive, and problems only arise > when the program already has data races -- and most numeric code deals with > thread-local data. Just like with the tradeoff of nulls vs zeros, the design of > the built-in primitives permits tearing as part of a tradeoff between > performance and correctness, where primitives chose "as fast as possible" and > reference types chose more safety. > > Today, most JVMs give us atomic loads and stores of 64-bit primitives, because > the hardware makes them cheap enough. But value classes bring us back to > 1995; atomic loads and stores of larger-than-64-bit values are still expensive > on many CPUs, leaving us with a choice of "make operations on primitives slower" > or permitting tearing when accessed under race. > > It would not be wise for the language to select a one-size-fits-all policy about > tearing; choosing "no tearing" means that types like `Complex` are slower than > they need to be, even in a single-threaded program; choosing "tearing" means > that classes like `Range` can be seen to not exhibit invariants asserted by > their constructor. Class authors have to choose, with full knowledge of their > domain, whether their types can tolerate tearing. The default is no tearing > (safe by default); a class can opt for greater flattening at the cost of > potential tearing by declaring the value companion as `non-atomic`: > > ``` > public value record Complex(double real, double imag) { > public non-atomic value companion Complex.val; > } > ``` > > For classes like `Complex`, all of whose bit patterns are valid, this is very > much like the choice around `long` in 1995. For other classes that might have > nontrivial representational invariants, they likely want to stick to the default > of atomicity. > > ## Migrating legacy primitives > > As part of generalizing primitives, we want to adjust the built-in primitives to > behave as consistently with value classes as possible. While we can't change > the fact that `int`'s reference companion is the oddly-named `Integer`, we can give them > more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an alias for > `Integer.val`) -- so that we can use a consistent rule for naming companions. > Similarly, we can extend member access to the legacy primitives, and allow > `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) > > We will redeclare `Integer` as a value class with a public value companion: > > ``` > value class Integer { > public value companion Integer.val; > > // existing methods > } > ``` > > where the type name `int` is an alias for `Integer.val`. The primitive array > types will be retrofitted such that arrays of primitives are subtypes of arrays > of their boxes (`int[] <: Integer[]`). > > ## Unifying primitives with classes > > Earlier, we had a chart of the differences between primitive and reference > types: > > | Primitives | Objects | > | ------------------------------------------ | ---------------------------------- | > | No identity (pure values) | Identity | > | `==` compares values | `==` compares object identity | > | Built-in | Declared in classes | > | No members (fields, methods, constructors) | Members (including mutable fields) | > | No supertypes or subtypes | Class and interface inheritance | > | Accessed directly | Accessed via object references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | Arrays are monomorphic | Arrays are covariant | > | May tear under race | Initialization safety guarantees | > | Have reference companions (boxes) | Don't need reference companions | > > The addition of value classes addresses many of these directly. Rather than > saying "classes have identity, primitives do not", we make identity an optional > characteristic of classes (and derive equality semantics from that.) Rather > than primitives being built in, we derive all types, including primitives, from > classes, and endow value companion types with the members and supertypes > declared with the value class. Rather than having primitive arrays be > monomorphic, we make all arrays covariant under the `extends` relation. > > The remaining differences now become differences between reference types and > value types: > > | Value types | Reference types | > | --------------------------------------------- | -------------------------------- | > | Accessed directly | Accessed via object references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | May tear under race, if declared `non-atomic` | Initialization safety guarantees | > > > ### Choosing which to use > > How would we choose between declaring an identity class or a value class, and > the various options on value companiones? Here are some quick rules of thumb: > > - If you need mutability, subclassing, or aliasing, choose an identity class. > - If uninitialized (zero) values are unacceptable, choose a value class with > the value companion encapsulated. > - If you have no cross-field invariants and are willing to tolerate tearing to > enable more flattening, choose a value class with a non-atomic value > companion. > > ## Summary > > Valhalla unifies, to the extent possible, primitives and objects. The > following table summarizes the transition from the current world to Valhalla. > > | Current World | Valhalla | > | ------------------------------------------- | --------------------------------------------------------- | > | All objects have identity | Some objects have identity | > | Fixed, built-in set of primitives | Open-ended set of primitives, declared via classes | > | Primitives don't have methods or supertypes | Primitives are classes, with methods and supertypes | > | Primitives have ad-hoc boxes | Primitives have regularized reference companions | > | Boxes have accidental identity | Reference companions have no identity | > | Boxing and unboxing conversions | Primitive reference and value conversions, but same rules | > | Primitive arrays are monomorphic | All arrays are covariant | > > > [valuebased]: https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html > [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 > [jep390]: https://openjdk.java.net/jeps/390 > > > > From forax at univ-mlv.fr Thu Jun 30 11:52:02 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 30 Jun 2022 13:52:02 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: References: <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> <2116995606.733122.1656518250393.JavaMail.zimbra@u-pem.fr> Message-ID: <1393328147.1270048.1656589922539.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "Remi Forax" > Cc: "Brian Goetz" , "Kevin Bourrillion" , "daniel smith" > , "valhalla-spec-experts" > Sent: Wednesday, June 29, 2022 9:22:04 PM > Subject: Re: User model stacking: current status > On Wed, Jun 29, 2022 at 2:11 PM wrote: >> >> >> >> ________________________________ >> >> From: "Brian Goetz" >> To: "Remi Forax" >> Cc: "Kevin Bourrillion" , "daniel smith" >> , "valhalla-spec-experts" >> >> Sent: Wednesday, June 29, 2022 5:32:38 PM >> Subject: Re: User model stacking: current status >> >> I think you have done a good job describing the pro of that model but weirdly >> not list the cons of that model. >> >> >> I think we described the con pretty clearly: .val is ugly, and this puts it in >> people's face. This point was mentioned multiple times during the discussions. >> But the notable thing is: no one has raised other cons. The con is syntax. >> >> >> no, the major con is the fact that the model you propose and the VM model are >> not aligned. > > Didn't we cover this at the EG meeting today? The consensus was that > they *are* aligned. Both the VM and language default to the ref type > (ie: CONSTANT_Class "Foo" vs CONSTANT_Class "QFoo;") and other > examples discussed. Where is the misalignment? yes, > >> >> All your points here are basically a dressed-up version of this same issue: at >> least in some cases, some users will be grumpy that the good name goes to the >> thing they don't want. And this is a point we are painfully aware of, so none >> of this is particularly new. >> >> And we have explored all the positions on this (Point is ref, Point is val, let >> the user pick two names, let the declarer choose, etc), and they all have >> downsides. Specifically, we explored having `ref-default` and `val-default` as >> declaration-site options; this "gives the user more control" (developers love >> knobs!) But it also imposes a significant cognitive load on all developers: >> people no longer know what `Point` means. Is it nullable? Is it a reference? >> You have to look it up, or "carry around a mental database." >> >> >> Let suppose we offer a model with with ref-default and val-default at >> declaration site. >> In that case, is it a nullable or is it reference are questions from the past, >> nullable becomes less important because there is a notion of default value. >> And knowing if something is a reference or not is not something people really >> care. In Python, everything is reference, even integers, but nobody cares. Does >> VMs do escape analysis or not, noone care. What is important is if there is a >> difference in behavior between being a reference or not. >> Those questions that you have to carry around are only important if we make them >> important. > > The model we've been working towards is (roughly) expressed as "codes > like a class; works like an int" based on both user requirements and > the underlying vm physics. There is a difference between being a > reference and being a value - though we've done an incredible job of > bringing the benefits of Valhalla to non-identity reference types (a > bigger win than we expected when we started!). yes, > > I'm confused by your assertion that "nullable becomes less important > because there is a notion of default value." That default value - the > all zeros value that the VM paints on freshly allocated instances - is > something we've agreed many value classes want to encapsulate. That's > the whole story of "no good default" value classes. We've spent a lot > of time plumbing those depths before arriving at this point where such > NGD classes want to be expressed with references to ensure their "bad" > default values don't leak. So I'm kind of confused by this assertion. I would like to separate the concern about null, you have the perspective of the maintainer/writer of a class and the perspective of the user of a class. I was not talking about the maintainer POV which as to deal with the no good default class but from the user POV, that only need to deal with fields and array being initialized with the default value instead of null. I don't disagree with the current model, i think the model is not enough, not exposing a way to declare primary val classes (val is always secondary in the proposed model) is moving the burden to dealing with the val/ref world from the maintainer of a class to the users of a class. I will develop that in a later mail. > > Overall - we're winning more than we expected to with this model. > More cases can be scalarized on the stack than we initially thought > and we can still offer heap flattening for the smaller set of use > cases that really benefit from it. > >> >> You are judging your model with the questions of the past, not the questions we >> will have 10 years after the new model is introduced. > > As always, today's solutions are tomorrow's problems. Can you be more > specific about the questions you think will be asked in the next 10 > years so we can dig into those ? The proposed model is similar to the eclair model from the POV of the users of the value class, i think we did not do a good postmortem on why the eclair model fails from the user POV because we discover that the VM could be must smarter that we previously though. So the proposed model exhibits the same issue. I will dig for my note on the eclair model and rewrite them in terms of the current model. > >> >> If anyone has the choices, then everyone has more responsibility. And given >> that the performance differences between Point.ref and Point.val accrue pretty >> much exclusively in the heap, which is to say, apply only to implementation >> code and not API, sticking the implementation with this burden seems >> reasonable. >> >> >> no, you can not change a Point.ref to a Point.val without breaking the backward >> compatibility, so it's an issue for APIs. > > Point.ref (the "L" carrier) and Point.val (the "Q" carrier) are > spelled differently from a VM perspective. So changing from one to > the other is making a new API. The benefit of the approach we've > landed on though, is that the difference should be small for API > points as we can scalarize the identity-less L on the stack. For > backwards compatibility, just leave it! Better to use the L in api > signatures and limit the Q's to heap storage (fields and arrays). I think we can get both, i would like a Point.ref followed by a Objects.requireNonNull to be equivalent to a Point.val from the user POV. By example public void foo(Point p) { Object.requireNonNull(p); ... } should be equivalent to public void foo(Point.val p) { ... } This requires to never have a Point.val in the method descriptor and to use the attribute TypeRestriction when Point.val is used. I believe this is the kind of heroic efforts we will have to do so users can add ".val" to a parameter type of a method without thinking too much. Obviously, i would prefer a world were the maintainer of a value class have to deal with this kind of stuff instead of the users but if we keep the proposed model, i think we will have to polish it around the edges. > >> >> If your description of the world was true, then we do not need Q-type, the >> attribute Preload which say that a L-type is a value type is enough. >> In that case, then the VM model and the language model you propose are more in >> sync. > > Preload and L-type give identity-less values flattening on the stack. > That's part of the story. For heap flattening we still need the Q. Yes, i've forgotten that we need Q-type for generics as Brian remember me/us during our meeting. > > I thought we covered this in the EG discussion. Are you just reading > into the record the concerns raised in the meeting to get the answers > captured ? I think the meeting was very useful to me because i did not understand correctly the proposed model. I have another set of worries now, but as i said, i want to comb through my note before raising another set of concerns. R?mi > > --Dan > >> >> R?mi >> >> >> >> >> >> On 6/29/2022 10:38 AM, Remi Forax wrote: >> >> >> >> ________________________________ >> >> From: "Brian Goetz" >> To: "Kevin Bourrillion" >> Cc: "daniel smith" , "valhalla-spec-experts" >> >> Sent: Thursday, June 23, 2022 9:01:24 PM >> Subject: Re: User model stacking: current status >> >> >> On 6/15/2022 12:41 PM, Kevin Bourrillion wrote: >> >> All else being equal, the idea to use "inaccessible value type" over "value type >> doesn't exist" feels very good and simplifying, with the main problem that the >> syntax can't help but be gross. >> >> >> A few weeks in, and this latest stacking is still feeling pretty good: >> >> - There are no coarse buckets any more; there are just identity classes and >> value classes. >> - Value classes have ref and val companion types with the obvious properties. >> (Notably, refs are always atomic.) >> - For `value class C`, C as a type is an alias for `C.ref`. >> - The bucket formerly known as B2 becomes "value class, whose .val type is >> private." This is the default for a value class. >> - The bucket formerly known as B3a is denoted by explicitly making the val >> companion public, with a public modifier on a "member" of the class. >> - The bucket formerly known as B3n is denoted by explicitly making the val >> companion public and non-atomic, again using modifiers. >> >> I went and updated the State of the Values document to use the new terminology, >> test-driving some new syntax. (Usual rules: syntax comments are premature at >> this time.) I was very pleased with the result, because almost all the changes >> were small changes in terminology (e.g., "value companion type"), and >> eliminating the clumsy distinction between value classes and primitive classes. >> Overall the structure remains the same, but feels more compact and clean. MD >> source is below, for review. >> >> Kevin's two questions remain, but I don't think they get in the way of refining >> the model in this way: >> >> - Have we made the right choices around == ? >> - Are we missing a big opportunity by not spelling Complex.val with a bang? >> >> >> I think you have done a good job describing the pro of that model but weirdly >> not list the cons of that model. >> >> I see three reasons your proposed model, let's call it the companion class >> model, needs improvements. >> It fails our moto, the companion class model and the VM models are not aligned >> and the performance model is a "sigil for performance" model. >> >> >> It fails our moto (code like a class, works like an int): >> If i say that an Image is an array of pixels with each pixel have three colors, >> the obvious translation is not the right one: >> >> class Image { >> Pixel[][] pixels; >> } >> value record Pixel(Color red, Color green, Color blue) {} >> value record Color(byte value) {} >> >> because a value class is nullable, only it's companion class is not nullable, >> the correct code is >> class Image { >> Pixel.val[][] pixels; >> } >> value record Pixel(Color.val red, Color.val green, Color.val blue) {} >> value record Color(byte value) {} >> >> Color and byte does not work the same way, it's not code like a class works like >> an int but code like a class, works like an Integer. >> >> >> The VM models and the Java model are not aligned: >> For the VM model, L-type and Q-type on equal footing, not one is more important >> than the other, but the companion class model you propose makes the value class >> a first citizen and the companion class a second citizen. >> We know that when the Java model and the VM model are not aligned, bugs will lie >> in between. Those can be mild bugs, by example you can throw a checked >> exception from a method not declaring that exception or painful bugs in the >> case of generics or serialization. >> I think we should list all the cases where the Java Model and the VM model >> disagree to see the kind of bugs we will ask the future generation to solve. >> By example, having a value class with a default constructor and public companion >> class looks like a lot like a deserialization bug to me, in both case you are >> able to produce an instance that bypass the constructor. >> The other problem is for the other languages than Java. Do those languages will >> have to define a companion class or a companion class is purely a javac >> artifact the same way an attribute like InnerClass is. >> >> The proposed performance model is a "sigil for performance" model. >> There is a tradeoff between the safety of the reference vs the performance of >> flattened value type. In the proposed model, the choice is not done by the >> maintainer of the class but by the user of the class. This is not fully true, >> the maintainer of the class can make the companion class private choosing >> safety but it can not choose performance. The performance has to be chosen by >> the user of the class. >> This is unlike everything we know in Java, this kind of model where the user >> choose performance is usually called "sigil for performance", the user has to >> add some magical keywords or sigil to get performance. >> A good example of such performance model is the keyword "register" in C. You >> have to opt-in at use site to get performance. >> Moreover unlike in C, in Java we also have to take care of the fact that adding >> .val is not a backward compatible change, if a value class is used in a public >> method a user can not change it to its companion class after the fact. >> We know from the errors of past that a "sigil for performance" model is a >> terrible model. >> >> Overall, i don't think it's the wrong model, but it over-rotates on the notion >> of reference value class, it's refreshing because in the past we had the >> tendency to over-rotate on the notion of flattened value class. >> I really think that this model can be improved by allowing top-level value class >> to be declared either as reference or as value and the companion class to be >> either a value class projection or a reference class projection so the Java >> model and the VM model will be more in sync. >> >> R?mi >> >> >> >> >> # State of Valhalla >> ## Part 2: The Language Model {.subtitle} >> >> #### Brian Goetz {.author} >> #### June 2022 {.date} >> >> > _This is the second of three documents describing the current State of >> Valhalla. The first is [The Road to Valhalla](01-background); the >> third is [The JVM Model](03-vm-model)._ >> >> This document describes the directions for the Java _language_ charted by >> Project Valhalla. (In this document, we use "currently" to describe the >> language as it stands today, without value classes.) >> >> Valhalla started with the goal of providing user-programmable classes which can >> be flat and dense in memory. Numerics are one of the motivating use cases; >> adding new primitive types directly to the language has a very high barrier. As >> we learned from [Growing a Language][growing] there are infinitely many numeric >> types we might want to add to Java, but the proper way to do that is via >> libraries, not as a language feature. >> >> ## Primitive and reference types in Java today >> >> Java currently has eight built-in primitive types. Primitives represent pure >> _values_; any `int` value of "3" is equivalent to, and indistinguishable from, >> any other `int` value of "3". Primitives are monolithic (their bits cannot be >> addressed individually) and have no canonical location, and so are _freely >> copyable_. With the exception of the unusual treatment of exotic floating point >> values such as `NaN`, the `==` operator performs a _substitutibility test_ -- it >> asks "are these two values the same value". >> >> Java also has _objects_, and each object has a unique _object identity_. Because >> of identity, objects are not freely copyable; each object lives in exactly one >> place at any given time, and to access its state we have to go to that place. >> But we mostly don't notice this because objects are not manipulated or accessed >> directly, but instead through _object references_. Object references are also a >> kind of value -- they encode the identity of the object to which they refer, and >> the `==` operator on object references asks "do these two references refer to >> the same object." Accordingly, object _references_ (like other values) can be >> freely copied, but the objects they refer to cannot. >> >> Primitives and objects differ in almost every conceivable way: >> >> | Primitives | Objects >> | | >> | ------------------------------------------ | ---------------------------------- >> | | >> | No identity (pure values) | Identity >> | | >> | `==` compares values | `==` compares object identity >> | | >> | Built-in | Declared in classes >> | | >> | No members (fields, methods, constructors) | Members (including mutable fields) >> | | >> | No supertypes or subtypes | Class and interface inheritance >> | | >> | Accessed directly | Accessed via object references >> | | >> | Not nullable | Nullable >> | | >> | Default value is zero | Default value is null >> | | >> | Arrays are monomorphic | Arrays are covariant >> | | >> | May tear under race | Initialization safety guarantees >> | | >> | Have reference companions (boxes) | Don't need reference companions >> | | >> >> The design of primitives represents various tradeoffs aimed at maximizing >> performance and usability of the primtive types. Reference types default to >> `null`, meaning "referring to no object"; primitives default to a usable zero >> value (which for most primitives is the additive identity). Reference types >> provide initialization safety guarantees against a certain category of data >> races; primitives allow tearing under race for larger-than-32-bit values. >> We could characterize the design principles behind these tradeoffs are "make >> objects safer, make primitives faster." >> >> The following figure illustrates the current universe of Java's types. The >> upper left quadrant is the built-in primitives; the rest of the space is >> reference types. In the upper-right, we have the abstract reference types -- >> abstract classes, interfaces, and `Object` (which, though concrete, acts more >> like an interface than a concrete class). The built-in primitives have wrappers >> or boxes, which are reference types. >> >>
>> >> Current universe of Java field types >> >>
>> >> Valhalla aims to unify primitives and objects in that they can both be >> declared with classes, but maintains the special runtime characteristics >> primitives have. But while everyone likes the flatness and density that >> user-definable value types promise, in some cases we want them to be more like >> classical objects (nullable, non-tearable), and in other cases we want them to >> be more like classical primitives (trading some safety for performance). >> >> ## Value classes: separating references from identity >> >> Many of the impediments to optimization that Valhalla seeks to remove center >> around _unwanted object identity_. The primitive wrapper classes have identity, >> but it is a purely accidental one. Not only is it not directly useful, it can >> be a source of bugs. For example, due to caching, `Integer` can be accidentally >> compared correctly with `==` just often enough that people keep doing it. >> Similarly, [value-based classes][valuebased] such as `Optional` have no need for >> identity, but pay the costs of having identity anyway. >> >> Our first step is allowing class declarations to explicitly disavow identity, by >> declaring themselves as _value classes_. The instances of a value class are >> called _value objects_. >> >> ``` >> value class ArrayCursor { >> T[] array; >> int offset; >> >> public ArrayCursor(T[] array, int offset) { >> this.array = array; >> this.offset = offset; >> } >> >> public boolean hasNext() { >> return offset < array.length; >> } >> >> public T next() { >> return array[offset]; >> } >> >> public ArrayCursor advance() { >> return new ArrayCursor(array, offset+1); >> } >> } >> ``` >> >> This says that an `ArrayCursor` is a class whose instances have no identity -- >> that instead they have _value semantics_. As a consequence, it must give up the >> things that depend on identity; the class and its fields are implicitly final. >> >> But, value classes are still classes, and can have most of the things classes >> can have -- fields, methods, constructors, type parameters, superclasses (with >> some restrictions), nested classes, class literals, interfaces, etc. The >> classes they can extend are restricted: `Object` or abstract classes with no >> instance fields, empty no-arg constructor bodies, no other constructors, no >> instance >> initializers, no synchronized methods, and whose superclasses all meet this same >> set of conditions. (`Number` meets these conditions.) >> >> Classes in Java give rise to types; the class `ArrayCursor` gives rise to a type >> `ArrayCursor` (actually a parametric family of instantiations `ArrayCursor`.) >> `ArrayCursor` is still a reference type, just one whose references refer to >> value objects rather than identity objects. For the types in the upper-right >> quadrant of the diagram (interfaces, abstract classes, and `Object`), references >> to these types might refer to either an identity object or a value object. >> (Historically, JVMs were effectively forced to represent object references with >> pointers; for references to value objects, JVMs now have more flexibility.) >> >> Because `ArrayCursor` is a reference type, it is nullable (because references >> are nullable), its default value is null, and loads and stores of references are >> atomic with respect to each other even in the presence of data races, providing >> the initialization safety we are used to with classical objects. >> >> Because instances of `ArrayCursor` have value semantics, `==` compares by state >> rather than identity. This means that value objects, like primitives, are >> _freely copyable_; we can explode them into their fields and re-aggregate them >> into another value object, and we cannot tell the difference. (Because they >> have no identity, some identity-sensitive operations, such as synchronization, >> are disallowed.) >> >> So far we've addressed the first two lines of the table of differences above; >> rather than identity being a property of all object instances, classes can >> decide whether their instances have identity or not. By allowing classes that >> don't need identity to exclude it, we free the runtime to make better layout and >> compilation decisions -- and avoid a whole category of bugs. >> >> In looking at the code for `ArrayCursor`, we might mistakenly assume it will be >> inefficient, as each loop iteration appears to allocate a new cursor: >> >> ``` >> for (ArrayCursor c = Arrays.cursor(array); >> c.hasNext(); >> c = c.advance()) { >> // use c.next(); >> } >> ``` >> >> One should generally expect here that _no_ cursors are actually allocated. >> Because an `ArrayCursor` is just its two fields, these fields will routinely get >> scalarized and hoisted into registers, and the constructor call in `advance` >> will typically compile down to incrementing one of these registers. >> >> ### Migration >> >> The JDK (as well as other libraries) has many [value-based classes][valuebased] >> such as `Optional` and `LocalDateTime`. Value-based classes adhere to the >> semantic restrictions of value classes, but are still identity classes -- even >> though they don't want to be. Value-based classes can be migrated to true value >> classes simply by redeclaring them as value classes, which is both source- and >> binary-compatible. >> >> We plan to migrate many value-based classes in the JDK to value classes. >> Additionally, the primitive wrappers can be migrated to value classes as well, >> making the conversion between `int` and `Integer` cheaper; see the section >> "Legacy Primitives" below. (In some cases, this may be _behaviorally_ >> incompatible for code that synchronizes on the primitive wrappers. [JEP >> 390][jep390] has supported both compile-time and runtime warnings for >> synchronizing on primitive wrappers since Java 16.) >> >>
>> >> Java field types adding value classes >> >>
>> >> ### Equality >> >> Earlier we said that `==` compares value objects by state rather than by >> identity. More precisely, two value objects are `==` if they are of the same >> type, and each of their fields are pairwise equal, where equality is given by >> `==` for primitives (except `float` and `double`, which are compared with >> `Float::equals` and `Double::equals` to avoid anomalies), `==` for references to >> identity objects, and recursively with `==` for references to value objects. In >> no case is a value object ever `==` to a reference to an identity object. >> >> ### Value records >> >> While records have a lot in common with value classes -- they are final and >> their fields are final -- they are still identity classes. Records embody a >> tradeoff: give up on decoupling the API from the representation, and in return >> get various syntactic and semantic benefits. Value classes embody another >> tradeoff: give up identity, and get various semantic and performance benefits. >> If we are willing to give up both, we can get both sets of benefits. >> >> ``` >> value record NameAndScore(String name, int score) { } >> ``` >> >> Value records combine the data-carrier idiom of records with the improved >> scalarization and flattening benefits of value classes. >> >> In theory, it would be possible to apply `value` to certain enums as well, but >> this is not currently possible because the `java.lang.Enum` base class that >> enums extend do not meet the requirements for superclasses of value classes (it >> has fields and non-empty constructors). >> >> ## Unboxing values for flatness and density >> >> Value classes shed object identity, gaining a host of performance and >> predictability benefits in the process. They are an ideal replacement for many >> of today's value-based classes, fully preserving their semantics (except for the >> accidental identity these classes never wanted). But identity-free reference >> types are only one point a spectrum of tradeoffs between abstraction and >> performance, and other desired use cases -- such as numerics -- may want a >> different set of tradeoffs. >> >> Reference types are nullable, and therefore must account for null somehow in >> their representation, which may involve additional footprint. Similarly, they >> offer the initialization safety guarantees for final fields that we come to >> expect from identity objects, which may entail limits on flatness. For certain >> use cases, it may be desire to additionally give up something else to make >> further flatness and footprint gains -- and that something else is >> reference-ness. >> >> The built-in primitives are best understood as _pairs_ of types: a primitive >> type (e.g., `int`) and its reference companion or box (`Integer`), with >> conversions between the two (boxing and unboxing.) We have both types because >> the two have different characteristics. Primitives are optimized for efficient >> storage and access: they are not nullable, they tolerate uninitialized (zero) >> values, and larger primitive types (`long`, `double`) may tear under racy >> access. References err on the side of safety and flexibility; they support >> nullity, polymorphism, and offer initialization safety (freedom from tearing), >> but by comparison to primitives, they pay a footprint and indirection cost. >> >> For these reasons, value classes give rise to pairs of types as well: a >> reference type and a _value companion type_. We've seen the reference type so >> far; for a value class `Point`, the reference type is called `Point`. (The full >> name for the reference type is `Point.ref`; `Point` is an alias for that.) The >> value companion type is called `Point.val`, and the two types have the same >> conversions between them as primitives do today with their boxes. (If we are >> talking explicitly about the value companion type of a value class, we may >> sometimes describe the corresponding reference type as its _reference >> companion_.) >> >> ``` >> value class Point implements Serializable { >> int x; >> int y; >> >> Point(int x, int y) { >> this.x = x; >> this.y = y; >> } >> >> Point scale(int s) { >> return new Point(s*x, s*y); >> } >> } >> ``` >> >> The default value of the value companion type is the one for which all fields >> take on their default value; the default value of the reference type is, like >> all reference types, null. >> >> In our diagram, these new types show up as another entity that straddles the >> line between primitives and identity-free references, alongside the legacy >> primitives: >> >> ** UPDATE DIAGRAM ** >> >>
>> >> Java field types with extended
>>     primitives >> >>
>> >> ### Member access >> >> Both the reference and value companion types are seen to have the same instance >> members. Unlike today's primitives, value companion types can be used as >> receivers to access fields and invoke methods, subject to accessibility >> constraints: >> >> ``` >> Point.val p = new Point(1, 2); >> assert p.x == 1; >> >> p = p.scale(2); >> assert p.x == 2; >> ``` >> >> ### Polymorphism >> >> When we declare a class today, we set up a subtyping (is-a) relationship between >> the declared class and its supertypes. When we declare a value class, we set up >> a subtyping relationship between the _reference type_ and the declared >> supertypes. This means that if we declare: >> >> ``` >> value class UnsignedShort extends Number >> implements Comparable { >> ... >> } >> ``` >> >> then `UnsignedShort` is a subtype of `Number` and `Comparable`, >> and we can ask questions about subtyping using `instanceof` or pattern matching. >> What happens if we ask such a question of the value companion type? >> >> ``` >> UnsignedShort.val us = ... >> if (us instanceof Number) { ... } >> ``` >> >> Since subtyping is defined only on reference types, the `instanceof` operator >> (and corresponding type patterns) will behave as if both sides were lifted to >> the approrpriate reference type, and we can answer the question that way. (This >> may trigger fears of expensive boxing conversions, but in reality no actual >> allocation will happen.) >> >> We introduce a new relationship based on `extends` / `implements` clauses, which >> we'll call "extends"; we define `A extends B` as meaning `A <: B` when A is a >> reference type, and `A.ref <: B` when A is a value companion type. The >> `instanceof` relation, reflection, and pattern matching are updated to use >> "extends". >> >> ### Arrays >> >> Arrays of reference types are _covariant_; this means that if `A <: B`, then >> `A[] <: B[]`. This allows `Object[]` to be the "top array type", at least for >> arrays of references. But arrays of primitives are currently left out of this >> story. We can unify the treatment of arrays by defining array covariance over >> the new "extends" relationship; if A extends B, then `A[] <: B[]`. For a value >> class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the top >> type for all arrays. >> >> ### Equality >> >> Just as with `instanceof`, we define `==` on values by appealing to the >> reference companion (though no actual boxing need occur). Evaluating `a == b`, >> where one or both operands are of a value companion type, can be defined as if >> the operands are first converted to their corresponding reference type, and then >> comparing the results. This means that the following will succeed: >> >> ``` >> Point.val p = new Point(3, 4); >> Point pr = p; >> assert p == pr; >> ``` >> >> The base implementation of `Object::equals` delegates to `==`, which is a >> suitable default for both reference and value classes. >> >> ### Serialization >> >> If a value class implements `Serializable`, this is also really a statement >> about the reference type. Just as with other aspects described here, >> serialization of value companions can be defined by converting to the >> corresponding reference type and serializing that, and reversing the process at >> deserialization time. >> >> Serialization currently uses object identity to preserve the topology of an >> object graph. This generalizes cleanly to objects without identity, because >> `==` on value objects treats two identical copies of a value object as equal. >> So any observations we make about graph topology prior to serialization with >> `==` are consistent with those after deserialization. >> >> ### Identity-sensitive operations >> >> Certain operations are currently defined in terms of object identity. As we've >> already seen, some of these, like equality, can be sensibly extended to cover >> all instances. Others, like synchronization, will become partial. >> Identity-sensitive operations include: >> >> - **Equality.** We extend `==` on references to include references to value >> objects. Where it currently has a meaning, the new definition coincides >> with that meaning. >> >> - **System::identityHashCode.** The main use of `identityHashCode` is in the >> implementation of data structures such as `IdentityHashMap`. We can extend >> `identityHashCode` in the same way we extend equality -- deriving a hash on >> primitive objects from the hash of all the fields. >> >> - **Synchronization.** This becomes a partial operation. If we can >> statically detect that a synchronization will fail at runtime (including >> declaring a `synchronized` method in a value class), we can issue a >> compilation error; if not, attempts to lock on a value object results in >> `IllegalMonitorStateException`. This is justifiable because it is >> intrinsically imprudent to lock on an object for which you do not have a >> clear understanding of its locking protocol; locking on an arbitrary >> `Object` or interface instance is doing exactly that. >> >> - **Weak, soft, and phantom references.** Capturing an exotic reference to a >> value object becomes a partial operation, as these are intrinsically tied to >> reachability (and hence to identity). However, we will likely make >> enhancements to `WeakHashMap` to support mixed identity and value keys. >> >> ### What about Object? >> >> The root class `Object` poses an unusual problem, in that every class must >> extend it directly or indirectly, but it is also instantiable (non-abstract), >> and its instances have identity -- it is common to use `new Object()` as a way >> to obtain a new object identity for purposes of locking. >> >> ## Why two types? >> >> It is sensible to ask: why do we need companion types at all? This is analogous >> to the need for boxes in 1995: we'd made one set of tradeoffs for primitives, >> favoring performance (non-nullable, zero-default, tolerant of >> non-initialization, tolerant of tearing under race, unrelated to `Object`), and >> another for references, favoring flexibility and safety. Most of the time, we >> ignored the primitive wrapper classes, but sometimes we needed to temporarily >> suppress one of these properties, such as when interoperating with code that >> expects an `Object` or the ability to express "no value". The reasons we needed >> boxes in 1995 still apply today: sometimes we need the affordances of >> references, and in those cases, we appeal to the reference companion. >> >> Reasons we might want to use the reference companion include: >> >> - **Interoperation with reference types.** Value classes can implement >> interfaces and extend classes (including `Object` and some abstract classes), >> which means some class and interface types are going to be polymorphic over >> both identity and primitive objects. This polymorphism is achieved through >> object references; a reference to `Object` may be a reference to an identity >> object, or a reference to a value object. >> >> - **Nullability.** Nullability is an affordance of object _references_, not >> objects themselves. Most of the time, it makes sense that primitive types >> are non-nullable (as the primitives are today), but there may be situations >> where null is a semantically important value. Using the reference companion >> when nullability is required is semantically clear, and avoids the need to >> invent new sentinel values for "no value." >> >> This need comes up when migrating existing classes; the method `Map::get` >> uses `null` to signal that the requested key was not present in the map. But, >> if the `V` parameter to `Map` is a primitive class, `null` is not a valid >> value. We can capture the "`V` or null" requirement by changing the >> descriptor of `Map::get` to: >> >> ``` >> public V.ref get(K key); >> ``` >> >> where, whatever type `V` is instantiated as, `Map::get` returns the reference >> companion. (For a type `V` that already is a reference type, this is just `V` >> itself.) This captures the notion that the return type of `Map::get` will >> either be a reference to a `V`, or the `null` reference. (This is a >> compatible change, since both erase to the same thing.) >> >> >> - **Self-referential types.** Some types may want to directly or indirectly >> refer to themselves, such as the "next" field in the node type of a linked >> list: >> >> ``` >> class Node { >> T theValue; >> Node nextNode; >> } >> ``` >> >> We might want to represent this as a value class, but if the type of >> `nextNode` were `Node.val`, the layout of `Node` would be >> self-referential, since we would be trying to flatten a `Node` into its own >> layout. >> >> - **Protection from tearing.** For a value class with a non-atomic value >> companion type, we may want to use the reference companion in cases where we >> are concerned about tearing; because loads and stores of references are >> atomic, `P.ref` is immune to the tearing under race that `P.val` might be >> subject to. >> >> - **Compatibility with existing boxing.** Autoboxing is convenient, in that it >> lets us pass a primitive where a reference is required. But boxing affects >> far more than assignment conversion; it also affects method overload >> selection. The rules are designed to prefer overloads that require no >> conversions to those requiring boxing (or varargs) conversions. Having both >> a value and reference type for every value class means that these rules can >> be cleanly and intuitively extended to cover value classes. >> >> ## Refining the value companion >> >> Value classes have several options for refining the behavior of the value >> companion type and how they are exposed to clients. >> >> ### Classes with no good default value >> >> For a value class `C`, the default value of `C.ref` is the same as any other >> reference type: `null`. For the value companion type `C.val`, the default value >> is the one where all of its fields are initialized to their default value. >> >> The built-in primitives reflect the design assumption that zero is a reasonable >> default. The choice to use a zero default for uninitialized variables was one >> of the central tradeoffs in the design of the built-in primitives. It gives us >> a usable initial value (most of the time), and requires less storage footprint >> than a representation that supports null (`int` uses all 2^32 of its bit >> patterns, so a nullable `int` would have to either make some 32 bit signed >> integers unrepresentable, or use a 33rd bit). This was a reasonable tradeoff >> for the built-in primitives, and is also a reasonable tradeoff for many (but not >> all) other potential value classes (such as complex numbers, 2D points, >> half-floats, etc). >> >> But for others potential value classes, such as `LocalDate`, there _is_ no >> reasonable default. If we choose to represent a date as the number of days >> since some some epoch, there will invariably be bugs that stem from >> uninitialized dates; we've all been mistakenly told by computers that something >> will happen on or near 1 January 1970. Even if we could choose a default other >> than the zero representation, an uninitialized date is still likely to be an >> error -- there simply is no good default date value. >> >> For this reason, value classes have the choice of encapsulating or exposing >> their value companion type. If the class is willing to tolerate an >> uninitialized (zero) value, it can freely share its `.val` companion with the >> world; if uninitialized values are dangerous (such as for `LocalDate`), it can >> be encapsulated to the class or package. >> >> Encapsulation is accomplished using ordinary access control. By default, the >> value companion is `private`, and need not be declared explicitly; a class that >> wishes to share its value companion can make it public: >> >> ``` >> public value record Complex(double real, double imag) { >> public value companion Complex.val; >> } >> ``` >> >> ### Atomicity and tearing >> >> For the primitive types longer than 32 bits (long and double), it is not >> guaranteed that reads and writes from different threads (without suitable >> coordination) are atomic with respect to each other. The result is that, if >> accessed under data race, a long or double field or array element can be seen to >> "tear", and a read might see the low 32 bits of one write and the high 32 bits >> of another. (Declaring the containing field `volatile` is sufficient to restore >> atomicity, as is properly coordinating with locks or other concurrency control, >> or not sharing across threads in the first place.) >> >> This was a pragmatic tradeoff given the hardware of the time; the cost of 64-bit >> atomicity on 1995 hardware would have been prohibitive, and problems only arise >> when the program already has data races -- and most numeric code deals with >> thread-local data. Just like with the tradeoff of nulls vs zeros, the design of >> the built-in primitives permits tearing as part of a tradeoff between >> performance and correctness, where primitives chose "as fast as possible" and >> reference types chose more safety. >> >> Today, most JVMs give us atomic loads and stores of 64-bit primitives, because >> the hardware makes them cheap enough. But value classes bring us back to >> 1995; atomic loads and stores of larger-than-64-bit values are still expensive >> on many CPUs, leaving us with a choice of "make operations on primitives slower" >> or permitting tearing when accessed under race. >> >> It would not be wise for the language to select a one-size-fits-all policy about >> tearing; choosing "no tearing" means that types like `Complex` are slower than >> they need to be, even in a single-threaded program; choosing "tearing" means >> that classes like `Range` can be seen to not exhibit invariants asserted by >> their constructor. Class authors have to choose, with full knowledge of their >> domain, whether their types can tolerate tearing. The default is no tearing >> (safe by default); a class can opt for greater flattening at the cost of >> potential tearing by declaring the value companion as `non-atomic`: >> >> ``` >> public value record Complex(double real, double imag) { >> public non-atomic value companion Complex.val; >> } >> ``` >> >> For classes like `Complex`, all of whose bit patterns are valid, this is very >> much like the choice around `long` in 1995. For other classes that might have >> nontrivial representational invariants, they likely want to stick to the default >> of atomicity. >> >> ## Migrating legacy primitives >> >> As part of generalizing primitives, we want to adjust the built-in primitives to >> behave as consistently with value classes as possible. While we can't change >> the fact that `int`'s reference companion is the oddly-named `Integer`, we can >> give them >> more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an alias for >> `Integer.val`) -- so that we can use a consistent rule for naming companions. >> Similarly, we can extend member access to the legacy primitives, and allow >> `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) >> >> We will redeclare `Integer` as a value class with a public value companion: >> >> ``` >> value class Integer { >> public value companion Integer.val; >> >> // existing methods >> } >> ``` >> >> where the type name `int` is an alias for `Integer.val`. The primitive array >> types will be retrofitted such that arrays of primitives are subtypes of arrays >> of their boxes (`int[] <: Integer[]`). >> >> ## Unifying primitives with classes >> >> Earlier, we had a chart of the differences between primitive and reference >> types: >> >> | Primitives | Objects >> | | >> | ------------------------------------------ | ---------------------------------- >> | | >> | No identity (pure values) | Identity >> | | >> | `==` compares values | `==` compares object identity >> | | >> | Built-in | Declared in classes >> | | >> | No members (fields, methods, constructors) | Members (including mutable fields) >> | | >> | No supertypes or subtypes | Class and interface inheritance >> | | >> | Accessed directly | Accessed via object references >> | | >> | Not nullable | Nullable >> | | >> | Default value is zero | Default value is null >> | | >> | Arrays are monomorphic | Arrays are covariant >> | | >> | May tear under race | Initialization safety guarantees >> | | >> | Have reference companions (boxes) | Don't need reference companions >> | | >> >> The addition of value classes addresses many of these directly. Rather than >> saying "classes have identity, primitives do not", we make identity an optional >> characteristic of classes (and derive equality semantics from that.) Rather >> than primitives being built in, we derive all types, including primitives, from >> classes, and endow value companion types with the members and supertypes >> declared with the value class. Rather than having primitive arrays be >> monomorphic, we make all arrays covariant under the `extends` relation. >> >> The remaining differences now become differences between reference types and >> value types: >> >> | Value types | Reference types >> | | >> | --------------------------------------------- | -------------------------------- >> | | >> | Accessed directly | Accessed via object references >> | | >> | Not nullable | Nullable >> | | >> | Default value is zero | Default value is null >> | | >> | May tear under race, if declared `non-atomic` | Initialization safety guarantees >> | | >> >> >> ### Choosing which to use >> >> How would we choose between declaring an identity class or a value class, and >> the various options on value companiones? Here are some quick rules of thumb: >> >> - If you need mutability, subclassing, or aliasing, choose an identity class. >> - If uninitialized (zero) values are unacceptable, choose a value class with >> the value companion encapsulated. >> - If you have no cross-field invariants and are willing to tolerate tearing to >> enable more flattening, choose a value class with a non-atomic value >> companion. >> >> ## Summary >> >> Valhalla unifies, to the extent possible, primitives and objects. The >> following table summarizes the transition from the current world to Valhalla. >> >> | Current World | Valhalla >> | | >> | ------------------------------------------- | >> | --------------------------------------------------------- | >> | All objects have identity | Some objects have identity >> | | >> | Fixed, built-in set of primitives | Open-ended set of primitives, >> | declared via classes | >> | Primitives don't have methods or supertypes | Primitives are classes, with >> | methods and supertypes | >> | Primitives have ad-hoc boxes | Primitives have regularized >> | reference companions | >> | Boxes have accidental identity | Reference companions have no >> | identity | >> | Boxing and unboxing conversions | Primitive reference and value >> | conversions, but same rules | >> | Primitive arrays are monomorphic | All arrays are covariant >> | | >> >> >> [valuebased]: >> https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html >> [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 >> [jep390]: https://openjdk.java.net/jeps/390 >> >> >> From heidinga at redhat.com Thu Jun 30 13:35:16 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 30 Jun 2022 09:35:16 -0400 Subject: User model stacking: current status In-Reply-To: <1393328147.1270048.1656589922539.JavaMail.zimbra@u-pem.fr> References: <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> <2116995606.733122.1656518250393.JavaMail.zimbra@u-pem.fr> <1393328147.1270048.1656589922539.JavaMail.zimbra@u-pem.fr> Message-ID: > > > > I'm confused by your assertion that "nullable becomes less important > > because there is a notion of default value." That default value - the > > all zeros value that the VM paints on freshly allocated instances - is > > something we've agreed many value classes want to encapsulate. That's > > the whole story of "no good default" value classes. We've spent a lot > > of time plumbing those depths before arriving at this point where such > > NGD classes want to be expressed with references to ensure their "bad" > > default values don't leak. So I'm kind of confused by this assertion. > > I would like to separate the concern about null, you have the perspective of the maintainer/writer of a class and the perspective of the user of a class. > I was not talking about the maintainer POV which as to deal with the no good default class but from the user POV, that only need to deal with fields and array being initialized with the default value instead of null. > > I don't disagree with the current model, i think the model is not enough, not exposing a way to declare primary val classes (val is always secondary in the proposed model) is moving the burden to dealing with the val/ref world from the maintainer of a class to the users of a class. I will develop that in a later mail. > Remember, this was a deliberate choice. We started with exposing values (val default) and allowing tearing by default before finally interating to this solution (ref default). Now, the "good name" is always safe. It's always the reference type, can't be torn, and doesn't leak the uninitialized value. On the downside, it requires users to say ".val" when they want the direct value. And that's ugly. But if we're down to arguing syntax, then we're in a pretty good place. Looking forward to the email that clearly outlines the problem you see here. > > > > Overall - we're winning more than we expected to with this model. > > More cases can be scalarized on the stack than we initially thought > > and we can still offer heap flattening for the smaller set of use > > cases that really benefit from it. > > > >> > >> You are judging your model with the questions of the past, not the questions we > >> will have 10 years after the new model is introduced. > > > > As always, today's solutions are tomorrow's problems. Can you be more > > specific about the questions you think will be asked in the next 10 > > years so we can dig into those ? > > The proposed model is similar to the eclair model from the POV of the users of the value class, i think we did not do a good postmortem on why the eclair model fails from the user POV because we discover that the VM could be must smarter that we previously though. So the proposed model exhibits the same issue. I will dig for my note on the eclair model and rewrite them in terms of the current model. > The downfall of the eclair model was the use of interfaces. The VM can't enforce them during verification due to long standing verifier rules which we won't change. It also forced subtyping relationships between the wrapper and the filling of the eclair. And that's problematic in other places. Brian laid out on the EG call that the current model is more of a boxing / unboxing (projection / embedding) model that uses similar language models to insert the conversions (if needed). > > > >> > >> If anyone has the choices, then everyone has more responsibility. And given > >> that the performance differences between Point.ref and Point.val accrue pretty > >> much exclusively in the heap, which is to say, apply only to implementation > >> code and not API, sticking the implementation with this burden seems > >> reasonable. > >> > >> > >> no, you can not change a Point.ref to a Point.val without breaking the backward > >> compatibility, so it's an issue for APIs. > > > > Point.ref (the "L" carrier) and Point.val (the "Q" carrier) are > > spelled differently from a VM perspective. So changing from one to > > the other is making a new API. The benefit of the approach we've > > landed on though, is that the difference should be small for API > > points as we can scalarize the identity-less L on the stack. For > > backwards compatibility, just leave it! Better to use the L in api > > signatures and limit the Q's to heap storage (fields and arrays). > > I think we can get both, i would like a Point.ref followed by a Objects.requireNonNull to be equivalent to a Point.val from the user POV. > By example > public void foo(Point p) { > Object.requireNonNull(p); > ... > } > > should be equivalent to > public void foo(Point.val p) { > ... > } > > This requires to never have a Point.val in the method descriptor and to use the attribute TypeRestriction when Point.val is used. Is this a question/concern about the parametric VM proposal? If not, I'm confused by the mention of "TypeRestriction". Looking at the example, those two functions will be equivalent from a user point of view, except that they can assign null to "p" in the first copy of foo() and can't in the second. From a VM point of view, we should be able to scalarize the value type in both cases (though we'll need some extra metadata for the first case). Even when calling them, given the box/unbox rules, I'm not clear where you see the difference showing up? That they may have to cast "(Point.val)p" (not clear on the language rules here) to call the second if they have a Point in hand? > > I believe this is the kind of heroic efforts we will have to do so users can add ".val" to a parameter type of a method without thinking too much. > Obviously, i would prefer a world were the maintainer of a value class have to deal with this kind of stuff instead of the users but if we keep the proposed model, i think we will have to polish it around the edges. Can you be more clear on the heroic efforts you see required? I can speculate but I'll probably be wrong =) > > > > >> > >> If your description of the world was true, then we do not need Q-type, the > >> attribute Preload which say that a L-type is a value type is enough. > >> In that case, then the VM model and the language model you propose are more in > >> sync. > > > > Preload and L-type give identity-less values flattening on the stack. > > That's part of the story. For heap flattening we still need the Q. > > Yes, i've forgotten that we need Q-type for generics as Brian remember me/us during our meeting. > > > > > I thought we covered this in the EG discussion. Are you just reading > > into the record the concerns raised in the meeting to get the answers > > captured ? > > > I think the meeting was very useful to me because i did not understand correctly the proposed model. > I have another set of worries now, but as i said, i want to comb through my note before raising another set of concerns. +1 --Dan From brian.goetz at oracle.com Thu Jun 30 14:11:38 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 30 Jun 2022 10:11:38 -0400 Subject: Fwd: Please don't restrict access to the companion-type! In-Reply-To: References: Message-ID: <88162d4d-2584-c4ee-af0a-2b89fce66ba5@oracle.com> This was received on the spec-comments list.? My comments here: I'm a little mystified by this take, really.? Previously, you had "bucket 2" classes, which didn't even have a value projection; now all value classes have a value projection, but some have the option to encapsulate them.? But that is less restrictive than what we had before, so its odd to get this argument now?? If you don't want to restrict the value projection for your class, then you are free to expose it.? So it seems you are mostly afraid of _other_ class authors being too restrictive. > Yes, I know what it is supposed to achieve: prevent users from > accidentally creating zero-initialized values for value-types "with no > resonable default". The goal is bigger than this: it is to allow class authors to write safe classes.? If uninitialized values are OK (as they are with many numerics), then that's easy.? If they are not, then the class has two choices: write the class as to defend against uninitialized values as receivers and arguments (this may be hard), or encapsulate the initialization of flat values to ensure that bad values do not escape.? Requiring the value projection always be public forces class authors down the first path, which they may not realize they need to do, and may well get wrong even if they try. Secondarily, I think you are overestimating the downside of using the ref type; you probably have an outdated performance model that says "ref is as bad as boxing."? But the ref projection optimizes on the stack (parameters, returns, locals) almost as well as the val projection.? The major performance difference is in the heap. Your Accumulator example is correct, but I think you are overestimating the novelty of the problem.? Arrays have always had a dynamic check; I can cast String[] to Object[] and hand that to you, if you try to put an Integer in it, you'll get an ASE.? Handing out arrays for someone else to write to should always specify what the bounds on those writes are; "don't write nulls" is novel in degree but not in concept. > 3. Let the compiler treat fields of companion-types like final fields > today, i.e. enforce initialization. If this were possible to do reliably, we would have gone this way.? But initializing final fields today has holes where the null is visible to the user, such as class initialization circularity and receiver escape.? (And a reliable protocol is even harder for arrays.)? Exposing a null in this way is bad, but exposing the zero in this way would be worse, because now every user has a way to get the zero and inject it into unsuspecting (and unguarded) implementation code.? There is simply no way we can reasonably expect everyone to write perfectly defensive code against a threat they don't fully understand and believe to be vanishingly rare -- and this is a perfect recipe for tomorrow's security exploits. > 4. Provide the users with a convenient API for creating arrays with > all elements initialized to a specific value. We explored this as well; it is a good start but ultimately not flexible enough to be "the solution".? If a class has no good default, what value should it initialize array elements to? There's still no good default.? And the model of "here's a lambda to initialize each element" is similarly an 80% solution. The goal here is to let people write classes that can be used safely.? If non-initialization is an mistake then we can make that mistake impossible.? That's much better than trying to detect and recover from that mistake. -------- Forwarded Message -------- Subject: Please don't restrict access to the companion-type! Date: Thu, 30 Jun 2022 09:33:42 +0200 From: Gernot Neppert To: valhalla-spec-comments at openjdk.java.net I've been following the valhalla-development for a very long time, and have also posted quite a few comments, some of them raising valid concerns, some of them proving my ignorance. This comment hopefully falls into the first category: My concern is that allowing access-restriction for a value-type's "companion-type" is a severe case of "throwing the baby out with the bathwater". Yes, I know what it is supposed to achieve: prevent users from accidentally creating zero-initialized values for value-types "with no resonable default". However, the proposed solution of hiding the companion-type will force programmers to use the reference-type even if they do not want to. Please have a look at the following class "Accumulator". It assumes that "Sample" is a value-class in the same package with a non-public companion-type. The Javadoc must now explicitly mention some pitfalls that would not be there if "Sample.val" were accessible. Especially the necessary precaution about the returned array-type is rather ugly, right?! public class Accumulator { ?? private Sample.val samples; /** Yields the samples that were taken. Note: the returned array is actually a "flat" array! No element can be null. While processing this array, do not try to set any of its elements to null, as that may trigger an ArrayStoreException! */ public Sample[] samples() { ??? return samples.clone(); } } To sum it up, my proposal is: 1. Make the companion-type public always. 2. When introducing value-classes, document the risks of having "uninitialized" values under very specific circumstances (uninitialized fields, flat arrays). 3. Let the compiler treat fields of companion-types like final fields today, i.e. enforce initialization. 4. The risk of still encountering uninitialized fields is really really low, and is, btw, absolutely not new. 4. Provide the users with a convenient API for creating arrays with all elements initialized to a specific value. 5. In Java, one could possibly also use this currently disallowed syntax for creating initialized arrays: new Sample.val[20] { Sample.of("Hello") }; -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Jun 30 14:21:14 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 30 Jun 2022 16:21:14 +0200 (CEST) Subject: User model stacking: current status In-Reply-To: References: <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> <2091250374.631588.1656513488696.JavaMail.zimbra@u-pem.fr> <6be1527f-5047-9e04-75cb-c8ecab131941@oracle.com> <2116995606.733122.1656518250393.JavaMail.zimbra@u-pem.fr> <1393328147.1270048.1656589922539.JavaMail.zimbra@u-pem.fr> Message-ID: <474429472.1465382.1656598874880.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "Remi Forax" > Cc: "Brian Goetz" , "Kevin Bourrillion" , "daniel smith" > , "valhalla-spec-experts" > Sent: Thursday, June 30, 2022 3:35:16 PM > Subject: Re: User model stacking: current status > >> > >> > I'm confused by your assertion that "nullable becomes less important >> > because there is a notion of default value." That default value - the >> > all zeros value that the VM paints on freshly allocated instances - is >> > something we've agreed many value classes want to encapsulate. That's >> > the whole story of "no good default" value classes. We've spent a lot >> > of time plumbing those depths before arriving at this point where such >> > NGD classes want to be expressed with references to ensure their "bad" >> > default values don't leak. So I'm kind of confused by this assertion. >> >> I would like to separate the concern about null, you have the perspective of the >> maintainer/writer of a class and the perspective of the user of a class. >> I was not talking about the maintainer POV which as to deal with the no good >> default class but from the user POV, that only need to deal with fields and >> array being initialized with the default value instead of null. >> >> I don't disagree with the current model, i think the model is not enough, not >> exposing a way to declare primary val classes (val is always secondary in the >> proposed model) is moving the burden to dealing with the val/ref world from the >> maintainer of a class to the users of a class. I will develop that in a later >> mail. >> > > Remember, this was a deliberate choice. We started with exposing > values (val default) and allowing tearing by default before finally > iterating to this solution (ref default). Now, the "good name" is > always safe. It's always the reference type, can't be torn, and > doesn't leak the uninitialized value. On the downside, it requires > users to say ".val" when they want the direct value. And that's ugly. > But if we're down to arguing syntax, then we're in a pretty good > place. There is a swat of val value classes that are also safe by default, all the ones that can not tear (because they are not declared as tearable) and support the all zeroes default. By example, Scala and Kotlin restrict their value classes to a single field/single property, which is enough to cover all the things like quantities, units, etc where all you need a lightweigth wrapper on top of a reference or a primitive type. > > Looking forward to the email that clearly outlines the problem you see here. > >> > >> > Overall - we're winning more than we expected to with this model. >> > More cases can be scalarized on the stack than we initially thought >> > and we can still offer heap flattening for the smaller set of use >> > cases that really benefit from it. >> > >> >> >> >> You are judging your model with the questions of the past, not the questions we >> >> will have 10 years after the new model is introduced. >> > >> > As always, today's solutions are tomorrow's problems. Can you be more >> > specific about the questions you think will be asked in the next 10 >> > years so we can dig into those ? >> >> The proposed model is similar to the eclair model from the POV of the users of >> the value class, i think we did not do a good postmortem on why the eclair >> model fails from the user POV because we discover that the VM could be must >> smarter that we previously though. So the proposed model exhibits the same >> issue. I will dig for my note on the eclair model and rewrite them in terms of >> the current model. >> > > The downfall of the eclair model was the use of interfaces. The VM > can't enforce them during verification due to long standing verifier > rules which we won't change. It also forced subtyping relationships > between the wrapper and the filling of the eclair. And that's > problematic in other places. Brian laid out on the EG call that the > current model is more of a boxing / unboxing (projection / embedding) > model that uses similar language models to insert the conversions (if > needed). yes, the ref/val model is better if like for the VM there is a conversion (it can be an auto-convertion) between val and ref. But because of that, we have not explored the other drawbacks of exposing a twin types model front and center to the users. > >> > >> >> >> >> If anyone has the choices, then everyone has more responsibility. And given >> >> that the performance differences between Point.ref and Point.val accrue pretty >> >> much exclusively in the heap, which is to say, apply only to implementation >> >> code and not API, sticking the implementation with this burden seems >> >> reasonable. >> >> >> >> >> >> no, you can not change a Point.ref to a Point.val without breaking the backward >> >> compatibility, so it's an issue for APIs. >> > >> > Point.ref (the "L" carrier) and Point.val (the "Q" carrier) are >> > spelled differently from a VM perspective. So changing from one to >> > the other is making a new API. The benefit of the approach we've >> > landed on though, is that the difference should be small for API >> > points as we can scalarize the identity-less L on the stack. For >> > backwards compatibility, just leave it! Better to use the L in api >> > signatures and limit the Q's to heap storage (fields and arrays). >> >> I think we can get both, i would like a Point.ref followed by a >> Objects.requireNonNull to be equivalent to a Point.val from the user POV. >> By example >> public void foo(Point p) { >> Object.requireNonNull(p); >> ... >> } >> >> should be equivalent to >> public void foo(Point.val p) { >> ... >> } >> >> This requires to never have a Point.val in the method descriptor and to use the >> attribute TypeRestriction when Point.val is used. > > Is this a question/concern about the parametric VM proposal? If not, > I'm confused by the mention of "TypeRestriction". TypeRestriction is a tool to keep binary backward compatibility and a source backward compatibility change if we have auto-boxing between ref and val so people can modify an existing method signature without having to take car about backward compatibility. > > Looking at the example, those two functions will be equivalent from a > user point of view, except that they can assign null to "p" in the > first copy of foo() and can't in the second. yes, one can say that the version with .val is more safe because a passing null explicitly will result in a compile-time error instead of a runtime error. If the idea of the language model is that .val is used in implementation and .ref is used in API, then TypeRestriction is a good tool for that because it allows user to have an increment approach, use .ref everywhere and later add .val when you want performance. It has a price, because it's a kind of erasure, we will have signature clashes like with generics. > From a VM point of view, > we should be able to scalarize the value type in both cases (though > we'll need some extra metadata for the first case). scalarized yes, but will it work in term of calling convention if those methods are overridden. > > Even when calling them, given the box/unbox rules, I'm not clear where > you see the difference showing up? That they may have to cast > "(Point.val)p" (not clear on the language rules here) to call the > second if they have a Point in hand? yes, it depends if we have the equivalent of auto-unboxing or not. > >> >> I believe this is the kind of heroic efforts we will have to do so users can add >> ".val" to a parameter type of a method without thinking too much. >> Obviously, i would prefer a world were the maintainer of a value class have to >> deal with this kind of stuff instead of the users but if we keep the proposed >> model, i think we will have to polish it around the edges. > > Can you be more clear on the heroic efforts you see required? I can > speculate but I'll probably be wrong =) yes, i will. > >> >> > >> >> >> >> If your description of the world was true, then we do not need Q-type, the >> >> attribute Preload which say that a L-type is a value type is enough. >> >> In that case, then the VM model and the language model you propose are more in >> >> sync. >> > >> > Preload and L-type give identity-less values flattening on the stack. >> > That's part of the story. For heap flattening we still need the Q. >> >> Yes, i've forgotten that we need Q-type for generics as Brian remember me/us >> during our meeting. >> >> > >> > I thought we covered this in the EG discussion. Are you just reading >> > into the record the concerns raised in the meeting to get the answers >> > captured ? >> >> >> I think the meeting was very useful to me because i did not understand correctly >> the proposed model. >> I have another set of worries now, but as i said, i want to comb through my note >> before raising another set of concerns. > > +1 > > --Dan R?mi From brian.goetz at oracle.com Thu Jun 30 21:24:46 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 30 Jun 2022 17:24:46 -0400 Subject: Valhalla performance model Message-ID: <85b9cc41-8078-3a7d-e89b-156ae4bcb3b2@oracle.com> Here's a *first draft* of a document to go into the SoV docset on the performance model. # State of Valhalla ## Part 5: Performance Model {.subtitle} #### Brian Goetz {.author} #### June 2022 {.date} This document describes performance considerations for value classes under Project Valhalla. ?While we describe the optimizations that we expect the HotSpot JVM to routinely make, other JVMs may make their own choices, and of course these choices may vary over time and situations. ## Flattening Project Valhalla has two broad classes of goals. ?The first is _unification_: unifying the treatment of primitives and references in the Java type system. The second is _performance_: enabling the declaration of aggregates with _flatter_and _denser_layouts than the layout we get with today's identity classes. By _flatness_, we mean the number of memory indirections that must be traversed to get to the leaf data in an object graph. ?If all object references are implemented as pointers -- as they almost always are for identity objects -- then each object becomes an "island" of data, requiring indirections each time we hop to another island. ?Flatness begets density; each indirection requires an object header, and eliminating indirections also reduces the number of object headers. ?Flatness also reduces garbage collection costs, since there are fewer objects in the heap to process. ### Heap flattening The form of flattening that comes most readily to mind is flattening on the heap, _inlining_the layout of some objects into that of other objects (and arrays), which eliminates island-hopping. ?If `C`is an identity class, then a field of type `C`is laid out as a pointer, and an array of type `C[]`is laid out as an array of pointers. ?If `V`is a value type, we have the option to lay out fields of type `V`by inlining the fields of `V`into the layout of the enclosing type, and lay out arrays of type `V[]`as repeating (aligned) groups of the fields of `V`. ?These layout choices reduce the number of indirections to get to the fields of `V`by one hop. ### Calling convention flattening A less obvious, but also important form of flattening, is flattening in _method calling conventions_. ?If a method argument or return is a reference to `C`, where `C`is an identity class or polymorphic type (interfaces and abstract classes), the argument will usually be passed as a pointer on the stack or in a register. ?If `V`is a value type, we have another option: to _scalarize_`V` (explode it into its fields) and pass its fields on the stack or in registers. (Perhaps surprisingly, under some situations we have the same option if `V`is a strongly typed reference to a value object (a `V.ref`) as well.) Both heap layouts and calling convention are determined fairly early in the execution of a program. ?This means that any information needed to make these flattening choices must be available early in the execution as well. ?The `Q` descriptors used by value types act as a preload signal, as does the `Preload` attribute used for reference companions of value classes. ### Locals Locals variables have even more latitude over representation, because unlike with layouts and calling conventions, there is no need for separately compiled code to agree on a representation. ?Values and references to values may be routinely scalarized in local variables. ### Which is more important? It is tempting to assume that heap flattening is more important, but this is a bias we need to overcome. ?Developers tend to be more aware of heap allocation (we can see the `new`in the code) and heap utilization is more easily measured with monitoring tools. ?But this is mostly observability bias. ?Both are important to performance, and serve complementary goals. Stack flattening is what makes much of the cost of using boxing and wrapper classes like `Optional`go away. ?As developers, we all flinch a bit when we have to return a wrapper like `Optional`or a record type that the client is just going to unpack; this feels like "needless motion". ?Stack flattening allows us to get the benefits of these abstractions without paying this cost, whis shows up as a streamlining of general computational costs. Heap flattening serves a different role; it is about flattening and compacting object graphs. ?This has a bigger impact on data-intensive code, allowing us to pack more data into a given sized heap and traverse data in the heap more efficiently. ?It also means that the garbage collector has less work to do, making more CPU cycles and memory bandwidth available for business calculation. ## Additional considerations There are two additional considerations that affect performance indirectly: nullity and tearing. ### Nulls Nullity is a property of references; null is how a reference refers to no instance at all. ?Values (historically primitives, but now also value types) are never null, so are directly amenable to scalarization. ?Perhaps surprisingly, _references_to value types may also be scalarized by adjoining a synthetic boolean _null channel_to represent whether or not the reference is null. ?(If the null channel indicates the reference is null, the data in the other channels should be ignored.) ?This null channel may require additional space, since many value types (e.g., `int`) use all their bit patterns and therefore would need additional bits to represent nullity. ?However, the JVM has a variety of possible tricks at its disposal to eliminate this extra footprint in some cases, such as using slack in pointers, booleans, or the alignment shadow. ### Tearing Whether or not to flatten heap-based variables has an additional consideration: the possibility for _tearing_. ?Tearing occurs when a read of a logical quantity (such as a 64-bit integer) is broken up into multiple physical reads (such as two 32-bit reads), and the result of those reads correspond to different writes of the logical quantity. The Java platform has always allowed for some form of tearing: reads and writes of 64-bit primitives (`long`and `double`) may be broken up into two 32-bit reads and writes. ?In the presence of a data race (which is a logic error), these two reads could return data corresponding to two different logical writes. This possible inconsistency was permitted because at the time, most hardware lacked the ability to perform atomic 64 bit operations efficiently, and this problem only occurs in concurrent programs that already have a serious concurrency bug. ?(The recommended cure is to declare the field `volatile`, but any technique that eliminates the data race, such as guarding the data by a lock, will also work.) On modern hardware, most JVMs now use atomic 64 bit instructions for reads and writes of `long`and `double`variables, as the performance of these instructions has caught up and so JVMs can provide greater safety at negligible cost. ?However, with the advent of value classes, tearing under race again becomes a possibility, since one can easily declare a value class whose layout exceeds the maximum atomic load and store size of any hardware. Because values are aggregates, some value classes may be less tolerant of tearing than others. ?Specifically, value classes that have representational invariants across their fields (e.g., a `Range`class that requires the lower bound not exceed the upper bound), and exposing code to instances that do not respect these invariants may be surprising or dangerous. ?Accordingly, some value classes may be declared with stronger or weaker atomicity requirements (e.g., `non-atomic`) that affect whether or not instances may tear under race -- and which potentially constrains how these are flattened in the heap. ?(Tearing is not an issue for local variables or method parameters or returns, as these are entirely within-thread and therefore are not at risk for data races.) ## Layout Today, object layout is simple: reference types are represented as pointers, and primitive types are represented directly (flat); similarly, arrays of reference types are arrays of pointers, and arrays of primitives are flattened. ?These layout choices are common to heap, calling convention, and local representation. References to identity objects, and the built-in primitives, will surely continue to use this layout. ?But we have additional latitude with value types and references to value objects. ?The choice of layout for these new types will depend on a number of factors: whether they have atomicity requirements, their size, the context (heap, stack, or local), and mutability. There are effectively three possible flattening strategies available, which we'll call non-, low-, and full-flat. ?Non-flat is the same old strategy as for identity objects: pointers. ?JVMs are free to fall back to non-flat in any situation. ? Full-flat is full inlining of the layout into the enclosing class or array layout, as we get with primitives today. ?Low-flat chooses between these based on the size of the object layout and the hardware -- if the object fits into cheap-enough atomic loads and stores, flatten as per full-flat, otherwise fall back to non-flat. (In addition to requiring suitable atomic loads and stores, the low-flat treatment may also require compiler heroics to support reading and writing multiple fields in a single memory access.) The following table outlines the best we can do based on the desired semantics: | Kind ? ? ? ? ? ? ? ? ?| Stack ? ? ? ? ? ? ? ? ? ? ? ? | Heap ? ? ? ? ? ? ? ? ? ? ? ? | | --------------------- | ----------------------------- | ---------------------------- | | Identity object ? ? ? | Non-flat ? ? ? ? ? ? ? ? ? ? ?| Non-flat ? ? ? ? ? ? ? ? ? ? | | Primitive ? ? ? ? ? ? | Full-flat ? ? ? ? ? ? ? ? ? ? | Full-flat ? ? ? ? ? ? ? ? ? ?| | Non-atomic value type | Full-flat ? ? ? ? ? ? ? ? ? ? | Full-flat ? ? ? ? ? ? ? ? ? ?| | Atomic value type ? ? | Full-flat ? ? ? ? ? ? ? ? ? ? | Low-flat (unless final) ? ? ?| | Ref to value type ? ? | Full-flat (with null channel) | Low-flat (with null channel) | There are two significant attributes of this chart. ?First, note that we can still get full or partial flattening even for some reference types. ?The other is that we can flatten more uniformly on the stack (calling convention, locals) than we can on the heap (fields, arrays). ?This is because of the intrusion of atomicity and the possibility of data races. ?The stack is immune from data races since it is confined to a single thread, but in the heap, there is always the possibility of concurrent access. ?If there are atomicity requirements (`non-atomic`value types, and all references), then the best we can do is to flatten up to the threshold of atomicity. For references to value types, the footprint cost may be larger to accomodate the null channel (absent heroics to encode the null channel in slack bits.) For final fields, we may be able to upgrade to full-flat even for atomic value types and references to value types, because final fields are not subject to tearing. ?(Technically, they are if the receiver escapes construction, but in this case the Java Memory Model voids the initialization safety guarantees that would prevent tearing.) For very large values, we would likely choose non-flat even when we could otherwise flatten; flattening a 1000-field value type is likely to be counterproductive. ### Expected performance The table above reveals some insights. ?It means that we can routinely expect full flattening in locals and calling convention, regardless of atomicity, nullity, or reference-ness -- the only thing we need to avoid is identity. ?In turn, this means the performance difference between `V.ref`and `V.val`for method parameters, returns, and locals, is minimal (though there are some second-order effects due to the null channel, such as register pressure and null check instructions). The real difference between `V.ref`and `V.val`shows up in the heap, which means fields and array elements. ?(Arrays are particularly important because the same structure is repeated many times, so any footprint and indirection costs are multiplied by the array size.) ?While identity was an absolute impediment to flattening in the heap, once that is removed, the next impediment reveals itself: atomicity. ?Java has long offered a powerful safety guarantee of _initialization safety_for final fields, even when the object reference is published via race. ?This is where the oft-repeated wisdom of "immutable objects are automatically thread-safe" comes from; it relies on the atomicity of loading object references. ?To avoid consistency surprises, value types provide atomicity by default, but can be marked as `non-atomic`to achieve greater flattening (at the cost of tearing under race.) Mutable variables of atomic types -- atomic value types, and references to all value types -- will only be flattened in the heap if they are small enough to fit into the atomic memory operations available. ?For references, if there is any additional footprint required to represent null, this additional footprint is included in size for purposes of evaluating "small enough." ### Coding guidelines On the stack (method parameters and returns, locals) the performance difference between using `V.ref`and `V.val`is minimal; disavowing identity is enough. Among other things, this means that migrating value-based classes to true value classes should provide an immediate boost with no code changes. ?(Further gains can be had by using the `.val`companion in the heap, but it is generally not necessary to use it in method signatures or locals.) The most important consideration for heap flattening is whether the type disavows not only identity, but atomicity. ?If a type disavows atomicity, then its value companion will get full flattening in the heap. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 30 23:54:12 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 30 Jun 2022 19:54:12 -0400 Subject: Fwd: nullable-inlined-values on the heap In-Reply-To: References: Message-ID: <799cc618-a3c6-2b8d-c03f-d154e5bf6107@oracle.com> From the -comments list. My comments: this posting feels mostly like a solution without stating what problem it is trying to solve, so its pretty hard to comment on.? But ... > Would it be possible to decomplect nullability from a variable's > encoding-mode (reference or inline)? Not in reality.? A null is fundamentally a *reference* (or the absence of a reference.)? In theory, we could construct the union type int|Null, but this type doesn't have a practical representation in memory, and drags in all sorts of mismatches because union types would then flow throughout the system.? So the only practical way to represent "int or null" is "reference to int."? Which is to say, Integer (minus identity.) > If this is possible, maybe Valhalla's Java could have a user-model > like this: You should probably start with what problem you are trying to solve. -------- Forwarded Message -------- Subject: nullable-inlined-values on the heap Date: Thu, 30 Jun 2022 23:02:17 +0100 From: Jo?o Mendon?a To: valhalla-spec-comments at openjdk.org Hello, Would it be possible to decomplect nullability from a variable's encoding-mode (reference or inline)? I have been looking at the C# spec on "nullable-value-types" and I wonder if the Java runtime could do something similar under the hood to allow nullable-inlined-values, even on the heap. I think that, compared to C#'s "value-types", Java can take advantage of the fact that its value-class instances are immutable, which means that pass-by-value or pass-by-reference is indistinguishable, which, with nullable-inlined-values,could mean that Java can have the variable encoding-mode completely encapsulated/hidden from the user-model as a runtime implementation detail. If this is possible, maybe Valhalla's Java could have a user-model like this: *** A decomplected user-model *** For class-authors: ?- *value-knob* to reject identity - Applicable on class declarations, indicates that the class instances don't require identity (a value-class). ?- *zero-knob* to indicate that the value-class has a zero-value - if a value-class does not have a zero-value, its instances won't be inlined in any shared-variables (?17.4.1.) since this is the only way for the language to ensure the non-existence of the zero-value. If the value-class is declared with a zero-value, then care must be taken when reading/writing constructors since *no constructor invariant can exclude the zero-value*. ?- *tearable-knob* to allow tearing - Applicable on zero value-class declarations with bitSize > 32 bits, may be used by the class-author to hand the class-user the responsibility of how to avoid tearing, freeing the runtime to always inline instances in shared-mutables (non-final shared-variables). Conversely, if this knob is not used, instances will be kept atomic, which allows the class-author to guarantee constructor invariants *provided they're not broken by the zero-value*, which may be useful for the class implementation and class-users to rely upon. For class-users: ?- *not-nullable-knob (!)* to exclude null from a variable's value-set - Applicable on any variable declarations. On nullable variables, the default value is null and, in either encoding-mode (reference or inline), the runtime is free to choose the encoding for the extra bit of information required to represent the null state. ?- *atomic-knob* to avoid tearing - Applicable on shared-mutable declarations, may be used to reverse the effect of the tearable-knob, thereby restoring atomicity. The encoding-mode of a variable is decided at runtime according to this ternary expression: var encodingMode = ??? ? ? !valueClass(variable.type)???????? ? REFERENCE // value-knob ? ? : ? tooBig(variable.type.bitSize)????? ? REFERENCE ??? :?? !shared(variable)????????????????? ? INLINE // (?17.4.1.) ??? :?? !zeroValueClass(variable.type) ? ? ? REFERENCE // zero-knob ??? :?? final(variable)??????????????????? ? INLINE ? ? : ? atomicWrite(variable.type.bitSize) ? INLINE ??? :?? atomic(variable)?????????????????? ? REFERENCE // atomic-knob ? ? : ? tearableValueClass(variable.type)? ? INLINE // tearable-knob ? ? : ?????????????????????????????????????? REFERENCE; The variable.type.bitSize depends on nullability as nullable types may require more space. The predicates tooBig and atomicWrite depend on the hardware. As an example, they could be: ? ? boolean tooBig(int bitSize) ???? {return bitSize > 256;} ? ? boolean atomicWrite(int bitSize) {return bitSize <= 64;} Table-view of the user-model knobs: identity??????????? ?? (identity) |????????????????????????? value | zeroness??????????? ?? (no-zero)? |???? (no-zero) |????? ? ? ? ???? zero ? ? ? ? ? ? ? ?? | atomicity?????????? ?? (atomic)?? |?? ? (atomic)???? | ? (atomic)????? |?? ? tearable ??? | nullability???????? ? (?)|? !?? | (?)? |??? !??? | (?) |????? !????? | (?)|??? !????? | ============================================================================================== encoding-mode?????? ?? reference? | inline/reference ? ?? ???? ? ?? ? ??? | needs reference???? ?? everywhere | shared-variables | no/shared-mutables | ?? ??? no ? ? ?? | definite-assignment ?? no? | yes? |?? no?? | ? yes ? | no? |???? yes???? | yes? |??? yes? ? | default???????????? ? null | n.a. | null |?? n.a.? | null |???? n.a.??? | n.a. |??? n.a.?? | init-default??????? ???? null? ?? |?????? null?????? | null |? zero/null? | null | zero/null | Notes: ?- tokens in parenthesis are the default when no knob is used ?- definite-assignment (?16.) means that the compiler enforces (to the best of its ability) variable initialization before usage ?- default is the default-value of a variable when not definitely-assigned ?- init-default is the default-value of a variable before any initialization code runs ?- on non-nullable zero value-classes, the init-default (zero or null) depends on the encoding-mode chosen by the runtime ?- on atomic zero value-classes, reference-encoding is needed on shared-mutables if instance bitSize cannot be written atomically *** Migration of value-based classes *** Requiring definite-assignment on all non-nullable shared-mutables is useful to get rid of missed-initialization-bugs, so I think it's a good idea to require it wherever source-compatibility allows. In this model, all value-based classes can be migrated to (atomic) zero value-classes. Due to definite-assignment, even if LocalDate is migrated to a zero value-class, it will be hard to get an accidental "Jan 1, 1970". Rational can also be a zero value-class but users will have to keep in mind that it's possible to get a zero-denominator Rational, even if the constructor throws when we try to build one. To maintain source-compatibility, no migrated value-based class can be tearable, not even Double or Long, since wherever in existing code we have a field declaration such as: ??? ValueBasedClass v; v is always reference encoded and, therefore, atomic. For Double and Long, this is a bit of an anomaly, because it means that for these two primitives, and for them alone, each of these pair of field declarations will not be semantically equivalent: long v; ? ?// tearable Long! v; ? // atomic double d; // tearable Double! d; // atomic Jo?o Mendon?a -------------- next part -------------- An HTML attachment was scrubbed... URL: