From daniel.smith at oracle.com Wed Apr 5 02:40:12 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 5 Apr 2023 02:40:12 +0000 Subject: EG meeting *canceled*, 2023-04-05 Message-ID: Brian and I are both unavailable tomorrow, so no EG meeting. Next time we'll come back to Brian's mail about implicit initialization. I'll also have a JVMS update to share in support of JEP 401. From daniel.smith at oracle.com Wed Apr 19 15:14:05 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 19 Apr 2023 15:14:05 +0000 Subject: EG meeting, 2023-04-19 Message-ID: <294B6A5F-4663-4626-B8B3-37A7B2858C5C@oracle.com> EG meeting today, April 19, at 4pm UTC (9am PDT, 12pm EDT). Not a lot on the agenda, but we'll have a quick check-in. We may want to discuss some of Brian's "B3, default values, and implicit initialization" mail. From forax at univ-mlv.fr Wed Apr 19 15:31:06 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 19 Apr 2023 17:31:06 +0200 (CEST) Subject: EG meeting, 2023-04-19 In-Reply-To: <294B6A5F-4663-4626-B8B3-37A7B2858C5C@oracle.com> References: <294B6A5F-4663-4626-B8B3-37A7B2858C5C@oracle.com> Message-ID: <33150119.39216445.1681918266770.JavaMail.zimbra@univ-eiffel.fr> ----- Original Message ----- > From: "daniel smith" > To: "valhalla-spec-experts" > Sent: Wednesday, April 19, 2023 5:14:05 PM > Subject: EG meeting, 2023-04-19 > EG meeting today, April 19, at 4pm UTC (9am PDT, 12pm EDT). > > Not a lot on the agenda, but we'll have a quick check-in. We may want to discuss > some of Brian's "B3, default values, and implicit initialization" mail. I would lie to also ask if we need an explicit user available boxing operation for value classes By example in, https://github.com/JosePaumard/play-with-valhalla-for-devoxx-fr/blob/master/src/main/java/org/paumard/amber/model2/D_specialized_default_value/CitiesNameSwitchMinArrays.java which calculate a mininum (lexicographically) between the names of several non-null value classes with default instance. Doing the boxing when the value is read from the array is more efficient that doing the boxing each time the names are compared. R?mi From kevinb at google.com Wed Apr 19 16:43:41 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 19 Apr 2023 09:43:41 -0700 Subject: EG meeting, 2023-04-19 In-Reply-To: <294B6A5F-4663-4626-B8B3-37A7B2858C5C@oracle.com> References: <294B6A5F-4663-4626-B8B3-37A7B2858C5C@oracle.com> Message-ID: I might be less involved for a while now (aside from anything nullness-oriented). But I'll take this chance to say clearly that I'm very happy with where the design is. I can give my usual quibbles before it's too late :-) but they're no longer major. All I want now is for it to ship :-) and if there's something I could do to improve confidence in its readiness I'll do it! To the extent that my complaints have slowed it down, sorry about that! But I try to take the long view and I do think we're doing a really good thing now. This will breathe a lot of new life into JVM languages. Onward and upwards :-) On Wed, Apr 19, 2023 at 8:14?AM Dan Smith wrote: > EG meeting today, April 19, at 4pm UTC (9am PDT, 12pm EDT). > > Not a lot on the agenda, but we'll have a quick check-in. We may want to > discuss some of Brian's "B3, default values, and implicit initialization" > mail. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Apr 19 22:51:35 2023 From: john.r.rose at oracle.com (John Rose) Date: Wed, 19 Apr 2023 15:51:35 -0700 Subject: FAQ: why no user-defined default instances? Message-ID: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> This is a recurrent question, so I wrote an answer. https://cr.openjdk.org/~jrose/values/why-zero-defaults.html To avoid link problems, here is the markdown source as well: -------- Question: Why not just allow me, the Valhalla user, to define my own default instance for my class? After all, for a `Person` value class, the `String name` field should be an empty string by default, not `null`. And for a `RationalNumber` value class, the `int denominator` of the default instance should be `1` not `0`, and if it is a `BigInteger` it should be `BigInteger.ONE`. Why is Java refusing to give me control over those decisions? Since the default instance is derived from a special constructor in the class declaration, why can't that allow the user to specify a different (not all-zero) default instance, as an ad hoc constructor body? How hard can it be for a class to collect some default values for fields, and then ?stamp out? their bit pattern into every new uninitialized variable of the class? Isn?t it an unwelcome burden for users to cope with a mandated default value for their class? Answer: After long discussion and multiple re-evaluation, the Valhalla design time will support default values already naturally present in the Java language, but not others. In short, fields will always start out with their natural zero value (which is `null` for references). Valhalla values can expose their own natural default values (bundles of zero field-values) but no other defaults. This is a compromise of complexity versus expressiveness. (Such compromises are typical in Java.) There are a number of reasons for this one. 1. The complexity at the VM level does not seem to be justified since there are simple workarounds at the user level. (See below for notes on workarounds and VM complexity.) 2. ?Stamping out? zeroes is fundamentally simpler to do than ?stamping out? some other pattern. Especially managed references: If you ?stamp out? GC-managed references, you have to have a store-barrier conversation with the GC at each reference, if the GC is designed that way. This factor all by itself will slow array creation down, no matter what other design decisions are taken. 3. Also, the all-zero value will be visible in some states, even though we are trying to make it disappear. At least while initial field initializers are executed, and probably at other times as well. That will lead to confusion. Note that none of the options being discussed here makes any use of field initializers, because those apply to all constructors. A new form of field initializer would be required to create a low-ceremony syntax for non-zero default instances. The VM-defined default field values are admittedly not suitable, in many cases. A class abstraction author has a right and duty to define and enforce the valid states of fields, and that may not include nulls or zeroes in the class fields. In refusing to supply user-defined default field values, Valhalla requires users to employ workarounds for unsuitably initialized fields. 1. The simplest workaround is not to expose the default value. Allow `null` to be the default value of your class, just as with all pre-Valhalla classes. The standard workarounds for the `null` value apply in all cases. Valhalla can succeed even if it doesn?t fully solve all problems with `null`. 2. A field being null (or some other zero) can be kept inside the class abstraction by defining an access method. There is no particular reason why class fields must be made accessible to the public. Instead, an access method can detect the unsuitable state (often `null`) and replace it with a better state, defined by the class author. This involves a little more ceremony (a test and branch) than offered by a special syntax (one-time assignment) but it works. It is of course a workaround used even before Valhalla. 3. As a variation of the test-and branch workaround, a numeric zero value, if unsuitable, can be converted using exclusive OR to a different value, yielding code which is probably as fast as a raw field read. The class would define a private static final constant `FDV` of the preferred field default value, and encode and decode field values appropriately, as `this.f0=f^FDV` and then `return f0^FDV`, where `f0` is the zero-default field in the class?s private implementation. We could allow an ad-hoc constructor body written by the user to ?poke? non-default values into fields, and run it exactly once, capturing the instance state to use for all future default initializations. But this would have a number of problems. 1. The initial value of `this` is in fact has to be an all-zero default, which means the `aconst_init` bytecode, used at the beginning of all constructors including the initial one, must always yield the all-zero default, regardless of what the language says. Therefore, there is always a risk of the all-zero default showing up later, due to an insufficiently protected use of `aconst_init`. 2. If the ad-hoc constructor body itself builds arrays or other objects requiring the default value, there is a vicious circularity, requiring new JVM specification language. 3. The ad-hoc constructor body must be assigned, by the JVM specification, an order to execute relative to the whole of ``, and this again is not trivial to specify or implement. Normally, object constructors run after a class is fully initialized (unless they are initiated during the `` activity itself). But in this case an object constructor, just the one special one, would presumably have to run either before all `` activity (as an early initialization step) or else lazily in a separately specified and implemented phase of class setup. In any case, array creation performance is likely to take some kind of performance hit. The issue is that a every array type will have to store the bit pattern to ?stamp out? when creating an array that has an element class with a non-zero default. One might think that this is a pay-as-you go feature, incurring cost only to classes which default non-zero defaults, but that it not completely true. There are places in the JVM and JDK where arrays (and other objects) are created from dynamically selected types. Such places are likely to need a new slow-path branch to handle the possibility that the selected type requires special special handling for a non-zero default. Tests and branches are cheap but it cannot be assumed that they are free. As noted above, managed references are a particularly difficult cost to control, whe considering the initialization of non-zero defaults, particularly for arrays. It is somewhat dispiriting to contemplate a flurry of GC activity to create the default state of an array, immediately followed by a copy of non-default values, initiating a second flurry on the same locations. One might avoid this particular problem by allowing only primitive values to be ?stamped out?, but that reduces the utility of a user-defined default, and forces users into workarounds anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Apr 19 23:57:54 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 19 Apr 2023 23:57:54 +0000 Subject: FAQ: why no user-defined default instances? In-Reply-To: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> References: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> Message-ID: > On Apr 19, 2023, at 3:51 PM, John Rose wrote: > > ?Stamping out? zeroes is fundamentally simpler to do than ?stamping out? some other pattern. Especially managed references: If you ?stamp out? GC-managed references, you have to have a store-barrier conversation with the GC at each reference, if the GC is designed that way. This factor all by itself will slow array creation down, no matter what other design decisions are taken. > In any case, array creation performance is likely to take some kind of performance hit. The issue is that a every array type will have to store the bit pattern to ?stamp out? when creating an array that has an element class with a non-zero default. One might think that this is a pay-as-you go feature, incurring cost only to classes which default non-zero defaults, but that it not completely true. There are places in the JVM and JDK where arrays (and other objects) are created from dynamically selected types. Such places are likely to need a new slow-path branch to handle the possibility that the selected type requires special special handling for a non-zero default. Tests and branches are cheap but it cannot be assumed that they are free. > As noted above, managed references are a particularly difficult cost to control, whe considering the initialization of non-zero defaults, particularly for arrays. It is somewhat dispiriting to contemplate a flurry of GC activity to create the default state of an array, immediately followed by a copy of non-default values, initiating a second flurry on the same locations. One might avoid this particular problem by allowing only primitive values to be ?stamped out?, but that reduces the utility of a user-defined default, and forces users into workarounds anyway. This discussion about arrays should acknowledge that we *do*, in fact, "stamp out" non-zero pointers to default instances when the class has been deemed unsuitable for flattening. At an implementation level, there are 3 kinds of value classes and !-typed value class arrays: 1) The happy case that we'll flatten, in which case the array stores a bunch of zeros 2) Value classes that are too big or atomic (or maybe fail some other criteria?) and so won't be flattened, in which the array stores a repeated non-zero pointer (an alternative strategy would check for null on read and swap in the pointer; that's what happens with fields) 3) Value classes that don't have a default instance, in which case the array stores a bunch of nulls and the '!' is erased (an alternative VM design would check for null on read and throw) We can argue that case (1) is expecting the best performance, so in that case we don't want to do anything to slow down its array allocations, but I think it's fair to say that examples of case (1) wouldn't mind the trade-off of moving into case (2) and getting a custom default value instead?especially when their alternative workaround is to move into case (3) instead. (I'm not advocating for this outcome, just critiquing the argument.) > ? Also, the all-zero value will be visible in some states, even though we are trying to make it disappear. At least while initial field initializers are executed, and probably at other times as well. That will lead to confusion. Because of case (2), this should never happen (although we've had to do extra work in order to properly hide the null pointer in case (2)). > ? The simplest workaround is not to expose the default value. Allow null to be the default value of your class, just as with all pre-Valhalla classes. The standard workarounds for the null value apply in all cases. Valhalla can succeed even if it doesn?t fully solve all problems with null. I'd add some acknowledgement of the role '!' has here, and refine the analogy?instead of "allow null to be the default value of your class", you mean "don't give the class a default instance". Then your '!' types operate in case (3) instead of case (1) or (2), and the arrays store null. (Your nullable type, meanwhile, continues to have '!' as its default value.) From john.r.rose at oracle.com Thu Apr 20 01:47:40 2023 From: john.r.rose at oracle.com (John Rose) Date: Wed, 19 Apr 2023 18:47:40 -0700 Subject: FAQ: why no user-defined default instances? In-Reply-To: References: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> Message-ID: <5EC6F6DE-D14D-4045-983B-9A871011441B@oracle.com> That?s a good point about ?hidden indirections?, but it is not true that we always eagerly ?stamp out? the canned pointer to the default value. In at least one place (`getstatic`), at the same time as we dynamically indirect the hidden indirection, we *also check for null* and substitute the default instance. That would work for arrays as well. In any case, the JVM has the *option* of stamping out zeroes in all cases. For hidden indirections, the *option* is to either eagerly stamp out a pointer to the hidden default *or* stamp out zeroes and map on the fly to the default instance. On 19 Apr 2023, at 16:57, Dan Smith wrote: >> On Apr 19, 2023, at 3:51 PM, John Rose >> wrote: >> >> ?Stamping out? zeroes is fundamentally simpler to do than >> ?stamping out? some other pattern. Especially managed references: >> If you ?stamp out? GC-managed references, you have to have a >> store-barrier conversation with the GC at each reference, if the GC >> is designed that way. This factor all by itself will slow array >> creation down, no matter what other design decisions are taken. > > >> In any case, array creation performance is likely to take some kind >> of performance hit. The issue is that a every array type will have to >> store the bit pattern to ?stamp out? when creating an array that >> has an element class with a non-zero default. One might think that >> this is a pay-as-you go feature, incurring cost only to classes which >> default non-zero defaults, but that it not completely true. There are >> places in the JVM and JDK where arrays (and other objects) are >> created from dynamically selected types. Such places are likely to >> need a new slow-path branch to handle the possibility that the >> selected type requires special special handling for a non-zero >> default. Tests and branches are cheap but it cannot be assumed that >> they are free. > >> As noted above, managed references are a particularly difficult cost >> to control, whe considering the initialization of non-zero defaults, >> particularly for arrays. It is somewhat dispiriting to contemplate a >> flurry of GC activity to create the default state of an array, >> immediately followed by a copy of non-default values, initiating a >> second flurry on the same locations. One might avoid this particular >> problem by allowing only primitive values to be ?stamped out?, >> but that reduces the utility of a user-defined default, and forces >> users into workarounds anyway. > > This discussion about arrays should acknowledge that we *do*, in fact, > "stamp out" non-zero pointers to default instances when the class has > been deemed unsuitable for flattening. At an implementation level, > there are 3 kinds of value classes and !-typed value class arrays: > > 1) The happy case that we'll flatten, in which case the array stores a > bunch of zeros > > 2) Value classes that are too big or atomic (or maybe fail some other > criteria?) and so won't be flattened, in which the array stores a > repeated non-zero pointer (an alternative strategy would check for > null on read and swap in the pointer; that's what happens with fields) > > 3) Value classes that don't have a default instance, in which case the > array stores a bunch of nulls and the '!' is erased (an alternative VM > design would check for null on read and throw) > > We can argue that case (1) is expecting the best performance, so in > that case we don't want to do anything to slow down its array > allocations, but I think it's fair to say that examples of case (1) > wouldn't mind the trade-off of moving into case (2) and getting a > custom default value instead?especially when their alternative > workaround is to move into case (3) instead. > > (I'm not advocating for this outcome, just critiquing the argument.) > >> ? Also, the all-zero value will be visible in some states, even >> though we are trying to make it disappear. At least while initial >> field initializers are executed, and probably at other times as well. >> That will lead to confusion. > > Because of case (2), this should never happen (although we've had to > do extra work in order to properly hide the null pointer in case (2)). > >> ? The simplest workaround is not to expose the default value. >> Allow null to be the default value of your class, just as with all >> pre-Valhalla classes. The standard workarounds for the null value >> apply in all cases. Valhalla can succeed even if it doesn?t fully >> solve all problems with null. > > I'd add some acknowledgement of the role '!' has here, and refine the > analogy?instead of "allow null to be the default value of your > class", you mean "don't give the class a default instance". Then your > '!' types operate in case (3) instead of case (1) or (2), and the > arrays store null. (Your nullable type, meanwhile, continues to have > '!' as its default value.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Thu Apr 20 16:03:11 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 20 Apr 2023 16:03:11 +0000 Subject: FAQ: why no user-defined default instances? In-Reply-To: <5EC6F6DE-D14D-4045-983B-9A871011441B@oracle.com> References: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> <5EC6F6DE-D14D-4045-983B-9A871011441B@oracle.com> Message-ID: <7478A84D-CAE3-45FE-AB89-BEEFD9AEF146@oracle.com> On Apr 19, 2023, at 6:47 PM, John Rose wrote: That?s a good point about ?hidden indirections?, but it is not true that we always eagerly ?stamp out? the canned pointer to the default value. In at least one place (getstatic), at the same time as we dynamically indirect the hidden indirection, we also check for null and substitute the default instance. That would work for arrays as well. In any case, the JVM has the option of stamping out zeroes in all cases. For hidden indirections, the option is to either eagerly stamp out a pointer to the hidden default or stamp out zeroes and map on the fly to the default instance. Yes, agree. My understanding of the current implementation of (2) is that we eagerly stamp out arrays and lazily check-and-swap fields. Of course both of these techniques would also work for a class with a custom default instance, which undercuts the argument that the performance of such a feature would be unacceptable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Apr 20 17:59:26 2023 From: john.r.rose at oracle.com (John Rose) Date: Thu, 20 Apr 2023 10:59:26 -0700 Subject: FAQ: why no user-defined default instances? In-Reply-To: <7478A84D-CAE3-45FE-AB89-BEEFD9AEF146@oracle.com> References: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> <5EC6F6DE-D14D-4045-983B-9A871011441B@oracle.com> <7478A84D-CAE3-45FE-AB89-BEEFD9AEF146@oracle.com> Message-ID: On 20 Apr 2023, at 9:03, Dan Smith wrote: >> On Apr 19, 2023, at 6:47 PM, John Rose wrote: >> >> That?s a good point about ?hidden indirections?, but it is not true that we always eagerly ?stamp out? the canned pointer to the default value. In at least one place (getstatic >> >> ), at the same time as we dynamically indirect the hidden indirection, we also check for null and substitute the default instance. That would work for arrays as well. In any case, the JVM has the option of stamping out zeroes in all cases. For hidden indirections, the option is to either eagerly stamp out a pointer to the hidden default or stamp out zeroes and map on the fly to the default instance. > > Yes, agree. My understanding of the current implementation of (2) is that we eagerly stamp out arrays and lazily check-and-swap fields. > > Of course both of these techniques would also work for a class with a custom default instance, which undercuts the argument that the performance of such a feature would be unacceptable. Only slightly. The JVM is free to modify its internal tactics if the ?stamping out? becomes a problem. It is much less free to do so if the user bustles up and dumps a special pattern on the JVM. From daniel.smith at oracle.com Thu Apr 20 20:42:09 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 20 Apr 2023 20:42:09 +0000 Subject: FAQ: why no user-defined default instances? In-Reply-To: References: <2F56029E-C694-42B7-B6D7-91BC5F718D32@oracle.com> <5EC6F6DE-D14D-4045-983B-9A871011441B@oracle.com> <7478A84D-CAE3-45FE-AB89-BEEFD9AEF146@oracle.com> Message-ID: <3BCB7FE7-4982-47F7-803D-66FBAF40131F@oracle.com> > On Apr 20, 2023, at 10:59 AM, John Rose wrote: > > On 20 Apr 2023, at 9:03, Dan Smith wrote: > >>> On Apr 19, 2023, at 6:47 PM, John Rose wrote: >>> >>> That?s a good point about ?hidden indirections?, but it is not true that we always eagerly ?stamp out? the canned pointer to the default value. In at least one place (getstatic >>> >>> ), at the same time as we dynamically indirect the hidden indirection, we also check for null and substitute the default instance. That would work for arrays as well. In any case, the JVM has the option of stamping out zeroes in all cases. For hidden indirections, the option is to either eagerly stamp out a pointer to the hidden default or stamp out zeroes and map on the fly to the default instance. >> >> Yes, agree. My understanding of the current implementation of (2) is that we eagerly stamp out arrays and lazily check-and-swap fields. >> >> Of course both of these techniques would also work for a class with a custom default instance, which undercuts the argument that the performance of such a feature would be unacceptable. > > Only slightly. The JVM is free to modify its internal tactics if the ?stamping out? becomes a problem. It is much less free to do so if the user bustles up and dumps a special pattern on the JVM. If the problem is to find some way to logically fill an array with pointers to a heap object (that's what category (2) has to do), I think the problem is identical?at the other end of that pointer is an object with some fields, and it doesn't much matter whether the fields' values are zeros or not. But if the problem is to encode a *flattened* nonzero default instance, then I'm following what you're saying: in category (1), you really wants zeros. So perhaps one argument against custom default instances is that they "lock in" the less-performant category (2) behavior, with no hope of it ever improving; whereas other kinds of category (2) classes (like atomic ones) might someday start being treated as category (1) as hardware and implementation strategies evolve. From brian.goetz at oracle.com Thu Apr 20 22:27:12 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 20 Apr 2023 18:27:12 -0400 Subject: B3, default values, and implicit initialization In-Reply-To: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> Message-ID: <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> As I mentioned yesterday, the high order bit here is how we describe a class whose (null-restricted) instances can tolerate (and possibly even encourage) uninitialized use, just as the primitives do today. Ignoring the surface syntax, what we really need is an evocative term for such a class.? This term has to be useful and evocative to multiple participants: ?- The author of a class, who is making a decision about whether the zero state represents a sensible default. ?- The client of a class, who may exploit the fact that instances may be safely used uninitialized, or who may want to reason about flattening. ?- The specification / descriptive documents, which will need a way to talk about "classes that are friendly to uninitialized use." This concept is made more difficult because this property will only have observable effects for variables with null-restricted types. On 3/28/2023 3:13 PM, Brian Goetz wrote: > The recent update of JEP 401 contained a number of refinements to the > user model, specifically, separating the primitive/reference > distinction into a number of smaller distinctions (e.g., nullable vs > non-nullable, optional vs required construction.)? Overall this has > been a very positive step forward. > > We still have a need for the distinction between what we've been > calling B2 and B3; JEP 401 currently frames that in terms of > "construction is optional."? This is a big step forward; indeed, the > key difference between them is whether the class _needs_ the > "variables start out as null, and all instances are created by > constructors" protection, or whether it admits the lighter-weight > initialization protocol of "there's a a standard zero value, null-free > variables are initialized to that" that primitives enjoy today.? (Note > that B3 classes don't require this lighter protocol, they merely > enable it, much as primitives all give you the option of boxing to get > the full conservative initialization protocol.) > > The idea of framing this as "construction is optional" is a good one, > but the expression of it proposed in JEP 401 feels "not quite there".? > In this note I'll propose an alternative presentation, but the main > goal here is around terminology and user model rather than syntax (so > please keep the syntax agitation to a reasonable level.) > > The key distinction between B2 and B3 is that B3 has a _default value_ > which the VM can summon at will.? This enables non-nullable heap > variables to be flattened, because we can initialize these the same > way we initialize other fields and array elements.? Further, that > default value is highly constrained; it is a physical zero, the result > of initializing all fields to their default value. > > Flattening is of course a goal, but it is not something that exists in > the programming model -- its just an optimization. What exists in the > programming model is the default value, and what this unlocks is the > possibility for variables to be _implicitly initializated_.? > Reference-typed variables today are _explicitly initialized_; > variables start out null and have to be initialized with a constructed > value.? A class with a default value has the option (opted in through > null-exclusion) for its variables to be implicitly initialized, which, > like primitives, means that they start out with a valid default value, > and can be further assigned to. > > Framed this way, the Valhalla performance story simplifies to: > > ?- Give up identity, get flattening on the stack; > ?- Further give up explicit initialization, get flattening for small > objects on the heap; > ?- Further give up atomicity, get flattening for larger objects on the > heap. > > Giving up explicit initialization entails both the class opting out of > explicit initialization, _and_ the variable opting out of nullity. > > The key new terminology that comes out of this is implicit vs explicit > initialization. > > > Syntactically, my preference is to indicate that the default value can > be summoned by giving a value class a _default constructor_: > > ??? value class Complex { > ??????? public final double re, im; > > ??????? public default Complex(); > ??? } > > A default constructor has no arguments, no body, no throws clause, and > implicitly initializes all fields to their default values.? Unlike > identity classes, value classes don't get constructions implicitly; a > value class must declare at least one constructor, default or > otherwise.? This replaces the idea of "optional constructor", which is > a negative statement about construction ("but you don't have to call > me"), with a more direct and positive statement that there is a > _default constructor_ with the required properties. > > Note that this is similar to the existing concept of "default > constructor", which you get for free in an identity class if you don't > specify any constructors.? It is possible we can unify these features > (and also with constructors in "agnostic" abstract classes), but first > let's work out what it would mean in value classes, and see if we like it. > > In this model, a B3 class is just a value class with a default > constructor -> a default constructor means that you have the choice of > implicit or explicit initialization -> non-nullity at the use site > opts into implicit initialization -> B3! gets flattening (for small > layouts.) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Apr 20 22:38:16 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 21 Apr 2023 00:38:16 +0200 (CEST) Subject: B3, default values, and implicit initialization In-Reply-To: <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: <117552221.39820559.1682030296637.JavaMail.zimbra@univ-eiffel.fr> > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Friday, April 21, 2023 12:27:12 AM > Subject: Re: B3, default values, and implicit initialization > As I mentioned yesterday, the high order bit here is how we describe a class > whose (null-restricted) instances can tolerate (and possibly even encourage) > uninitialized use, just as the primitives do today. Ignoring the surface > syntax, what we really need is an evocative term for such a class. This term > has to be useful and evocative to multiple participants: > - The author of a class, who is making a decision about whether the zero state > represents a sensible default. > - The client of a class, who may exploit the fact that instances may be safely > used uninitialized, or who may want to reason about flattening. > - The specification / descriptive documents, which will need a way to talk about > "classes that are friendly to uninitialized use." > This concept is made more difficult because this property will only have > observable effects for variables with null-restricted types. . The effects are not even observable for all variables, but only for instance fields, array elements, and parameters if the method is not inlined (and later not parametric). R?mi > On 3/28/2023 3:13 PM, Brian Goetz wrote: >> The recent update of JEP 401 contained a number of refinements to the user >> model, specifically, separating the primitive/reference distinction into a >> number of smaller distinctions (e.g., nullable vs non-nullable, optional vs >> required construction.) Overall this has been a very positive step forward. >> We still have a need for the distinction between what we've been calling B2 and >> B3; JEP 401 currently frames that in terms of "construction is optional." This >> is a big step forward; indeed, the key difference between them is whether the >> class _needs_ the "variables start out as null, and all instances are created >> by constructors" protection, or whether it admits the lighter-weight >> initialization protocol of "there's a a standard zero value, null-free >> variables are initialized to that" that primitives enjoy today. (Note that B3 >> classes don't require this lighter protocol, they merely enable it, much as >> primitives all give you the option of boxing to get the full conservative >> initialization protocol.) >> The idea of framing this as "construction is optional" is a good one, but the >> expression of it proposed in JEP 401 feels "not quite there". In this note I'll >> propose an alternative presentation, but the main goal here is around >> terminology and user model rather than syntax (so please keep the syntax >> agitation to a reasonable level.) >> The key distinction between B2 and B3 is that B3 has a _default value_ which the >> VM can summon at will. This enables non-nullable heap variables to be >> flattened, because we can initialize these the same way we initialize other >> fields and array elements. Further, that default value is highly constrained; >> it is a physical zero, the result of initializing all fields to their default >> value. >> Flattening is of course a goal, but it is not something that exists in the >> programming model -- its just an optimization. What exists in the programming >> model is the default value, and what this unlocks is the possibility for >> variables to be _implicitly initializated_. Reference-typed variables today are >> _explicitly initialized_; variables start out null and have to be initialized >> with a constructed value. A class with a default value has the option (opted in >> through null-exclusion) for its variables to be implicitly initialized, which, >> like primitives, means that they start out with a valid default value, and can >> be further assigned to. >> Framed this way, the Valhalla performance story simplifies to: >> - Give up identity, get flattening on the stack; >> - Further give up explicit initialization, get flattening for small objects on >> the heap; >> - Further give up atomicity, get flattening for larger objects on the heap. >> Giving up explicit initialization entails both the class opting out of explicit >> initialization, _and_ the variable opting out of nullity. >> The key new terminology that comes out of this is implicit vs explicit >> initialization. >> Syntactically, my preference is to indicate that the default value can be >> summoned by giving a value class a _default constructor_: >> value class Complex { >> public final double re, im; >> public default Complex(); >> } >> A default constructor has no arguments, no body, no throws clause, and >> implicitly initializes all fields to their default values. Unlike identity >> classes, value classes don't get constructions implicitly; a value class must >> declare at least one constructor, default or otherwise. This replaces the idea >> of "optional constructor", which is a negative statement about construction >> ("but you don't have to call me"), with a more direct and positive statement >> that there is a _default constructor_ with the required properties. >> Note that this is similar to the existing concept of "default constructor", >> which you get for free in an identity class if you don't specify any >> constructors. It is possible we can unify these features (and also with >> constructors in "agnostic" abstract classes), but first let's work out what it >> would mean in value classes, and see if we like it. >> In this model, a B3 class is just a value class with a default constructor -> a >> default constructor means that you have the choice of implicit or explicit >> initialization -> non-nullity at the use site opts into implicit initialization >> -> B3! gets flattening (for small layouts.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From asviraspossible at gmail.com Fri Apr 21 09:46:18 2023 From: asviraspossible at gmail.com (Victor Nazarov) Date: Fri, 21 Apr 2023 11:46:18 +0200 Subject: B3, default values, and implicit initialization In-Reply-To: <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: On Fri, Apr 21, 2023 at 12:27?AM Brian Goetz wrote: > As I mentioned yesterday, the high order bit here is how we describe a > class whose (null-restricted) instances can tolerate (and possibly even > encourage) uninitialized use, just as the primitives do today. > Ignoring the surface syntax, what we really need is an evocative term for > such a class. > As a quick brainstorming options, what about the term "having immediate default" This term has to be useful and evocative to multiple participants: > > The word "immediate" can work in multiple contexts and evoke different associations. > - The client of a class, who may exploit the fact that instances may be > safely used uninitialized, or who may want to reason about flattening. > - The specification / descriptive documents, which will need a way to talk > about "classes that are friendly to uninitialized use." > "Immediate" can mean that the default value is immediately available and can be used without initialization. A declared field immediately has a usable value. A declared array immediately has elements, without the need to initialize them. Having values immediately allows them to be efficiently packed (flattened). "Immediate" can also mean ease of construction. A class with "immediate default" can be constructed very easily using `T.default` notation or `new T()` constructor without arguments. This expression that allows you to construct an instance of the class is very small and obvious, so it's very fast to write it, so you get a new instance almost immediately. > - The author of a class, who is making a decision about whether the zero > state represents a sensible default. > For the author of the class, the word "immediate" can mean that the default value is obvious, i. e. you can immediately see what it is, the meaning of all zeroes is immediately clear without further explanation. > This concept is made more difficult because this property will only have > observable effects for variables with null-restricted types. > > "Having immediate default" serves in this way to say that nothing special is actively happening with such value classes, but if the default value is needed for the class, then it is immediately available, Hope this may be useful. -- Victor Nazarov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Apr 21 19:49:50 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 21 Apr 2023 15:49:50 -0400 Subject: B3, default values, and implicit initialization In-Reply-To: <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: <7b23c626-4653-475d-2749-5e923d427e99@oracle.com> Over on the -comments list, Qu?n Anh Mai suggested drawing inspiration from the C++ term, "trivially default constructible", which is tied to the act of construction.? Its a bit wordy but "default constructible" is probably a reasonable term, and ties to the under-consideration syntax of ??? default Foo(); as a constructor.? "Trivially constructible" is also a reasonable term if we end up selecting a different syntax for the final expression. On 4/20/2023 6:27 PM, Brian Goetz wrote: > As I mentioned yesterday, the high order bit here is how we describe a > class whose (null-restricted) instances can tolerate (and possibly > even encourage) uninitialized use, just as the primitives do today.? > Ignoring the surface syntax, what we really need is an evocative term > for such a class.? This term has to be useful and evocative to > multiple participants: > > ?- The author of a class, who is making a decision about whether the > zero state represents a sensible default. > ?- The client of a class, who may exploit the fact that instances may > be safely used uninitialized, or who may want to reason about flattening. > ?- The specification / descriptive documents, which will need a way to > talk about "classes that are friendly to uninitialized use." > > This concept is made more difficult because this property will only > have observable effects for variables with null-restricted types. > > > > > > > > On 3/28/2023 3:13 PM, Brian Goetz wrote: >> The recent update of JEP 401 contained a number of refinements to the >> user model, specifically, separating the primitive/reference >> distinction into a number of smaller distinctions (e.g., nullable vs >> non-nullable, optional vs required construction.)? Overall this has >> been a very positive step forward. >> >> We still have a need for the distinction between what we've been >> calling B2 and B3; JEP 401 currently frames that in terms of >> "construction is optional."? This is a big step forward; indeed, the >> key difference between them is whether the class _needs_ the >> "variables start out as null, and all instances are created by >> constructors" protection, or whether it admits the lighter-weight >> initialization protocol of "there's a a standard zero value, >> null-free variables are initialized to that" that primitives enjoy >> today.? (Note that B3 classes don't require this lighter protocol, >> they merely enable it, much as primitives all give you the option of >> boxing to get the full conservative initialization protocol.) >> >> The idea of framing this as "construction is optional" is a good one, >> but the expression of it proposed in JEP 401 feels "not quite >> there".? In this note I'll propose an alternative presentation, but >> the main goal here is around terminology and user model rather than >> syntax (so please keep the syntax agitation to a reasonable level.) >> >> The key distinction between B2 and B3 is that B3 has a _default >> value_ which the VM can summon at will.? This enables non-nullable >> heap variables to be flattened, because we can initialize these the >> same way we initialize other fields and array elements.? Further, >> that default value is highly constrained; it is a physical zero, the >> result of initializing all fields to their default value. >> >> Flattening is of course a goal, but it is not something that exists >> in the programming model -- its just an optimization.? What exists in >> the programming model is the default value, and what this unlocks is >> the possibility for variables to be _implicitly initializated_.? >> Reference-typed variables today are _explicitly initialized_; >> variables start out null and have to be initialized with a >> constructed value.? A class with a default value has the option >> (opted in through null-exclusion) for its variables to be implicitly >> initialized, which, like primitives, means that they start out with a >> valid default value, and can be further assigned to. >> >> Framed this way, the Valhalla performance story simplifies to: >> >> ?- Give up identity, get flattening on the stack; >> ?- Further give up explicit initialization, get flattening for small >> objects on the heap; >> ?- Further give up atomicity, get flattening for larger objects on >> the heap. >> >> Giving up explicit initialization entails both the class opting out >> of explicit initialization, _and_ the variable opting out of nullity. >> >> The key new terminology that comes out of this is implicit vs >> explicit initialization. >> >> >> Syntactically, my preference is to indicate that the default value >> can be summoned by giving a value class a _default constructor_: >> >> ??? value class Complex { >> ??????? public final double re, im; >> >> ??????? public default Complex(); >> ??? } >> >> A default constructor has no arguments, no body, no throws clause, >> and implicitly initializes all fields to their default values.? >> Unlike identity classes, value classes don't get constructions >> implicitly; a value class must declare at least one constructor, >> default or otherwise. This replaces the idea of "optional >> constructor", which is a negative statement about construction ("but >> you don't have to call me"), with a more direct and positive >> statement that there is a _default constructor_ with the required >> properties. >> >> Note that this is similar to the existing concept of "default >> constructor", which you get for free in an identity class if you >> don't specify any constructors.? It is possible we can unify these >> features (and also with constructors in "agnostic" abstract classes), >> but first let's work out what it would mean in value classes, and see >> if we like it. >> >> In this model, a B3 class is just a value class with a default >> constructor -> a default constructor means that you have the choice >> of implicit or explicit initialization -> non-nullity at the use site >> opts into implicit initialization -> B3! gets flattening (for small >> layouts.) >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Mon Apr 24 23:18:01 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Mon, 24 Apr 2023 19:18:01 -0400 Subject: B3, default values, and implicit initialization In-Reply-To: <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: Turning in my homework late because I don't have a good answer for where to specify the "non-atomic" modifier and have minor misgivings about reusing the term "default". Despite my initial concerns with the default constructor (primarily how easily a consumer of a class can find this property), I've come around to the "public default Complex()" constructor model because it clearly lets a class's author indicate their intent. The default constructor should appear as a method in the classfile, though without a Code_attribute, and should be visible to reflection, etc. It's a constructor like any other which also acts as a flag to indicate the author accepts the all-zeros pattern. Expressing this through the constructor is a clean approach for users and let's us build on existing infrastructure for representing it - similar to how interface default methods benefited from being modeled as regular methods. Discoverability can be handled by having the javadoc for class report the default constructor (like any other constructor) which handles most consumers. Those reading the source can either search for "default" keyword or be helped by their IDE to flag the classes in some way. My misgivings around the term "default" are due to having already used it to describe interface methods with a default implementation. The term has also been used related to the default (initial) value of variables but that has no syntax associated with it. So precedent supports its use..... a mixed result I guess? ....And I just found a section in the JLS (8.8.9) that already defines the default constructor for a class. That's even stronger precedent for reusing the term here given this is a slightly different kind of default constructor. We've previously talked about allowing value classes to extend abstract classes. What are the conditions that would allow my value class to implement a default constructor if it extends an abstract class? Would the abstract class need a default constructor? No constructor? (Probing for the edges of this model to see if it breaks down) I really wanted to cram the non-atomic designation on constructors as well but it's not really a property of the instance; rather it describes writes to storage which puts it as a property of the class. Still trying to come up with a better intuition for where this belongs. --Dan On Thu, Apr 20, 2023 at 6:27?PM Brian Goetz wrote: > As I mentioned yesterday, the high order bit here is how we describe a > class whose (null-restricted) instances can tolerate (and possibly even > encourage) uninitialized use, just as the primitives do today. Ignoring > the surface syntax, what we really need is an evocative term for such a > class. This term has to be useful and evocative to multiple participants: > > - The author of a class, who is making a decision about whether the zero > state represents a sensible default. > - The client of a class, who may exploit the fact that instances may be > safely used uninitialized, or who may want to reason about flattening. > - The specification / descriptive documents, which will need a way to > talk about "classes that are friendly to uninitialized use." > > This concept is made more difficult because this property will only have > observable effects for variables with null-restricted types. > > > > > > > > On 3/28/2023 3:13 PM, Brian Goetz wrote: > > The recent update of JEP 401 contained a number of refinements to the user > model, specifically, separating the primitive/reference distinction into a > number of smaller distinctions (e.g., nullable vs non-nullable, optional vs > required construction.) Overall this has been a very positive step > forward. > > We still have a need for the distinction between what we've been calling > B2 and B3; JEP 401 currently frames that in terms of "construction is > optional." This is a big step forward; indeed, the key difference between > them is whether the class _needs_ the "variables start out as null, and all > instances are created by constructors" protection, or whether it admits the > lighter-weight initialization protocol of "there's a a standard zero value, > null-free variables are initialized to that" that primitives enjoy today. > (Note that B3 classes don't require this lighter protocol, they merely > enable it, much as primitives all give you the option of boxing to get the > full conservative initialization protocol.) > > The idea of framing this as "construction is optional" is a good one, but > the expression of it proposed in JEP 401 feels "not quite there". In this > note I'll propose an alternative presentation, but the main goal here is > around terminology and user model rather than syntax (so please keep the > syntax agitation to a reasonable level.) > > The key distinction between B2 and B3 is that B3 has a _default value_ > which the VM can summon at will. This enables non-nullable heap variables > to be flattened, because we can initialize these the same way we initialize > other fields and array elements. Further, that default value is highly > constrained; it is a physical zero, the result of initializing all fields > to their default value. > > Flattening is of course a goal, but it is not something that exists in the > programming model -- its just an optimization. What exists in the > programming model is the default value, and what this unlocks is the > possibility for variables to be _implicitly initializated_. > Reference-typed variables today are _explicitly initialized_; variables > start out null and have to be initialized with a constructed value. A > class with a default value has the option (opted in through null-exclusion) > for its variables to be implicitly initialized, which, like primitives, > means that they start out with a valid default value, and can be further > assigned to. > > Framed this way, the Valhalla performance story simplifies to: > > - Give up identity, get flattening on the stack; > - Further give up explicit initialization, get flattening for small > objects on the heap; > - Further give up atomicity, get flattening for larger objects on the > heap. > > Giving up explicit initialization entails both the class opting out of > explicit initialization, _and_ the variable opting out of nullity. > > The key new terminology that comes out of this is implicit vs explicit > initialization. > > > Syntactically, my preference is to indicate that the default value can be > summoned by giving a value class a _default constructor_: > > value class Complex { > public final double re, im; > > public default Complex(); > } > > A default constructor has no arguments, no body, no throws clause, and > implicitly initializes all fields to their default values. Unlike identity > classes, value classes don't get constructions implicitly; a value class > must declare at least one constructor, default or otherwise. This replaces > the idea of "optional constructor", which is a negative statement about > construction ("but you don't have to call me"), with a more direct and > positive statement that there is a _default constructor_ with the required > properties. > > Note that this is similar to the existing concept of "default > constructor", which you get for free in an identity class if you don't > specify any constructors. It is possible we can unify these features (and > also with constructors in "agnostic" abstract classes), but first let's > work out what it would mean in value classes, and see if we like it. > > In this model, a B3 class is just a value class with a default constructor > -> a default constructor means that you have the choice of implicit or > explicit initialization -> non-nullity at the use site opts into implicit > initialization -> B3! gets flattening (for small layouts.) > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Apr 25 00:56:56 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 24 Apr 2023 20:56:56 -0400 Subject: B3, default values, and implicit initialization In-Reply-To: References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: > Despite my initial concerns with the default constructor?(primarily > how easily a consumer of a class can find this property), I've come > around to the "public default Complex()" constructor model because it > clearly lets a class's author indicate their intent.? The default > constructor should appear as a method in the classfile, though without > a Code_attribute, and should be visible to reflection, etc. It's a > constructor like any other which also acts as a flag to indicate the > author accepts the all-zeros pattern.? Expressing this through the > constructor?is a clean approach for users and let's us build on > existing infrastructure for representing it - similar to how interface > default methods benefited from being modeled as regular methods. Right, it's a clear signal at multiple levels -- source, classfile, reflection -- and this idiom works "well enough" at all of them. And it's easy to forget the classfile/reflection levels in these discussions. > My misgivings around the term "default" are due to having already used > it to describe interface methods with a default implementation. Same.? It sometimes feels like "reusing keyword disease", but on the other hand, it mostly works. > The term has also been used related to the default (initial) value of > variables but that has no syntax associated with it.? So precedent > supports its use..... a mixed result I guess?? ....And I just found a > section in the JLS (8.8.9) that already defines the default > constructor for a class.? That's even stronger precedent for reusing > the term here given this is a slightly different kind of default > constructor. And don't forget annotations... > We've previously talked about allowing value classes to extend > abstract classes.? What are the conditions that would allow my value > class to implement a default constructor if it extends an abstract > class?? Would the abstract class need a default constructor?? No > constructor?? (Probing for the edges of this model to see if it breaks > down) The constructor in a value-capable abstract class is restricted enough that it should work file here -- no fields, empty constructor body (save for super() call), and such constraints all the way up to Object. > ?I really wanted to cram the non-atomic designation on constructors as > well but it's not really a property of the instance; rather it > describes writes to storage which puts it as a property of the class.? > Still trying to come up with a better intuition for where this belongs. One thing we didn't consider completely is a superinterface: ??? value class Foo implements Terrible { ... } We had a discussion about "what does non-atomic really mean" at the EG meeting that I now realize I forgot to write up, so I will try to do that tomorrow. From daniel.smith at oracle.com Tue Apr 25 18:59:16 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 25 Apr 2023 18:59:16 +0000 Subject: B3, default values, and implicit initialization In-Reply-To: References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: <2429F181-F6B7-4D79-8AB3-E04037AC5E21@oracle.com> On Apr 24, 2023, at 4:18 PM, Dan Heidinga wrote: My misgivings around the term "default" are due to having already used it to describe interface methods with a default implementation. The term has also been used related to the default (initial) value of variables but that has no syntax associated with it. So precedent supports its use..... a mixed result I guess? ....And I just found a section in the JLS (8.8.9) that already defines the default constructor for a class. That's even stronger precedent for reusing the term here given this is a slightly different kind of default constructor. Forwarding some notes I put together about other uses of "default constructor". This exploration dampened my enthusiasm for trying to unify these different concepts?too many differences, IMHO. I think we're better off with a different keyword (currently trying out "implicit" in the JEP text, but still open for more bikeshedding). ----- Here are some different "default constructor" concepts that need to coexist in the language, with a list of their properties. special value class constructor: - must be explicit - must be public (maybe can relax this, but I don't want access checks in Arrays.newInstance) - distinct from an explicit constructor with an empty body (and maybe we allow overloading of both?) - no execution, can't throw; ignores field/instance initializers - not subject to final field initialization rule - gives the class a default instance (*preempting* field initializers!) - possibly able to opt in to non-atomic instance creation as well - implies can't have field circularity and can't be an inner class - TBD how encoded in class files/reflection, there's some sort of special metadata identity class default constructor: - implicit, or can be explicitly stated (e.g. for documentation) with an empty body or a 'super()' body - doesn't get implicitly provided if any other constructor is declared - compatibility constraint: lots of explicitly stated default constructors in the wild - has same/greater access than the class - includes super() call and field/instance initializers, may throw - no relationship to default instances or non-atomic instance creation - error if the class has uninitialized final fields - indistinguishable from any other no-arg constructor by class files/reflection abstract class "trivial" constructor (if absent, the class is implicitly 'identity'): - implicit, or can be explicitly stated (e.g. for documentation) with an empty body or a 'super()' body - doesn't get implicitly provided if any other constructor is declared - compatibility constraint: lots of explicitly stated "trivial" constructors in the wild - has same/greater access than the class - no relationship to default instances or non-atomic instance creation record class canonical constructor: I won't dig into it, but this has its own special behavior (As an aside, there's also "default method", which uses the keyword 'default', and means something totally different.) So, concretely, could we re-use the special value class constructor syntax in identity classes as a new way to explicitly express the default value constructor? There would be many mismatches: - it shouldn't give the class a default instance - it should do super() and run field/instance initializers - it doesn't imply any field circularity or inner class restriction - it would (somewhat surprisingly) allow final fields without explicit initialization - if it's detectable via reflection, we're retroactively saying all legacy classes need to be recompiled to get that capability, and all legacy attempts at explicit default constructors have to be rewritten Abstract classes still need to recognize legacy attempts at explicit "trivial" constructors. And record classes really want canonical constructors as a distinct concept from allowing a default instance. My conclusion is I don't think this is a good opportunity for unification. I get the interest in doing so?it is a bit much to have all of these similar concepts floating around, with subtle rules in each case?but I think the need for an explicit opt-in syntax in value classes conflicts with the longstanding "convenience feature" treatment we've used elsewhere, and also the different semantics around code execution are hard to reconcile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Apr 26 18:13:45 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Apr 2023 14:13:45 -0400 Subject: On atomicity and tearing Message-ID: <468972c0-0d0f-6d0e-4080-af1dfd73237a@oracle.com> There has been, not surprisingly, a lot of misunderstanding about atomicity, non-atomicity, and tearing.? In particular, various syntactic expressions of non-atomicity (e.g., a `non-atomic` class keyword) tend to confuse users into thinking that non-atomic access is somehow a *feature*, rather than providing more precise control over the breakage modes of already-broken programs (to steer optimizations for non-broken programs.) I've written the following as an attempt to help people understand the role of atomicity and tearing in the model; comments are welcome (though let's steer clear of trying to paint the bikeshed in this thread.) # Understanding non-atomicity and tearing Almost since the beginning of Project Valhalla, the design has included some form of "non-atomicity" or "tearability".? Addressing this in the programming model is necessary if we are to achieve the heap flattening that Valhalla wants to deliver, but unfortunately this aspect of the feature set is frequently misunderstood. Whether non-atomicity is expressed syntactically as a class modifier, constructor modifier, supertype, or some other means, the concept is the same: a class indicates its willingness to give up certain guarantees in order to gain additional heap flattening. Unlike most language features, which express either the presence or absence of things that are at some level "normal" (e.g., the presence or absence of `final` means a class either can be assigned to, or cannot), non-atomicity is different; it is about what the possible observable effects are when an instance of this class is accessed with a data race.? Programs with data races are _already broken_, so rather than opting into or out of a feature, non-atomicity is expressing a choice between "breakage mode A" and "breakage mode B". > Non-atomicity is best thought of not as a _feature_ or the absence thereof, > but an alternate choice about the runtime-visible behavior of _already-broken > programs_. ## Background: flattening and tearing in built-in primitives Flattening and non-atomicity have been with us since Java 1.0. The eight built-in primitive types are routinely flattened into object layouts and arrays. This "direct" storage results from several design choices made about primitives: primitive types are non-nullable, and their zero values represent explicitly "good" default values and therefore even "uninitialized" primitives have useful initial values. Further, the two 64-bit primitive types (`long` and `double`) are explicitly permitted to _tear_ when accessed via a data race, as if they are read and written using two 32-bit loads and stores.? When a mutable `long` or `double` is read with a data race, it may be seen to have the high-order 32 bits of one previous write and the low-order 32 bits of another.? This is because at the time, atomic 64-bit loads and stores were prohibitively expensive on most processors, so we faced a tradeoff: punish well-behaved programs with poorly-performing numerics, or allow already-broken programs (concurrent programs with insufficient synchronization) to be seen to produce broken numeric results. In most similar situations, Java would have come down on the side of predictability and correctness. However, numeric performance was important enough, and data races enough of an "all bets are off" sort of thing, that this set of decisions was a pragmatic compromise.? While tearing sounds scary, it is important to reiterate that tearing only happens when programs _are already broken_, and that even if we outlawed tearing, _something else bad_ would still happen. Valhalla takes these implicit characteristics of primitives and formalizes them to explicit characteristics of value classes in the programming model, enabling user-defined classes to gain the runtime characteristics of primitives. ## Data races and consistency A _data race_ is when a nonfinal heap variable (array element or nonfinal field) is accessed by multiple threads, at least once access is a write, and the reads and writes of that variable are not ordered by _happens-before_ (see JLS Ch17 or _Java Concurrency in Practice_ Ch16.)? In the presence of a data race, the reading thread may see a stale (out of date) value for that variable. "Stale" doesn't sound so bad, but in a program with multiple variables, the error states can multiply with the number and configuration of mutable variables.? Suppose we have two `Range` classes: ``` class MutableRange { ??? int low, high; ??? // obvious constructor, accessor, and updater methods ??? // constructor and updater methods validate invariant low <= high } class ImmutableRange { ??? final int low, high; ??? // obvious constructor and accessors, constructor validates invariant } final static MutableRange mr = new MutableRange(0, 10); static ImmutableRange ir = new ImmutableRange(0, 10); ``` For `mr`, we have a final reference to a mutable point, so there are two mutable variables here (`mr.low` and `mr.high`.)? We update our range value through a method that mutates `low` and/or `high`.? By contrast, `ir` is a mutable reference to an immutable object, with one mutable variable (`ir`), and we update our range value by creating a new `ImmutableRange` and mutating the reference `ir` to refer to it. More things can go wrong when we racily access the mutable range, because there are more mutable variables.? If Thread A writes `low` and then writes `high`, and Thread B reads `low` and `high`; under racy access B could see stale or up-to-date values for either field, and even if it sees an up-to-date value for `high` (the one written later), that still doesn't mean it would see an up-to-date value for `low`.? This means that in addition to seeing out-of-date values for either or both, we could observe an instance of `MutableRange` to not obey the invariant that is checked by constructors and setters. Suppose instead we racily access the immutable range.? At least there are fewer possible error states; a reader might see a stale _reference_ to the immutable object.? Access to `low` and `high` through that stale reference would see out-of-date values, but those out-of-date values would at least be consistent with each other (because of the initialization safety guarantees of final fields.) When primitives other than `long` or `double` are accessed with a data race, the failure modes are like that of `ImmutableRange`; when we accept that `long` or `double` could tear under race, we are additionally accepting the failure modes of `MutableRange` under race for those types as well, as if the high- and low-order 32-bit quantities were separate fields (in exchange for better performance).? Accepting non-atomicity of large primitives merely _increases_ the number of observable failure modes for broken programs; even with atomic access, such programs are still broken and can produce observably incorrect results. Note that a `long` or `double` will never tear if it is `final`, `volatile`, only accessed from a single thread, or accessed concurrently with appropriate sychronization.? Tearing only happens in the presence of concurrent access to mutable variables with insufficient synchronization. ## Non-atomicity and value types Hardware has improved significantly since Java 1.0, so the specific tradeoff faced by the Java designers regarding `long` and `double` is no longer an issue, as most processors have fast atomic 64-bit load and store operations today. However, Valhalla will still face the same problem, as value types can easily exceed 64 bits in size, and whatever the limit on efficient atomic loads and stores is, we can easily write value types that will exceed that size.? This leaves us with three choices: ?- Never allow tearing of values, as with `int`; ?- Always allow tearing of values under race, as with `long`; ?- Allow tearing of values under race based on some sort of opt-in or opt-out. Note that tearing is not anything anyone ever _wants_, but it is sometimes an acceptable tradeoff to get more flattening.? It was a sensible tradeoff for `long` and `double` in 1995, and will continue to be a sensible tradeoff for at least some value types going forward. The first choice -- values are always atomic -- offers the most safety, but means we must forgo one of the primary goals of Valhalla for all but the smallest value types. This leaves us with "values are always like `long`", or "values can opt into / out of being like `long`."? Types like `long` have the interesting property that all bit patterns correspond to valid values; there are no representational invariants for `long`.? On the other hand, values are classes, and can have representation invariants that are enforced by the constructor. Having representational invariants for immutable classes be seen to not hold would be a significant and novel new failure mode, and so we took the safe route, requiring class authors to make the tradeoff between flattening and failure modes under race. Just as with `long` and `double`, a value will never tear if the variable that holds the value is `final`, `volatile`, only accessed from a single thread, or accessed concurrently with appropriate sychronization.? Tearing only happens in the presence of concurrent access to mutable variables with insufficient synchronization. Further, tearing under race will only happen for non-nullable variables of value types that support default instances. What remains is to offer sensible advice to authors of value classes as to when to opt into non-atomicity.? If a class has any cross-field invariants (such as `ImmutableRange`), atomicity should definitely be retained.? In the remaining cases, class authors (like the creators of `long` or `double`) must make a tradeoff about the perceived value of atomicity vs flattening for the expected range of users of the class. -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Wed Apr 26 18:41:48 2023 From: liangchenblue at gmail.com (-) Date: Wed, 26 Apr 2023 13:41:48 -0500 Subject: On atomicity and tearing In-Reply-To: <468972c0-0d0f-6d0e-4080-af1dfd73237a@oracle.com> References: <468972c0-0d0f-6d0e-4080-af1dfd73237a@oracle.com> Message-ID: Will we get vm arguments that allow us to test with atomicity turned on/off for value classes for testing purposes, if we have these 3 choices available? Also, for atomicity, I know there is a concept of benign race (often seen in @Stable lazy patterns in the JDK), where multiple values may be constructed concurrently, but they are functionally equivalent, so using any of them is acceptable. Am I correct that the value objects we return in benign races must be atomic, as otherwise another thread will read a partial and erroneous lazy object, like in the mutable range example given here? On Wed, Apr 26, 2023 at 1:14?PM Brian Goetz wrote: > > There has been, not surprisingly, a lot of misunderstanding about atomicity, non-atomicity, and tearing. In particular, various syntactic expressions of non-atomicity (e.g., a `non-atomic` class keyword) tend to confuse users into thinking that non-atomic access is somehow a *feature*, rather than providing more precise control over the breakage modes of already-broken programs (to steer optimizations for non-broken programs.) > > I've written the following as an attempt to help people understand the role of atomicity and tearing in the model; comments are welcome (though let's steer clear of trying to paint the bikeshed in this thread.) > > > > # Understanding non-atomicity and tearing > > Almost since the beginning of Project Valhalla, the design has included some > form of "non-atomicity" or "tearability". Addressing this in the programming > model is necessary if we are to achieve the heap flattening that Valhalla wants > to deliver, but unfortunately this aspect of the feature set is frequently > misunderstood. > > Whether non-atomicity is expressed syntactically as a class modifier, > constructor modifier, supertype, or some other means, the concept is the same: a > class indicates its willingness to give up certain guarantees in order to gain > additional heap flattening. > > Unlike most language features, which express either the presence or absence of > things that are at some level "normal" (e.g., the presence or absence of `final` > means a class either can be assigned to, or cannot), non-atomicity is different; > it is about what the possible observable effects are when an instance of this > class is accessed with a data race. Programs with data races are _already > broken_, so rather than opting into or out of a feature, non-atomicity is > expressing a choice between "breakage mode A" and "breakage mode B". > > > Non-atomicity is best thought of not as a _feature_ or the absence thereof, > > but an alternate choice about the runtime-visible behavior of _already-broken > > programs_. > > ## Background: flattening and tearing in built-in primitives > > Flattening and non-atomicity have been with us since Java 1.0. The eight > built-in primitive types are routinely flattened into object layouts and arrays. > This "direct" storage results from several design choices made about primitives: > primitive types are non-nullable, and their zero values represent explicitly > "good" default values and therefore even "uninitialized" primitives have useful > initial values. > > Further, the two 64-bit primitive types (`long` and `double`) are explicitly > permitted to _tear_ when accessed via a data race, as if they are read and > written using two 32-bit loads and stores. When a mutable `long` or `double` is > read with a data race, it may be seen to have the high-order 32 bits of one > previous write and the low-order 32 bits of another. This is because at the > time, atomic 64-bit loads and stores were prohibitively expensive on most > processors, so we faced a tradeoff: punish well-behaved programs with > poorly-performing numerics, or allow already-broken programs (concurrent > programs with insufficient synchronization) to be seen to produce broken numeric > results. > > In most similar situations, Java would have come down on the side of > predictability and correctness. However, numeric performance was important > enough, and data races enough of an "all bets are off" sort of thing, that this > set of decisions was a pragmatic compromise. While tearing sounds scary, it is > important to reiterate that tearing only happens when programs _are already > broken_, and that even if we outlawed tearing, _something else bad_ would still > happen. > > Valhalla takes these implicit characteristics of primitives and formalizes them > to explicit characteristics of value classes in the programming model, enabling > user-defined classes to gain the runtime characteristics of primitives. > > ## Data races and consistency > > A _data race_ is when a nonfinal heap variable (array element or nonfinal field) > is accessed by multiple threads, at least once access is a write, and the reads > and writes of that variable are not ordered by _happens-before_ (see JLS Ch17 or > _Java Concurrency in Practice_ Ch16.) In the presence of a data race, the > reading thread may see a stale (out of date) value for that variable. > > "Stale" doesn't sound so bad, but in a program with multiple variables, the > error states can multiply with the number and configuration of mutable > variables. Suppose we have two `Range` classes: > > ``` > class MutableRange { > int low, high; > > // obvious constructor, accessor, and updater methods > // constructor and updater methods validate invariant low <= high > } > > class ImmutableRange { > final int low, high; > > // obvious constructor and accessors, constructor validates invariant > } > > final static MutableRange mr = new MutableRange(0, 10); > static ImmutableRange ir = new ImmutableRange(0, 10); > ``` > > For `mr`, we have a final reference to a mutable point, so there are two mutable > variables here (`mr.low` and `mr.high`.) We update our range value through a > method that mutates `low` and/or `high`. By contrast, `ir` is a mutable > reference to an immutable object, with one mutable variable (`ir`), and we > update our range value by creating a new `ImmutableRange` and mutating the > reference `ir` to refer to it. > > More things can go wrong when we racily access the mutable range, because there > are more mutable variables. If Thread A writes `low` and then writes `high`, > and Thread B reads `low` and `high`; under racy access B could see stale or > up-to-date values for either field, and even if it sees an up-to-date value for > `high` (the one written later), that still doesn't mean it would see an > up-to-date value for `low`. This means that in addition to seeing out-of-date > values for either or both, we could observe an instance of `MutableRange` to not > obey the invariant that is checked by constructors and setters. > > Suppose instead we racily access the immutable range. At least there are fewer > possible error states; a reader might see a stale _reference_ to the immutable > object. Access to `low` and `high` through that stale reference would see > out-of-date values, but those out-of-date values would at least be consistent > with each other (because of the initialization safety guarantees of final > fields.) > > When primitives other than `long` or `double` are accessed with a data race, the > failure modes are like that of `ImmutableRange`; when we accept that `long` or > `double` could tear under race, we are additionally accepting the failure modes > of `MutableRange` under race for those types as well, as if the high- and > low-order 32-bit quantities were separate fields (in exchange for better > performance). Accepting non-atomicity of large primitives merely _increases_ > the number of observable failure modes for broken programs; even with atomic > access, such programs are still broken and can produce observably incorrect > results. > > Note that a `long` or `double` will never tear if it is `final`, `volatile`, > only accessed from a single thread, or accessed concurrently with appropriate > sychronization. Tearing only happens in the presence of concurrent access to > mutable variables with insufficient synchronization. > > ## Non-atomicity and value types > > Hardware has improved significantly since Java 1.0, so the specific tradeoff > faced by the Java designers regarding `long` and `double` is no longer an issue, > as most processors have fast atomic 64-bit load and store operations today. > However, Valhalla will still face the same problem, as value types can easily > exceed 64 bits in size, and whatever the limit on efficient atomic loads and > stores is, we can easily write value types that will exceed that size. This > leaves us with three choices: > > - Never allow tearing of values, as with `int`; > - Always allow tearing of values under race, as with `long`; > - Allow tearing of values under race based on some sort of opt-in or opt-out. > > Note that tearing is not anything anyone ever _wants_, but it is sometimes an > acceptable tradeoff to get more flattening. It was a sensible tradeoff for > `long` and `double` in 1995, and will continue to be a sensible tradeoff for at > least some value types going forward. > > The first choice -- values are always atomic -- offers the most safety, but > means we must forgo one of the primary goals of Valhalla for all but the > smallest value types. > > This leaves us with "values are always like `long`", or "values can opt into / > out of being like `long`." Types like `long` have the interesting property that > all bit patterns correspond to valid values; there are no representational > invariants for `long`. On the other hand, values are classes, and can have > representation invariants that are enforced by the constructor. Having > representational invariants for immutable classes be seen to not hold would be a > significant and novel new failure mode, and so we took the safe route, requiring > class authors to make the tradeoff between flattening and failure modes under > race. > > Just as with `long` and `double`, a value will never tear if the variable that > holds the value is `final`, `volatile`, only accessed from a single thread, or > accessed concurrently with appropriate sychronization. Tearing only happens in > the presence of concurrent access to mutable variables with insufficient > synchronization. > > Further, tearing under race will only happen for non-nullable variables of value > types that support default instances. > > What remains is to offer sensible advice to authors of value classes as to when > to opt into non-atomicity. If a class has any cross-field invariants (such as > `ImmutableRange`), atomicity should definitely be retained. In the remaining > cases, class authors (like the creators of `long` or `double`) must make a > tradeoff about the perceived value of atomicity vs flattening for the expected > range of users of the class. > > From forax at univ-mlv.fr Wed Apr 26 21:07:24 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Apr 2023 23:07:24 +0200 (CEST) Subject: On atomicity and tearing In-Reply-To: <468972c0-0d0f-6d0e-4080-af1dfd73237a@oracle.com> References: <468972c0-0d0f-6d0e-4080-af1dfd73237a@oracle.com> Message-ID: <1092576884.42867927.1682543244717.JavaMail.zimbra@univ-eiffel.fr> From: "Brian Goetz" To: "valhalla-spec-experts" Sent: Wednesday, April 26, 2023 8:13:45 PM Subject: On atomicity and tearing Hi Brian, I'm now more confused than I was before reading your mail, it seems i've not understand how things are supposed to work. BQ_BEGIN BQ_END BQ_BEGIN There has been, not surprisingly, a lot of misunderstanding about atomicity, non-atomicity, and tearing. In particular, various syntactic expressions of non-atomicity (e.g., a `non-atomic` class keyword) tend to confuse users into thinking that non-atomic access is somehow a *feature*, rather than providing more precise control over the breakage modes of already-broken programs (to steer optimizations for non-broken programs.) I've written the following as an attempt to help people understand the role of atomicity and tearing in the model; comments are welcome (though let's steer clear of trying to paint the bikeshed in this thread.) # Understanding non-atomicity and tearing Almost since the beginning of Project Valhalla, the design has included some form of "non-atomicity" or "tearability". Addressing this in the programming model is necessary if we are to achieve the heap flattening that Valhalla wants to deliver, but unfortunately this aspect of the feature set is frequently misunderstood. Whether non-atomicity is expressed syntactically as a class modifier, constructor modifier, supertype, or some other means, the concept is the same: a class indicates its willingness to give up certain guarantees in order to gain additional heap flattening. Unlike most language features, which express either the presence or absence of things that are at some level "normal" (e.g., the presence or absence of `final` means a class either can be assigned to, or cannot), non-atomicity is different; it is about what the possible observable effects are when an instance of this class is accessed with a data race. Programs with data races are _already broken_, so rather than opting into or out of a feature, non-atomicity is expressing a choice between "breakage mode A" and "breakage mode B". BQ_END What do you mean by "a class either can be assigned to, or cannot" ? As far as i know a class can not be assigned. As far as i know, at least in Java, programs with data races are not automatically broken, it depends on if the states produces by the data races are valid states or not. The usual example is not declaring a field final even if the field is initialized only once, in the constructor, it may result in a publication issue, a thread can see the instance not fully initialized. But it can be a problem or not depending on if the default value of the field is a valid value or not. For me, the non-atomicity of a value class may lead to more possible states than with the publication issue where you can only either see the default value or the assigned value. So I fail to see how the non-atomicity of the value class issue as a different issue than the publication issue. I'm sure i've misunderstood something ? R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Apr 26 22:05:25 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Apr 2023 18:05:25 -0400 Subject: On atomicity and tearing In-Reply-To: <1092576884.42867927.1682543244717.JavaMail.zimbra@univ-eiffel.fr> References: <468972c0-0d0f-6d0e-4080-af1dfd73237a@oracle.com> <1092576884.42867927.1682543244717.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <1be8bfc7-d898-e2ea-2604-0f91772703e8@oracle.com> > > What do you mean by "a class either can be assigned to, or cannot" ? > As far as i know a class can not be assigned. Sorry, typo -- s/class/variable/ > As far as i know, at least in Java, programs with data races are not > automatically broken, it depends on if the states produces by the data > races are valid states or not. It is true that some data races are benign; the canonical example here is the String hashCode.? But this trick is (and should be) exceedingly rare, and for purposes of explaining atomicity to the other 99.999% of Java users, can be ignored. > > The usual example is not declaring a field final even if the field is > initialized only once, in the constructor, it may result in a > publication issue, a thread can see the instance not fully initialized. > But it can be a problem or not depending on if the default value of > the field is a valid value or not. I think you may be coming at this from a different direction; you've got a collection of "interesting corner cases" where a data race might not be a problem.? These are really interesting cases, but it's focusing on the trees rather than the forest.? We don't want to encourage reasoning about "in this case, data race X is OK".? That's a game for experts -- and most experts know to try to avoid playing it. The point is that "bad things happen in data races", and that non-atomicity merely *moves the bad things around*. > > For me, the non-atomicity of a value class may lead to more possible > states than with the publication issue where you can only either see > the default value or the assigned value. Yes, it basically trades the bad effects of "mutable ref to immutable state" for the bad effects of "immutable ref to mutable state".? In the latter, more things can go wrong, because there is more mutability. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Apr 27 17:59:23 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Apr 2023 13:59:23 -0400 Subject: Fwd: B3, default values, and implicit initialization In-Reply-To: References: Message-ID: <34283df4-f328-812a-dd70-0c479566fba4@oracle.com> The following was received on valhalla-spec-comments. Without deep-diving into the specific syntax suggestion, which I'll summarize as "default values are values, and we have fields to specify values, so why wouldn't we do that?", the main thing here is that there is a concern that we might make the distinction between B2 and B3 "too subtle", and that would "lead to overuse of B3". There is a challenge with the specifics here, in that the stated goal is to explicitly specify the default, but (a) it is not possible to explicitly specify an arbitrary default (which using a field would imply), and (b) it is not possible to conveniently denote "the instance where all the fields have their default values".? So I don't think the field-centric approach is a winner, but again, that's not the main value of such a suggestion. I'm not a fan of arguments that start with "we should force the user..."; while we often do go down this road (e.g., subtypes of sealed types must be explicitly sealed, final, or non-sealed, to avoid ambiguities over defaults), such arguments should start with "here is what would go wrong if people didn't understand the defaults properly." So, what I'd like to see is more specifics on _how_ B3 might be overused and _why_ that is bad before considering these syntactic directions. Some additional relevant observations about the current direction: ?- Value classes, without any reference to the special property of "usable without initialization" or atomicity under race, are very simple: they are like ordinary classes without identity.? As a direct consequence, identity-sensitive operations (==, synchronized, etc) are either redefined based on state or are partialized, and features that require identity (mutability, layout polymorphism) are disallowed.? They are otherwise like identity classes in every other way -- they are reference types, variables of such class types are nullable, etc. This is the "simplest possible" interpretation of value classes, in that the only thing we take away is identity. ?- This interpretation of value classes is a "safe default"; if you don't go the extra mile and talk about uninitialized use, or atomicity, you get something with the same safety properties as identity classes.? You have to explicitly do something extra to get the B3 properties. -------- Forwarded Message -------- Subject: Re: B3, default values, and implicit initialization Date: Thu, 27 Apr 2023 11:23:54 +0100 From: Stephen Colebourne To: valhalla-spec-comments at openjdk.java.net From my perspective, the difference between B2 and B3 is vital, as I fear developers will greatly overuse B3. I don't think "default constructors" are the right focus. The initial discussions of B2 vs B3 focussed on one main question - does the type have a sensible default. `LocalDate` does not, but `Decimal` or `Optional` does (zero or empty). The big issue Valhalla faces during adoption from my perspective is the messaging, which is far too easy to simplify to B3 faster than B2 faster than B1. The net result would be many more 1979-01-01 type bugs. Without great care here we could be creating the potential for many null-like "million-dollar mistakes". To counteract this, the syntax IMO needs to place the issue at hand directly in the face of the developer. And the key question is "what is the sensible default value for an instance of this type". Given this, I think all authors of *all* value types should be forced to *explicitly* define what the default value is. ie. it isn't something where the language should choose one or the other as the default (sic). The obvious syntax is a field, which is implicitly public static final. I don't feel that a class-level keyword is the right choice here: public value LocalDate { default = null; } public value Decimal { default = new; } In each case, the author has had to explicitly choose what the sensible default is, and therefore implicitly chooses whether it is B2 and B3 - without any opportunity to be distracted by the performance model. Neither B2 or B3 is chosen as the favourite by the language. "It is a compilation error when a value class declaration does not specify a default value". By contrast, default constructors are one or two steps removed from the actual decision point that the class author should actually be thinking about, which is what the sensible default is. It is also the case that the default constructor is never actually invoked, which will be an ongoing point of surprise. Terminology in specs just talks about what the default value is, eg "authors should select the most appropriate default value for their domain", "arrays are initialised to the default value of a value type" or "if the default is null then ...": The syntax is intended to make it perfectly reasonable to ask for `LocalDate.default` or `Decimal.default` and get a sensible answer - it looks like a "normal" constant in code. The use of `default = new` by itself deliberately invokes the idea of a default constructor that does nothing, without the need to spell it out. Javadoc can be added to the `default` constant, which is very helpful. For example it might include justification as to why LocaleDate does not have a default value of 0000-01-01 or 1970-01-01. Stephen On Tue, 28 Mar 2023 at 20:13, Brian Goetz wrote: > The recent update of JEP 401 contained a number of refinements to the user model, specifically, separating the primitive/reference distinction into a number of smaller distinctions (e.g., nullable vs non-nullable, optional vs required construction.) Overall this has been a very positive step forward. > > We still have a need for the distinction between what we've been calling B2 and B3; JEP 401 currently frames that in terms of "construction is optional." This is a big step forward; indeed, the key difference between them is whether the class _needs_ the "variables start out as null, and all instances are created by constructors" protection, or whether it admits the lighter-weight initialization protocol of "there's a a standard zero value, null-free variables are initialized to that" that primitives enjoy today. (Note that B3 classes don't require this lighter protocol, they merely enable it, much as primitives all give you the option of boxing to get the full conservative initialization protocol.) > > The idea of framing this as "construction is optional" is a good one, but the expression of it proposed in JEP 401 feels "not quite there". In this note I'll propose an alternative presentation, but the main goal here is around terminology and user model rather than syntax (so please keep the syntax agitation to a reasonable level.) > > The key distinction between B2 and B3 is that B3 has a _default value_ which the VM can summon at will. This enables non-nullable heap variables to be flattened, because we can initialize these the same way we initialize other fields and array elements. Further, that default value is highly constrained; it is a physical zero, the result of initializing all fields to their default value. > > Flattening is of course a goal, but it is not something that exists in the programming model -- its just an optimization. What exists in the programming model is the default value, and what this unlocks is the possibility for variables to be _implicitly initializated_. Reference-typed variables today are _explicitly initialized_; variables start out null and have to be initialized with a constructed value. A class with a default value has the option (opted in through null-exclusion) for its variables to be implicitly initialized, which, like primitives, means that they start out with a valid default value, and can be further assigned to. > > Framed this way, the Valhalla performance story simplifies to: > > - Give up identity, get flattening on the stack; > - Further give up explicit initialization, get flattening for small objects on the heap; > - Further give up atomicity, get flattening for larger objects on the heap. > > Giving up explicit initialization entails both the class opting out of explicit initialization, _and_ the variable opting out of nullity. > > The key new terminology that comes out of this is implicit vs explicit initialization. > > > Syntactically, my preference is to indicate that the default value can be summoned by giving a value class a _default constructor_: > > value class Complex { > public final double re, im; > > public default Complex(); > } > > A default constructor has no arguments, no body, no throws clause, and implicitly initializes all fields to their default values. Unlike identity classes, value classes don't get constructions implicitly; a value class must declare at least one constructor, default or otherwise. This replaces the idea of "optional constructor", which is a negative statement about construction ("but you don't have to call me"), with a more direct and positive statement that there is a _default constructor_ with the required properties. > > Note that this is similar to the existing concept of "default constructor", which you get for free in an identity class if you don't specify any constructors. It is possible we can unify these features (and also with constructors in "agnostic" abstract classes), but first let's work out what it would mean in value classes, and see if we like it. > > In this model, a B3 class is just a value class with a default constructor -> a default constructor means that you have the choice of implicit or explicit initialization -> non-nullity at the use site opts into implicit initialization -> B3! gets flattening (for small layouts.) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Apr 27 18:31:30 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 27 Apr 2023 20:31:30 +0200 (CEST) Subject: B3, default values, and implicit initialization In-Reply-To: <34283df4-f328-812a-dd70-0c479566fba4@oracle.com> References: <34283df4-f328-812a-dd70-0c479566fba4@oracle.com> Message-ID: <1550840502.43358908.1682620290012.JavaMail.zimbra@univ-eiffel.fr> > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Thursday, April 27, 2023 7:59:23 PM > Subject: Fwd: B3, default values, and implicit initialization > The following was received on valhalla-spec-comments. > Without deep-diving into the specific syntax suggestion, which I'll summarize as > "default values are values, and we have fields to specify values, so why > wouldn't we do that?", the main thing here is that there is a concern that we > might make the distinction between B2 and B3 "too subtle", and that would "lead > to overuse of B3". > There is a challenge with the specifics here, in that the stated goal is to > explicitly specify the default, but (a) it is not possible to explicitly > specify an arbitrary default (which using a field would imply), and (b) it is > not possible to conveniently denote "the instance where all the fields have > their default values". So I don't think the field-centric approach is a winner, > but again, that's not the main value of such a suggestion. > I'm not a fan of arguments that start with "we should force the user..."; while > we often do go down this road (e.g., subtypes of sealed types must be > explicitly sealed, final, or non-sealed, to avoid ambiguities over defaults), > such arguments should start with "here is what would go wrong if people didn't > understand the defaults properly." > So, what I'd like to see is more specifics on _how_ B3 might be overused and > _why_ that is bad before considering these syntactic directions. > Some additional relevant observations about the current direction: > - Value classes, without any reference to the special property of "usable > without initialization" or atomicity under race, are very simple: they are like > ordinary classes without identity. As a direct consequence, identity-sensitive > operations (==, synchronized, etc) are either redefined based on state or are > partialized, and features that require identity (mutability, layout > polymorphism) are disallowed. They are otherwise like identity classes in every > other way -- they are reference types, variables of such class types are > nullable, etc. This is the "simplest possible" interpretation of value classes, > in that the only thing we take away is identity. > - This interpretation of value classes is a "safe default"; if you don't go the > extra mile and talk about uninitialized use, or atomicity, you get something > with the same safety properties as identity classes. You have to explicitly do > something extra to get the B3 properties. Yes ! I think this is the important point, the current incarnation of B3 is a B2, you need to add a '!' at use site to explicitly ask for trouble. And, just to not forget, this is also the curse of B3, to work well, you need B3! everywhere along the way but the JITs tend to reason locally so it's easy to have a boxing in the middle (e.g. a method parameter using a B3 with no bang or an interface), making the performance of a code using B3! far worst than using B2 or B3 with no bang. The proposed syntax, declaring a pseudo field, is not too far from the current one (default constructor), the declaration is done inside the class not as a modifier. I do not find this syntax attractive, especially the "new" in "default = new", i can hear my students saying "new what" ? regards, R?mi > -------- Forwarded Message -------- > Subject: Re: B3, default values, and implicit initialization > Date: Thu, 27 Apr 2023 11:23:54 +0100 > From: Stephen Colebourne [ mailto:scolebourne at joda.org | > ] > To: [ mailto:valhalla-spec-comments at openjdk.java.net | > valhalla-spec-comments at openjdk.java.net ] > From my perspective, the difference between B2 and B3 is vital, as I > fear developers will greatly overuse B3. I don't think "default > constructors" are the right focus. > The initial discussions of B2 vs B3 focussed on one main question - > does the type have a sensible default. `LocalDate` does not, but > `Decimal` or `Optional` does (zero or empty). The big issue Valhalla > faces during adoption from my perspective is the messaging, which is > far too easy to simplify to B3 faster than B2 faster than B1. The net > result would be many more 1979-01-01 type bugs. Without great care > here we could be creating the potential for many null-like > "million-dollar mistakes". > To counteract this, the syntax IMO needs to place the issue at hand > directly in the face of the developer. And the key question is "what > is the sensible default value for an instance of this type". Given > this, I think all authors of *all* value types should be forced to > *explicitly* define what the default value is. ie. it isn't something > where the language should choose one or the other as the default > (sic). > The obvious syntax is a field, which is implicitly public static > final. I don't feel that a class-level keyword is the right choice > here: > public value LocalDate { > default = null; > } > public value Decimal { > default = new; > } > In each case, the author has had to explicitly choose what the > sensible default is, and therefore implicitly chooses whether it is B2 > and B3 - without any opportunity to be distracted by the performance > model. Neither B2 or B3 is chosen as the favourite by the language. > "It is a compilation error when a value class declaration does not > specify a default value". > By contrast, default constructors are one or two steps removed from > the actual decision point that the class author should actually be > thinking about, which is what the sensible default is. It is also the > case that the default constructor is never actually invoked, which > will be an ongoing point of surprise. > Terminology in specs just talks about what the default value is, eg > "authors should select the most appropriate default value for their > domain", "arrays are initialised to the default value of a value type" > or "if the default is null then ...": > The syntax is intended to make it perfectly reasonable to ask for > `LocalDate.default` or `Decimal.default` and get a sensible answer - > it looks like a "normal" constant in code. The use of `default = new` > by itself deliberately invokes the idea of a default constructor that > does nothing, without the need to spell it out. > Javadoc can be added to the `default` constant, which is very helpful. > For example it might include justification as to why LocaleDate does > not have a default value of 0000-01-01 or 1970-01-01. > Stephen > On Tue, 28 Mar 2023 at 20:13, Brian Goetz [ mailto:brian.goetz at oracle.com | > ] wrote: >> The recent update of JEP 401 contained a number of refinements to the user >> model, specifically, separating the primitive/reference distinction into a >> number of smaller distinctions (e.g., nullable vs non-nullable, optional vs >> required construction.) Overall this has been a very positive step forward. >> We still have a need for the distinction between what we've been calling B2 and >> B3; JEP 401 currently frames that in terms of "construction is optional." This >> is a big step forward; indeed, the key difference between them is whether the >> class _needs_ the "variables start out as null, and all instances are created >> by constructors" protection, or whether it admits the lighter-weight >> initialization protocol of "there's a a standard zero value, null-free >> variables are initialized to that" that primitives enjoy today. (Note that B3 >> classes don't require this lighter protocol, they merely enable it, much as >> primitives all give you the option of boxing to get the full conservative >> initialization protocol.) >> The idea of framing this as "construction is optional" is a good one, but the >> expression of it proposed in JEP 401 feels "not quite there". In this note >> I'll propose an alternative presentation, but the main goal here is around >> terminology and user model rather than syntax (so please keep the syntax >> agitation to a reasonable level.) >> The key distinction between B2 and B3 is that B3 has a _default value_ which the >> VM can summon at will. This enables non-nullable heap variables to be >> flattened, because we can initialize these the same way we initialize other >> fields and array elements. Further, that default value is highly constrained; >> it is a physical zero, the result of initializing all fields to their default >> value. >> Flattening is of course a goal, but it is not something that exists in the >> programming model -- its just an optimization. What exists in the programming >> model is the default value, and what this unlocks is the possibility for >> variables to be _implicitly initializated_. Reference-typed variables today >> are _explicitly initialized_; variables start out null and have to be >> initialized with a constructed value. A class with a default value has the >> option (opted in through null-exclusion) for its variables to be implicitly >> initialized, which, like primitives, means that they start out with a valid >> default value, and can be further assigned to. >> Framed this way, the Valhalla performance story simplifies to: >> - Give up identity, get flattening on the stack; >> - Further give up explicit initialization, get flattening for small objects on >> the heap; >> - Further give up atomicity, get flattening for larger objects on the heap. >> Giving up explicit initialization entails both the class opting out of explicit >> initialization, _and_ the variable opting out of nullity. >> The key new terminology that comes out of this is implicit vs explicit >> initialization. >> Syntactically, my preference is to indicate that the default value can be >> summoned by giving a value class a _default constructor_: >> value class Complex { >> public final double re, im; >> public default Complex(); >> } >> A default constructor has no arguments, no body, no throws clause, and >> implicitly initializes all fields to their default values. Unlike identity >> classes, value classes don't get constructions implicitly; a value class must >> declare at least one constructor, default or otherwise. This replaces the idea >> of "optional constructor", which is a negative statement about construction >> ("but you don't have to call me"), with a more direct and positive statement >> that there is a _default constructor_ with the required properties. >> Note that this is similar to the existing concept of "default constructor", >> which you get for free in an identity class if you don't specify any >> constructors. It is possible we can unify these features (and also with >> constructors in "agnostic" abstract classes), but first let's work out what it >> would mean in value classes, and see if we like it. >> In this model, a B3 class is just a value class with a default constructor -> a >> default constructor means that you have the choice of implicit or explicit >> initialization -> non-nullity at the use site opts into implicit initialization >> -> B3! gets flattening (for small layouts.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Apr 27 18:54:53 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Apr 2023 14:54:53 -0400 Subject: B3, default values, and implicit initialization In-Reply-To: <1550840502.43358908.1682620290012.JavaMail.zimbra@univ-eiffel.fr> References: <34283df4-f328-812a-dd70-0c479566fba4@oracle.com> <1550840502.43358908.1682620290012.JavaMail.zimbra@univ-eiffel.fr> Message-ID: Agreed.? The fact that it looks like a field, but its initial value is not actually an expression of that type, is pretty much disqualifying. But, they syntax is not really the main point here.? Stephen's point is that he's worried that "performance lore" will drive people to reach for B3, even when the zero-default sucks (like LocalDate).? We can't stop developers from being moths to the performance flame, but what we can do is try to find the most clear way to represent "instances of this class can be implicitly initialized", and have users explicitly opt into that.? And we can show what good judgment looks like by leading by example in the JDK.? We're good on the "requiring opt in" part, what we're mostly debating here is whether a class modifier or field or constructor or other special member or supertype is the best way to say "implicitly initializable value". (The field syntax also teases that you can put any value there, but you can't.? Which is why the implicit constructor syntax has no body; you can't put code in there that would make you think that you get to choose the default state.) On 4/27/2023 2:31 PM, Remi Forax wrote: > > I do not find this syntax attractive, especially the "new" in "default > = new", i can hear my students saying "new what" ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: