From john.r.rose at oracle.com Fri Dec 1 22:01:42 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 1 Dec 2023 22:01:42 +0000 Subject: Rules for larval value object construction In-Reply-To: References: Message-ID: Thanks for the update Dan. I am very encouraged with the increased simplicity and clarity of our new primitive notions. The original concept of value classes factors into final classes with strictly initialized finals and no non-final instance fields, plus identical restrictions on all their supers. By fiat they also are given identity masking and, optionally, implicit zero-construction and, optionally again, relaxed field consistency. This constellation of features unlocks a new set of optimizations. > On Nov 28, 2023, at 5:03?PM, Dan Smith wrote: > > ?Here's an update on how these ideas about construction have panned out as we've dug more into them. Some of the details are still in flux, but I'll try to highlight what's settled and what's still being explored. > >> On Sep 19, 2023, at 5:24 PM, Dan Smith wrote: >> >> We've spent some time investigating the idea of larval value object construction, and are enthusiastic about this change. Here are some details. >> >> 1) In the JVM, we no longer need the special methods or the 'aconst_init' and 'withfield' instructions. Drop them. > > Yep, check. > >> 2) Value objects are constructed just like identity objects, via field mutation. However, the "object" being mutated is in a larval state and we want to give implementations a lot of flexibility in how these larval objects are implemented. Thus, the language and JVM must ensure that a larval object is never observable?it can be written to but not read. > > Yes. This is the big idea: constrain access to the mutable larval instance, only allow programs to observe the instance once we no longer need mutation/identity. > >> 3) New language concept/modifier: a 'regulated' constructor. (Trying a name out here, haven't put a ton of thought into naming. Feel free to bikeshed.) The 'regulated' keyword can be applied to any constructor in any class. >> >> 4) A 'regulated' constructor promises not to make any use of 'this'. An error occurs if this promise is violated in any of a number of different ways: direct reference, instance field access (except for assignment targets), instance method invocation, implicit use as an enclosing instance, any of these occurring in an initializer expression/block, and, importantly, a 'super()' or 'this()' call to a different non-regulated constructor. These checks are very similar to the longstanding rules for "pre-construction contexts" clarified by JEP 447. > > After taking a serious look at the design of object construction, we've refined this approach. Rather than putting a blanket restriction on 'this' references, we want to encourage object state to be initialized *before* the 'super()' call, guaranteeing that it will be properly initialized before the language & verification rules allow any use of 'this'. Then we don't need to impose any restrictions on 'this'. > > So, for example, the fields of a value class must be initialized early. (Bikeshed: the fields and the class that declares them require "strict initialization", or something like that. Note, FWIW, that the JVM has an unused ACC_STRICT flag lying around...) We can relax the language rules to allow *writes*, but not *reads*, to instance fields in a JEP 447 "pre-construction context". > > value class Foo { > int x; int y; > > public Foo(int z) { > this.x = z / 5; > // can't use 'this' here: if (x > 0) ... > this.y = z % 5; > super(); > if (this.x == this.y) ... // no problem, can use 'this' > } > > } > > The concept of writing to fields before the 'super()' call may seem uncomfortable at first, but it has been allowed by the JVM for a long time. Enclosing instances of inner classes, in particular, have always been "strictly initialized", with javac generating the bytecode to set the field before it calls 'super()'. > > In a simple constructor without a 'super()' call, the implicit call no longer happens right at the start, allowing the fields to be properly initialized. > > value class Foo { > int x; int y; > > public Foo(int z) { > this.x = z / 5; > // can't use 'this' here: if (x > 0) ... > this.y = z % 5; > // implicit: super(); > } > } > > One open question we have about implicit 'super()': if there are extra statements after all fields are initialized, do we go ahead with the 'super()' call as soon as possible? Or do we have a blanket rule that says 'super()' always goes last? There are pros and cons in both directions. > > 'this'-calling constructors are similar, except that they're not allowed to write to final fields, because that's the responsibility of the delegated-to constructor. But they can freely use 'this' after the 'this()' call. > > In corpus analysis, we've found that, for final fields whose initialization doesn't depend on 'this', the timing of super constructor execution can be proven, using some pretty simple heuristics, to be irrelevant in the overwhelming majority of cases. (E.g., maybe the subclass just assigns a parameter to the field; or maybe the superclass constructor is empty.) > > Can we generalize this strict initialization capability to 'final' fields in other classes? Sure! Allowing assignments to final fields (and perhaps fields in general) before an explicit 'super()' is straightforward. For compatibility reasons, it gets trickier to adjust the timing of implicit 'super()' calls, but there may be something we can do, maybe automatically or maybe with an opt in. > > And note the benefits on the other side: a 'final' field that is guaranteed never to be observed to mutate (again, maybe call it a "strict final field") can be optimized in ways that our existing "mostly final" fields cannot. As one example, flattened strict-final fields will never experience read/write races, so need not have atomic encodings. > >> 5) The no-arg constructor of 'Object' is 'regulated'. The default constructor of any other class (the one you get if you declare no constructor) is implicitly 'regulated' if the superclass's no-arg constructor is 'regulated'. (A slight incompatibility here that I think we can tolerate: if you have an implicit constructor but also an initializer that depends on 'this', an error occurs.) > > In general, there's no longer a "regulated" property to worry about/infer/advertise. If you safely initialize your fields before calling 'super()', you don't care what your superclass constructor does. Strict initialization is a local, implementation-level property of the class. > >> 6) Every constructor of a value class is implicitly 'regulated'. (Similar to the rule that says every instance field of a value class is 'final'.) > > Value classes must be "strictly initialized"?i.e., the (always-final) fields of value classes must be initialized early. > > For superclasses of (concrete) value classes, we rely on the existing notion of identity/value/unrestricted abstract classes: value classes cannot extend identity classes; and in the value/unrestricted cases, all fields (if any) must be final and early-initialized. (Still working out details for how to infer/declare which category an abstract class belongs to.) > >> 7) In the class file, there's an ACC_REGULATED flag for methods, only allowed to be applied to methods. >> >> 8) At class loading, an ACC_VALUE class requires that its constructors all be ACC_REGULATED. >> >> 9) Verification ensures that 'this' is never read from or passed out of an ACC_REGULATED method. Specifically, in such a method, the type of 'this' is 'uninitializedThis' even after the invokespecial super/this call. Verification also ensures that the target of an invokespecial super/this call is ACC_REGULATED. (See detailed rules below.) > > Still working out JVM details, but for runtime safety, what we need is some way to assert that a final field must be early-initialized, and then a mechanism to reject attempts to do 'putfield' after the 'super()' call. > > Tentatively: > > - Final instance fields could be marked ACC_STRICT. Or maybe we could infer this property based on context (like ACC_VALUE on the class). > > - A linkage error already rejects 'putfield' outside of a final field's constructor. So we just need to manage 'putfield' within . And the verifier already distinguishes between "early" and "late" putfields (because the types involved are different). It would be trivial to add a verifier condition that the target of a "late" putfield must not be ACC_STRICT. > > - Alternatively, we have some ideas about tracking a "larval" state on objects, and dynamically rejecting a 'putfield' to an ACC_STRICT field of a non-larval object. > >> 10) The point at which control returns to a invokespecial of an method that *doesn't* represent a super/this call is the point at which a larval object becomes promoted to a "real" value object and the verification type can be 'LValueClass;'. No spec changes here, other than conveying that concept. > > As suggested by the above rules, this approach moves the larval-->adult transition a bit earlier, to the entry point of the Object. constructor. This is expressed with the existing verification type system, which transitions from 'uninitializedThis' to 'LFoo;'. > >> Some benefits of this approach: >> - Garbage collecting special new opcodes and the methods, a big simplification >> - Full binary compatibility when a class is refactored from identity to new, or vice versa >> - (I think) ability to translate an identity class to a value class by simply flipping a flag bit >> - Surfacing of a useful general-purpose concept: constructors that can be counted on not to leak 'this' >> - Fewer restrictions on value objects' superclass constructors: allow code, access control, and checked exceptions >> - Support for instance fields in superclasses of value objects (!)* >> >> (*We can logically allow for superclass fields, anyway. It's possible we'll decide there are implementation constraints that prevent implementing this immediately.) > > Basically the same list. Early initialization does require some change to the bytecode, so migration is not as simple (at the bytecode level) as adding 'ACC_VALUE'. But since most value class candidates don't need an explicit super() call, at the source level migration may be as easy as adding the 'value' modifier. > > The (very useful) general-purpose concept here is "strict construction" or "strict-final fields". > > This approach asks *even less* of value objects' superclasses: if they have fields, those will need to be early-initialized. Beyond that, the constructors can do whatever they want. > >> One risk is that ACC_REGULATED methods must not be instrumented in ways that make use of 'this'. This restriction applies to the constructor of Object. What are the chances this breaks somebody's tooling? > > This is one motivation for the shift in strategy: it actually seems fairly common for instrumentation users to want to add some this-dependent code to Object.. (There are examples of this in our own documentation, in fact...) > > If a value class is strictly constructed, there is no problem with executing arbitrary instrumentation code into the Object. constructor. That code should run just fine. > From forax at univ-mlv.fr Sat Dec 2 07:35:43 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 2 Dec 2023 08:35:43 +0100 (CET) Subject: Rules for larval value object construction In-Reply-To: References: Message-ID: <2052129002.70270761.1701502543711.JavaMail.zimbra@univ-eiffel.fr> ----- Original Message ----- > From: "John Rose" > To: "daniel smith" > Cc: "valhalla-spec-experts" > Sent: Friday, December 1, 2023 11:01:42 PM > Subject: Re: Rules for larval value object construction > Thanks for the update Dan. I am very encouraged with the increased simplicity > and clarity of our new primitive notions. > > The original concept of value classes factors into final classes with strictly > initialized finals and no non-final instance fields, plus identical > restrictions on all their supers. By fiat they also are given identity masking > and, optionally, implicit zero-construction and, optionally again, relaxed > field consistency. This constellation of features unlocks a new set of > optimizations. For the identity class, I do not think we can avoid an opt-in mecanism given that the semantics of calling the super constructor early or late is different /*abstract or not*/ class A { A() { System.out.println("side effect " + this); } } final class B extends A { final int value; B(int value) { super(); // early super call (inserted by the compiler) this.value = value; super(); // or late super call (inserted by the compiler) } } If we use an opt-in mechanism, I would prefer it to be a class-wide opt-in. Because, we have already the concept of stricter final fields, final fields of hidden classes and records are not modifiable by reflection and I do not really see the point of mixing early initialized final fields and late initialized final fields. So let say we have a __strict__ keyword on classes. A __strict__ class must have all its final fields initialized before calling the super-constructor. For a constructor of a __strict__ class, the compiler insert the call to super() at the end. If people want to use "this" after the initilization of the fields, they are required to write the super call explicitly. At VM level, we have a modifier bit on the class saying the class is __strict__. All value classes are __strict__ by default, and their super classes must be strict too. Record are modified to be __strict__ by default. All non-mutable classes of the packages java.lang/java.util like java.lang.Enum, java.lang.String are modified to be __strict__. At some point, javac/IDEs should warn users to declare the class __strict__ if there is at least one final field to raise awareness. Later, like we have done with ACC_SUPER, we can mandate all classes to be __strict__ for a specific version of the class file. Obviously, we can discuss to take a shortcut a jump to mandating all classes to be __strict__ without introducing a keyword only using the class file version, but my fear is that people will remove the "final" keyword if the compiler starts to ask all final fields to be initialized before the super() call, but i may be wrong. regards, R?mi > >> On Nov 28, 2023, at 5:03?PM, Dan Smith wrote: >> >> ?Here's an update on how these ideas about construction have panned out as we've >> ?dug more into them. Some of the details are still in flux, but I'll try to >> ?highlight what's settled and what's still being explored. >> >>> On Sep 19, 2023, at 5:24 PM, Dan Smith wrote: >>> >>> We've spent some time investigating the idea of larval value object >>> construction, and are enthusiastic about this change. Here are some details. >>> >>> 1) In the JVM, we no longer need the special methods or the 'aconst_init' >>> and 'withfield' instructions. Drop them. >> >> Yep, check. >> >>> 2) Value objects are constructed just like identity objects, via field mutation. >>> However, the "object" being mutated is in a larval state and we want to give >>> implementations a lot of flexibility in how these larval objects are >>> implemented. Thus, the language and JVM must ensure that a larval object is >>> never observable?it can be written to but not read. >> >> Yes. This is the big idea: constrain access to the mutable larval instance, only >> allow programs to observe the instance once we no longer need >> mutation/identity. >> >>> 3) New language concept/modifier: a 'regulated' constructor. (Trying a name out >>> here, haven't put a ton of thought into naming. Feel free to bikeshed.) The >>> 'regulated' keyword can be applied to any constructor in any class. >>> >>> 4) A 'regulated' constructor promises not to make any use of 'this'. An error >>> occurs if this promise is violated in any of a number of different ways: direct >>> reference, instance field access (except for assignment targets), instance >>> method invocation, implicit use as an enclosing instance, any of these >>> occurring in an initializer expression/block, and, importantly, a 'super()' or >>> 'this()' call to a different non-regulated constructor. These checks are very >>> similar to the longstanding rules for "pre-construction contexts" clarified by >>> JEP 447. >> >> After taking a serious look at the design of object construction, we've refined >> this approach. Rather than putting a blanket restriction on 'this' references, >> we want to encourage object state to be initialized *before* the 'super()' >> call, guaranteeing that it will be properly initialized before the language & >> verification rules allow any use of 'this'. Then we don't need to impose any >> restrictions on 'this'. >> >> So, for example, the fields of a value class must be initialized early. >> (Bikeshed: the fields and the class that declares them require "strict >> initialization", or something like that. Note, FWIW, that the JVM has an unused >> ACC_STRICT flag lying around...) We can relax the language rules to allow >> *writes*, but not *reads*, to instance fields in a JEP 447 "pre-construction >> context". >> >> value class Foo { >> int x; int y; >> >> public Foo(int z) { >> this.x = z / 5; >> // can't use 'this' here: if (x > 0) ... >> this.y = z % 5; >> super(); >> if (this.x == this.y) ... // no problem, can use 'this' >> } >> >> } >> >> The concept of writing to fields before the 'super()' call may seem >> uncomfortable at first, but it has been allowed by the JVM for a long time. >> Enclosing instances of inner classes, in particular, have always been "strictly >> initialized", with javac generating the bytecode to set the field before it >> calls 'super()'. >> >> In a simple constructor without a 'super()' call, the implicit call no longer >> happens right at the start, allowing the fields to be properly initialized. >> >> value class Foo { >> int x; int y; >> >> public Foo(int z) { >> this.x = z / 5; >> // can't use 'this' here: if (x > 0) ... >> this.y = z % 5; >> // implicit: super(); >> } >> } >> >> One open question we have about implicit 'super()': if there are extra >> statements after all fields are initialized, do we go ahead with the 'super()' >> call as soon as possible? Or do we have a blanket rule that says 'super()' >> always goes last? There are pros and cons in both directions. >> >> 'this'-calling constructors are similar, except that they're not allowed to >> write to final fields, because that's the responsibility of the delegated-to >> constructor. But they can freely use 'this' after the 'this()' call. >> >> In corpus analysis, we've found that, for final fields whose initialization >> doesn't depend on 'this', the timing of super constructor execution can be >> proven, using some pretty simple heuristics, to be irrelevant in the >> overwhelming majority of cases. (E.g., maybe the subclass just assigns a >> parameter to the field; or maybe the superclass constructor is empty.) >> >> Can we generalize this strict initialization capability to 'final' fields in >> other classes? Sure! Allowing assignments to final fields (and perhaps fields >> in general) before an explicit 'super()' is straightforward. For compatibility >> reasons, it gets trickier to adjust the timing of implicit 'super()' calls, but >> there may be something we can do, maybe automatically or maybe with an opt in. >> >> And note the benefits on the other side: a 'final' field that is guaranteed >> never to be observed to mutate (again, maybe call it a "strict final field") >> can be optimized in ways that our existing "mostly final" fields cannot. As one >> example, flattened strict-final fields will never experience read/write races, >> so need not have atomic encodings. >> >>> 5) The no-arg constructor of 'Object' is 'regulated'. The default constructor of >>> any other class (the one you get if you declare no constructor) is implicitly >>> 'regulated' if the superclass's no-arg constructor is 'regulated'. (A slight >>> incompatibility here that I think we can tolerate: if you have an implicit >>> constructor but also an initializer that depends on 'this', an error occurs.) >> >> In general, there's no longer a "regulated" property to worry >> about/infer/advertise. If you safely initialize your fields before calling >> 'super()', you don't care what your superclass constructor does. Strict >> initialization is a local, implementation-level property of the class. >> >>> 6) Every constructor of a value class is implicitly 'regulated'. (Similar to the >>> rule that says every instance field of a value class is 'final'.) >> >> Value classes must be "strictly initialized"?i.e., the (always-final) fields of >> value classes must be initialized early. >> >> For superclasses of (concrete) value classes, we rely on the existing notion of >> identity/value/unrestricted abstract classes: value classes cannot extend >> identity classes; and in the value/unrestricted cases, all fields (if any) must >> be final and early-initialized. (Still working out details for how to >> infer/declare which category an abstract class belongs to.) >> >>> 7) In the class file, there's an ACC_REGULATED flag for methods, only allowed to >>> be applied to methods. >>> >>> 8) At class loading, an ACC_VALUE class requires that its constructors all be >>> ACC_REGULATED. >>> >>> 9) Verification ensures that 'this' is never read from or passed out of an >>> ACC_REGULATED method. Specifically, in such a method, the type of 'this' >>> is 'uninitializedThis' even after the invokespecial super/this call. >>> Verification also ensures that the target of an invokespecial super/this call >>> is ACC_REGULATED. (See detailed rules below.) >> >> Still working out JVM details, but for runtime safety, what we need is some way >> to assert that a final field must be early-initialized, and then a mechanism to >> reject attempts to do 'putfield' after the 'super()' call. >> >> Tentatively: >> >> - Final instance fields could be marked ACC_STRICT. Or maybe we could infer this >> property based on context (like ACC_VALUE on the class). >> >> - A linkage error already rejects 'putfield' outside of a final field's >> constructor. So we just need to manage 'putfield' within . And the >> verifier already distinguishes between "early" and "late" putfields (because >> the types involved are different). It would be trivial to add a verifier >> condition that the target of a "late" putfield must not be ACC_STRICT. >> >> - Alternatively, we have some ideas about tracking a "larval" state on objects, >> and dynamically rejecting a 'putfield' to an ACC_STRICT field of a non-larval >> object. >> >>> 10) The point at which control returns to a invokespecial of an method >>> that *doesn't* represent a super/this call is the point at which a larval >>> object becomes promoted to a "real" value object and the verification type can >>> be 'LValueClass;'. No spec changes here, other than conveying that concept. >> >> As suggested by the above rules, this approach moves the larval-->adult >> transition a bit earlier, to the entry point of the Object. constructor. >> This is expressed with the existing verification type system, which transitions >> from 'uninitializedThis' to 'LFoo;'. >> >>> Some benefits of this approach: >>> - Garbage collecting special new opcodes and the methods, a big >>> simplification >>> - Full binary compatibility when a class is refactored from identity to new, or >>> vice versa >>> - (I think) ability to translate an identity class to a value class by simply >>> flipping a flag bit >>> - Surfacing of a useful general-purpose concept: constructors that can be >>> counted on not to leak 'this' >>> - Fewer restrictions on value objects' superclass constructors: allow code, >>> access control, and checked exceptions >>> - Support for instance fields in superclasses of value objects (!)* >>> >>> (*We can logically allow for superclass fields, anyway. It's possible we'll >>> decide there are implementation constraints that prevent implementing this >>> immediately.) >> >> Basically the same list. Early initialization does require some change to the >> bytecode, so migration is not as simple (at the bytecode level) as >> adding 'ACC_VALUE'. But since most value class candidates don't need an >> explicit super() call, at the source level migration may be as easy as adding >> the 'value' modifier. >> >> The (very useful) general-purpose concept here is "strict construction" or >> "strict-final fields". >> >> This approach asks *even less* of value objects' superclasses: if they have >> fields, those will need to be early-initialized. Beyond that, the constructors >> can do whatever they want. >> >>> One risk is that ACC_REGULATED methods must not be instrumented in ways that >>> make use of 'this'. This restriction applies to the constructor of Object. What >>> are the chances this breaks somebody's tooling? >> >> This is one motivation for the shift in strategy: it actually seems fairly >> common for instrumentation users to want to add some this-dependent code to >> Object.. (There are examples of this in our own documentation, in >> fact...) >> >> If a value class is strictly constructed, there is no problem with executing >> arbitrary instrumentation code into the Object. constructor. That code >> should run just fine. From daniel.smith at oracle.com Wed Dec 13 00:29:26 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 13 Dec 2023 00:29:26 +0000 Subject: Fwd: Field initialization before 'super' References: Message-ID: FYI, proposed strategy for introducing early field initialization as part of the Statements Before Super JEP, with value classes layering on top of this. Begin forwarded message: From: Dan Smith Subject: Field initialization before 'super' Date: December 12, 2023 at 4:27:24 PM PST To: amber-spec-experts In Valhalla we've been building on the language changes in JEP 447 (Statements Before Super) to move towards a more safe and reliable programming pattern for initializing final fields. Some of these ideas could make their way into the next iteration of Statements Before Super, to be further augmented with Value Classes (JEP 401). Two key observations: - Inside methods, the JVM allows writes to instance fields of "uninitialized" objects, before the 'super()' call. (In fact, javac has long used this capability to initialize fields that store captured state of inner classes.) - When a 'final' field is written before the 'super()' call, it is impossible to observe the field prior to its initialization. Thus, the field can be treated as truly immutable?every 'getfield' on the same instance will return the same value. (In contrast, in existing usage, uses of final fields may observe mutation if the object might still be under construction.) To enable and take advantage of early field initialization, we've envisioned the following changes: 1) As an exception to the general rule about 'this' usage, a "pre-construction context" allows writes to blank instance fields of the class. (The terminology may need updating, since you're clearly "constructing" the object if you're writing to its fields.) The fields are "write-only" at this stage?you can write into them but can't read them back. The regular DA/DU rules apply for final fields: they must be initialized exactly once by an initializer or by every 'super()'-calling constructor, whether in the prologue or the epilogue. At a 'this()' call, all final fields must be DU (because the delegated constructor will perform its own writes). No such restriction is needed for non-final fields; but it's an open question whether we should prohibit all writes before 'this()' anyway. Writes to non-final fields with initializers are disallowed, to avoid confusion about sequencing (the field initializer will always run later, overwriting whatever you put in the constructor prologue.) 2) If a final field is written before 'super()' via every constructor in the class, it can be considered a "strict final" field. It will never be observed to mutate. In the class file, ACC_STRICT is repurposed to indicate a strict final field. javac is responsible for identifying strict final fields. Existing early-initialized capture fields can probably be automatically counted as strict finals. ACC_STRICT implies ACC_FINAL and !ACC_STATIC. Verification ensures that a 'putfield' for an ACC_STRICT field of the current class never occurs after the 'super()' call. (Specifically, the receiver type for the putfield must be 'uninitializedThis', not a class type.) 3) Immutability of strict finals is a strong guarantee. JVM internals may treat strict final fields as truly immutable, without supporting any deopt paths when unexpected mutation occurs. The 'Field.setAccessible' method, which provides a standard API mechanism for mutating final fields, considers strict finals to be "non-modifiable", and will not enable reflective writes. (It already does the same for record fields.) Standard deserialization ensures strict finals are set, and so their values deserialized, before the object under construction is leaked to any user code. This probably means back references to an object from its own strict final fields are unsupported, and deserialize to 'null'. (Records already behave in this way.) Unsafe and JNI are capable of performing arbitrary, type-unsafe modifications to field storage. Clients who modify strict finals do so at their own risk, and JVM optimizations won't try to account for such usage. ----- That covers "phase 1" for this feature. Eventually, we'll want to address questions like - What about fields with initializers? - Can I have my implicit 'super()' call go at the end of my constructor? - Can javac check for me that my fields are strict? These sorts of capabilities probably make sense to introduce with value classes, and perhaps retrofit on records. Further design work needed to figure out how to release them for general consumption. All of that can be considered "phase 2", to come later. But for Statements Before Super, we're just proposing to start with (1), (2), and (3). I realize (2) and especially (3) are stretching the original concept of this JEP (which was purely language/compiler-oriented). But I think, from end users' perspective, it will all feel like the same feature. If wanted, though, I could see doing those pieces in their own JEP in parallel with Statements Before Super. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Dec 13 15:37:04 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 13 Dec 2023 15:37:04 +0000 Subject: EG meeting 2023-12-13 Message-ID: <240E9C1B-12EB-4181-BD58-EFFF481D29DB@oracle.com> EG meeting today, 5pm UTC (9am PST, 12pm EST). I've updated last time's "rules for larval value object construction" with some spin-off details building on the Statements Before Super JEP, something that can be pursued in the Amber project. From daniel.smith at oracle.com Fri Dec 15 20:22:05 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 15 Dec 2023 20:22:05 +0000 Subject: Simplifying 'value' and 'identity' Message-ID: When we last checked in on it, our story for superclasses/interfaces of value classes was as follows: In general, classes/interfaces are "unconstrained" and support subclassing by both value and identity classes. The 'value' and 'identity' keywords act as a form of "sealing", restricting subclassing to only classes of the given type. Any class/interface that extends classes/interfaces with both keywords is in error. As a special case, concrete classes and some abstract classes with certain properties are implicitly 'identity' classes. This approach has a three-state categorization scheme expressed with two keywords ('value', 'identity', and "neither"). This corresponds to two JVM flags (ACC_IDENTITY and ACC_VALUE). I think it holds together fairly well, but it's also somewhat difficult to communicate, and not as intuitive as we might like. I find myself periodically reminding people (and sometimes myself): "don't forget that there are both value abstract classes and unconstrained abstract classes" or "unconstrained classes can't have mutable fields, those are only for identity classes". As we've revised the construction mechanism, we've found ourselves searching for an explicit keyword for the third, unconstrained state: "value-capable" or "universal" or something. None of these keywords are very compelling, and I don't relish the job of trying to get people to adopt these fairly obscure, infrequently used terms. ----- If we're willing to give up some fairly marginal fine-grained controls, this story can be simplified significantly. Concrete classes: nothing new?a concrete class is an identity class by default, but may opt out of identity with the 'value' keyword. Concrete value classes are implicitly final and subject to a handful of extra constraints. Abstract classes: an abstract class is also an identity class by default, but may use the 'value' modifier to indicate that it doesn't require identity. This not the same thing as saying its subclasses *must not* make use of identity, it's just an assertion about the abstract class itself (and its supers). Both value classes and identity classes may extend an abstract value class. These abstract classes are subject to the same constraints as concrete value classes. Interfaces: all interfaces can be implemented by value classes and identity classes. Full stop. If you have a particular need to limit the kinds of implementing classes, you can use a sealed interface and provide the implementation yourself, or you can make it an informal part of the contract. But, like access control, synchronization, and other fine-grained, implementation-oriented features, identity is not something interfaces can explicitly control. This approach has a two-state categorization scheme for classes expressed with one keyword ('value' and identity). This corresponds to one JVM flag (ACC_IDENTITY, for {reasons}). Interfaces have only one state. What do we lose? - You can't force an abstract class/interface to be implemented by *only* value classes - You can't force an interface to be implemented by *only* identity classes - You can't declare an abstract class or interface whose type will refuse to support the 'synchronized' keyword I don't think any of these are enough to justify the extra costs of 3 states or an 'identity' keyword. (For awhile, I was considering keeping around the 'identity' keyword, just for the purpose of interfaces. But, eh, nobody is going to use it.) From brian.goetz at oracle.com Fri Dec 15 20:33:58 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 15 Dec 2023 15:33:58 -0500 Subject: Simplifying 'value' and 'identity' In-Reply-To: References: Message-ID: This seems perfectly sensible to me. On 12/15/2023 3:22 PM, Dan Smith wrote: > When we last checked in on it, our story for superclasses/interfaces of value classes was as follows: > > In general, classes/interfaces are "unconstrained" and support subclassing by both value and identity classes. The 'value' and 'identity' keywords act as a form of "sealing", restricting subclassing to only classes of the given type. Any class/interface that extends classes/interfaces with both keywords is in error. > > As a special case, concrete classes and some abstract classes with certain properties are implicitly 'identity' classes. > > This approach has a three-state categorization scheme expressed with two keywords ('value', 'identity', and "neither"). This corresponds to two JVM flags (ACC_IDENTITY and ACC_VALUE). > > I think it holds together fairly well, but it's also somewhat difficult to communicate, and not as intuitive as we might like. I find myself periodically reminding people (and sometimes myself): "don't forget that there are both value abstract classes and unconstrained abstract classes" or "unconstrained classes can't have mutable fields, those are only for identity classes". > > As we've revised the construction mechanism, we've found ourselves searching for an explicit keyword for the third, unconstrained state: "value-capable" or "universal" or something. None of these keywords are very compelling, and I don't relish the job of trying to get people to adopt these fairly obscure, infrequently used terms. > > ----- > > If we're willing to give up some fairly marginal fine-grained controls, this story can be simplified significantly. > > Concrete classes: nothing new?a concrete class is an identity class by default, but may opt out of identity with the 'value' keyword. Concrete value classes are implicitly final and subject to a handful of extra constraints. > > Abstract classes: an abstract class is also an identity class by default, but may use the 'value' modifier to indicate that it doesn't require identity. This not the same thing as saying its subclasses *must not* make use of identity, it's just an assertion about the abstract class itself (and its supers). Both value classes and identity classes may extend an abstract value class. These abstract classes are subject to the same constraints as concrete value classes. > > Interfaces: all interfaces can be implemented by value classes and identity classes. Full stop. If you have a particular need to limit the kinds of implementing classes, you can use a sealed interface and provide the implementation yourself, or you can make it an informal part of the contract. But, like access control, synchronization, and other fine-grained, implementation-oriented features, identity is not something interfaces can explicitly control. > > This approach has a two-state categorization scheme for classes expressed with one keyword ('value' and identity). This corresponds to one JVM flag (ACC_IDENTITY, for {reasons}). Interfaces have only one state. > > What do we lose? > - You can't force an abstract class/interface to be implemented by *only* value classes > - You can't force an interface to be implemented by *only* identity classes > - You can't declare an abstract class or interface whose type will refuse to support the 'synchronized' keyword > > I don't think any of these are enough to justify the extra costs of 3 states or an 'identity' keyword. (For awhile, I was considering keeping around the 'identity' keyword, just for the purpose of interfaces. But, eh, nobody is going to use it.) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Dec 15 21:08:03 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 15 Dec 2023 16:08:03 -0500 Subject: Fwd: Field initialization before 'super' In-Reply-To: References: Message-ID: <8111f92e-e0de-4d60-9d82-1950de0be8a7@oracle.com> > The 'Field.setAccessible' method, which provides a standard API > mechanism for mutating final fields, considers strict finals to be > "non-modifiable", and will not enable reflective writes. (It already > does the same for record fields.) We discussed this at the EG meeting and I'm still uncomfortable. There are two not-completely-coincident sets of constraints: ?- Final instance fields that can be modified using setAccessible ?- Final instance fields that are trusted by the VM to stay constant The former is specified as: final instance fields that are not in records and hidden classes, as well as the modularity constraint that prevent use of setAccessible in the absence of an "opens" edge (see AccessibleObject::checkCanSetAccessible). The latter is a larger and somewhat more embarassing set: hidden classes, box classes, record classes, String, atomic field updaters, and a whitelisted set of packages (java.lang, java.lang.{invoke,reflect}, sun.invoke, jdk.internal.{reflect,foreign.layout,foreign,vm.vector}, jdk.incubator.vector (except for the weird in/out fields in System).? Plus a flag to just turn on "trust them all, except the weird ones in System". The former is specified through JDK specifications; the latter is a VM implementation detail that conservatively gives up optimization when we cannot prove that initialization is race-free. Adding "value classes" to both lists seems reasonable.? Driving towards fewer modifiable final fields is a desirable goal; driving towards all final fields being trustable is also. But this proposal also adds another entry to both lists: "final fields whose initialization precedes the super-call in all constructors."? This is not an easily describable property of the programming model, and it is pretty hard to explain why this is relevant.? Their presence on the first list may show up as weird breakage with existing frameworks under harmless-looking refactorings.? Plus, since we don't have a reasonable description for these fields at the language level (unlike "fields in records, hidden classes, or value classes"), it's hard to even talk about. The stated purpose of the setAccessible loophole is deserialization; the existence of the non-trusting of final fields is due to the possibility that the JIT might inline the wrong value, either due to races (if the field is set after this escapes) or setAccessible shenanigans. On the other hand, I think we all agree that we don't want a `really-final-i-mean-it` modifier on fields, and that if we'd like for most classes to be strictly initialized, then having a strict modifier on classes/ctors is another "wrong default". tl;dr: I think the "strict field" formulation beyond that needed for value classes needs some more bake time. From daniel.smith at oracle.com Fri Dec 15 21:22:04 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 15 Dec 2023 21:22:04 +0000 Subject: Simplifying 'value' and 'identity' In-Reply-To: References: Message-ID: Two more notes, just to keep all of these ideas in one place: > On Dec 15, 2023, at 12:22 PM, Dan Smith wrote: > > Abstract classes: an abstract class is also an identity class by default, but may use the 'value' modifier to indicate that it doesn't require identity. > What do we lose? > - You can't force an abstract class/interface to be implemented by *only* value classes > - You can't force an interface to be implemented by *only* identity classes > - You can't declare an abstract class or interface whose type will refuse to support the 'synchronized' keyword Another noteworthy difference is that stateless abstract classes must *explicitly* opt in to supporting value classes, whereas before we were doing so implicitly. This is really an orthogonal choice. If we wanted to, we could roll back this aspect of the change by continuing to infer that certain abstract classes are value-compatible (in this case, that amounts to inferring the 'value' keyword). But I've come around to the idea that a simple "all classes are identity by default" rule is good, and that the binary compatibility commitment associated with "value" (e.g., the class will never add a mutable field) is better stated explicitly. > (For awhile, I was considering keeping around the 'identity' keyword, just for the purpose of interfaces. But, eh, nobody is going to use it.) One problem I noticed with a two-state interfaces approach is that interfaces need to be "value interfaces" by default, but we probably don't want to allow a "value interface" to extend an "identity interface", because in this iteration 'value' is an assertion about supertypes. (In contrast to the 3-state case, where it was fine to allow an "unconstrained interface" to extend an "identity interface".) Implication: if I'm going to add the 'identity' modifier to my interface, I first have to be sure that any subinterfaces out there are also 'identity' interfaces. But if the interface is public, how could I be sure? And if the interface is not public, what is the point of applying the 'identity' constraint at all? Essentially, this version of 'identity interface' would only be useful for newly-declared interfaces. From maurizio.cimadamore at oracle.com Mon Dec 18 11:03:31 2023 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 18 Dec 2023 11:03:31 +0000 Subject: Simplifying 'value' and 'identity' In-Reply-To: References: Message-ID: On 15/12/2023 20:22, Dan Smith wrote: > it's just an assertion about the abstract class itself (and its supers) I think the rules here have to say something more about what a value abstract class can and cannot extend? E.g. I suppose (hope) this is invalid: ```java class Foo { ... } value abstract Bar extends Foo { ... } ``` Is this what you mean by > (and its supers) ? Maurizio -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Mon Dec 18 11:06:51 2023 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 18 Dec 2023 11:06:51 +0000 Subject: Simplifying 'value' and 'identity' In-Reply-To: References: Message-ID: > But I've come around to the idea that a simple "all classes are identity by default" rule is good, and that the binary compatibility commitment associated with "value" (e.g., the class will never add a mutable field) is better stated explicitly. I agree with this - it makes the mental map so much easier! > >> (For awhile, I was considering keeping around the 'identity' keyword, just for the purpose of interfaces. But, eh, nobody is going to use it.) > One problem I noticed with a two-state interfaces approach is that interfaces need to be "value interfaces" by default, but we probably don't want to allow a "value interface" to extend an "identity interface", because in this iteration 'value' is an assertion about supertypes. (In contrast to the 3-state case, where it was fine to allow an "unconstrained interface" to extend an "identity interface".) Yep, allowing a class to implement a mix of value and non-value interface seems messy - saying that all interfaces are "value" seems much better. Maurizio -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Mon Dec 18 16:15:13 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 18 Dec 2023 16:15:13 +0000 Subject: Simplifying 'value' and 'identity' In-Reply-To: References: Message-ID: <6EAFC130-9E7C-46E5-9805-388C9B71A4A6@oracle.com> > On Dec 18, 2023, at 3:03 AM, Maurizio Cimadamore wrote: > > On 15/12/2023 20:22, Dan Smith wrote: >> it's just an assertion about the abstract class itself (and its supers) > > I think the rules here have to say something more about what a value abstract class can and cannot extend? E.g. I suppose (hope) this is invalid: > ```java > class Foo { ... } > value abstract Bar extends Foo { ... } > ``` > Is this what you mean by > >> (and its supers) > ? Yes, that's right. The rule is that a value class must extend another value class (necessarily abstract), or Object.