From daniel.smith at oracle.com Wed Nov 1 15:26:02 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 1 Nov 2023 15:26:02 +0000 Subject: EG meeting *canceled* 2023-11-01 Message-ID: <97E0F3B4-BA2F-47E7-B2C7-E95F6B684803@oracle.com> We've got some conflicts and holidays today, so let's skip the EG meeting. From daniel.smith at oracle.com Wed Nov 15 05:49:15 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 Nov 2023 05:49:15 +0000 Subject: EG meeting *canceled* 2023-11-15 Message-ID: <9016F963-A22F-4074-ADEE-90A4D7163275@oracle.com> It's been quiet on here, sorry, we've been working internally on some details of value class initialization. I think we'll have something useful to share soon. But nothing quite ready to talk about this week. From daniel.smith at oracle.com Wed Nov 29 01:03:20 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 29 Nov 2023 01:03:20 +0000 Subject: Rules for larval value object construction In-Reply-To: References: Message-ID: Here's an update on how these ideas about construction have panned out as we've dug more into them. Some of the details are still in flux, but I'll try to highlight what's settled and what's still being explored. > On Sep 19, 2023, at 5:24 PM, Dan Smith wrote: > > We've spent some time investigating the idea of larval value object construction, and are enthusiastic about this change. Here are some details. > > 1) In the JVM, we no longer need the special methods or the 'aconst_init' and 'withfield' instructions. Drop them. Yep, check. > 2) Value objects are constructed just like identity objects, via field mutation. However, the "object" being mutated is in a larval state and we want to give implementations a lot of flexibility in how these larval objects are implemented. Thus, the language and JVM must ensure that a larval object is never observable?it can be written to but not read. Yes. This is the big idea: constrain access to the mutable larval instance, only allow programs to observe the instance once we no longer need mutation/identity. > 3) New language concept/modifier: a 'regulated' constructor. (Trying a name out here, haven't put a ton of thought into naming. Feel free to bikeshed.) The 'regulated' keyword can be applied to any constructor in any class. > > 4) A 'regulated' constructor promises not to make any use of 'this'. An error occurs if this promise is violated in any of a number of different ways: direct reference, instance field access (except for assignment targets), instance method invocation, implicit use as an enclosing instance, any of these occurring in an initializer expression/block, and, importantly, a 'super()' or 'this()' call to a different non-regulated constructor. These checks are very similar to the longstanding rules for "pre-construction contexts" clarified by JEP 447. After taking a serious look at the design of object construction, we've refined this approach. Rather than putting a blanket restriction on 'this' references, we want to encourage object state to be initialized *before* the 'super()' call, guaranteeing that it will be properly initialized before the language & verification rules allow any use of 'this'. Then we don't need to impose any restrictions on 'this'. So, for example, the fields of a value class must be initialized early. (Bikeshed: the fields and the class that declares them require "strict initialization", or something like that. Note, FWIW, that the JVM has an unused ACC_STRICT flag lying around...) We can relax the language rules to allow *writes*, but not *reads*, to instance fields in a JEP 447 "pre-construction context". value class Foo { int x; int y; public Foo(int z) { this.x = z / 5; // can't use 'this' here: if (x > 0) ... this.y = z % 5; super(); if (this.x == this.y) ... // no problem, can use 'this' } } The concept of writing to fields before the 'super()' call may seem uncomfortable at first, but it has been allowed by the JVM for a long time. Enclosing instances of inner classes, in particular, have always been "strictly initialized", with javac generating the bytecode to set the field before it calls 'super()'. In a simple constructor without a 'super()' call, the implicit call no longer happens right at the start, allowing the fields to be properly initialized. value class Foo { int x; int y; public Foo(int z) { this.x = z / 5; // can't use 'this' here: if (x > 0) ... this.y = z % 5; // implicit: super(); } } One open question we have about implicit 'super()': if there are extra statements after all fields are initialized, do we go ahead with the 'super()' call as soon as possible? Or do we have a blanket rule that says 'super()' always goes last? There are pros and cons in both directions. 'this'-calling constructors are similar, except that they're not allowed to write to final fields, because that's the responsibility of the delegated-to constructor. But they can freely use 'this' after the 'this()' call. In corpus analysis, we've found that, for final fields whose initialization doesn't depend on 'this', the timing of super constructor execution can be proven, using some pretty simple heuristics, to be irrelevant in the overwhelming majority of cases. (E.g., maybe the subclass just assigns a parameter to the field; or maybe the superclass constructor is empty.) Can we generalize this strict initialization capability to 'final' fields in other classes? Sure! Allowing assignments to final fields (and perhaps fields in general) before an explicit 'super()' is straightforward. For compatibility reasons, it gets trickier to adjust the timing of implicit 'super()' calls, but there may be something we can do, maybe automatically or maybe with an opt in. And note the benefits on the other side: a 'final' field that is guaranteed never to be observed to mutate (again, maybe call it a "strict final field") can be optimized in ways that our existing "mostly final" fields cannot. As one example, flattened strict-final fields will never experience read/write races, so need not have atomic encodings. > 5) The no-arg constructor of 'Object' is 'regulated'. The default constructor of any other class (the one you get if you declare no constructor) is implicitly 'regulated' if the superclass's no-arg constructor is 'regulated'. (A slight incompatibility here that I think we can tolerate: if you have an implicit constructor but also an initializer that depends on 'this', an error occurs.) In general, there's no longer a "regulated" property to worry about/infer/advertise. If you safely initialize your fields before calling 'super()', you don't care what your superclass constructor does. Strict initialization is a local, implementation-level property of the class. > 6) Every constructor of a value class is implicitly 'regulated'. (Similar to the rule that says every instance field of a value class is 'final'.) Value classes must be "strictly initialized"?i.e., the (always-final) fields of value classes must be initialized early. For superclasses of (concrete) value classes, we rely on the existing notion of identity/value/unrestricted abstract classes: value classes cannot extend identity classes; and in the value/unrestricted cases, all fields (if any) must be final and early-initialized. (Still working out details for how to infer/declare which category an abstract class belongs to.) > 7) In the class file, there's an ACC_REGULATED flag for methods, only allowed to be applied to methods. > > 8) At class loading, an ACC_VALUE class requires that its constructors all be ACC_REGULATED. > > 9) Verification ensures that 'this' is never read from or passed out of an ACC_REGULATED method. Specifically, in such a method, the type of 'this' is 'uninitializedThis' even after the invokespecial super/this call. Verification also ensures that the target of an invokespecial super/this call is ACC_REGULATED. (See detailed rules below.) Still working out JVM details, but for runtime safety, what we need is some way to assert that a final field must be early-initialized, and then a mechanism to reject attempts to do 'putfield' after the 'super()' call. Tentatively: - Final instance fields could be marked ACC_STRICT. Or maybe we could infer this property based on context (like ACC_VALUE on the class). - A linkage error already rejects 'putfield' outside of a final field's constructor. So we just need to manage 'putfield' within . And the verifier already distinguishes between "early" and "late" putfields (because the types involved are different). It would be trivial to add a verifier condition that the target of a "late" putfield must not be ACC_STRICT. - Alternatively, we have some ideas about tracking a "larval" state on objects, and dynamically rejecting a 'putfield' to an ACC_STRICT field of a non-larval object. > 10) The point at which control returns to a invokespecial of an method that *doesn't* represent a super/this call is the point at which a larval object becomes promoted to a "real" value object and the verification type can be 'LValueClass;'. No spec changes here, other than conveying that concept. As suggested by the above rules, this approach moves the larval-->adult transition a bit earlier, to the entry point of the Object. constructor. This is expressed with the existing verification type system, which transitions from 'uninitializedThis' to 'LFoo;'. > Some benefits of this approach: > - Garbage collecting special new opcodes and the methods, a big simplification > - Full binary compatibility when a class is refactored from identity to new, or vice versa > - (I think) ability to translate an identity class to a value class by simply flipping a flag bit > - Surfacing of a useful general-purpose concept: constructors that can be counted on not to leak 'this' > - Fewer restrictions on value objects' superclass constructors: allow code, access control, and checked exceptions > - Support for instance fields in superclasses of value objects (!)* > > (*We can logically allow for superclass fields, anyway. It's possible we'll decide there are implementation constraints that prevent implementing this immediately.) Basically the same list. Early initialization does require some change to the bytecode, so migration is not as simple (at the bytecode level) as adding 'ACC_VALUE'. But since most value class candidates don't need an explicit super() call, at the source level migration may be as easy as adding the 'value' modifier. The (very useful) general-purpose concept here is "strict construction" or "strict-final fields". This approach asks *even less* of value objects' superclasses: if they have fields, those will need to be early-initialized. Beyond that, the constructors can do whatever they want. > One risk is that ACC_REGULATED methods must not be instrumented in ways that make use of 'this'. This restriction applies to the constructor of Object. What are the chances this breaks somebody's tooling? This is one motivation for the shift in strategy: it actually seems fairly common for instrumentation users to want to add some this-dependent code to Object.. (There are examples of this in our own documentation, in fact...) If a value class is strictly constructed, there is no problem with executing arbitrary instrumentation code into the Object. constructor. That code should run just fine. From daniel.smith at oracle.com Wed Nov 29 16:52:33 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 29 Nov 2023 16:52:33 +0000 Subject: EG meeting 2023-11-29 Message-ID: Didn't send this out earlier, sorry. But we'll hold our scheduled EG meeting today. 5pm UTC (9am PST, 12pm EST). We can discuss the "rules for larval value object construction" updates.