From heidinga at redhat.com Thu Jun 1 14:47:49 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 1 Jun 2023 10:47:49 -0400 Subject: implicit constructor translation? Message-ID: Pulling on a couple of threads to make sure I have the translation strategy for the implicit constructors straight. Given a class like Complex with an implicit constructor: ``` value class Complex { private int re; private int im; public implicit Complex(); public Complex(int re, int im) { ... } ... } ``` and the JEP 401's ImplicitCreation attribute: ``` ImplicitCreation_attribute { u2 attribute_name_index; u4 attribute_length; u2 implicit_creation_flags; } ``` The "obvious" translation is to generate an ImplicitCreation attribute if there is an implicit constructor, which seems reasonable. The piece I'm looking for clarification on is whether there will also be a `method_info` in the classfile representing the implicit constructor. If the implicit constructor has a `method_info` then it will naturally be represented in the same way as the explicit `Complex(int, int)` constructor. This means both will be found by reflective operations (ie by j.l.Class::getConstructor) without special casing. Users that expect two constructors will find them in classfile and reflectively. Alas representing implicit constructors with a `method_info` is not without costs: primarily specing how the method_info exists and explaining why it doesn't have a code attribute. I know this has been mentioned on the EG calls, and I don't recall a final decision or see it in the spec drafts / documents so far. Was a conclusion reached on how to do the translation? --Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 1 17:34:53 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 13:34:53 -0400 Subject: Reader mail bag Message-ID: Some comments received on the -comments list over the past month or so: ?- https://mail.openjdk.org/pipermail/valhalla-spec-comments/2023-June/000049.html (Quan Mai) ?- https://mail.openjdk.org/pipermail/valhalla-spec-comments/2023-April/000048.html (Dmitry Paltatzidis) ?- https://mail.openjdk.org/pipermail/valhalla-spec-comments/2023-April/000047.html (Victor Nazarov) Victor N proposes "having immediate default" for B3 classes. This is nicely evocative, but a bit long; we seem to be converging on "implicitly constructible" for this term.? Also "immediate" is a term that is better known to assembly language programmers; "constructor" is a term in the Java developer's lexicon. Dmitry P reiterates some concerns about how users will be tempted to overuse B3 "because performance", and end up creating less safe programs as a result.? He raises two examples, Rational and Range. Rational is unfortunate because the default representation (when used improperly) can lead to DBZE, but has a sensible default of zero -- except for that pesky denominator.? However, I think this is a removable discontinuity, where the author can make up for this with some careful coding: ??? value class Rational { ??????? private int n, d; ??????? // obvious explicit and implicit ctor ??????? public int num() { return n; } ??????? public int denom() { return d == 0 ? 1 : 0; } ??????? // logic uses num() and denom() rather than n/d ??? } The moral of the story here is that sometimes class authors will have to do some extra work to interpret the default representation as meaning "what it should" if they want to take advantage of the benefits of B3, but all of this can be encapsulated to the implementation. The other moral hazard Dmitry P raises is the temptation to expose a Range class that tears, "because performance".? Indeed, Java developers frequently write and publish broken code "because performance", and we can't stop them -- all we can do is educate them.? It is a valid fear that people will over-rotate towards the new shiny rocket fuel, and in fact quite likely that people will do so initially.? We will have to use the levers we have -- education, good examples, sufficiently scary "don't be this guy" responses on Reddit and SO, IDE inspections, etc. Quan M raises the concern that ! opts into both non-nullity and, for non-atomic B3 classes, non-atomicity, and wonders whether there should be an explicit use-site syntax for non-atomicity. The answer to this is an emphatic "no".? I direct readers to the posting "On atomicity and tearing"; non-atomicity is not a feature to be programmed with directly, as much as a way of trading off one bad consequence vs another in the presence of broken programs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jun 1 17:46:43 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 1 Jun 2023 10:46:43 -0700 Subject: Reader mail bag In-Reply-To: References: Message-ID: On Thu, Jun 1, 2023 at 10:35?AM Brian Goetz wrote: Rational is unfortunate because the default representation (when used > improperly) can lead to DBZE, but has a sensible default of zero -- except > for that pesky denominator. However, I think this is a removable > discontinuity, where the author can make up for this with some careful > coding: > It helps a bit that you want to canonicalize all 0/n to *something* anyway, and rational operations are already busy taking the gcd and ensuring positive denominator as it is. The need to internally represent zero as 0/0 probably adds little incremental pain in *this* case, but there will be others where it does. Still, overall it seems like a very fine trade-off. -- Kevin Bourrillion | Java/Kotlin Ecosystem Team | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 1 17:52:29 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 13:52:29 -0400 Subject: Reader mail bag (Rational) In-Reply-To: References: Message-ID: Yes, I would frame this one as a "lucky near miss", where there's a removable discontinuity in Rational that permits it to work with a small amount of extra effort.? The author will likely do the appropriate input checking (and possibly GCD reduction) in the explicit ctor, but the key is that because of the implicit ctor, the author has to do some additional checking elsewhere too. Because it can all be factored through a `denom` accessor *in this case*, it is not very intrusive, but I am not positioning this as a general property, as much as "we got lucky with Rational and might get lucky with other similar cases that are at the boundary." I think much of the value of the Rational example is not "building a better Rational", as much as a cookbook example of something that _almost_ fits the implicitly-constructible mold, and a how-to guide for how to deal with this flavor of "almost."? If we had a few others, we'd have a cookbook cooking, and that would be a good thing. On 6/1/2023 1:46 PM, Kevin Bourrillion wrote: > On Thu, Jun 1, 2023 at 10:35?AM Brian Goetz > wrote: > > Rational is unfortunate because the default representation (when > used improperly) can lead to DBZE, but has a sensible default of > zero -- except for that pesky denominator. However, I think this > is a removable discontinuity, where the author can make up for > this with some careful coding: > > > It helps a bit that you want to canonicalize all 0/n to > /something/?anyway, and rational operations are already busy taking > the gcd and ensuring positive denominator as it is. The?need to > internally represent zero as 0/0 probably adds little incremental pain > in *this* case, but there will be others where it does. Still, overall > it seems like a very fine trade-off. > > > -- > Kevin Bourrillion?|?Java/Kotlin Ecosystem Team |?Google, > Inc.?|kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Thu Jun 1 17:53:42 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 1 Jun 2023 13:53:42 -0400 Subject: Preload attribute Message-ID: A couple of questions about the spec for the Preload attribute[0]. The current spec says it indicates "certain classes contain information that may be of interest during linkage." The Preload attribute removes one need for Q modifiers while allowing calling convention optimizations and layout decisions to be made early. The current spec is quite vague on what classes should be included in the attribute and on when / what the VM will do with those classes (or even if it does anything). I think it's time to tighten up the spec for Preload attribute and specify: * what the VM will do with classes listed in the attribute * when those classes will be loaded (ie: somewhere in JVMS 5.3) * how invalid cases are handled, including circularities (Class A's Preload mentions B <: A) * what types of classes can be listed (any? only values?) And there's probably other issues to clarify. Otherwise, the current spec isn't clear enough for users to know when to add a class, how it will be treated, and any potential edge cases to avoid. It probably makes sense to start from the current Hotspot handling of the attribute and fine tune that into the spec? --Dan [0] https://cr.openjdk.org/~dlsmith/jep401/jep401-20230404/specs/value-objects-jvms.html#jvms-4.7.31 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 1 17:59:38 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 13:59:38 -0400 Subject: implicit constructor translation? In-Reply-To: References: Message-ID: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> I think that there should be an explicit (and not synthetic) method_info for the implicit constructor, with the obvious Code attribute (`defaultvalue` / `areturn`.) My rationale is: "the declaration said there's a no-arg constructor, I should be able to call it".? And that intuition is entirely reasonable.? Users should be able to say `new Complex()` and get a default complex value.? (Maybe they can also say `Complex.default`; maybe we won't need that.)? And the same for reflection. Also, in case it is not obvious, the following class is illegal: ??? value class X { ??????? implicit X(); ??????? X() { ... } ??? } because it is trying to declare the same constructor twice.? An implicit constructor is a constructor, just one for which the compiler can deduce specific known semantics. On 6/1/2023 10:47 AM, Dan Heidinga wrote: > Pulling on a couple of threads to make sure I have the translation > strategy for the implicit constructors straight. > > Given a class like Complex with an implicit constructor: > ``` > value class Complex { > ? ? private int re; > ? ? private int im; > > ? ? public implicit Complex(); > ? ? public Complex(int re, int im) { ... } > > ? ? ... > } > ``` > and the JEP 401's ImplicitCreation attribute: > ``` > ImplicitCreation_attribute { > ? ? u2 attribute_name_index; > ? ? u4 attribute_length; > ? ? u2 implicit_creation_flags; > } > ``` > The?"obvious" translation is to generate an ImplicitCreation attribute > if there is an implicit constructor, which seems reasonable. > > The piece I'm looking for clarification on is whether there will also > be a `method_info` in the classfile representing the implicit constructor. > > If the implicit constructor has a `method_info` then it will naturally > be represented in the same way as the explicit `Complex(int, int)` > constructor.? This means both will be found by reflective operations > (ie by j.l.Class::getConstructor) without special casing.? Users that > expect two constructors will find them in classfile and reflectively. > > Alas representing implicit constructors with a `method_info` is not > without costs: primarily specing how the method_info exists and > explaining why it doesn't have a code attribute. > > I know this has been mentioned on the EG calls, and I don't?recall a > final decision or see it in the spec drafts / documents so far.? Was a > conclusion reached on how to do the translation? > > --Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Thu Jun 1 18:24:05 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 1 Jun 2023 18:24:05 +0000 Subject: Preload attribute In-Reply-To: References: Message-ID: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> > On Jun 1, 2023, at 10:53 AM, Dan Heidinga wrote: > > A couple of questions about the spec for the Preload attribute[0]. The current spec says it indicates "certain classes contain information that may be of interest during linkage." > > The Preload attribute removes one need for Q modifiers while allowing calling convention optimizations and layout decisions to be made early. > > The current spec is quite vague on what classes should be included in the attribute and on when / what the VM will do with those classes (or even if it does anything). FWIW, the JEP has more detail about when javac is expected to include classes in Preload. > I think it's time to tighten up the spec for Preload attribute and specify: > * what the VM will do with classes listed in the attribute It is intentional that the VM may choose to do nothing. So anything it does is purely an optimization. > * when those classes will be loaded (ie: somewhere in JVMS 5.3) If the VM chooses to load Preload classes, then our thinking was that JVMS 5.4 already describes the details of timing: https://docs.oracle.com/javase/specs/jvms/se20/html/jvms-5.html#jvms-5.4 So, for example, "Alternatively, an implementation may choose an "eager" linkage strategy, where all symbolic references are resolved at once when the class or interface is being verified." That is, the Preload classes could all be loaded during verification, or at some other stage of linking. My expectation is that the natural point for processing Preload is during preparation as vtables are set up, but sometimes I get these things wrong. :-) > * how invalid cases are handled, including circularities (Class A's Preload mentions B <: A) "Errors detected during linkage are thrown at a point in the program where some action is taken by the program that might, directly or indirectly, require linkage to the class or interface involved in the error." I've always found this rule super vague, but I think "require" is the key word, and implies that errors caused by Preload resolution should just be ignored. (Because Preload isn't "required" to be processed at all.) > * what types of classes can be listed (any? only values?) Definitely intend to support any classes of interest. Say a future optimization wants to know about a sealed superinterface, for example?it would be fine to tweak javac to add that interface to Preload, and then use the information to facilitate the optimization. There's a lot of nondeterminism here?can a compliant system trigger changes to class loading timing, but just on my birthday??but I think it's within the scope of JVMS 5.4, which provides a lot of latitude for loading classes whenever it's convenient. > It probably makes sense to start from the current Hotspot handling of the attribute and fine tune that into the spec? So I've outlined our hands-off stake in the ground above. The spec would definitely benefit, at least, from a non-normative cross-reference to 5.4 and short explanation. Beyond that, I think we'd be open to specifying more if we can agree something more is needed... From daniel.smith at oracle.com Thu Jun 1 18:30:40 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 1 Jun 2023 18:30:40 +0000 Subject: implicit constructor translation? In-Reply-To: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: <7B20DA4F-26A8-4F8B-972C-FA4718434044@oracle.com> > On Jun 1, 2023, at 10:59 AM, Brian Goetz wrote: > > I think that there should be an explicit (and not synthetic) method_info for the implicit constructor, with the obvious Code attribute (`defaultvalue` / `areturn`.) > > My rationale is: "the declaration said there's a no-arg constructor, I should be able to call it". And that intuition is entirely reasonable. Users should be able to say `new Complex()` and get a default complex value. (Maybe they can also say `Complex.default`; maybe we won't need that.) And the same for reflection. > > Also, in case it is not obvious, the following class is illegal: > > value class X { > implicit X(); > X() { ... } > } > > because it is trying to declare the same constructor twice. An implicit constructor is a constructor, just one for which the compiler can deduce specific known semantics. Agree with all of this. Some of these details were initially unclear a few weeks ago, but I think we've settled on a design in which the implicit constructor is a "real", invokable constructor, in addition to signaling some metadata about the class. > On 6/1/2023 10:47 AM, Dan Heidinga wrote: >> Alas representing implicit constructors with a `method_info` is not without costs: primarily specing how the method_info exists and explaining why it doesn't have a code attribute. There would be nothing special about the method. It only exists as a code path for explicit constructor invocations. The "I have a default instance" metadata comes from ImplicitCreation. (Do we expect reflection to reconstruct ImplicitCreation+no-arg --> 'Constructor.isImplicit' or 'ACC_IMPLICIT'? Maybe. Not entirely sure yet what the reflection surface will look like.) From heidinga at redhat.com Thu Jun 1 18:32:46 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 1 Jun 2023 14:32:46 -0400 Subject: implicit constructor translation? In-Reply-To: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: On Thu, Jun 1, 2023 at 1:59?PM Brian Goetz wrote: > I think that there should be an explicit (and not synthetic) method_info > for the implicit constructor, with the obvious Code attribute > (`defaultvalue` / `areturn`.) > I'm slightly concerned about having a Code attribute for the implicit constructor as it allows agents (ClassFile load hook & redefinition) to modify the bytecodes to be inconsistent with the VM's behaviour given the VM won't actually call the implicit constructor. Telling users it's "as if" the VM called the implicit ctor and then having the reflective behaviour be different after retransformation is slightly uncomfortable. > > > My rationale is: "the declaration said there's a no-arg constructor, I > should be able to call it". And that intuition is entirely reasonable. > Users should be able to say `new Complex()` and get a default complex > value. (Maybe they can also say `Complex.default`; maybe we won't need > that.) And the same for reflection. > Agreed on this. > > > Also, in case it is not obvious, the following class is illegal: > > value class X { > implicit X(); > X() { ... } > } > > because it is trying to declare the same constructor twice. An implicit > constructor is a constructor, just one for which the compiler can deduce > specific known semantics. > And this sounds right to me as well. I guess the only concern I have is whether there should be a Code attribute or not and I think I lean towards not... > > On 6/1/2023 10:47 AM, Dan Heidinga wrote: > > Pulling on a couple of threads to make sure I have the translation > strategy for the implicit constructors straight. > > Given a class like Complex with an implicit constructor: > ``` > value class Complex { > private int re; > private int im; > > public implicit Complex(); > public Complex(int re, int im) { ... } > > ... > } > ``` > and the JEP 401's ImplicitCreation attribute: > ``` > ImplicitCreation_attribute { > u2 attribute_name_index; > u4 attribute_length; > u2 implicit_creation_flags; > } > ``` > The "obvious" translation is to generate an ImplicitCreation attribute if > there is an implicit constructor, which seems reasonable. > > The piece I'm looking for clarification on is whether there will also be a > `method_info` in the classfile representing the implicit constructor. > > If the implicit constructor has a `method_info` then it will naturally be > represented in the same way as the explicit `Complex(int, int)` > constructor. This means both will be found by reflective operations (ie by > j.l.Class::getConstructor) without special casing. Users that expect two > constructors will find them in classfile and reflectively. > > Alas representing implicit constructors with a `method_info` is not > without costs: primarily specing how the method_info exists and explaining > why it doesn't have a code attribute. > > I know this has been mentioned on the EG calls, and I don't recall a final > decision or see it in the spec drafts / documents so far. Was a conclusion > reached on how to do the translation? > > --Dan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jun 1 18:37:13 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 1 Jun 2023 11:37:13 -0700 Subject: Briefest summary of today's Valhalla+nullness picture Message-ID: Brian's article from yesterday walks through it all carefully. This message is my attempt at a summary. When I step back I'm astounded that we've come this far: most developers, most of the time, will be able to reason like so: "I give up things I don't need, then I let the VM do its thing, and my performance is better. If I care to learn the gory details, I can." (Granted, the sort of developers reading *this* thread will want more than that, and in Brian's post they have it.) But from *this* vantage point, here's where we're at today, as concisely as I can manage: 1. If your class doesn't do Identity Things, make it a value class. The VM can do smarter things now. 2. `==` isn't really an Identity Thing: it will just continue to mean "observably identical*" as it always has. System.hashCode() remains defined in terms of `==`. But actual identity dependence will fail (when feasible, at compile time). 3. Any type based on that class can be marked as nullable `?` or non-null `!`. Conceptually (if not literally), `Foo!` is a subtype of `Foo?`. 4. Nullness enforcement is "best-effort" and better than nothing. *If* you want the gory details of what's enforced you can dig into them. There's still room for third-party nullness analysis tools to help. 5. If a value class has a do-nothing constructor, and you're fine with non-null variables of that type being initialized to that value, add `implicit` to the constructor. The VM can do more smarter things now. 6. Consider adding `non-atomic` to the class; the VM can do even morer smarter things now. The downside is that racy code (already risky) might fail in worse ways; you decide if that worries you or not. 7. The above holds for the 8 primitive types too; yes, they are still special, but mostly in *additional* ways, rather than exceptions to the usual rules. `int` and `Integer!` are now nearly synonymous. And we can finally all retire the term "primitive class" now at last. :-) While that list is meant to have asterisks and just be "good enough for most people most of the time", it might need some important corrections nevertheless, so, I'm glad to hear them. * yes float/double are weird -- Kevin Bourrillion | Java/Kotlin Ecosystem Team | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Thu Jun 1 18:49:19 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 1 Jun 2023 18:49:19 +0000 Subject: implicit constructor translation? In-Reply-To: References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: > On Jun 1, 2023, at 11:32 AM, Dan Heidinga wrote: > > On Thu, Jun 1, 2023 at 1:59?PM Brian Goetz wrote: >> I think that there should be an explicit (and not synthetic) method_info for the implicit constructor, with the obvious Code attribute (`defaultvalue` / `areturn`.) >> > I'm slightly concerned about having a Code attribute for the implicit constructor as it allows agents (ClassFile load hook & redefinition) to modify the bytecodes to be inconsistent with the VM's behaviour given the VM won't actually call the implicit constructor. > > Telling users it's "as if" the VM called the implicit ctor and then having the reflective behaviour be different after retransformation is slightly uncomfortable. The problem here is we have a language/VM model mismatch. In the VM model: is a factory method that can do whatever it wants, and be included or not included. All that matters for default values is ImplicitCreation. In the language model: the implicit constructor allows both 'Foo.default' and 'new Foo()', both of which produce the same value. Yes, it's possible to generate bytecode that doesn't conform to the language model, that's always the risk of designing a language/VM mismatch. But this feels to me like more of the same in the constructor space?e.g., we've already got VM-level abstract classes that can have methods that won't run for value class instance creation, but will run for identity class instance creation. From frederic.parain at oracle.com Thu Jun 1 18:34:57 2023 From: frederic.parain at oracle.com (Frederic Parain) Date: Thu, 1 Jun 2023 14:34:57 -0400 Subject: Preload attribute In-Reply-To: References: Message-ID: <4946a01e-d205-0aa9-5ad2-148333519e25@oracle.com> The current support of the PreLoad attribute in HotSpot is very lenient: ? - the VM tries to load all classes listed in the attribute, there's no attempt to check if a listed class is used in the declaration of a field or as the type of a method argument or return value ? - the VM tries to load those classes at link time (after linking of super-interfaces, before verification), but this timing is constrained by some HotSpot internal designs, timing could be different on another VM ? - all error/exceptions thrown during an attempt to load one of those classes is caught and discarded (silent failure) ? - if the loading is successful, there's no check that the class is a value class. It could be an identity class or an interface AFAICT, this implementation matches Dan Smith's answers to your questions. Fred On 6/1/23 1:53 PM, Dan Heidinga wrote: > A couple of questions about?the spec for the Preload attribute[0].? > The current spec says it indicates "certain classes contain > information that may be of interest during linkage." > > The Preload attribute removes one need for Q modifiers while allowing > calling convention optimizations and layout decisions to be made early. > > The current spec is quite vague on what classes should be included in > the attribute and on when / what the VM will do with those classes (or > even if it does anything).? I think it's time to tighten up the spec > for Preload attribute and specify: > * what the VM will do with classes listed in the attribute > * when those classes will be loaded (ie: somewhere in JVMS 5.3) > * how invalid cases are handled, including circularities (Class A's > Preload mentions B <: A) > * what types of classes can be listed (any? only values?) > > And there's probably other issues to clarify.? Otherwise, the current > spec isn't clear enough for users to know when to add a class, how it > will be treated, and any potential edge cases to avoid. > > It probably makes sense to start from the current Hotspot handling of > the attribute and fine tune that into the spec? > > --Dan > > [0] > https://cr.openjdk.org/~dlsmith/jep401/jep401-20230404/specs/value-objects-jvms.html#jvms-4.7.31 From brian.goetz at oracle.com Thu Jun 1 19:34:29 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 15:34:29 -0400 Subject: implicit constructor translation? In-Reply-To: References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: On 6/1/2023 2:32 PM, Dan Heidinga wrote: > > > On Thu, Jun 1, 2023 at 1:59?PM Brian Goetz wrote: > > I think that there should be an explicit (and not synthetic) > method_info for the implicit constructor, with the obvious Code > attribute (`defaultvalue` / `areturn`.) > > > I'm slightly concerned about having a Code attribute for the implicit > constructor as it allows agents (ClassFile load hook & redefinition) > to modify the bytecodes to be inconsistent with the VM's behaviour > given the VM won't actually call the implicit constructor. > > Telling users it's "as if" the VM called the implicit ctor and then > having the reflective behaviour be different after retransformation is > slightly uncomfortable. I'm fine if the expansion of the constructor body happens at runtime rather than compile time; my suggestion was mostly "this is super-easy in the static compiler, why make more work for the JVM." But, if you're signing up for that work, I won't stop you.... -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 1 19:38:01 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 15:38:01 -0400 Subject: Briefest summary of today's Valhalla+nullness picture In-Reply-To: References: Message-ID: <31d5bf44-579c-b755-7185-913b0283ab6f@oracle.com> > 1. If your class doesn't do Identity Things, make it a value class. > The VM can do smarter things now. Which also makes your class safer too! > 2. `==` isn't really an Identity Thing: it will just continue to mean > "observably identical*" as it always has. System.hashCode() remains > defined in terms of `==`. But actual identity dependence will fail > (when feasible, at compile time). Some readers might retort that the cost of this is `==` went from ultra-super-cheap to maybe-but-who-knows.? We might counter-retort that such micro-performance concerns are usually a distraction. > 3. Any type based on that class can be marked as nullable `?` or > non-null `!`. Conceptually (if not literally), `Foo!` is a subtype of > `Foo?`. Where by "conceptually" you are appealing to "is-a". > 4. Nullness enforcement is "best-effort" and better than nothing. *If* > you want the gory details of what's enforced you can dig into them. > There's still room for third-party nullness analysis tools to help. > > 5. If a value class has a do-nothing constructor, and you're fine with > non-null variables of that type being initialized to that value, add > `implicit` to the constructor. The VM can do more smarter things now. > > 6. Consider adding `non-atomic` to the class; the VM can do even morer > smarter things now. The downside is that racy code (already risky) > might fail in worse ways; you decide if that worries you or not. And specifically, if your class has cross-field invariants, it should worry you. > 7. The above holds for the 8 primitive types too; yes, they are still > special, but mostly in /additional/?ways, rather than exceptions to > the usual rules. `int` and `Integer!` are now nearly synonymous. > > And we can finally all retire the term "primitive class" now at last. :-) Praise be. > While that list is meant to have asterisks and just be "good enough > for most people most of the time", it might need some important > corrections nevertheless, so, I'm glad to hear them. > > > * yes float/double are weird > > -- > Kevin Bourrillion?|?Java/Kotlin Ecosystem Team |?Google, > Inc.?|kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 1 19:59:22 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 15:59:22 -0400 Subject: B3, default values, and implicit initialization In-Reply-To: References: <61a456f3-88bc-553a-2310-50acf0fe9cf7@oracle.com> <713ff8e0-38c3-8e27-6055-5f7bf1fc7818@oracle.com> Message-ID: Returning to an old syntaxy topic which is as-yet unresolved. On 4/24/2023 7:18 PM, Dan Heidinga wrote: > > ?I really wanted to cram the non-atomic designation on constructors as > well but it's not really a property of the instance; rather it > describes writes to storage which puts it as a property of the class.? > Still trying to come up with a better intuition for where this belongs. Our journey here has been pulled by multiple forces.? On the one hand, there are four separate things that you can opt out of, and each of these may have consequences for semantics and performance: ?- opting out of identity ?- opting out of the requirement for explicit construction ?- opting out of atomicity ?- opting out of nullability We've largely decided that the first three are declaration-site properties of the class and the last is a use-site declaration (there are of course pros and cons of moving the boundary.)? We've also decided that we don't want to lump any of these; each of these is a decision that should be made explicitly. The three declaration-site properties build on each other; non-atomicity makes little sense for identity objects, and while I could imagine a sensible semantics for implicit construction of identity objects, it's just not that useful.? So in practice we'll see one of the following: ?? __value class Foo { } ?? __implictly-constructible __value class Foo { } ?? __non-atomic __implicitly-constructible __value class Foo { } Stacking modifiers up like this has several disadvantages.? It gets annoying fast.? You can't use some modifiers without using prerequisite modifiers.? And all the modifiers feel a little "front-loaded." The other approach, which we considered and rejected eralier, is having three different top-level things (call them B1 class, B2 class, and B3 class.)? This seemed out of line, as it seems better to focus on the commonality rather than the differences. I think we were very successful with moving __implicitly-constructible to a constructor; this reframes __implicitly-constructible as a property of the class, and making the implicit value a constructor feels natural.? That whittles down the "long list of modifiers" from three to two, which is progress. It feels like non-atomic might similarly want to be "demoted" somewhere syntactically. Possibilities include: ?- A modifier like non-atomic ?- A modifier on some other class member, such as the constructors or the fields ?- A supertype ?- Some other declaration in the class We test-drove the modifiers-on-constructor idea and it didn't feel right; inventing a new kind of declaration in the class also doesn't seem right.? So I'd like to test-drive the supertype idea. In an early version, we actually did model this as a supertype, which we called Tearable (or maybe it was NonTearable.)? But I think we can agree that the T-word isn't quite the right focus. What seems like the right focus is a description about the _boundary of the object_.? Ordinarily, an object has integrity; the fields travel together.? A non-atomic object is a looser confederation of fields; we are explicitly saying "the fields can take separate routes to their destination." I am thinking of something like: ??? value class X implements WeakValue { ... } where "WeakValue" means that the object is a weak confederation of its fields.? (A WV value class would require an implicit constructor, as today with non-atomic.) (One advantage of using a marker interface like this is that in the IDE, the user can navigate into WeakValue and see an explanation of what this means in the WV Javadoc.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jun 1 22:26:14 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 1 Jun 2023 15:26:14 -0700 Subject: The Good Default Value In-Reply-To: References: Message-ID: There is one thing that I simply can never let go without comment (which Brian had to have known :-)). On Wed, May 31, 2023 at 11:37?AM Brian Goetz wrote: ### Initialization > > The key distinction between today's primitives and objects has to do with > _initialization requirements_. Primitives are designed to be _used > uninitialized_; if we declare a field `int count`, it is reliably > initialized to > zero by the JVM before any code can access it. This initial value is a > perfectly good default, and it is not a bug to read or even increment this > field > before it has been explicitly assigned a value by the program, because it > has > _already_ been initialized to a known good value by the JVM. The zero > value > pre-written by the JVM is not just a safety net; it is actually part of the > programming model that primitives start out life with "good enough" > defaults. > This is part of what it means to be a primitive type. > Uninitialized values of primitive types do still cause bugs. Uninitialized values of primitive types do still cause bugs! If it scanned better I'd put it to music. I think I understand the sense in which the phrase "it is not a bug to" is intended above, but to me it's still an outrageous statement. Put succinctly, I'm still unconvinced there is any such thing as a "good default value". What I see are varying degrees of tolerable ones. Which we tolerate why? Because performance, end of story. Oh, it might happen to be the default you wanted, but in my mind that's just coincidence (if admittedly not a very rare one). As far as I know, Valhalla could *maybe* still decide to require that non-nullable variables of value-class types be initialized explicitly. Maybe it would complicate migration too much, I'm not sure. I know it would require one new feature: `new ValClass[100] (int i -> something())` ... which, importantly, must compile to the optimal code (that never actually loops) when `something()` is literally just a call to ValClass's implicit constructor (or is otherwise the default value for the type in question). In case `(i -> MyImplicitlyConstructableType())` is the behavior you want, you'd be a bit steamed to have to write it out. But just *how* valuable is getting to skip it? I wonder. And I would claim with some vigor that this array syntax would *always* have been a lovely feature to have in any case. It looks to me like it wrangles and contains the initialization gap in much the same way try-with-resources wrangles and contains the acquire-release gap. Maybe this is off the table for $reasons -- but then here's my point. If it is, I claim that's a *sacrifice* we're making -- not that it was made unnecessary by the preponderance of types with so-called "good default values". In other words, even if my plea here changes nothing about the feature design (after all, it is work), I still hope merely to convince us to talk about the feature differently than we have been, because I think it matters. Okay. That's my highest bid. If it didn't shift anyone's view I'd really appreciate hearing the rebuttal. -- Kevin Bourrillion | Java/Kotlin Ecosystem Team | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jun 1 22:41:59 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 1 Jun 2023 18:41:59 -0400 Subject: The Good Default Value In-Reply-To: References: Message-ID: Saw this coming :) On 6/1/2023 6:26 PM, Kevin Bourrillion wrote: > There is one thing that I simply can never let go without comment > (which Brian had to have known :-)). > > > On Wed, May 31, 2023 at 11:37?AM Brian Goetz > wrote: > > ### Initialization > > The key distinction between today's primitives and objects has to > do with > _initialization requirements_.?? Primitives are designed to be _used > uninitialized_; if we declare a field `int count`, it is reliably > initialized to > zero by the JVM before any code can access it.? This initial value > is a > perfectly good default, and it is not a bug to read or even > increment this field > before it has been explicitly assigned a value by the program, > because it has > _already_ been initialized to a known good value by the JVM.? The > zero value > pre-written by the JVM is not just a safety net; it is actually > part of the > programming model that primitives start out life with "good > enough" defaults. > This is part of what it means to be a primitive type. > > > Uninitialized values of primitive types do still cause bugs. Another way to think about this is the historical progression. In hardware / assembly language, when you load a value from a register or memory location, and you haven't provably put a good value there, you get an out-of-thin-air value.? There is no "safe initialization" of anything; there is just memory and registers. C more or less propagates this approach; if you don't initialize a variable, you get what you get.? C doesn't force indirections on you; you have to ask for them. Java 1.0 moved the ball forward for direct storage, saying that the initial value of any heap variable is zero, and requires that locals be initialized before use.? It also moves the ball forward by putting indirections in for you where they are semantically needed. But Java copies what C did for primitives; in the desire to not make arithmetic ungodly expensive, an int is just a direct machine int, like in C, with a frosting of bzero. Valhalla gives us a choice: we can flatten more non-nullable things, or we can cheapen (somewhat) the things that use null as a initialization guard.? The good thing is that we can choose which we want as the situation warrants; the bad thing is we can make bad choices. People will surely make bad choices to get the flattening benefits of B3, because, performance!? But this is not all that different from the other bad performance-overrotations that people make in Java every day, other than this one is new and so people will initially fall into it more. > As far as I know, Valhalla could *maybe* still decide to require that > non-nullable variables of value-class types be initialized > explicitly.?Maybe it would complicate migration too much,?I'm not > sure. I know it would require one new feature: > > `new ValClass[100] (int i -> something())` For fields, we can close the gap by doing null checks at constructor end (in the same place we emit memory barriers to support the final field initialization safety guarantees.)? The latter is voided when `this` escapes construction, and the former would be as well, but this seems a pragmatic choice which narrows the initially-null problem quite a bit for fields.? As you point out, arrays are harder, and it requires something much like what you suggest (which is also a useful feature in its own right.)? Note that none of this is needed for B3!, only for B1!/B2!. > ... which, importantly, must compile to the optimal code (that never > actually loops) when `something()` is literally just a call to > ValClass's implicit constructor (or is otherwise the default value for > the type in question). In case `(i -> > MyImplicitlyConstructableType())` is the behavior you want, you'd be a > bit steamed to have to write it out. But just *how* valuable is > getting to skip it? I wonder. Nothing wrong with `new B3![n]`, any more than `new int[n]`.? It's B1/B2 that have the problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jun 1 23:10:59 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 1 Jun 2023 16:10:59 -0700 Subject: The Good Default Value In-Reply-To: References: Message-ID: On Thu, Jun 1, 2023 at 3:42?PM Brian Goetz wrote: Another way to think about this is the historical progression. > > In hardware / assembly language, when you load a value from a register or > memory location, and you haven't provably put a good value there, you get > an out-of-thin-air value. There is no "safe initialization" of anything; > there is just memory and registers. > > C more or less propagates this approach; if you don't initialize a > variable, you get what you get. C doesn't force indirections on you; you > have to ask for them. > > Java 1.0 moved the ball forward for direct storage, saying that the > initial value of any heap variable is zero, and requires that locals be > initialized before use. It also moves the ball forward by putting > indirections in for you where they are semantically needed. But Java > copies what C did for primitives; in the desire to not make arithmetic > ungodly expensive, an int is just a direct machine int, like in C, with a > frosting of bzero. > > Valhalla gives us a choice: we can flatten more non-nullable things, or we > can cheapen (somewhat) the things that use null as a initialization guard. > The good thing is that we can choose which we want as the situation > warrants; the bad thing is we can make bad choices. > > People will surely make bad choices to get the flattening benefits of B3, > because, performance! But this is not all that different from the other > bad performance-overrotations that people make in Java every day, other > than this one is new and so people will initially fall into it more. > I'm not necessarily following the impact of these statements on the arguments I'm making. I don't think I'm raising concerns about people picking the wrong bucket. I'm trying to establish that there's never anything actually *good* about default initialization; that at the very best it's "harmless and very slightly convenient", no more. A typing saver in exchange for bug risk. Notably it's at its most harmless for nullable types, which are the more likely ones to blow up outright when used uninitialized. But those aren't the cases this thread is focusing on. As far as I know, Valhalla could *maybe* still decide to require that > non-nullable variables of value-class types be initialized > explicitly. Maybe it would complicate migration too much, I'm not sure. I > know it would require one new feature: > > `new ValClass[100] (int i -> something())` > > For fields, we can close the gap by doing null checks at constructor end > (in the same place we emit memory barriers to support the final field > initialization safety guarantees.) The latter is voided when `this` > escapes construction, and the former would be as well, but this seems a > pragmatic choice which narrows the initially-null problem quite a bit for > fields. > I do like that, but again I think it's addressing a different set of issues than I'm trying to. My brain is certainly fuzzy, though. Again I'd say that initialization problems that leave something *null* are probably the least-harmful kind, thanks to our beloved friend NullPointerException. I'm wondering why we shouldn't require fields of non-nullable value-class types to be explicitly initialized. `Complex x = new Complex(0, 0)` or `Complex x = new Complex()`. I'll stipulate "people would grumble" as self-evident. > As you point out, arrays are harder, and it requires something much like > what you suggest (which is also a useful feature in its own right.) Note > that none of this is needed for B3!, only for B1!/B2!. > . . . > Nothing wrong with `new B3![n]`, any more than `new int[n]`. It's B1/B2 > that have the problem. > I think I *am* talking about B3? If you could reread my message it might help. Or you might just tell me to reread yours (which I would). :-) -- Kevin Bourrillion | Java/Kotlin Ecosystem Team | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jun 2 17:31:39 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 2 Jun 2023 13:31:39 -0400 Subject: The Good Default Value In-Reply-To: References: Message-ID: > I'm trying to establish that there's never anything actually *good* > about default initialization; that at the very best it's "harmless and > very slightly convenient", no more. A typing saver in exchange for bug > risk. Notably it's at its most harmless for nullable types, which are > the more likely ones to blow up outright when used uninitialized. But > those aren't the cases this thread is focusing on. OK, let me zoom out.? Primitives (and B3) support implicit construction (with zero default values) *so that* they can be effectively represented in memory.? While neither C nor Java 1.0 spelled this out, there is an obvious cost to representing numerics with an indirection, and the initialization safety of null would have effectively required indirection.? So numerics in C and primitives in Java (and going forward, B3 in Java) support default initialization not because the default is *semantically great*, but because it's the pragmatic choice that gets us the memory layout we want. I think when you say "good" wrt default values, you're speaking purely about programming-model considerations (i.e., convenience, readability, safety), and when I say "good" wrt default values, I'm speaking about all of those *plus* the memory layout consequences. Which explains the difference in conclusion -- you're saying "not terrible" and I'm saying "good" because it's a good overall tradeoff.? Does that track? > I'm wondering why we shouldn't require fields of non-nullable > value-class types to be explicitly initialized. `Complex x = new > Complex(0, 0)` or `Complex x = new Complex()`. I'll stipulate "people > would grumble" as self-evident. For B1!/B2! fields, this almost a forced move, as otherwise an object will be created with ! fields that have null in them.? For B3! fields, given that the whole distinction between B3 and B2 is about implicit construction, this seems like it might be counterproductive, and it will be another seam between primitives and B3!. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Fri Jun 2 21:21:17 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 2 Jun 2023 14:21:17 -0700 Subject: The Good Default Value In-Reply-To: References: Message-ID: On Fri, Jun 2, 2023 at 10:31?AM Brian Goetz wrote: > I'm trying to establish that there's never anything actually *good* about > default initialization; that at the very best it's "harmless and very > slightly convenient", no more. A typing saver in exchange for bug risk. > Notably it's at its most harmless for nullable types, which are the more > likely ones to blow up outright when used uninitialized. But those aren't > the cases this thread is focusing on. > > > OK, let me zoom out. Primitives (and B3) support implicit construction > (with zero default values) *so that* they can be effectively represented in > memory. While neither C nor Java 1.0 spelled this out, there is an obvious > cost to representing numerics with an indirection, and the initialization > safety of null would have effectively required indirection. So numerics in > C and primitives in Java (and going forward, B3 in Java) support default > initialization not because the default is *semantically great*, but because > it's the pragmatic choice that gets us the memory layout we want. > > I think when you say "good" wrt default values, you're speaking purely > about programming-model considerations (i.e., convenience, readability, > safety), and when I say "good" wrt default values, I'm speaking about all > of those *plus* the memory layout consequences. > I low key suspect you might have that impression mostly because it's what you *expect* from me. Or I might have communicated badly, or am missing something obvious (wouldn't be the first time today). I was hypothesizing that we could have both. I've floated the idea of requiring explicit initialization, but did acknowledge that something would absolutely *have* to optimize a literal `new MyVal[1000] (i -> MyVal.default)` (handwave syntax) to the same thing `new MyVal[1000]` would do (if allowed). I wouldn't have it otherwise. I *think* that bridges this perceived gap? If that's impossible, this whole thread basically goes away. Now the reaction I'd *expect* is "no one wants to have to write that". But I think that actually supports my main point: that this is a convenience argument, not that that default is in any intrinsic way "good". Then I guess my secondary claim is that it's actually *mild* convenience at that. I'm wondering why we shouldn't require fields of non-nullable value-class > types to be explicitly initialized. `Complex x = new Complex(0, 0)` or > `Complex x = new Complex()`. I'll stipulate "people would grumble" as > self-evident. > > For B1!/B2! fields, this almost a forced move, as otherwise an object will > be created with ! fields that have null in them. For B3! fields, given > that the whole distinction between B3 and B2 is about implicit > construction, this seems like it might be counterproductive, and it will be > another seam between primitives and B3!. > Making primitives more special is bad, it's true. But for what it's worth, from my experiences, what I'm saying does apply to primitives too: `new int[100] { 0 }` is just as superior to `new int[100]` as with the MyVal case above. The only difference (in my magic-wand universe here) is that the int case can draw only a warning, for $someLongPeriodOfTime, before the seam goes away. I think a seam with an expiration date is *already* a less-bad kind of a seam just for that fact, even if that date is *very* far in the future. I realize this is a hard pill to swallow, and I can't be surprised if my full argument doesn't carry. But even given today's exact design, I have to object to the *way* it keeps being justified. Did this help? -- Kevin Bourrillion | Java/Kotlin Ecosystem Team | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Fri Jun 2 21:46:40 2023 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 2 Jun 2023 14:46:40 -0700 Subject: Minor question about a `MyVal.default`-like syntax In-Reply-To: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: This is not too important right now, but I had thoughts, so... On Thu, Jun 1, 2023 at 10:59?AM Brian Goetz wrote: Users should be able to say `new Complex()` and get a default complex > value. (Maybe they can also say `Complex.default`; maybe we won't need > that.) And the same for reflection. > I think the argument for `MyVal.default` being *unnecessary* might go like this: * either there's no implicit constructor and `MyVal.default` won't work * or there is, and `MyVal.default` would have to mean the same as `new MyVal()`, so what's the point? If that's correct, there might not be a strong argument for keeping it, but I came up with a couple weak ones. 1. Arguably, its meaning is more apparent without the reader having to dig into MyVal.java (how much does this matter, in this case?) 2. It *feels like* a well-known immutable value that just kind of "exists" and has no need to be constructed. A constant. In fact people might feel tempted to make such constants? Note I do still approve of `public implicit MyVal();` and even of calling that a "constructor", because it nearly enough plays that role. But something about the term "implicit construction" doesn't seem right. "Default initialization" seems more on point I guess.) (Tangent: while the choice of keyword isn't that important right now, I have already found myself saying "explicit implicit constructor" and then frowning in momentary confusion :-)) -- Kevin Bourrillion | Java/Kotlin Ecosystem Team | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Sat Jun 3 06:12:59 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 3 Jun 2023 08:12:59 +0200 (CEST) Subject: Design document on nullability and value types In-Reply-To: References: Message-ID: <2075843456.71765771.1685772779404.JavaMail.zimbra@univ-eiffel.fr> Hi all, I am not convinced that adding the nullability annotations to other types than the value types with a default value is a good move, it seems to shut the door to potential futures where being non null is more strongly enforced by the VM. If the goal of Valhalla is to introduce value types, I think we are extending too much our reach here, making decisions for Java we may regret later. In understand the appeal of providing an unified view regarding nullability but sadly as explained in this document, the unification will only be true in term of syntax not in term of semantics, only the value types with a default value being reified, enforced by the VM. As an example, the proposal Array 2.0 of John is still on the table and proposes to add non null array at runtime, so choosing to erase the nullability annotation now seems a bad move now if in the future such kind of arrays are added to the Java platform. Obviously, we want to allow migration from/to identity type and value type (or from value type without a default value to a value type with a default value) so we have to specify both at compile time and at runtime a semantics that allow that. But I am seeing erasing nullability annotations in case of identity type as a too easy shortcut. I would prefer to live in a world where '!' is only available on value type with a default value at compile time, with the field and array creation being only enforced at runtime only if the class is a value class with a default value. A world where adding an implicit constructor is a source backward compatible move but the opposite is not and where the VM ignores nullability attributes at runtime if the class is not actually a value class with a default value so moving from a value class with a default value to a value class without a default value is a binary compatible move. With the model above, we only have null pollution because of separate compilations, especially, we keep the property that when unboxing (i.e. the transition T to T!) null checking is done by the VM so there is no null pollution. Allowing more erasure only to have a more uniform syntax is not appealing to me and seems worst if seen from the future. regards, R?mi > From: "Brian Goetz" > To: "valhalla-spec-experts" > Sent: Wednesday, May 31, 2023 8:37:34 PM > Subject: Design document on nullability and value types > As we've hinted at, we've made some progress refining the essential differences > between primitive and reference types, which has enabled us to shed the `.val` > / `.ref` distinction and lean more heavily on nullability. The following > document outlines the observations that have enabled this current turn of > direction and some of its consequences. > This document is mostly to be interpreted in the context of the Valhalla > journey, and so talks about where we were a few months ago and where we're > heading now. > # Rehabilitating primitive classes: a nullity-centric approach > Over the course of Project Valhalla, we have observed that there are two > distinct groups of value types. We've tried stacking them in various ways, but > there are always two groups, which we've historically described as "objects > without identity" and "primitive classes", and which admit different degrees of > flattening. > The first group, which we are now calling "value objects" or "value classes", > represent the minimal departure from traditional classes to disavow object > identity. The existing classes that are described as "value-based", such as > `Optional` or `LocalDate`, are candidate for migrating to value classes. Such > classes give up object identity; identity-sensitive behaviors are either recast > as state-based (such as for `==` and `Objects::identityHashCode`) or partialized > (`synchronized`, `WeakReference`), and such classes must live without the > affordances of identity (mutability, layout polymorphism.) In return, they > avoid being burdened by "accidental identity" which can be a source of bugs, and > gain significant optimization for stack-based values (e.g., scalarization in > calling convention) and other JIT optimizations. > The second group, which we had been calling "primitive classes" (we are now > moving away from that term), are those that are more like the existing > primitives, such as `Decimal` or `Complex`. Where ordinary value classes, like > identity classes, gave rise to a single (reference) type, these classes gave > rise to two types, a value type (`X.val`) and a reference type (`X.ref`). This > pair of types was directly analogous to legacy primitives and their boxes. These > classes come with more restrictions and more to think about, but are rewarded > with greater heap flattening. This model -- after several iterations -- seemed > to meet the goals for expressiveness and performance: we can express the > difference between `int`-like behavior and `Integer`-like behavior, and get > routine flattening for `int`-like types. But the result still had many > imbalances; the distinction was heavyweight, and a significant fraction of the > incremental specification complexity was centered only on these types. We > eventually concluded that the source of this was trying to model the `int` / > `Integer` distinction directly, and that this distinction, while grounded in > user experience, was just not "primitive" enough. > In this document, we will break down the characteristics of so-called "primitive > classes" into more "primitive" (and hopefully less ad-hoc) distinctions. This > results in a simpler model, streamlines the syntactic baggage, and enables us to > finally reunite with an old friend, null-exclusion (bang) types. Rather than > treating "value types" and "reference types" as different things, we can treat > the existing primitives (and the "value projection" of user-defined primitive > classes) as being restricted references, whose restrictions enable the desired > runtime properties. > ## Primitives and objects > In a previous edition of _State of Valhalla_, we outlined a host of differences > between primitives and objects: > | Primitives | Objects | >| ------------------------------------------ | > | ----------------------------------------- | > | No identity (pure values) | Identity | > | `==` compares state | `==` compares object identity | > | Built-in | Declared in classes | >| No members (fields, methods, constructors) | Members (including mutable fields) > | | > | No supertypes or subtypes | Class and interface inheritance | > | Represented directly in memory | Represented indirectly through references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | Arrays are monomorphic | Arrays are covariant | > | May tear under race | Initialization safety guarantees | > | Have reference companions (boxes) | Don't need reference companions | > Over many iterations, we have chipped away at this list, mostly by making > classes richer: value classes can disavow identity (and thereby opt into > state-based `==` comparison); the lack of members and supertypes are an > accidental restriction that can go away with declarable value classes; we can > make primitive arrays covariant with arrays of their boxes; we can let some > class declarations opt into non-atomicity under race. That leaves the > following, condensed list of differences: > | Primitives | Objects | > | --------------------------------- | ----------------------------------------- | > | Represented directly in memory | Represented indirectly through references | > | Not nullable | Nullable | > | Default value is zero | Default value is null | > | Have reference companions (boxes) | Don't need reference companions | > The previous approach ("primitive classes") started with the assumption that > this is the list of things to be modeled by the value/reference distinction. In > this document we go further, by showing that flattening (direct representation) > is derived from more basic principles around nullity and initialization > requirements, and perhaps surprisingly, the concept of "primitive type" can > disappear almost completely, save only for historical vestiges related to the > existing eight primitives. The `.val` type can be replaced by restricted > references whose restrictions enable the desired representational properties. As > is consistent with the goals of Valhalla, flattenability is an emergent > property, gained by giving up those properties that would undermine > flattenability, rather than being a linguistic concept on its own. > ### Initialization > The key distinction between today's primitives and objects has to do with > _initialization requirements_. Primitives are designed to be _used > uninitialized_; if we declare a field `int count`, it is reliably initialized to > zero by the JVM before any code can access it. This initial value is a > perfectly good default, and it is not a bug to read or even increment this field > before it has been explicitly assigned a value by the program, because it has > _already_ been initialized to a known good value by the JVM. The zero value > pre-written by the JVM is not just a safety net; it is actually part of the > programming model that primitives start out life with "good enough" defaults. > This is part of what it means to be a primitive type. > Objects, on the other hand, are not designed for uninitialized use; they must be > initialized via constructors before use. The default zero values written to an > object's fields by the JVM typically don't necessarily constitute a valid state > according to the classes specification, and, even if it did, is rarely a good > default value. Therefore, we require that class instances be initialized by > their constructors before they can be exposed to the rest of the program. To > ensure that this happens, objects are referenced exclusively through _object > references_, which _can_ be safely used uninitialized -- because they reliably > have the usable default value of `null`. (Some may quibble with this use of > "safely" and "usable", because null references are fairly limited, but they do > their limited job correctly: we can easily and safely test whether a reference > is null, and if we accidentally dereference a null reference, we get a clear > exception rather than accessing uninitialized object state.) > > Primitives can be safely used without explicit initialization; objects cannot. > > Object references are nullable _precisely because_ objects cannot be used > > safely without explicit initialization. > ### Nullability > A key difference between today's primitives and references is that primitives > are non-nullable and references are nullable. One might think this was > primarily a choice of convenience: null is useful for references as a universal > sentinel, and not all that useful for primitives (when we want nullable > primitives we can use the box classes -- but we usually don't.) But the > reality is not one of convenience, but of necessity: nullability is _required_ > for the safety of objects, and usually _detrimental_ to the performance of > primitives. > Nullability for object references is a forced move because null is what is > preventing us from accessing uninitialized object state. Nullability for > primitives is usually not needed, but that's not the only reason primitives are > non-nullable. If primitives were nullable, `null` would be another state that > would have to be represented in memory, and the costs would be out of line with > the benefits. Since a 64-bit `long` uses all of its bit patterns, a nullable > `long` would require at least 65 bits, and alignment requirements would likely > round this up to 128 bits, doubling memory usage. (The density cost here is > substantial, but it gets worse because most hardware today does not have cheap > atomic 128 bit loads and stores. Since tearing might conflate a null value with > a non-null value -- even worse than the usual consequences of tearing -- this > would push us strongly towards using an indirection instead.) So > non-nullability is a precondition for effective flattening and density of > primitives, and nullable primitives would involve giving up the flatness and > density that are the reason to have primitives in the first place. > > Nullability interferes with heap flattening. > To summarize, the design of primitives and objects implicitly stems from the > following facts: > - For most objects, the uninitialized (zeroed) state is either invalid or not a > good-enough default value; > - For primitives, the uninitialized (zeroed) state is both valid and a > good-enough default value; > - Having the uninitialized (zeroed) state be a good-enough default is a > precondition for reliable flattening; > - Nullability is required when the the uninitialized (zeroed) state is not a > good-enough default; > - Nullability not only has a footprint cost, but often is an impediment to > flattening. > > Primitives exist in the first place because they can be flattened to give us > > better numeric performance; flattening requires giving up nullity and > > tolerance of uninitialized (zero) values. > These observations were baked in to the language (and other languages too), but > the motivation for these decisions was then "erased" by the rigid distinction > between primitives and objects. Valhalla seeks to put that choice back into the > user's hands. > ### Getting the best of both worlds > Project Valhalla promises the best of both worlds: sufficiently constrained > entities can "code like a class and work like an int." Classes that give up > object identity can get some of the runtime benefits of primitives, but to get > full heap flattening, we must embrace the two defining characteristics of > primitives described so far: non-nullability and safe uninitialized use. > Some candidates for value classes, such as `Complex`, are safe to use > uninitialized because the default (zero) value is a good initial value. Others, > like `LocalDate`, simply have no good default value (zero or otherwise), and > therefore need the initialzation protocol enabled by null-default object > references. This distinction in inherent to the semantics of the domain; some > domains simply do not have reasonable default value, and this is a choice that > the class author must capture when the code is written. > There is a long list of classes that are candidates to be value classes; some > are like `Complex`, but many are more like `LocalDate`. The latter group can > still benefit significantly from eliminating identity, but can't necessarily get > full heap flattening. The former group, which are most like today's primitives, > can get all the benefits, including heap flattening -- when their instances are > non-null. > ### Declaring value classes > As in previous iterations, a class can be declared as as _value class_: > ``` > value class LocalDate { ... } > ``` > A value class gives up identity and its consequences (e.g., mutability) -- and > that's it. The resulting `LocalDate` type is still a reference type, and > variables of type `LocalDate` are still nullable. Instances can get significant > optimizations for on-stack use but are still usually represented in the heap via > indirections. > ### Implicitly constructible value classes > In order to get the next group of benefits, a value class must additionally > attest that it can be used uninitialized. Because this is a statement of how > instances of this class come into existence, modeling this as a special kind of > constructor seems natural: > ``` > value class Complex { > private int re; > private int im; > public implicit Complex(); > public Complex(int re, int im) { ... } > ... > } > ``` > These two constructors say that there are two ways a `Complex` instance comes > into existence: the first is via the traditional constructor that takes real and > imaginary values (`new Complex(1.0, 1.0)`), and the second is via the _implicit_ > constructor that produces the instance used to initialize fields and array > elements to their default values. That the implicit constructor cannot have a > body is a signal that the "zero default" is not something the class author can > fine-tune. A value class with an implicit constructor is called an _implicitly > constructible_ value class. > Having an implicit constructor is a necessary but not sufficient condition for > heap flattening. The other required condition is that variable that holds a > `Complex` needs to be non-nullable. In the previous iteration, the `.val` type > was non-nullable for the same reason primitive types were, and therefore `.val` > types could be fully flattened. However, after several rounds of teasing apart > the fundamental properties of primitives and value types, nullability has > finally sedimented to a place in the model where a sensible reunion between > value types and non-nullable types may be possible. > ## Null exclusion > Non-nullable reference types have been a frequent request for Java for years, > having been explored in `C#`, Kotlin, and Scala. The goals of non-nullable > types are sensible: richer types means safer programs. It is a pervasive > problem in Java libraries that we are not able to express within the language > whether a returned object reference might be null, or is known never to be null, > and programmers can therefore easily make wrong assumptions about nullability. > To date, Project Valhalla has deliberately steered clear of non-nullable types > as a standalone feature. This is not only because the goals of Valhalla were too > ambitious to burden the project with another ambitious goal (though that is > true), but for a more fundamental reason: the assumptions one might make in a > vacuum about the semantics of non-nullable types would likely become hidden > sources of constraints for the value type design, which was already bordering on > over-constrained. Now that the project has progressed sufficiently, we are more > confident that we can engage with the issue of null exclusion. > A _refinement type_ (or _restriction type_) is a type that is derived from > another type that excludes certain values from the derived type's value set, > such as "the non-negative integers". In the most general form, a refinement type > is defined by one or more predicates (Liquid Haskell and Clojure Spec are > examples of this); range types in Pascal are a more constrained form of > refinement type. Non-nullable types ("bang" types) can similarly be viewed as a > constrained form of refinement type, characterized by the predicate `x != null`. > (Note that the null-excluding refinement type `X!` of a reference type is still > a reference type.) > Rather than saying that primitive classes give rise to two types, `X.val` and > `X.ref`, we can observe the the null-excluding type `X!` of a > implicitly-constructible value class can have the same runtime characteristic as > the `.val` type in the previous round. Both the declaration-site property that > a value class is implicitly constructible, and the use-site property that a > variable is null-excluding, are necessary to routinely get flattening. > Related to null exclusion is _null-adjunction_; this takes a non-nullable type > (such as `int`) or a type of indeterminate nullability (such as a type variable > `T` in a generic class that can be instantiated with either nullable or > non-nullable type parameters) and produces a type that is explicitly nullable > (`int?` or `T?`.) In the current form of the design, there is only one place > where the null-adjoining type is strictly needed -- when generic code needs to > express "`T`, but might be null. The canonical example of this is `Map::get`; > it wants to wants to return `V?`, to capture the fact that `Map` uses `null` to > represent "no mapping". > For a given class `C`, the type `C!` is clearly non-nullable, and the type `C?` > is clearly nullable. What of the unadorned name `C`? This has _unspecified_ > nullability. Unspecified nullability is analogous to raw types in generics (we > could call this "raw nullability"); we cannot be sure what the author had in > mind, and so must find a balance between the desire for greater null safety and > tolerance of ambiguity in author intent. > Readers who are familiar with explicitly nullable and non-nullable types in > other languages may be initially surprised at some of the choices made regarding > null-exclusion (and null-adjunction) types here. The interpretation outlined > here is not necessarily the "obvious" one, because it is constrained both by the > needs of null-exclusion, of Valhalla, and the migration-compatibility > constraints needed for the ecosystem to make a successful transition to types > that have richer nullability information. > While the theory outlined here will allow all class types to have a > null-excluding refinement type, it is also possible that we will initially > restrict null-exclusion to implicitly constructible value types. There are > several reasons to consider pursuing such an incremental path, including the > fact that we will be able to reify the non-nullability of implicitly > constructible value types in the JVM, whereas the null-exclusion types of other > classes such as `String` or of ordinary value classes such as `LocalDate` would > need to be done through erasure, increasing the possible sources of null > polluion. > ### Goals > We adopt the following set of goals for adding null-excluding refinement types: > - More complete unification of primitives with classes; > - Flatness is an emergent property that can derive from more basic semantic > constraints, such as identity-freedom, implicit constructibility, and > non-nullity; > - Merge the concept of "value companion" (`.val` type) into the null-restricted > refinement type of implicitly constructible value classes; > - Allow programmers to annotate type uses to explicitly exclude or affirm nulls > in the value set; > - Provide some degree of runtime nullness checking to detect null pollution; > - Annotating an existing API (one based on identity classes) with additional > nullness information should be binary- and source-compatible. > The last goal is a source of strong constraints, and not one to be taken > lightly. If an existing API that specifies "this method never returns null" > cannot be compatibly migrated to one where this constraint is reflected in the > method declaration proper, the usefulness of null-exclusion types is greatly > reduced; library maintainers will be put to a bad choice of forgoing a feature > that will make their APIs safer, or making an incompatible change in order to do > so. If we were building a new language from scratch, the considerations might > be different, but we do not have that luxury. "Just copying" what other > languages have done here is a non-starter. > ### Interoperation between nullable and non-nullable types > We enable conversions between a nullable type and a compatible null-excluding > refinement type by adding new widening and narrowing conversions between `T?` > and `T!` that have analogous semantics to the existing boxing and unboxing > conversions between `Integer` and `int`. Just as with boxing and unboxing, > widening from a non-nullable type to a nullable type is unconditional and never > fails, and narrowing from a nullable type to a non-nullable type may fail by > throwing `NullPointerException`. These conversions for null-excluding types > would be sensible in assignment context, cast context, and method invocation > context (both loose and strict, unlike boxing for primitives today.) This would > allow existing assignments, invocation, and overload applicability checks to > continue to work even after migrating one of the types involved, as required for > source-compatibility. > Checking for bad values can mirror the approach taken for generics. When a > richer compile-time type system erases to a less-rich runtime type system, type > safety derives from a mix of compile-time type checking and synthetic runtime > checks. In both cases, there is a possibility of pollution which can be > injected at the boundary between legacy and new code, by malicious code, or > through injudicious use of unchecked casts and raw types. And like generics, we > would like to offer the possibility that if a program compiles in its entirety > with no unchecked warnings, null-excluding types will not be observed to contain > null. To achieve this, we will need a combination of runtime checks, new > unchecked warnings, and possibly restrictions on initialization. > The intrusion on the type-checking of generics here is considerable; nullity > will have to be handled in type inference, bounds conformance, subtyping, etc. > In addition, there are new sources of heap pollution and new conditions under > which a varaible may be polluted. The _Universal Generics_ JEP outlines a > number of unchecked warnings that must be issued in order to avoid null > pollution in type variables that might be instantiated either with a nullable or > null-excluding type. While this work was designed for `ref` and `val` types, > much of it applies directly to null-excluding types. > The liberal use of conversion rather than subtyping here may be surprising to > readers who are familiar with other languages that support null-excluding types. > At first, it may appear to be "giving up all the benefit" of having annotated > APIs for nullness, since a nullable value may be assigned directly to a > non-nullable type without requiring a cast. But the reality is that for the > first decade at least, we will at best be living in a mixed world where some > APIs are migrated to use nullness information and some will not, and forcing > users to modify code that uses these libraries (and then do so again and again > as more libraries migrate) would be an unnacceptable tax on Java users, and a > deterrent to libraries migrating to use these features. > Starting from `T! <: T?` -- and forcing explicit conversions when you want to go > from nullable to non-nullable values -- does seem an obvious choice if you have > the luxury of building a type system from scratch. But if we want to make > migration to null-excluding types a source-compatible change for libraries and > clients, we cannot accept a strict subtyping approach. (Even if we did, we > could still only use subtyping in one direction, and would have to add an > additional implicit conversion for the other direction -- a conversion that is > similar to the narrowing conversion proposed here.) > Further, primitives _already_ use boxing and unboxing conversions to go between > their nullable (box) and non-nullable (primitive) forms. So choosing subtyping > for references (plus an unbalanced implicit conversion) and boxing/unboxing > conversion for primitives means our treatment of null-excluding types is > gratuitously different for primitives than for other classes. > Another consequence of wanting migration compatibility for annotating a library > with nullness constraints is that nullness constraints cannot affect overload > selection. Compatibility is not just for clients, it is also for subclasses. > ### Null exclusion for implicitly constructible value classes > Implicitly constructible value classes go particularly well with null exclusion, > because we can choose a memory representation that _cannot_ encode null, > enabling a more compact and direct representation. > The Valhalla JVM has support for such a representation, and so we describe the > null-exclusion type of an implicitly constructible value class as _strongly null > excluding_. This means that its null exclusion is reified by the JVM. Such a > variable can never be seen to contain null, because null simply does not have a > runtime representation for these types. This is only possible because these > classes are implicitly constructible; that the default zero value written by the > JVM is known to be a valid value of the domain. As with primitives, these types > are explicitly safe to use uninitialized. > A strongly null-excluding type will have a type mirror, as type mirrors describe > reifiable types. > ### Null exclusion for other classes > For identity classes and non-implicitly-constructible value classes, the story > is not quite as nice. Since there is no JVM representation of "non-nullable > String", the best we can do is translate `String!` to `String` (a form of > erasure), and then try to keep the nulls at bay. This means that we do not get > the flattening or density benefits, and null-excluding variables may still be > subject to heap pollution. We can try to minimize this with a combination of > static type checking and generated runtime checks. We refer to the > null-exclusion type of an identity or non-implicitly constructible value class > as _weakly null-excluding_. > There is an additional source of potential null pollution, aside from the > sources analogous to generic heap pollution: the JVM itself. The JVM > initializes references in the heap to null. If `String!` erases to an ordinary > `String` reference, there is at least a small window in time when this > supposedly non-nullable field contains null. We can erect barriers to reduce > the window in which this can be observed, but these barriers will not be > foolproof. For example, the compiler could enforce that a field of type > `String!` either has an initializer or is definitely assigned in every > constructor. However, if the receiver escapes during construction, all bets are > off, just as they are with initialization safety for final fields. > We have a similar problem with arrays of `String!`; newly created arrays > initialize their elements to the default value for the component type, which is > `null`, and we don't even have the option of requiring an initializer as we > would with fields. (Since a `String![]` is also a `String[]`, one option is to > to outlaw the direct creation of arrays of weakly null-excluding types, instead > providing reflective API points which will safely create the array and > initialize all elements to a non-null value.) > A weakly null-excluding type will not have a type mirror, as the nullity > information is erased for these types. Generic signatures would be extended to > represent null-exclusion, and similarly the `Type` hiearchy would reflect such > signatures. > Because of erasure and the new possibilities for pollution, allowing > null-exclusion types for identity classes introduces significant potential new > complexity. For this reason, we may choose a staged approach where > null-restricted types are initially limited to the strongly null-restricted > ones. > ### Null exclusion for other value classes > Value classes that are not implicitly constructible are similar to identity > classes in that their null-exclusion types are only weakly null-excluding. > These classes are the ones for which the author has explicitly decided that the > default zero value is not a valid member of the domain, so we must ensure that > in no case does this invalid value ever escape. This effectively means that we > must similarly erase these types to a nullable representation to ensure that the > zero value stays contained. (There are limited heroics the VM can do with > alternate representations for null when these classes are small and have readily > identifiable slack bits, but this is merely a potential optimization for the > future.) > ### Atomicity > Primitives additionally have the property that larger-than-32-bit primitives > (`long` and `double`) may tear under race. The allowance for tearing was an > accomodation to the fact that numeric code is often performance-critical, and so > a tradeoff was made to allow for more performance at the cost of less safety for > incorrect programs. The corresponding box types, as well as primitive variables > declared `volatile`, are guaranteed not to tear, even under race. (See the > document entitled "Understanding non-atomicity and tearing" for more detail.) > Implicitly constructible value classes can be declared as "non-atomic" to > indicate that its null-exclusion type may tear under race (if not declared > `volatile`), just as with `long` and `double`. The classes `Long` and `Double` > would be declared non-atomic (though most implementations still offer atomic > access for 64-bit primitives.) > ### Flattening > Flattening in the heap is an emergent property, which is achieved when we give > up the degrees of freedom that would prevent flattening: > - Identity prevents flattening entirely; > - Nullability prevents flattening in the absence of heroics involving exotic > representations for null; > - The inability to use a class without initialization requires nullability at > the VM representation level, undermining flattening; > - Atomicity prevents flattening for larger value objects. > Putting this together, the null-exclusion type of implicitly constructible value > classes is flattenable in the heap when the class is non-atomic or the layout is > suitably small. For ordinary value classes, we can still get flattening in the > calling convention: all identity-free types can be flattened on the stack, > regardless of layout size or nullability. > ### Summarizing null-exclusion > The feature described so far is at the weak end of the spectrum of features > described by "non-nullable types". We make tradeoffs to enable gradual > migration compatibility, moving checks to the boundary -- where in some cases > they might not happen due to erasure, separate compilation, or just dishonest > clients. > Users may choose to look at this as "glass X% full" or "glass (100-X)% empty". > We can now more clearly say what we mean, migrate incrementally towards more > explicit and safe code without forking the ecosystem, and catch many errors > earlier in time. On the other hand, it is less explicit where we might > experience runtime failures, because autoboxing makes unboxing implicit. And > some users will surely complain merely because this is not what their favorite > language does. But it is the null-exclusion we can actually have, rather than > the one we wish we might have in an alternate universe. > This approach yields a significant payoff for the Valhalla story. Valhalla > already had to deal with considerable new complexity to handle the relationship > between reference and value types -- but this new complexity applied only to > primitive classes. For less incremental complexity, we can have a more uniform > treatment of null-exclusion across all class types. The story is significantly > simpler and more unified than we had previously: > - Everything, including the legacy primitives, is an object (an instance of > some class); > - Every type, including the legacy primitives, is derived from a class; > - All types are reference types (they refer to objects), but some reference > types (non-nullable references to implicitly constructible objects) exhibit > the runtime behavior of primitives; > - Some reference types exclude null, and some null-excluding reference types > are reifiable with a known-good non-null default; > - Every type can have a corresponding null-exclusion type. > ## Planning for a null-free future (?) > Users prefer working with unnanotated types (e.g., `Foo`) rather than explicitly > annotated types (`Foo!`, `Foo?`), where possible. The unannotated type `Foo` > could mean one of three things: an alias for `Foo!`, an alias for `Foo?`, or a > type of "raw" (unknown) nullity. Investigations into null-excluding type > systems have shown that the better default would be to treat an unannotated name > as indicating non-nullability, and use explicitly nullable types (`T?`) to > indicate the presence of null, because returning or accepting null is generally > a less common case. Of course, today `String` means "possibly nullable String" > in Java, meaning that, yet again, we seem to have chosen the wrong default. > Our friends in the `C#` community have explored the possibility of a > "flippening". `C#` started with the Java defaults, and later provided a > compiler mode to flip the default on a per-module basis, with checking (or > pollution risk) at the boundary between modules with opposite defaults. This is > an interesting experiment and we look forward to seeing how this plays out in > the `C#` ecosystem. > Alternately, another possible approach for Java is to continue to treat the > unadorned name as having "raw" or "unknown" nullity, encouraging users to > annotate types with either `!` or `?`. This approach has been partially > explored in the `JSpecify` project. Within this approach is a range of options > for what the language will do with such types; there is a risk of flooding users > with warnings. We may want to leave such analysis to extralinguistic type > checkers, at least initially -- but we would like to not foreclose on the > possibility of an eventual flippening. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Sat Jun 3 15:50:22 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 3 Jun 2023 11:50:22 -0400 Subject: Design document on nullability and value types In-Reply-To: <2075843456.71765771.1685772779404.JavaMail.zimbra@univ-eiffel.fr> References: <2075843456.71765771.1685772779404.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <137cc4cc-35b9-5222-ff73-f330ce151dbd@oracle.com> You're not wrong; having only B3! is a much simpler feature, because (a) it is enforced by the VM and (b) we don't have the split of "sometimes you can believe the bang, sometimes you can't."? (On the other hand, imagine the cries of bloody murder when we tell people they can't have any form of `String!`.)? And yes, arrays are particularly challenging. The document acknowledged to this in that it said we might start with B3! only, but we also have to work out what our options are for extending to other reference types.? (As it turns out, the anomalies are fairly close to, though not exactly the same as, heap pollution with erased generics -- which is to say, pollution still sucks, but its a pollution we're at least somewhat familiar with.) I don't see where you get "erased bangs today foreclose on enforced bangs tomorrow" (and if that's true, we also have no path to generic specialization.) I think its a significant overstatement to say "syntax but not semantics" or "only to have a more uniform syntax"; the concepts and semantics are also the same, it is the runtime guarantees that are different under some circumstances.? Your concern is valid, but don't overstate it. All of this is to say: your concerns are valid and we've been struggling with them, but I think saying "no String! ever" is also not a realistic position, so somewhere compromises will have to be made, and our job is to find the right set of compromises. On 6/3/2023 2:12 AM, Remi Forax wrote: > Hi all, > I am not convinced that adding the nullability annotations to other > types than the value types with a default value is a good move, > it seems to shut the door to potential futures where being non null is > more strongly enforced by the VM. > If the goal of Valhalla is to introduce value types, I think we are > extending too much our reach here, making decisions for Java we may > regret later. > > In understand the appeal of providing an unified view regarding > nullability but sadly as explained in this document, the unification > will only be true in term of syntax not in term of semantics, only the > value types with a default value being reified, enforced by the VM. > > As an example, the proposal Array 2.0 of John is still on the table > and proposes to add non null array at runtime, so choosing to erase > the nullability annotation now seems a bad move now if in the future > such kind of arrays are added to the Java platform. > > Obviously, we want to allow migration from/to identity type and value > type (or from value type without a default value to a value type with > a default value) so we have to specify both at compile time and at > runtime a semantics that allow that. But I am seeing erasing > nullability annotations in case of identity type as a too easy shortcut. > > I would prefer to live in a world where '!' is only available on value > type with a default value at compile time, with the field and array > creation being only enforced at runtime only if the class is a value > class with a default value. > A world where adding an implicit constructor is a source backward > compatible move but the opposite is not and where the VM ignores > nullability attributes at runtime if the class is not actually a value > class with a default value so moving from a value class with a default > value to a value class without a default value is a binary compatible > move. > > With the model above, we only have null pollution because of separate > compilations, especially, we keep the property that when unboxing > (i.e. the transition T to T!) null checking is done by the VM so there > is no null pollution. > Allowing more erasure only to have a more uniform syntax is not > appealing to me and seems worst if seen from the future. > > regards, > R?mi > > ------------------------------------------------------------------------ > > *From: *"Brian Goetz" > *To: *"valhalla-spec-experts" > *Sent: *Wednesday, May 31, 2023 8:37:34 PM > *Subject: *Design document on nullability and value types > > As we've hinted at, we've made some progress refining the > essential differences between primitive and reference types, which > has enabled us to shed the `.val` / `.ref` distinction and lean > more heavily on nullability.? The following document outlines the > observations that have enabled this current turn of direction and > some of its consequences. > > This document is mostly to be interpreted in the context of the > Valhalla journey, and so talks about where we were a few months > ago and where we're heading now. > > > > # Rehabilitating primitive classes: a nullity-centric approach > > Over the course of Project Valhalla, we have observed that there > are two > distinct groups of value types.? We've tried stacking them in > various ways, but > there are always two groups, which we've historically described as > "objects > without identity" and "primitive classes", and which admit > different degrees of > flattening. > > The first group, which we are now calling "value objects" or > "value classes", > represent the minimal departure from traditional classes to > disavow object > identity.? The existing classes that are described as > "value-based", such as > `Optional` or `LocalDate`, are candidate for migrating to value > classes.? Such > classes give up object identity; identity-sensitive behaviors are > either recast > as state-based (such as for `==` and `Objects::identityHashCode`) > or partialized > (`synchronized`, `WeakReference`), and such classes must live > without the > affordances of identity (mutability, layout polymorphism.)? In > return, they > avoid being burdened by "accidental identity" which can be a > source of bugs, and > gain significant optimization for stack-based values (e.g., > scalarization in > calling convention) and other JIT optimizations. > > The second group, which we had been calling "primitive classes" > (we are now > moving away from that term), are those that are more like the existing > primitives, such as `Decimal` or `Complex`.? Where ordinary value > classes, like > identity classes, gave rise to a single (reference) type, these > classes gave > rise to two types, a value type (`X.val`) and a reference type > (`X.ref`).? This > pair of types was directly analogous to legacy primitives and > their boxes. These > classes come with more restrictions and more to think about, but > are rewarded > with greater heap flattening.? This model -- after several > iterations -- seemed > to meet the goals for expressiveness and performance: we can > express the > difference between `int`-like behavior and `Integer`-like > behavior, and get > routine flattening for `int`-like types.? But the result still had > many > imbalances; the distinction was heavyweight, and a significant > fraction of the > incremental specification complexity was centered only on these > types.? We > eventually concluded that the source of this was trying to model > the `int` / > `Integer` distinction directly, and that this distinction, while > grounded in > user experience, was just not "primitive" enough. > > In this document, we will break down the characteristics of > so-called "primitive > classes" into more "primitive" (and hopefully less ad-hoc) > distinctions.? This > results in a simpler model, streamlines the syntactic baggage, and > enables us to > finally reunite with an old friend, null-exclusion (bang) types.? > Rather than > treating "value types" and "reference types" as different things, > we can treat > the existing primitives (and the "value projection" of > user-defined primitive > classes) as being restricted references, whose restrictions enable > the desired > runtime properties. > > ## Primitives and objects > > In a previous edition of _State of Valhalla_, we outlined a host > of differences > between primitives and objects: > > | Primitives???????????????????????????????? | > Objects?????????????????????????????????? | > | ------------------------------------------ | > ----------------------------------------- | > | No identity (pure values)????????????????? | > Identity????????????????????????????????? | > | `==` compares state??????????????????????? | `==` compares > object identity???????????? | > | Built-in?????????????????????????????????? | Declared in > classes?????????????????????? | > | No members (fields, methods, constructors) | Members (including > mutable fields)??????? | > | No supertypes or subtypes????????????????? | Class and interface > inheritance?????????? | > | Represented directly in memory???????????? | Represented > indirectly through references | > | Not nullable?????????????????????????????? | > Nullable????????????????????????????????? | > | Default value is zero????????????????????? | Default value is > null???????????????????? | > | Arrays are monomorphic???????????????????? | Arrays are > covariant????????????????????? | > | May tear under race??????????????????????? | Initialization > safety guarantees????????? | > | Have reference companions (boxes)????????? | Don't need > reference companions?????????? | > > Over many iterations, we have chipped away at this list, mostly by > making > classes richer: value classes can disavow identity (and thereby > opt into > state-based `==` comparison); the lack of members and supertypes > are an > accidental restriction that can go away with declarable value > classes; we can > make primitive arrays covariant with arrays of their boxes; we can > let some > class declarations opt into non-atomicity under race. That leaves the > following, condensed list of differences: > > | Primitives??????????????????????? | > Objects?????????????????????????????????? | > | --------------------------------- | > ----------------------------------------- | > | Represented directly in memory??? | Represented indirectly > through references | > | Not nullable????????????????????? | > Nullable????????????????????????????????? | > | Default value is zero???????????? | Default value is > null???????????????????? | > | Have reference companions (boxes) | Don't need reference > companions?????????? | > > The previous approach ("primitive classes") started with the > assumption that > this is the list of things to be modeled by the value/reference > distinction.? In > this document we go further, by showing that flattening (direct > representation) > is derived from more basic principles around nullity and > initialization > requirements, and perhaps surprisingly, the concept of "primitive > type" can > disappear almost completely, save only for historical vestiges > related to the > existing eight primitives.? The `.val` type can be replaced by > restricted > references whose restrictions enable the desired representational > properties. As > is consistent with the goals of Valhalla, flattenability is an > emergent > property, gained by giving up those properties that would undermine > flattenability, rather than being a linguistic concept on its own. > > ### Initialization > > The key distinction between today's primitives and objects has to > do with > _initialization requirements_.?? Primitives are designed to be _used > uninitialized_; if we declare a field `int count`, it is reliably > initialized to > zero by the JVM before any code can access it.? This initial value > is a > perfectly good default, and it is not a bug to read or even > increment this field > before it has been explicitly assigned a value by the program, > because it has > _already_ been initialized to a known good value by the JVM.? The > zero value > pre-written by the JVM is not just a safety net; it is actually > part of the > programming model that primitives start out life with "good > enough" defaults. > This is part of what it means to be a primitive type. > > Objects, on the other hand, are not designed for uninitialized > use; they must be > initialized via constructors before use.? The default zero values > written to an > object's fields by the JVM typically don't necessarily constitute > a valid state > according to the classes specification, and, even if it did, is > rarely a good > default value.? Therefore, we require that class instances be > initialized by > their constructors before they can be exposed to the rest of the > program.? To > ensure that this happens, objects are referenced exclusively > through _object > references_, which _can_ be safely used uninitialized -- because > they reliably > have the usable default value of `null`.? (Some may quibble with > this use of > "safely" and "usable", because null references are fairly limited, > but they do > their limited job correctly: we can easily and safely test whether > a reference > is null, and if we accidentally dereference a null reference, we > get a clear > exception rather than accessing uninitialized object state.) > > > Primitives can be safely used without explicit initialization; > objects cannot. > > Object references are nullable _precisely because_ objects > cannot be used > > safely without explicit initialization. > > ### Nullability > > A key difference between today's primitives and references is that > primitives > are non-nullable and references are nullable.? One might think > this was > primarily a choice of convenience: null is useful for references > as a universal > sentinel, and not all that useful for primitives (when we want > nullable > primitives we can use the box classes -- but we usually don't.)? > But the > reality is not one of convenience, but of necessity: nullability > is _required_ > for the safety of objects, and usually _detrimental_ to the > performance of > primitives. > > Nullability for object references is a forced move because null is > what is > preventing us from accessing uninitialized object state.? > Nullability for > primitives is usually not needed, but that's not the only reason > primitives are > non-nullable.? If primitives were nullable, `null` would be > another state that > would have to be represented in memory, and the costs would be out > of line with > the benefits.? Since a 64-bit `long` uses all of its bit patterns, > a nullable > `long` would require at least 65 bits, and alignment requirements > would likely > round this up to 128 bits, doubling memory usage.? (The density > cost here is > substantial, but it gets worse because most hardware today does > not have cheap > atomic 128 bit loads and stores.? Since tearing might conflate a > null value with > a non-null value -- even worse than the usual consequences of > tearing -- this > would push us strongly towards using an indirection instead.)? So > non-nullability is a precondition for effective flattening and > density of > primitives, and nullable primitives would involve giving up the > flatness and > density that are the reason to have primitives in the first place. > > > Nullability interferes with heap flattening. > > To summarize, the design of primitives and objects implicitly > stems from the > following facts: > > ?- For most objects, the uninitialized (zeroed) state is either > invalid or not a > ?? good-enough default value; > ?- For primitives, the uninitialized (zeroed) state is both valid > and a > ?? good-enough default value; > ?- Having the uninitialized (zeroed) state be a good-enough > default is a > ?? precondition for reliable flattening; > ?- Nullability is required when the the uninitialized (zeroed) > state is not a > ?? good-enough default; > ?- Nullability not only has a footprint cost, but often is an > impediment to > ?? flattening. > > > Primitives exist in the first place because they can be > flattened to give us > > better numeric performance; flattening requires giving up > nullity and > > tolerance of uninitialized (zero) values. > > These observations were baked in to the language (and other > languages too), but > the motivation for these decisions was then "erased" by the rigid > distinction > between primitives and objects.? Valhalla seeks to put that choice > back into the > user's hands. > > ### Getting the best of both worlds > > Project Valhalla promises the best of both worlds: sufficiently > constrained > entities can "code like a class and work like an int." Classes > that give up > object identity can get some of the runtime benefits of > primitives, but to get > full heap flattening, we must embrace the two defining > characteristics of > primitives described so far: non-nullability and safe > uninitialized use. > > Some candidates for value classes, such as `Complex`, are safe to use > uninitialized because the default (zero) value is a good initial > value.? Others, > like `LocalDate`, simply have no good default value (zero or > otherwise), and > therefore need the initialzation protocol enabled by null-default > object > references.? This distinction in inherent to the semantics of the > domain; some > domains simply do not have reasonable default value, and this is a > choice that > the class author must capture when the code is written. > > There is a long list of classes that are candidates to be value > classes; some > are like `Complex`, but many are more like `LocalDate`. The latter > group can > still benefit significantly from eliminating identity, but can't > necessarily get > full heap flattening.? The former group, which are most like > today's primitives, > can get all the benefits, including heap flattening -- when their > instances are > non-null. > > ### Declaring value classes > > As in previous iterations, a class can be declared as as _value > class_: > > ``` > value class LocalDate { ... } > ``` > > A value class gives up identity and its consequences (e.g., > mutability) -- and > that's it.? The resulting? `LocalDate` type is still a reference > type, and > variables of type `LocalDate` are still nullable. Instances can > get significant > optimizations for on-stack use but are still usually represented > in the heap via > indirections. > > ### Implicitly constructible value classes > > In order to get the next group of benefits, a value class must > additionally > attest that it can be used uninitialized.? Because this is a > statement of how > instances of this class come into existence, modeling this as a > special kind of > constructor seems natural: > > ``` > value class Complex { > ??? private int re; > ??? private int im; > > ??? public implicit Complex(); > ??? public Complex(int re, int im) { ... } > > ??? ... > } > ``` > > These two constructors say that there are two ways a `Complex` > instance comes > into existence: the first is via the traditional constructor that > takes real and > imaginary values (`new Complex(1.0, 1.0)`), and the second is via > the _implicit_ > constructor that produces the instance used to initialize fields > and array > elements to their default values.? That the implicit constructor > cannot have a > body is a signal that the "zero default" is not something the > class author can > fine-tune.? A value class with an implicit constructor is called > an _implicitly > constructible_ value class. > > Having an implicit constructor is a necessary but not sufficient > condition for > heap flattening.? The other required condition is that variable > that holds a > `Complex` needs to be non-nullable.? In the previous iteration, > the `.val` type > was non-nullable for the same reason primitive types were, and > therefore `.val` > types could be fully flattened.? However, after several rounds of > teasing apart > the fundamental properties of primitives and value types, > nullability has > finally sedimented to a place in the model where a sensible > reunion between > value types and non-nullable types may be possible. > > ## Null exclusion > > Non-nullable reference types have been a frequent request for Java > for years, > having been explored in `C#`, Kotlin, and Scala.? The goals of > non-nullable > types are sensible: richer types means safer programs. It is a > pervasive > problem in Java libraries that we are not able to express within > the language > whether a returned object reference might be null, or is known > never to be null, > and programmers can therefore easily make wrong assumptions about > nullability. > > To date, Project Valhalla has deliberately steered clear of > non-nullable types > as a standalone feature. This is not only because the goals of > Valhalla were too > ambitious to burden the project with another ambitious goal > (though that is > true), but for a more fundamental reason: the assumptions one > might make in a > vacuum about the semantics of non-nullable types would likely > become hidden > sources of constraints for the value type design, which was > already bordering on > over-constrained.? Now that the project has progressed > sufficiently, we are more > confident that we can engage with the issue of null exclusion. > > A _refinement type_ (or _restriction type_) is a type that is > derived from > another type that excludes certain values from the derived type's > value set, > such as "the non-negative integers". In the most general form, a > refinement type > is defined by one or more predicates (Liquid Haskell and Clojure > Spec are > examples of this); range types in Pascal are a more constrained > form of > refinement type.? Non-nullable types ("bang" types) can similarly > be viewed as a > constrained form of refinement type, characterized by the > predicate `x != null`. > (Note that the null-excluding refinement type `X!` of a reference > type is still > a reference type.) > > Rather than saying that primitive classes give rise to two types, > `X.val` and > `X.ref`, we can observe the the null-excluding type `X!` of a > implicitly-constructible value class can have the same runtime > characteristic as > the `.val` type in the previous round.? Both the declaration-site > property that > a value class is implicitly constructible, and the use-site > property that a > variable is null-excluding, are necessary to routinely get > flattening. > > Related to null exclusion is _null-adjunction_; this takes a > non-nullable type > (such as `int`) or a type of indeterminate nullability (such as a > type variable > `T` in a generic class that can be instantiated with either > nullable or > non-nullable type parameters) and produces a type that is > explicitly nullable > (`int?` or `T?`.)? In the current form of the design, there is > only one place > where the null-adjoining type is strictly needed -- when generic > code needs to > express "`T`, but might be null.? The canonical example of this is > `Map::get`; > it wants to wants to return `V?`, to capture the fact that `Map` > uses `null` to > represent "no mapping". > > For a given class `C`, the type `C!` is clearly non-nullable, and > the type `C?` > is clearly nullable.? What of the unadorned name `C`? This has > _unspecified_ > nullability.? Unspecified nullability is analogous to raw types in > generics (we > could call this "raw nullability"); we cannot be sure what the > author had in > mind, and so must find a balance between the desire for greater > null safety and > tolerance of ambiguity in author intent. > > Readers who are familiar with explicitly nullable and non-nullable > types in > other languages may be initially surprised at some of the choices > made regarding > null-exclusion (and null-adjunction) types here.? The > interpretation outlined > here is not necessarily the "obvious" one, because it is > constrained both by the > needs of null-exclusion, of Valhalla, and the migration-compatibility > constraints needed for the ecosystem to make a successful > transition to types > that have richer nullability information. > > While the theory outlined here will allow all class types to have a > null-excluding refinement type, it is also possible that we will > initially > restrict null-exclusion to implicitly constructible value types.? > There are > several reasons to consider pursuing such an incremental path, > including the > fact that we will be able to reify the non-nullability of implicitly > constructible value types in the JVM, whereas the null-exclusion > types of other > classes such as `String` or of ordinary value classes such as > `LocalDate` would > need to be done through erasure, increasing the possible sources > of null > polluion. > > ### Goals > > We adopt the following set of goals for adding null-excluding > refinement types: > > ?- More complete unification of primitives with classes; > ?- Flatness is an emergent property that can derive from more > basic semantic > ?? constraints, such as identity-freedom, implicit > constructibility, and > ?? non-nullity; > ?- Merge the concept of "value companion" (`.val` type) into the > null-restricted > ?? refinement type of implicitly constructible value classes; > ?- Allow programmers to annotate type uses to explicitly exclude > or affirm nulls > ?? in the value set; > ?- Provide some degree of runtime nullness checking to detect null > pollution; > ?- Annotating an existing API (one based on identity classes) with > additional > ?? nullness information should be binary- and source-compatible. > > The last goal is a source of strong constraints, and not one to be > taken > lightly.? If an existing API that specifies "this method never > returns null" > cannot be compatibly migrated to one where this constraint is > reflected in the > method declaration proper, the usefulness of null-exclusion types > is greatly > reduced; library maintainers will be put to a bad choice of > forgoing a feature > that will make their APIs safer, or making an incompatible change > in order to do > so.? If we were building a new language from scratch, the > considerations might > be different, but we do not have that luxury.? "Just copying" what > other > languages have done here is a non-starter. > > ### Interoperation between nullable and non-nullable types > > We enable conversions between a nullable type and a compatible > null-excluding > refinement type by adding new widening and narrowing conversions > between `T?` > and `T!` that have analogous semantics to the existing boxing and > unboxing > conversions between `Integer` and `int`.? Just as with boxing and > unboxing, > widening from a non-nullable type to a nullable type is > unconditional and never > fails, and narrowing from a nullable type to a non-nullable type > may fail by > throwing `NullPointerException`.? These conversions for > null-excluding types > would be sensible in assignment context, cast context, and method > invocation > context (both loose and strict, unlike boxing for primitives > today.) This would > allow existing assignments, invocation, and overload applicability > checks to > continue to work even after migrating one of the types involved, > as required for > source-compatibility. > > Checking for bad values can mirror the approach taken for > generics.? When a > richer compile-time type system erases to a less-rich runtime type > system, type > safety derives from a mix of compile-time type checking and > synthetic runtime > checks.? In both cases, there is a possibility of pollution which > can be > injected at the boundary between legacy and new code, by malicious > code, or > through injudicious use of unchecked casts and raw types.? And > like generics, we > would like to offer the possibility that if a program compiles in > its entirety > with no unchecked warnings, null-excluding types will not be > observed to contain > null.? To achieve this, we will need a combination of runtime > checks, new > unchecked warnings, and possibly restrictions on initialization. > > The intrusion on the type-checking of generics here is > considerable; nullity > will have to be handled in type inference, bounds conformance, > subtyping, etc. > In addition, there are new sources of heap pollution and new > conditions under > which a varaible may be polluted.? The _Universal Generics_ JEP > outlines a > number of unchecked warnings that must be issued in order to avoid > null > pollution in type variables that might be instantiated either with > a nullable or > null-excluding type.? While this work was designed for `ref` and > `val` types, > much of it applies directly to null-excluding types. > > The liberal use of conversion rather than subtyping here may be > surprising to > readers who are familiar with other languages that support > null-excluding types. > At first, it may appear to be "giving up all the benefit" of > having annotated > APIs for nullness, since a nullable value may be assigned directly > to a > non-nullable type without requiring a cast.? But the reality is > that for the > first decade at least, we will at best be living in a mixed world > where some > APIs are migrated to use nullness information and some will not, > and forcing > users to modify code that uses these libraries (and then do so > again and again > as more libraries migrate) would be an unnacceptable tax on Java > users, and a > deterrent to libraries migrating to use these features. > > Starting from `T! <: T?` -- and forcing explicit conversions when > you want to go > from nullable to non-nullable values -- does seem an obvious > choice if you have > the luxury of building a type system from scratch.? But if we want > to make > migration to null-excluding types a source-compatible change for > libraries and > clients, we cannot accept a strict subtyping approach. (Even if we > did, we > could still only use subtyping in one direction, and would have to > add an > additional implicit conversion for the other direction -- a > conversion that is > similar to the narrowing conversion proposed here.) > > Further, primitives _already_ use boxing and unboxing conversions > to go between > their nullable (box) and non-nullable (primitive) forms.? So > choosing subtyping > for references (plus an unbalanced implicit conversion) and > boxing/unboxing > conversion for primitives means our treatment of null-excluding > types is > gratuitously different for primitives than for other classes. > > Another consequence of wanting migration compatibility for > annotating a library > with nullness constraints is that nullness constraints cannot > affect overload > selection.? Compatibility is not just for clients, it is also for > subclasses. > > ### Null exclusion for implicitly constructible value classes > > Implicitly constructible value classes go particularly well with > null exclusion, > because we can choose a memory representation that _cannot_ encode > null, > enabling a more compact and direct representation. > > The Valhalla JVM has support for such a representation, and so we > describe the > null-exclusion type of an implicitly constructible value class as > _strongly null > excluding_.? This means that its null exclusion is reified by the > JVM.? Such a > variable can never be seen to contain null, because null simply > does not have a > runtime representation for these types.? This is only possible > because these > classes are implicitly constructible; that the default zero value > written by the > JVM is known to be a valid value of the domain.? As with > primitives, these types > are explicitly safe to use uninitialized. > > A strongly null-excluding type will have a type mirror, as type > mirrors describe > reifiable types. > > ### Null exclusion for other classes > > For identity classes and non-implicitly-constructible value > classes, the story > is not quite as nice.? Since there is no JVM representation of > "non-nullable > String", the best we can do is translate `String!` to `String` (a > form of > erasure), and then try to keep the nulls at bay.? This means that > we do not get > the flattening or density benefits, and null-excluding variables > may still be > subject to heap pollution.?? We can try to minimize this with a > combination of > static type checking and generated runtime checks.? We refer to the > null-exclusion type of an identity or non-implicitly constructible > value class > as _weakly null-excluding_. > > There is an additional source of potential null pollution, aside > from the > sources analogous to generic heap pollution: the JVM itself.? The JVM > initializes references in the heap to null.? If `String!` erases > to an ordinary > `String` reference, there is at least a small window in time when this > supposedly non-nullable field contains null.? We can erect > barriers to reduce > the window in which this can be observed, but these barriers will > not be > foolproof.? For example, the compiler could enforce that a field > of type > `String!` either has an initializer or is definitely assigned in every > constructor.? However, if the receiver escapes during > construction, all bets are > off, just as they are with initialization safety for final fields. > > We have a similar problem with arrays of `String!`; newly created > arrays > initialize their elements to the default value for the component > type, which is > `null`, and we don't even have the option of requiring an > initializer as we > would with fields.? (Since a `String![]` is also a `String[]`, one > option is to > to outlaw the direct creation of arrays of weakly null-excluding > types, instead > providing reflective API points which will safely create the array and > initialize all elements to a non-null value.) > > A weakly null-excluding type will not have a type mirror, as the > nullity > information is erased for these types.? Generic signatures would > be extended to > represent null-exclusion, and similarly the `Type` hiearchy would > reflect such > signatures. > > Because of erasure and the new possibilities for pollution, allowing > null-exclusion types for identity classes introduces significant > potential new > complexity.? For this reason, we may choose a staged approach where > null-restricted types are initially limited to the strongly > null-restricted > ones. > > ### Null exclusion for other value classes > > Value classes that are not implicitly constructible are similar to > identity > classes in that their null-exclusion types are only weakly > null-excluding. > These classes are the ones for which the author has explicitly > decided that the > default zero value is not a valid member of the domain, so we must > ensure that > in no case does this invalid value ever escape. This effectively > means that we > must similarly erase these types to a nullable representation to > ensure that the > zero value stays contained.? (There are limited heroics the VM can > do with > alternate representations for null when these classes are small > and have readily > identifiable slack bits, but this is merely a potential > optimization for the > future.) > > ### Atomicity > > Primitives additionally have the property that larger-than-32-bit > primitives > (`long` and `double`) may tear under race.? The allowance for > tearing was an > accomodation to the fact that numeric code is often > performance-critical, and so > a tradeoff was made to allow for more performance at the cost of > less safety for > incorrect programs.? The corresponding box types, as well as > primitive variables > declared `volatile`, are guaranteed not to tear, even under race.? > (See the > document entitled "Understanding non-atomicity and tearing" for > more detail.) > > Implicitly constructible value classes can be declared as > "non-atomic" to > indicate that its null-exclusion type may tear under race (if not > declared > `volatile`), just as with `long` and `double`.? The classes `Long` > and `Double` > would be declared non-atomic (though most implementations still > offer atomic > access for 64-bit primitives.) > > ### Flattening > > Flattening in the heap is an emergent property, which is achieved > when we give > up the degrees of freedom that would prevent flattening: > > ?- Identity prevents flattening entirely; > ?- Nullability prevents flattening in the absence of heroics > involving exotic > ?? representations for null; > ?- The inability to use a class without initialization requires > nullability at > ?? the VM representation level, undermining flattening; > ?- Atomicity prevents flattening for larger value objects. > > Putting this together, the null-exclusion type of implicitly > constructible value > classes is flattenable in the heap when the class is non-atomic or > the layout is > suitably small.? For ordinary value classes, we can still get > flattening in the > calling convention: all identity-free types can be flattened on > the stack, > regardless of layout size or nullability. > > ### Summarizing null-exclusion > > The feature described so far is at the weak end of the spectrum of > features > described by "non-nullable types".? We make tradeoffs to enable > gradual > migration compatibility, moving checks to the boundary -- where in > some cases > they might not happen due to erasure, separate compilation, or > just dishonest > clients. > > Users may choose to look at this as "glass X% full" or "glass > (100-X)% empty". > We can now more clearly say what we mean, migrate incrementally > towards more > explicit and safe code without forking the ecosystem, and catch > many errors > earlier in time.? On the other hand, it is less explicit where we > might > experience runtime failures, because autoboxing makes unboxing > implicit.? And > some users will surely complain merely because this is not what > their favorite > language does.? But it is the null-exclusion we can actually have, > rather than > the one we wish we might have in an alternate universe. > > This approach yields a significant payoff for the Valhalla story.? > Valhalla > already had to deal with considerable new complexity to handle the > relationship > between reference and value types -- but this new complexity > applied only to > primitive classes.? For less incremental complexity, we can have a > more uniform > treatment of null-exclusion across all class types.? The story is > significantly > simpler and more unified than we had previously: > > ?- Everything, including the legacy primitives, is an object (an > instance of > ?? some class); > ?- Every type, including the legacy primitives, is derived from a > class; > ?- All types are reference types (they refer to objects), but some > reference > ?? types (non-nullable references to implicitly constructible > objects) exhibit > ?? the runtime behavior of primitives; > ?- Some reference types exclude null, and some null-excluding > reference types > ?? are reifiable with a known-good non-null default; > ?- Every type can have a corresponding null-exclusion type. > > ## Planning for a null-free future (?) > > Users prefer working with unnanotated types (e.g., `Foo`) rather > than explicitly > annotated types (`Foo!`, `Foo?`), where possible.? The unannotated > type `Foo` > could mean one of three things: an alias for `Foo!`, an alias for > `Foo?`, or a > type of "raw" (unknown) nullity.?? Investigations into > null-excluding type > systems have shown that the better default would be to treat an > unannotated name > as indicating non-nullability, and use explicitly nullable types > (`T?`) to > indicate the presence of null, because returning or accepting null > is generally > a less common case.? Of course, today `String` means "possibly > nullable String" > in Java, meaning that, yet again, we seem to have chosen the wrong > default. > > Our friends in the `C#` community have explored the possibility of a > "flippening".? `C#` started with the Java defaults, and later > provided a > compiler mode to flip the default on a per-module basis, with > checking (or > pollution risk) at the boundary between modules with opposite > defaults.? This is > an interesting experiment and we look forward to seeing how this > plays out in > the `C#` ecosystem. > > Alternately, another possible approach for Java is to continue to > treat the > unadorned name as having "raw" or "unknown" nullity, encouraging > users to > annotate types with either `!` or `?`.? This approach has been > partially > explored in the `JSpecify` project.? Within this approach is a > range of options > for what the language will do with such types; there is a risk of > flooding users > with warnings.? We may want to leave such analysis to > extralinguistic type > checkers, at least initially -- but we would like to not foreclose > on the > possibility of an eventual flippening. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Sat Jun 3 18:12:34 2023 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sat, 3 Jun 2023 20:12:34 +0200 (CEST) Subject: Design document on nullability and value types In-Reply-To: <137cc4cc-35b9-5222-ff73-f330ce151dbd@oracle.com> References: <2075843456.71765771.1685772779404.JavaMail.zimbra@univ-eiffel.fr> <137cc4cc-35b9-5222-ff73-f330ce151dbd@oracle.com> Message-ID: <1291419694.72086863.1685815954778.JavaMail.zimbra@univ-eiffel.fr> > From: "Brian Goetz" > To: "Remi Forax" > Cc: "valhalla-spec-experts" > Sent: Saturday, June 3, 2023 5:50:22 PM > Subject: Re: Design document on nullability and value types > You're not wrong; having only B3! is a much simpler feature, because (a) it is > enforced by the VM and (b) we don't have the split of "sometimes you can > believe the bang, sometimes you can't." (On the other hand, imagine the cries > of bloody murder when we tell people they can't have any form of `String!`.) > And yes, arrays are particularly challenging. I've done several presentations saying that '!' will be, at least at first, limited to B3, so far no cry :) And I really like what Kevin* has suggested last Wednesday. Let's use JSpecify annotations/JSPecify compatible annotation processors to gather data before commiting to a precise semantics for non B3 types. > The document acknowledged to this in that it said we might start with B3! only, > but we also have to work out what our options are for extending to other > reference types. (As it turns out, the anomalies are fairly close to, though > not exactly the same as, heap pollution with erased generics -- which is to > say, pollution still sucks, but its a pollution we're at least somewhat > familiar with.) There are 4 locations (I may miss others ?) where we there is a choice to make between enforcing or not nullability, and they are variations because the nullability can be enforced at declaration site or at use site: - method calls (arguments at callsite, parameters at callee site) - cast - fields (field initialization, field read, field write) - arrays (array initialization, array read, array write) I like the idea that we can use JSpecify annotations and let the user choose if those annotations have teeth or not. Kevin talked about using a bytecode rewriter (it can be Java source code inserted too) to enforce nullability at runtime and we can also try to implement nullable array in the VM to help. Once people will start to use JSpecify compatible annotation processors, we will get a good idea of what can be done and what is too expensive, not enough backward compatbile, etc. At that point, we can commit to the exact semantics for non null identity types (and non null B2). So for our users, the message is, at first, "!" can only be used on B3 but we have a plan of migration and you can help us by using JSpecify compatible annotation processors. > I don't see where you get "erased bangs today foreclose on enforced bangs > tomorrow" (and if that's true, we also have no path to generic specialization.) I do not think that full generics reification is acheivable, only a lesser form, what we call "generic specialization" but which is not fully defined, seems practical. As a specific example, i do not think that casting to a parametrized type can be enforced at runtime, too many codes will stop working. > I think its a significant overstatement to say "syntax but not semantics" or > "only to have a more uniform syntax"; the concepts and semantics are also the > same, it is the runtime guarantees that are different under some circumstances. > Your concern is valid, but don't overstate it. > All of this is to say: your concerns are valid and we've been struggling with > them, but I think saying "no String! ever" is also not a realistic position, so > somewhere compromises will have to be made, and our job is to find the right > set of compromises. I don't know why you come to the conclusion that i say "no String! ever". I'm saying, Iet's not settle the exact semantics of String! right now, because we will choose the lowest common denominator, which is perhaps too much null pollution compared to what can be achived practically. I would prefer us to use JSpecify as an experience first, gather data and then use those data to choose the right semantics for String! R?mi * warning, warning, I may have understood what i wanted to understood. > On 6/3/2023 2:12 AM, Remi Forax wrote: >> Hi all, >> I am not convinced that adding the nullability annotations to other types than >> the value types with a default value is a good move, >> it seems to shut the door to potential futures where being non null is more >> strongly enforced by the VM. >> If the goal of Valhalla is to introduce value types, I think we are extending >> too much our reach here, making decisions for Java we may regret later. >> In understand the appeal of providing an unified view regarding nullability but >> sadly as explained in this document, the unification will only be true in term >> of syntax not in term of semantics, only the value types with a default value >> being reified, enforced by the VM. >> As an example, the proposal Array 2.0 of John is still on the table and proposes >> to add non null array at runtime, so choosing to erase the nullability >> annotation now seems a bad move now if in the future such kind of arrays are >> added to the Java platform. >> Obviously, we want to allow migration from/to identity type and value type (or >> from value type without a default value to a value type with a default value) >> so we have to specify both at compile time and at runtime a semantics that >> allow that. But I am seeing erasing nullability annotations in case of identity >> type as a too easy shortcut. >> I would prefer to live in a world where '!' is only available on value type with >> a default value at compile time, with the field and array creation being only >> enforced at runtime only if the class is a value class with a default value. >> A world where adding an implicit constructor is a source backward compatible >> move but the opposite is not and where the VM ignores nullability attributes at >> runtime if the class is not actually a value class with a default value so >> moving from a value class with a default value to a value class without a >> default value is a binary compatible move. >> With the model above, we only have null pollution because of separate >> compilations, especially, we keep the property that when unboxing (i.e. the >> transition T to T!) null checking is done by the VM so there is no null >> pollution. >> Allowing more erasure only to have a more uniform syntax is not appealing to me >> and seems worst if seen from the future. >> regards, >> R?mi >>> From: "Brian Goetz" [ mailto:brian.goetz at oracle.com | ] >>> To: "valhalla-spec-experts" [ mailto:valhalla-spec-experts at openjdk.java.net | >>> ] >>> Sent: Wednesday, May 31, 2023 8:37:34 PM >>> Subject: Design document on nullability and value types >>> As we've hinted at, we've made some progress refining the essential differences >>> between primitive and reference types, which has enabled us to shed the `.val` >>> / `.ref` distinction and lean more heavily on nullability. The following >>> document outlines the observations that have enabled this current turn of >>> direction and some of its consequences. >>> This document is mostly to be interpreted in the context of the Valhalla >>> journey, and so talks about where we were a few months ago and where we're >>> heading now. >>> # Rehabilitating primitive classes: a nullity-centric approach >>> Over the course of Project Valhalla, we have observed that there are two >>> distinct groups of value types. We've tried stacking them in various ways, but >>> there are always two groups, which we've historically described as "objects >>> without identity" and "primitive classes", and which admit different degrees of >>> flattening. >>> The first group, which we are now calling "value objects" or "value classes", >>> represent the minimal departure from traditional classes to disavow object >>> identity. The existing classes that are described as "value-based", such as >>> `Optional` or `LocalDate`, are candidate for migrating to value classes. Such >>> classes give up object identity; identity-sensitive behaviors are either recast >>> as state-based (such as for `==` and `Objects::identityHashCode`) or partialized >>> (`synchronized`, `WeakReference`), and such classes must live without the >>> affordances of identity (mutability, layout polymorphism.) In return, they >>> avoid being burdened by "accidental identity" which can be a source of bugs, and >>> gain significant optimization for stack-based values (e.g., scalarization in >>> calling convention) and other JIT optimizations. >>> The second group, which we had been calling "primitive classes" (we are now >>> moving away from that term), are those that are more like the existing >>> primitives, such as `Decimal` or `Complex`. Where ordinary value classes, like >>> identity classes, gave rise to a single (reference) type, these classes gave >>> rise to two types, a value type (`X.val`) and a reference type (`X.ref`). This >>> pair of types was directly analogous to legacy primitives and their boxes. These >>> classes come with more restrictions and more to think about, but are rewarded >>> with greater heap flattening. This model -- after several iterations -- seemed >>> to meet the goals for expressiveness and performance: we can express the >>> difference between `int`-like behavior and `Integer`-like behavior, and get >>> routine flattening for `int`-like types. But the result still had many >>> imbalances; the distinction was heavyweight, and a significant fraction of the >>> incremental specification complexity was centered only on these types. We >>> eventually concluded that the source of this was trying to model the `int` / >>> `Integer` distinction directly, and that this distinction, while grounded in >>> user experience, was just not "primitive" enough. >>> In this document, we will break down the characteristics of so-called "primitive >>> classes" into more "primitive" (and hopefully less ad-hoc) distinctions. This >>> results in a simpler model, streamlines the syntactic baggage, and enables us to >>> finally reunite with an old friend, null-exclusion (bang) types. Rather than >>> treating "value types" and "reference types" as different things, we can treat >>> the existing primitives (and the "value projection" of user-defined primitive >>> classes) as being restricted references, whose restrictions enable the desired >>> runtime properties. >>> ## Primitives and objects >>> In a previous edition of _State of Valhalla_, we outlined a host of differences >>> between primitives and objects: >>> | Primitives | Objects | >>>| ------------------------------------------ | >>> | ----------------------------------------- | >>> | No identity (pure values) | Identity | >>> | `==` compares state | `==` compares object identity | >>> | Built-in | Declared in classes | >>>| No members (fields, methods, constructors) | Members (including mutable fields) >>> | | >>> | No supertypes or subtypes | Class and interface inheritance | >>> | Represented directly in memory | Represented indirectly through references | >>> | Not nullable | Nullable | >>> | Default value is zero | Default value is null | >>> | Arrays are monomorphic | Arrays are covariant | >>> | May tear under race | Initialization safety guarantees | >>> | Have reference companions (boxes) | Don't need reference companions | >>> Over many iterations, we have chipped away at this list, mostly by making >>> classes richer: value classes can disavow identity (and thereby opt into >>> state-based `==` comparison); the lack of members and supertypes are an >>> accidental restriction that can go away with declarable value classes; we can >>> make primitive arrays covariant with arrays of their boxes; we can let some >>> class declarations opt into non-atomicity under race. That leaves the >>> following, condensed list of differences: >>> | Primitives | Objects | >>> | --------------------------------- | ----------------------------------------- | >>> | Represented directly in memory | Represented indirectly through references | >>> | Not nullable | Nullable | >>> | Default value is zero | Default value is null | >>> | Have reference companions (boxes) | Don't need reference companions | >>> The previous approach ("primitive classes") started with the assumption that >>> this is the list of things to be modeled by the value/reference distinction. In >>> this document we go further, by showing that flattening (direct representation) >>> is derived from more basic principles around nullity and initialization >>> requirements, and perhaps surprisingly, the concept of "primitive type" can >>> disappear almost completely, save only for historical vestiges related to the >>> existing eight primitives. The `.val` type can be replaced by restricted >>> references whose restrictions enable the desired representational properties. As >>> is consistent with the goals of Valhalla, flattenability is an emergent >>> property, gained by giving up those properties that would undermine >>> flattenability, rather than being a linguistic concept on its own. >>> ### Initialization >>> The key distinction between today's primitives and objects has to do with >>> _initialization requirements_. Primitives are designed to be _used >>> uninitialized_; if we declare a field `int count`, it is reliably initialized to >>> zero by the JVM before any code can access it. This initial value is a >>> perfectly good default, and it is not a bug to read or even increment this field >>> before it has been explicitly assigned a value by the program, because it has >>> _already_ been initialized to a known good value by the JVM. The zero value >>> pre-written by the JVM is not just a safety net; it is actually part of the >>> programming model that primitives start out life with "good enough" defaults. >>> This is part of what it means to be a primitive type. >>> Objects, on the other hand, are not designed for uninitialized use; they must be >>> initialized via constructors before use. The default zero values written to an >>> object's fields by the JVM typically don't necessarily constitute a valid state >>> according to the classes specification, and, even if it did, is rarely a good >>> default value. Therefore, we require that class instances be initialized by >>> their constructors before they can be exposed to the rest of the program. To >>> ensure that this happens, objects are referenced exclusively through _object >>> references_, which _can_ be safely used uninitialized -- because they reliably >>> have the usable default value of `null`. (Some may quibble with this use of >>> "safely" and "usable", because null references are fairly limited, but they do >>> their limited job correctly: we can easily and safely test whether a reference >>> is null, and if we accidentally dereference a null reference, we get a clear >>> exception rather than accessing uninitialized object state.) >>> > Primitives can be safely used without explicit initialization; objects cannot. >>> > Object references are nullable _precisely because_ objects cannot be used >>> > safely without explicit initialization. >>> ### Nullability >>> A key difference between today's primitives and references is that primitives >>> are non-nullable and references are nullable. One might think this was >>> primarily a choice of convenience: null is useful for references as a universal >>> sentinel, and not all that useful for primitives (when we want nullable >>> primitives we can use the box classes -- but we usually don't.) But the >>> reality is not one of convenience, but of necessity: nullability is _required_ >>> for the safety of objects, and usually _detrimental_ to the performance of >>> primitives. >>> Nullability for object references is a forced move because null is what is >>> preventing us from accessing uninitialized object state. Nullability for >>> primitives is usually not needed, but that's not the only reason primitives are >>> non-nullable. If primitives were nullable, `null` would be another state that >>> would have to be represented in memory, and the costs would be out of line with >>> the benefits. Since a 64-bit `long` uses all of its bit patterns, a nullable >>> `long` would require at least 65 bits, and alignment requirements would likely >>> round this up to 128 bits, doubling memory usage. (The density cost here is >>> substantial, but it gets worse because most hardware today does not have cheap >>> atomic 128 bit loads and stores. Since tearing might conflate a null value with >>> a non-null value -- even worse than the usual consequences of tearing -- this >>> would push us strongly towards using an indirection instead.) So >>> non-nullability is a precondition for effective flattening and density of >>> primitives, and nullable primitives would involve giving up the flatness and >>> density that are the reason to have primitives in the first place. >>> > Nullability interferes with heap flattening. >>> To summarize, the design of primitives and objects implicitly stems from the >>> following facts: >>> - For most objects, the uninitialized (zeroed) state is either invalid or not a >>> good-enough default value; >>> - For primitives, the uninitialized (zeroed) state is both valid and a >>> good-enough default value; >>> - Having the uninitialized (zeroed) state be a good-enough default is a >>> precondition for reliable flattening; >>> - Nullability is required when the the uninitialized (zeroed) state is not a >>> good-enough default; >>> - Nullability not only has a footprint cost, but often is an impediment to >>> flattening. >>> > Primitives exist in the first place because they can be flattened to give us >>> > better numeric performance; flattening requires giving up nullity and >>> > tolerance of uninitialized (zero) values. >>> These observations were baked in to the language (and other languages too), but >>> the motivation for these decisions was then "erased" by the rigid distinction >>> between primitives and objects. Valhalla seeks to put that choice back into the >>> user's hands. >>> ### Getting the best of both worlds >>> Project Valhalla promises the best of both worlds: sufficiently constrained >>> entities can "code like a class and work like an int." Classes that give up >>> object identity can get some of the runtime benefits of primitives, but to get >>> full heap flattening, we must embrace the two defining characteristics of >>> primitives described so far: non-nullability and safe uninitialized use. >>> Some candidates for value classes, such as `Complex`, are safe to use >>> uninitialized because the default (zero) value is a good initial value. Others, >>> like `LocalDate`, simply have no good default value (zero or otherwise), and >>> therefore need the initialzation protocol enabled by null-default object >>> references. This distinction in inherent to the semantics of the domain; some >>> domains simply do not have reasonable default value, and this is a choice that >>> the class author must capture when the code is written. >>> There is a long list of classes that are candidates to be value classes; some >>> are like `Complex`, but many are more like `LocalDate`. The latter group can >>> still benefit significantly from eliminating identity, but can't necessarily get >>> full heap flattening. The former group, which are most like today's primitives, >>> can get all the benefits, including heap flattening -- when their instances are >>> non-null. >>> ### Declaring value classes >>> As in previous iterations, a class can be declared as as _value class_: >>> ``` >>> value class LocalDate { ... } >>> ``` >>> A value class gives up identity and its consequences (e.g., mutability) -- and >>> that's it. The resulting `LocalDate` type is still a reference type, and >>> variables of type `LocalDate` are still nullable. Instances can get significant >>> optimizations for on-stack use but are still usually represented in the heap via >>> indirections. >>> ### Implicitly constructible value classes >>> In order to get the next group of benefits, a value class must additionally >>> attest that it can be used uninitialized. Because this is a statement of how >>> instances of this class come into existence, modeling this as a special kind of >>> constructor seems natural: >>> ``` >>> value class Complex { >>> private int re; >>> private int im; >>> public implicit Complex(); >>> public Complex(int re, int im) { ... } >>> ... >>> } >>> ``` >>> These two constructors say that there are two ways a `Complex` instance comes >>> into existence: the first is via the traditional constructor that takes real and >>> imaginary values (`new Complex(1.0, 1.0)`), and the second is via the _implicit_ >>> constructor that produces the instance used to initialize fields and array >>> elements to their default values. That the implicit constructor cannot have a >>> body is a signal that the "zero default" is not something the class author can >>> fine-tune. A value class with an implicit constructor is called an _implicitly >>> constructible_ value class. >>> Having an implicit constructor is a necessary but not sufficient condition for >>> heap flattening. The other required condition is that variable that holds a >>> `Complex` needs to be non-nullable. In the previous iteration, the `.val` type >>> was non-nullable for the same reason primitive types were, and therefore `.val` >>> types could be fully flattened. However, after several rounds of teasing apart >>> the fundamental properties of primitives and value types, nullability has >>> finally sedimented to a place in the model where a sensible reunion between >>> value types and non-nullable types may be possible. >>> ## Null exclusion >>> Non-nullable reference types have been a frequent request for Java for years, >>> having been explored in `C#`, Kotlin, and Scala. The goals of non-nullable >>> types are sensible: richer types means safer programs. It is a pervasive >>> problem in Java libraries that we are not able to express within the language >>> whether a returned object reference might be null, or is known never to be null, >>> and programmers can therefore easily make wrong assumptions about nullability. >>> To date, Project Valhalla has deliberately steered clear of non-nullable types >>> as a standalone feature. This is not only because the goals of Valhalla were too >>> ambitious to burden the project with another ambitious goal (though that is >>> true), but for a more fundamental reason: the assumptions one might make in a >>> vacuum about the semantics of non-nullable types would likely become hidden >>> sources of constraints for the value type design, which was already bordering on >>> over-constrained. Now that the project has progressed sufficiently, we are more >>> confident that we can engage with the issue of null exclusion. >>> A _refinement type_ (or _restriction type_) is a type that is derived from >>> another type that excludes certain values from the derived type's value set, >>> such as "the non-negative integers". In the most general form, a refinement type >>> is defined by one or more predicates (Liquid Haskell and Clojure Spec are >>> examples of this); range types in Pascal are a more constrained form of >>> refinement type. Non-nullable types ("bang" types) can similarly be viewed as a >>> constrained form of refinement type, characterized by the predicate `x != null`. >>> (Note that the null-excluding refinement type `X!` of a reference type is still >>> a reference type.) >>> Rather than saying that primitive classes give rise to two types, `X.val` and >>> `X.ref`, we can observe the the null-excluding type `X!` of a >>> implicitly-constructible value class can have the same runtime characteristic as >>> the `.val` type in the previous round. Both the declaration-site property that >>> a value class is implicitly constructible, and the use-site property that a >>> variable is null-excluding, are necessary to routinely get flattening. >>> Related to null exclusion is _null-adjunction_; this takes a non-nullable type >>> (such as `int`) or a type of indeterminate nullability (such as a type variable >>> `T` in a generic class that can be instantiated with either nullable or >>> non-nullable type parameters) and produces a type that is explicitly nullable >>> (`int?` or `T?`.) In the current form of the design, there is only one place >>> where the null-adjoining type is strictly needed -- when generic code needs to >>> express "`T`, but might be null. The canonical example of this is `Map::get`; >>> it wants to wants to return `V?`, to capture the fact that `Map` uses `null` to >>> represent "no mapping". >>> For a given class `C`, the type `C!` is clearly non-nullable, and the type `C?` >>> is clearly nullable. What of the unadorned name `C`? This has _unspecified_ >>> nullability. Unspecified nullability is analogous to raw types in generics (we >>> could call this "raw nullability"); we cannot be sure what the author had in >>> mind, and so must find a balance between the desire for greater null safety and >>> tolerance of ambiguity in author intent. >>> Readers who are familiar with explicitly nullable and non-nullable types in >>> other languages may be initially surprised at some of the choices made regarding >>> null-exclusion (and null-adjunction) types here. The interpretation outlined >>> here is not necessarily the "obvious" one, because it is constrained both by the >>> needs of null-exclusion, of Valhalla, and the migration-compatibility >>> constraints needed for the ecosystem to make a successful transition to types >>> that have richer nullability information. >>> While the theory outlined here will allow all class types to have a >>> null-excluding refinement type, it is also possible that we will initially >>> restrict null-exclusion to implicitly constructible value types. There are >>> several reasons to consider pursuing such an incremental path, including the >>> fact that we will be able to reify the non-nullability of implicitly >>> constructible value types in the JVM, whereas the null-exclusion types of other >>> classes such as `String` or of ordinary value classes such as `LocalDate` would >>> need to be done through erasure, increasing the possible sources of null >>> polluion. >>> ### Goals >>> We adopt the following set of goals for adding null-excluding refinement types: >>> - More complete unification of primitives with classes; >>> - Flatness is an emergent property that can derive from more basic semantic >>> constraints, such as identity-freedom, implicit constructibility, and >>> non-nullity; >>> - Merge the concept of "value companion" (`.val` type) into the null-restricted >>> refinement type of implicitly constructible value classes; >>> - Allow programmers to annotate type uses to explicitly exclude or affirm nulls >>> in the value set; >>> - Provide some degree of runtime nullness checking to detect null pollution; >>> - Annotating an existing API (one based on identity classes) with additional >>> nullness information should be binary- and source-compatible. >>> The last goal is a source of strong constraints, and not one to be taken >>> lightly. If an existing API that specifies "this method never returns null" >>> cannot be compatibly migrated to one where this constraint is reflected in the >>> method declaration proper, the usefulness of null-exclusion types is greatly >>> reduced; library maintainers will be put to a bad choice of forgoing a feature >>> that will make their APIs safer, or making an incompatible change in order to do >>> so. If we were building a new language from scratch, the considerations might >>> be different, but we do not have that luxury. "Just copying" what other >>> languages have done here is a non-starter. >>> ### Interoperation between nullable and non-nullable types >>> We enable conversions between a nullable type and a compatible null-excluding >>> refinement type by adding new widening and narrowing conversions between `T?` >>> and `T!` that have analogous semantics to the existing boxing and unboxing >>> conversions between `Integer` and `int`. Just as with boxing and unboxing, >>> widening from a non-nullable type to a nullable type is unconditional and never >>> fails, and narrowing from a nullable type to a non-nullable type may fail by >>> throwing `NullPointerException`. These conversions for null-excluding types >>> would be sensible in assignment context, cast context, and method invocation >>> context (both loose and strict, unlike boxing for primitives today.) This would >>> allow existing assignments, invocation, and overload applicability checks to >>> continue to work even after migrating one of the types involved, as required for >>> source-compatibility. >>> Checking for bad values can mirror the approach taken for generics. When a >>> richer compile-time type system erases to a less-rich runtime type system, type >>> safety derives from a mix of compile-time type checking and synthetic runtime >>> checks. In both cases, there is a possibility of pollution which can be >>> injected at the boundary between legacy and new code, by malicious code, or >>> through injudicious use of unchecked casts and raw types. And like generics, we >>> would like to offer the possibility that if a program compiles in its entirety >>> with no unchecked warnings, null-excluding types will not be observed to contain >>> null. To achieve this, we will need a combination of runtime checks, new >>> unchecked warnings, and possibly restrictions on initialization. >>> The intrusion on the type-checking of generics here is considerable; nullity >>> will have to be handled in type inference, bounds conformance, subtyping, etc. >>> In addition, there are new sources of heap pollution and new conditions under >>> which a varaible may be polluted. The _Universal Generics_ JEP outlines a >>> number of unchecked warnings that must be issued in order to avoid null >>> pollution in type variables that might be instantiated either with a nullable or >>> null-excluding type. While this work was designed for `ref` and `val` types, >>> much of it applies directly to null-excluding types. >>> The liberal use of conversion rather than subtyping here may be surprising to >>> readers who are familiar with other languages that support null-excluding types. >>> At first, it may appear to be "giving up all the benefit" of having annotated >>> APIs for nullness, since a nullable value may be assigned directly to a >>> non-nullable type without requiring a cast. But the reality is that for the >>> first decade at least, we will at best be living in a mixed world where some >>> APIs are migrated to use nullness information and some will not, and forcing >>> users to modify code that uses these libraries (and then do so again and again >>> as more libraries migrate) would be an unnacceptable tax on Java users, and a >>> deterrent to libraries migrating to use these features. >>> Starting from `T! <: T?` -- and forcing explicit conversions when you want to go >>> from nullable to non-nullable values -- does seem an obvious choice if you have >>> the luxury of building a type system from scratch. But if we want to make >>> migration to null-excluding types a source-compatible change for libraries and >>> clients, we cannot accept a strict subtyping approach. (Even if we did, we >>> could still only use subtyping in one direction, and would have to add an >>> additional implicit conversion for the other direction -- a conversion that is >>> similar to the narrowing conversion proposed here.) >>> Further, primitives _already_ use boxing and unboxing conversions to go between >>> their nullable (box) and non-nullable (primitive) forms. So choosing subtyping >>> for references (plus an unbalanced implicit conversion) and boxing/unboxing >>> conversion for primitives means our treatment of null-excluding types is >>> gratuitously different for primitives than for other classes. >>> Another consequence of wanting migration compatibility for annotating a library >>> with nullness constraints is that nullness constraints cannot affect overload >>> selection. Compatibility is not just for clients, it is also for subclasses. >>> ### Null exclusion for implicitly constructible value classes >>> Implicitly constructible value classes go particularly well with null exclusion, >>> because we can choose a memory representation that _cannot_ encode null, >>> enabling a more compact and direct representation. >>> The Valhalla JVM has support for such a representation, and so we describe the >>> null-exclusion type of an implicitly constructible value class as _strongly null >>> excluding_. This means that its null exclusion is reified by the JVM. Such a >>> variable can never be seen to contain null, because null simply does not have a >>> runtime representation for these types. This is only possible because these >>> classes are implicitly constructible; that the default zero value written by the >>> JVM is known to be a valid value of the domain. As with primitives, these types >>> are explicitly safe to use uninitialized. >>> A strongly null-excluding type will have a type mirror, as type mirrors describe >>> reifiable types. >>> ### Null exclusion for other classes >>> For identity classes and non-implicitly-constructible value classes, the story >>> is not quite as nice. Since there is no JVM representation of "non-nullable >>> String", the best we can do is translate `String!` to `String` (a form of >>> erasure), and then try to keep the nulls at bay. This means that we do not get >>> the flattening or density benefits, and null-excluding variables may still be >>> subject to heap pollution. We can try to minimize this with a combination of >>> static type checking and generated runtime checks. We refer to the >>> null-exclusion type of an identity or non-implicitly constructible value class >>> as _weakly null-excluding_. >>> There is an additional source of potential null pollution, aside from the >>> sources analogous to generic heap pollution: the JVM itself. The JVM >>> initializes references in the heap to null. If `String!` erases to an ordinary >>> `String` reference, there is at least a small window in time when this >>> supposedly non-nullable field contains null. We can erect barriers to reduce >>> the window in which this can be observed, but these barriers will not be >>> foolproof. For example, the compiler could enforce that a field of type >>> `String!` either has an initializer or is definitely assigned in every >>> constructor. However, if the receiver escapes during construction, all bets are >>> off, just as they are with initialization safety for final fields. >>> We have a similar problem with arrays of `String!`; newly created arrays >>> initialize their elements to the default value for the component type, which is >>> `null`, and we don't even have the option of requiring an initializer as we >>> would with fields. (Since a `String![]` is also a `String[]`, one option is to >>> to outlaw the direct creation of arrays of weakly null-excluding types, instead >>> providing reflective API points which will safely create the array and >>> initialize all elements to a non-null value.) >>> A weakly null-excluding type will not have a type mirror, as the nullity >>> information is erased for these types. Generic signatures would be extended to >>> represent null-exclusion, and similarly the `Type` hiearchy would reflect such >>> signatures. >>> Because of erasure and the new possibilities for pollution, allowing >>> null-exclusion types for identity classes introduces significant potential new >>> complexity. For this reason, we may choose a staged approach where >>> null-restricted types are initially limited to the strongly null-restricted >>> ones. >>> ### Null exclusion for other value classes >>> Value classes that are not implicitly constructible are similar to identity >>> classes in that their null-exclusion types are only weakly null-excluding. >>> These classes are the ones for which the author has explicitly decided that the >>> default zero value is not a valid member of the domain, so we must ensure that >>> in no case does this invalid value ever escape. This effectively means that we >>> must similarly erase these types to a nullable representation to ensure that the >>> zero value stays contained. (There are limited heroics the VM can do with >>> alternate representations for null when these classes are small and have readily >>> identifiable slack bits, but this is merely a potential optimization for the >>> future.) >>> ### Atomicity >>> Primitives additionally have the property that larger-than-32-bit primitives >>> (`long` and `double`) may tear under race. The allowance for tearing was an >>> accomodation to the fact that numeric code is often performance-critical, and so >>> a tradeoff was made to allow for more performance at the cost of less safety for >>> incorrect programs. The corresponding box types, as well as primitive variables >>> declared `volatile`, are guaranteed not to tear, even under race. (See the >>> document entitled "Understanding non-atomicity and tearing" for more detail.) >>> Implicitly constructible value classes can be declared as "non-atomic" to >>> indicate that its null-exclusion type may tear under race (if not declared >>> `volatile`), just as with `long` and `double`. The classes `Long` and `Double` >>> would be declared non-atomic (though most implementations still offer atomic >>> access for 64-bit primitives.) >>> ### Flattening >>> Flattening in the heap is an emergent property, which is achieved when we give >>> up the degrees of freedom that would prevent flattening: >>> - Identity prevents flattening entirely; >>> - Nullability prevents flattening in the absence of heroics involving exotic >>> representations for null; >>> - The inability to use a class without initialization requires nullability at >>> the VM representation level, undermining flattening; >>> - Atomicity prevents flattening for larger value objects. >>> Putting this together, the null-exclusion type of implicitly constructible value >>> classes is flattenable in the heap when the class is non-atomic or the layout is >>> suitably small. For ordinary value classes, we can still get flattening in the >>> calling convention: all identity-free types can be flattened on the stack, >>> regardless of layout size or nullability. >>> ### Summarizing null-exclusion >>> The feature described so far is at the weak end of the spectrum of features >>> described by "non-nullable types". We make tradeoffs to enable gradual >>> migration compatibility, moving checks to the boundary -- where in some cases >>> they might not happen due to erasure, separate compilation, or just dishonest >>> clients. >>> Users may choose to look at this as "glass X% full" or "glass (100-X)% empty". >>> We can now more clearly say what we mean, migrate incrementally towards more >>> explicit and safe code without forking the ecosystem, and catch many errors >>> earlier in time. On the other hand, it is less explicit where we might >>> experience runtime failures, because autoboxing makes unboxing implicit. And >>> some users will surely complain merely because this is not what their favorite >>> language does. But it is the null-exclusion we can actually have, rather than >>> the one we wish we might have in an alternate universe. >>> This approach yields a significant payoff for the Valhalla story. Valhalla >>> already had to deal with considerable new complexity to handle the relationship >>> between reference and value types -- but this new complexity applied only to >>> primitive classes. For less incremental complexity, we can have a more uniform >>> treatment of null-exclusion across all class types. The story is significantly >>> simpler and more unified than we had previously: >>> - Everything, including the legacy primitives, is an object (an instance of >>> some class); >>> - Every type, including the legacy primitives, is derived from a class; >>> - All types are reference types (they refer to objects), but some reference >>> types (non-nullable references to implicitly constructible objects) exhibit >>> the runtime behavior of primitives; >>> - Some reference types exclude null, and some null-excluding reference types >>> are reifiable with a known-good non-null default; >>> - Every type can have a corresponding null-exclusion type. >>> ## Planning for a null-free future (?) >>> Users prefer working with unnanotated types (e.g., `Foo`) rather than explicitly >>> annotated types (`Foo!`, `Foo?`), where possible. The unannotated type `Foo` >>> could mean one of three things: an alias for `Foo!`, an alias for `Foo?`, or a >>> type of "raw" (unknown) nullity. Investigations into null-excluding type >>> systems have shown that the better default would be to treat an unannotated name >>> as indicating non-nullability, and use explicitly nullable types (`T?`) to >>> indicate the presence of null, because returning or accepting null is generally >>> a less common case. Of course, today `String` means "possibly nullable String" >>> in Java, meaning that, yet again, we seem to have chosen the wrong default. >>> Our friends in the `C#` community have explored the possibility of a >>> "flippening". `C#` started with the Java defaults, and later provided a >>> compiler mode to flip the default on a per-module basis, with checking (or >>> pollution risk) at the boundary between modules with opposite defaults. This is >>> an interesting experiment and we look forward to seeing how this plays out in >>> the `C#` ecosystem. >>> Alternately, another possible approach for Java is to continue to treat the >>> unadorned name as having "raw" or "unknown" nullity, encouraging users to >>> annotate types with either `!` or `?`. This approach has been partially >>> explored in the `JSpecify` project. Within this approach is a range of options >>> for what the language will do with such types; there is a risk of flooding users >>> with warnings. We may want to leave such analysis to extralinguistic type >>> checkers, at least initially -- but we would like to not foreclose on the >>> possibility of an eventual flippening. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Jun 6 01:52:16 2023 From: john.r.rose at oracle.com (John Rose) Date: Mon, 05 Jun 2023 18:52:16 -0700 Subject: Preload attribute In-Reply-To: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> Message-ID: <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> Overall, as we look at Preload, it is looking more and more like a no-op, as far as the JLS and JVMS is concerned. Perhaps that is a signal that it should be placed somewhere outside of the JVMS, such as in a Leyden-specific mechanism (a preload list) produced in a way decoupled from any transactions between the JLS and JVMS. On 1 Jun 2023, at 11:24, Dan Smith wrote: >> On Jun 1, 2023, at 10:53 AM, Dan Heidinga >> wrote: >> >> A couple of questions about the spec for the Preload attribute[0]. >> The current spec says it indicates "certain classes contain >> information that may be of interest during linkage." >> >> The Preload attribute removes one need for Q modifiers while allowing >> calling convention optimizations and layout decisions to be made >> early. >> >> The current spec is quite vague on what classes should be included in >> the attribute and on when / what the VM will do with those classes >> (or even if it does anything). > > FWIW, the JEP has more detail about when javac is expected to include > classes in Preload. > >> I think it's time to tighten up the spec for Preload attribute and >> specify: >> * what the VM will do with classes listed in the attribute > > It is intentional that the VM may choose to do nothing. So anything it > does is purely an optimization. Looks like a nop? (In particular, it should not *reject* any inputs, because that would destabilize separate compilability in unpredictable ways.) > >> * when those classes will be loaded (ie: somewhere in JVMS 5.3) > > If the VM chooses to load Preload classes, then our thinking was that > JVMS 5.4 already describes the details of timing: > > https://docs.oracle.com/javase/specs/jvms/se20/html/jvms-5.html#jvms-5.4 > > So, for example, "Alternatively, an implementation may choose an > "eager" linkage strategy, where all symbolic references are resolved > at once when the class or interface is being verified." That is, the > Preload classes could all be loaded during verification, or at some > other stage of linking. > > My expectation is that the natural point for processing Preload is > during preparation as vtables are set up, but sometimes I get these > things wrong. :-) Sometimes you want them early (for instance layout) and sometimes you need to wait (vtable layout or even just before ). >> * how invalid cases are handled, including circularities (Class A's >> Preload mentions B <: A) Silent supression of errors, if any. Again, like a nop? > > "Errors detected during linkage are thrown at a point in the program > where some action is taken by the program that might, directly or > indirectly, require linkage to the class or interface involved in the > error." > > I've always found this rule super vague, but I think "require" is the > key word, and implies that errors caused by Preload resolution should > just be ignored. (Because Preload isn't "required" to be processed at > all.) > >> * what types of classes can be listed (any? only values?) > > Definitely intend to support any classes of interest. Say a future > optimization wants to know about a sealed superinterface, for > example?it would be fine to tweak javac to add that interface to > Preload, and then use the information to facilitate the optimization. > > There's a lot of nondeterminism here?can a compliant system trigger > changes to class loading timing, but just on my birthday??but I > think it's within the scope of JVMS 5.4, which provides a lot of > latitude for loading classes whenever it's convenient. > >> It probably makes sense to start from the current Hotspot handling of >> the attribute and fine tune that into the spec? > > So I've outlined our hands-off stake in the ground above. The spec > would definitely benefit, at least, from a non-normative > cross-reference to 5.4 and short explanation. Beyond that, I think > we'd be open to specifying more if we can agree something more is > needed... I think we could get the benefits in Fred?s prototype (as he describes) with a list that is decoupled from any particular class file, and Leyden could deliver this list. As you see, I?m kind of sour on a Preload attribute these days. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jun 7 02:29:07 2023 From: john.r.rose at oracle.com (John Rose) Date: Tue, 06 Jun 2023 19:29:07 -0700 Subject: implicit constructor translation? In-Reply-To: <7B20DA4F-26A8-4F8B-972C-FA4718434044@oracle.com> References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> <7B20DA4F-26A8-4F8B-972C-FA4718434044@oracle.com> Message-ID: On 1 Jun 2023, at 11:30, Dan Smith wrote: >> On Jun 1, 2023, at 10:59 AM, Brian Goetz >> wrote: >> >> I think that there should be an explicit (and not synthetic) >> method_info for the implicit constructor, with the obvious Code >> attribute (`defaultvalue` / `areturn`.) >> >> My rationale is: "the declaration said there's a no-arg constructor, >> I should be able to call it". And that intuition is entirely >> reasonable. Users should be able to say `new Complex()` and get a >> default complex value. (Maybe they can also say `Complex.default`; >> maybe we won't need that.) And the same for reflection. I like this. A non-Java language might omit such a method from its classfile, or even give it a different body. (A non-Java language might use such a body for some language-specific purpose. That?s up to the translation strategy of that other language.) For Java, the constructor will either not exist or will always have the given two-instruction body. It will never have a body of explicitly user-written statements. >> >> Also, in case it is not obvious, the following class is illegal: >> >> value class X { >> implicit X(); >> X() { ... } >> } >> >> because it is trying to declare the same constructor twice. An >> implicit constructor is a constructor, just one for which the >> compiler can deduce specific known semantics. That all hangs together well. It allows `new C()` to be a valid expression with a regular translation (as a call to ). We could ?optimize? `new C()` to `defaultvalue`, but why bother? If we keep it as `invokestatic ` then JVMTI has something to work with; you can put breakpoints on the implicit constructor. (No, those breakpoints won?t fire for implicit construction, such as in a flat field or array element. But they will fire for explicit `new C()`.) > Agree with all of this. Some of these details were initially unclear a > few weeks ago, but I think we've settled on a design in which the > implicit constructor is a "real", invokable constructor, in addition > to signaling some metadata about the class. > >> On 6/1/2023 10:47 AM, Dan Heidinga wrote: >>> Alas representing implicit constructors with a `method_info` is not >>> without costs: primarily specing how the method_info exists and >>> explaining why it doesn't have a code attribute. > > There would be nothing special about the method. It only exists > as a code path for explicit constructor invocations. The "I have a > default instance" metadata comes from ImplicitCreation. Yes. Except, I?d like to note a subtle issue here: All value classes ?have? default values, because the `aconst_default` bytecode works for all value classes. (Otherwise, there would be no primordial value to start from in `` methods.) I?d prefer to say this, probably not for the JLS, but for the JVMS: For some value types, the default values are private and for some they are public. (There are no intermediate access levels; we tried that and it was horrible.) What the JLS calls a missing default value the JVMS calls a private default value. This is necessary whenever we write about translation strategies. A private default value (invisible in the JLS) can have a rather rich career in the bytecodes of the JVMS. I think I?m on record as saying that both `aconst_default` and `withfield` should be private to the nest of the declared value class they operate within. Their use will initially be confined to constructors generated statically by a Java translation strategy. But in the future, with ?with? statements or serialization frameworks, it will be useful to support those instructions at other places in the nest, including in dynamically injected nestmates. The JVM already knows how to make privacy checks; we should have the resolution of those instructions include the same privacy check, rather than some new ad hoc check. > (Do we expect reflection to reconstruct ImplicitCreation+no-arg > --> 'Constructor.isImplicit' or 'ACC_IMPLICIT'? Maybe. Not entirely > sure yet what the reflection surface will look like.) Perhaps ACC_MANDATED applies here, to the implicit constructor? I guess not; but it has some of the feel of a mandated parameter, which is not present in the code but visible reflectively. Then again, an implicit constructor *IS* visible in the code, though abbreviated. Kind of like a canonical constructor of a record is visible, though abbreviated. In any case, since Java won?t allow nullary ctors of value types to have arbitrary code, it?s a safe bet that if you reflect a value class and see a nullary ctor, it?s implicit. What else could it be? -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jun 7 02:46:19 2023 From: john.r.rose at oracle.com (John Rose) Date: Tue, 06 Jun 2023 19:46:19 -0700 Subject: implicit constructor translation? In-Reply-To: References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: <168769A6-FC8D-412C-ABB4-397996C16A54@oracle.com> On 1 Jun 2023, at 12:34, Brian Goetz wrote: > On 6/1/2023 2:32 PM, Dan Heidinga wrote: >> >> >> On Thu, Jun 1, 2023 at 1:59?PM Brian Goetz >> wrote: >> >> I think that there should be an explicit (and not synthetic) >> method_info for the implicit constructor, with the obvious Code >> attribute (`defaultvalue` / `areturn`.) >> >> >> I'm slightly concerned about having a Code attribute for the implicit >> constructor as it allows agents (ClassFile load hook & redefinition) >> to modify the bytecodes to be inconsistent with the VM's behaviour >> given the VM won't actually call the implicit constructor. >> >> Telling users it's "as if" the VM called the implicit ctor and then >> having the reflective behaviour be different after retransformation >> is slightly uncomfortable. > > I'm fine if the expansion of the constructor body happens at runtime > rather than compile time; my suggestion was mostly "this is super-easy > in the static compiler, why make more work for the JVM." But, if > you're signing up for that work, I won't stop you.... I haven?t signed up the JVM for that work! It seems like busy-work to me. JVM code which implements some busy-work requirement costs money for no benefit, and its (inevitable) bug tail risks increased attack surfaces. I?m assuming (a) the `aconst_default` bytecode is not API (it?s private) and therefore that (b) any materialization of a default value will go through either the class?s API (real `new C()`) or else via some reflective API point (`C.class.defaultValue()`). Maybe something very simple like: ``` C defaultValue() { // if the next line throws an exception, it goes to the client var a1 = Array.newInstance(asNonNullable(), 1); // at this point we know C has a default value return (C ) Array.get(a1, 0); //OR?? return this.newInstance(); } ``` Regarding the problem of agents, I?d say either we don?t care what a buggy agent does, or else we can might add an implementation restriction that refuses to allow a buggy agent to load an implicit constructor with a bad body. Agents can modify all sorts of JDK internals. (Also non-javac-spun classfiles explore all sorts of odd states.) We can?t play whack-a-mole trying to prevent all kinds of agent misbehavior. That would just end up creating an endless parade of costs and bugs. And I don?t want to add a verification rule; the costs of tweaking the verifier far outweigh the benefits. Let?s give some basic trust our classfile generators and agents. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jun 7 02:57:14 2023 From: john.r.rose at oracle.com (John Rose) Date: Tue, 06 Jun 2023 19:57:14 -0700 Subject: implicit constructor translation? In-Reply-To: References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: On 1 Jun 2023, at 11:49, Dan Smith wrote: > The problem here is we have a language/VM model mismatch. It is also an opportunity. The JVMS does not (and should not) exactly track the JLS. The JLS ?erases? Java code to classfiles, and those classfiles rely on the JVMS, not the JLS. There are many JLS-level invariants and constructs which are (and should be) invisible to the JVMS. The rules for default-value processing are (and should be) different in the JLS and JVMS. > In the VM model: is a factory method that can do whatever it > wants, and be included or not included. All that matters for default > values is ImplicitCreation. Yes, and `aconst_default` is yet another VM operation distinct from `` (because the latter builds on top of the former). But the JLS should not (I think) expose the distinction. The translation strategy can use either primitive as appropriate to materialize defaults. My suggestion is to translate `new C()` as a real `invokestatic`, but consider using something else (reflective `condy` or `aconst_default` inside the nest) for other materializations of the default, if any. > In the language model: the implicit constructor allows both > 'Foo.default' and 'new Foo()', both of which produce the same value. (I partially agree that the syntax `C.default` might possibly have outlived its usefulness. Why not just `new C()`? Maybe `C.default` is a switch constant expression??) > Yes, it's possible to generate bytecode that doesn't conform to the > language model, that's always the risk of designing a language/VM > mismatch. But this feels to me like more of the same in the > constructor space?e.g., we've already got VM-level abstract classes > that can have methods that won't run for value class instance > creation, but will run for identity class instance creation. This is what happens with two specs, the JLS and JVMS. The JVMS does not (and must not) undertake to enforce every JLS invariant. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Thu Jun 8 16:01:05 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 8 Jun 2023 12:01:05 -0400 Subject: Preload attribute In-Reply-To: <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> Message-ID: Thanks Dan and Fred for the clarification on the current spec and Hotspot behaviour and John for suggesting we remove the attribute. I think the problem with the current status quo for the preload attribute is that we're trying to treat it like an optimization - a "we can have our cake" (know which classes to load at some indeterminate point) and "eat it too" (not promise users anything about when or if those classes will be loaded). As it stands, based just on the spec, it's hard for users to know what the preload attribute will do for them. There's nothing they can depend on in the spec and are at the mercy of whatever behaviour VM implementers pick on a given day. Which means they will depend on the current Hotspot behaviour (defacto standardization - yuck!) or expect the behaviour to change all the time. If we decouple the list of preloadable classes from the classfile, how would non-jdk classes be handled? Many applications are a mix of multiple jar files from different maintenance domains which would make having a combined list difficult. We'd also need to think through the classloader implications vs today's approach which attempts to load on the same loader as the class with the preload attribute. What if instead of ditching the attribute, or treating it like an optimization, we firmed up the contract and treated it as a guarantee similar to those of superclasses and superinterfaces as in the first line in section 5.4 "Linking"? The new text would read something like: > Linking a class or interface involves verifying and preparing that class or interface, its direct superclass, its direct superinterfaces, and its element type (if it is an array type), if necessary. Any classes listed in the preload attribute will be loaded at this point. We don't need to say in the JVMS why the classes are in the attribute, but we should be explicit about where the attempt to load them occurs and that errors are ignored. This provides stability to the users and allows for independent implementation while avoiding de facto standardization of today's behaviour. Before responding, I built the latest lworld branch of the Valhalla repo and played with the preload attribute to try and force a classloader-related deadlock. I was surprised to find that the attribute doesn't seem to have an effect on user-defined loaders and with the current spec, I can't even say if that's a bug or not. Example classfiles for this exploration are in https://github.com/DanHeidinga/valhalla-preload-example --Dan On Mon, Jun 5, 2023 at 9:52?PM John Rose wrote: > Overall, as we look at Preload, it is looking more and more like a no-op, > as far as the JLS and JVMS is concerned. Perhaps that is a signal that it > should be placed somewhere outside of the JVMS, such as in a > Leyden-specific mechanism (a preload list) produced in a way decoupled from > any transactions between the JLS and JVMS. > > On 1 Jun 2023, at 11:24, Dan Smith wrote: > > On Jun 1, 2023, at 10:53 AM, Dan Heidinga wrote: > > A couple of questions about the spec for the Preload attribute[0]. The > current spec says it indicates "certain classes contain information that > may be of interest during linkage." > > The Preload attribute removes one need for Q modifiers while allowing > calling convention optimizations and layout decisions to be made early. > > The current spec is quite vague on what classes should be included in the > attribute and on when / what the VM will do with those classes (or even if > it does anything). > > FWIW, the JEP has more detail about when javac is expected to include > classes in Preload. > > I think it's time to tighten up the spec for Preload attribute and > specify: > * what the VM will do with classes listed in the attribute > > It is intentional that the VM may choose to do nothing. So anything it > does is purely an optimization. > > Looks like a nop? > > (In particular, it should not *reject* any inputs, because that would > destabilize separate compilability in unpredictable ways.) > > * when those classes will be loaded (ie: somewhere in JVMS 5.3) > > If the VM chooses to load Preload classes, then our thinking was that JVMS > 5.4 already describes the details of timing: > > https://docs.oracle.com/javase/specs/jvms/se20/html/jvms-5.html#jvms-5.4 > > So, for example, "Alternatively, an implementation may choose an "eager" > linkage strategy, where all symbolic references are resolved at once when > the class or interface is being verified." That is, the Preload classes > could all be loaded during verification, or at some other stage of linking. > > My expectation is that the natural point for processing Preload is during > preparation as vtables are set up, but sometimes I get these things wrong. > :-) > > Sometimes you want them early (for instance layout) and sometimes you need > to wait (vtable layout or even just before ). > > * how invalid cases are handled, including circularities (Class A's > Preload mentions B <: A) > > Silent supression of errors, if any. Again, like a nop? > > "Errors detected during linkage are thrown at a point in the program where > some action is taken by the program that might, directly or indirectly, > require linkage to the class or interface involved in the error." > > I've always found this rule super vague, but I think "require" is the key > word, and implies that errors caused by Preload resolution should just be > ignored. (Because Preload isn't "required" to be processed at all.) > > * what types of classes can be listed (any? only values?) > > Definitely intend to support any classes of interest. Say a future > optimization wants to know about a sealed superinterface, for example?it > would be fine to tweak javac to add that interface to Preload, and then use > the information to facilitate the optimization. > > There's a lot of nondeterminism here?can a compliant system trigger > changes to class loading timing, but just on my birthday??but I think it's > within the scope of JVMS 5.4, which provides a lot of latitude for loading > classes whenever it's convenient. > > It probably makes sense to start from the current Hotspot handling of the > attribute and fine tune that into the spec? > > So I've outlined our hands-off stake in the ground above. The spec would > definitely benefit, at least, from a non-normative cross-reference to 5.4 > and short explanation. Beyond that, I think we'd be open to specifying more > if we can agree something more is needed... > > I think we could get the benefits in Fred?s prototype (as he describes) > with a list that is decoupled from any particular class file, and Leyden > could deliver this list. > > As you see, I?m kind of sour on a Preload attribute these days. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Jun 8 16:43:26 2023 From: john.r.rose at oracle.com (John Rose) Date: Thu, 08 Jun 2023 09:43:26 -0700 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> Message-ID: <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: > If we decouple the list of preloadable classes from the classfile, how > would non-jdk classes be handled?> What if instead of ditching the attribute, or treating it like an > optimization, we firmed up the contract and treated it as a guarantee? If we go down this route, let?s consider putting the control information into a module file (only) for starters. (Maybe class file later if needed.) There would be fewer states to document and test, since (by definition) class files could not get out of sync. A module would document, in one mplace, which types it would ?prefer? to preload in order to optimize its APIs (internal or external). From heidinga at redhat.com Thu Jun 8 16:52:35 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 8 Jun 2023 12:52:35 -0400 Subject: Preload attribute In-Reply-To: <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: On Thu, Jun 8, 2023 at 12:44?PM John Rose wrote: > On 8 Jun 2023, at 9:01, Dan Heidinga wrote: > > > If we decouple the list of preloadable classes from the classfile, how > > would non-jdk classes be handled?> What if instead of ditching the > attribute, or treating it like an > > optimization, we firmed up the contract and treated it as a guarantee? > > If we go down this route, let?s consider putting the control information > into a module file (only) for starters. (Maybe class file later if > needed.) There would be fewer states to document and test, since (by > definition) class files could not get out of sync. > > A module would document, in one mplace, which types it would ?prefer? to > preload in order to optimize its APIs (internal or external). > This might lead to more class loading than intended. The current approach has each classfile register the list of classes it wants preloaded to get the best linkage which means we only have to load those classes if we link the original class. There's a natural trigger for the preload and a limited set of classes to load. Moving to a single per-module list loses the natural trigger and may pre-load more classes than the application will use. If Module A has classes {A, B, C} and each one preloads 5 separate classes, with a per-module list that's forcing the loading of 15 additional classes (plus supers, etc). With a per-class list, we only preload the classes on a per-use basis. More of a pay for what you use model. Is there a natural trigger or way to limit the preloads to what I might use with the per-module file? --Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.parain at oracle.com Thu Jun 8 17:14:58 2023 From: frederic.parain at oracle.com (Frederic Parain) Date: Thu, 8 Jun 2023 13:14:58 -0400 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> Message-ID: <0cef3d78-55fc-668f-0c64-4f07ccbf86e9@oracle.com> Hi Dan, I've looked at your exploration of the Preload attribute. The LoaderTest test loads class A but it doesn't link it. HotSpot looks at the PreLoad attribute at class link time, not class load time (yes, the name is misleading), this is why class B is not loaded. Replacing the call to loadClass() with a call to Class.forName() to force the linking of the class will trigger the processing of the PreLoad attribute. public static void main(String[] args) throws Throwable { ??? ARGS = args; ??? ClassLoader cl = new LoaderTest2(); ??? Class c = Class.forName(args[0], true, cl); ??? System.out.println(""+c+"::loader="+c.getClassLoader()); } Fred On 6/8/23 12:01 PM, Dan Heidinga wrote: > Thanks Dan and Fred for the clarification on the current spec and > Hotspot behaviour and John for suggesting we remove the attribute. > > I think the problem with the current status quo for the preload > attribute is that we're trying to treat it like an optimization - a > "we can have our cake" (know which classes to load at some > indeterminate point) and "eat it too" (not promise users anything > about when or if those classes will be loaded). > > As it stands, based just on the spec, it's hard for users to know what > the preload attribute will do for them.? There's nothing they can > depend on in the spec and are at the mercy of whatever behaviour VM > implementers?pick on a given day.? Which means they will depend on the > current Hotspot behaviour (defacto standardization - yuck!) or expect > the behaviour to change all the time. > > If we decouple the list of preloadable classes from the classfile, how > would non-jdk classes be handled?? Many applications are a mix of > multiple jar files from different maintenance domains which would make > having a combined list difficult.? We'd also need to think through the > classloader implications vs today's approach which attempts to load on > the same loader as the class with the preload attribute. > > What if instead of ditching the attribute, or treating it like an > optimization, we firmed up the contract and treated it as a guarantee > similar to those of superclasses and superinterfaces as in the first > line in section 5.4 "Linking"?? The new text would read something like: > ? ?> Linking a class or interface involves verifying and preparing > that class or interface, its direct superclass, its direct > superinterfaces, and its element type (if it is an array type), if > necessary.?Any classes listed in the preload attribute will be loaded > at this point. > > We don't need to say in the JVMS why the classes are in the attribute, > but we should be explicit about where the attempt to load them occurs > and that errors are ignored.? This provides stability to the users and > allows for independent implementation while avoiding de > facto?standardization of today's behaviour. > > Before responding, I built the latest lworld branch of the Valhalla > repo and played with the preload attribute to try and force a > classloader-related deadlock.? I was surprised to find that the > attribute doesn't seem to have an effect on user-defined loaders and > with the current spec, I can't even say if that's a bug or not.? > Example classfiles for this exploration are in > https://github.com/DanHeidinga/valhalla-preload-example > > --Dan > > On Mon, Jun 5, 2023 at 9:52?PM John Rose wrote: > > Overall, as we look at Preload, it is looking more and more like a > no-op, as far as the JLS and JVMS is concerned. Perhaps that is a > signal that it should be placed somewhere outside of the JVMS, > such as in a Leyden-specific mechanism (a preload list) produced > in a way decoupled from any transactions between the JLS and JVMS. > > On 1 Jun 2023, at 11:24, Dan Smith wrote: > > On Jun 1, 2023, at 10:53 AM, Dan Heidinga > wrote: > > A couple of questions about the spec for the Preload > attribute[0]. The current spec says it indicates "certain > classes contain information that may be of interest during > linkage." > > The Preload attribute removes one need for Q modifiers > while allowing calling convention optimizations and layout > decisions to be made early. > > The current spec is quite vague on what classes should be > included in the attribute and on when / what the VM will > do with those classes (or even if it does anything). > > FWIW, the JEP has more detail about when javac is expected to > include classes in Preload. > > I think it's time to tighten up the spec for Preload > attribute and specify: > * what the VM will do with classes listed in the attribute > > It is intentional that the VM may choose to do nothing. So > anything it does is purely an optimization. > > Looks like a nop? > > (In particular, it should not /reject/ any inputs, because that > would destabilize separate compilability in unpredictable ways.) > > * when those classes will be loaded (ie: somewhere in JVMS > 5.3) > > If the VM chooses to load Preload classes, then our thinking > was that JVMS 5.4 already describes the details of timing: > > https://docs.oracle.com/javase/specs/jvms/se20/html/jvms-5.html#jvms-5.4 > > So, for example, "Alternatively, an implementation may choose > an "eager" linkage strategy, where all symbolic references are > resolved at once when the class or interface is being > verified." That is, the Preload classes could all be loaded > during verification, or at some other stage of linking. > > My expectation is that the natural point for processing > Preload is during preparation as vtables are set up, but > sometimes I get these things wrong. :-) > > Sometimes you want them early (for instance layout) and sometimes > you need to wait (vtable layout or even just before ). > > * how invalid cases are handled, including circularities > (Class A's Preload mentions B <: A) > > Silent supression of errors, if any. Again, like a nop? > > "Errors detected during linkage are thrown at a point in the > program where some action is taken by the program that might, > directly or indirectly, require linkage to the class or > interface involved in the error." > > I've always found this rule super vague, but I think "require" > is the key word, and implies that errors caused by Preload > resolution should just be ignored. (Because Preload isn't > "required" to be processed at all.) > > * what types of classes can be listed (any? only values?) > > Definitely intend to support any classes of interest. Say a > future optimization wants to know about a sealed > superinterface, for example?it would be fine to tweak javac to > add that interface to Preload, and then use the information to > facilitate the optimization. > > There's a lot of nondeterminism here?can a compliant system > trigger changes to class loading timing, but just on my > birthday??but I think it's within the scope of JVMS 5.4, which > provides a lot of latitude for loading classes whenever it's > convenient. > > It probably makes sense to start from the current Hotspot > handling of the attribute and fine tune that into the spec? > > So I've outlined our hands-off stake in the ground above. The > spec would definitely benefit, at least, from a non-normative > cross-reference to 5.4 and short explanation. Beyond that, I > think we'd be open to specifying more if we can agree > something more is needed... > > I think we could get the benefits in Fred?s prototype (as he > describes) with a list that is decoupled from any particular class > file, and Leyden could deliver this list. > > As you see, I?m kind of sour on a Preload attribute these days. > From john.r.rose at oracle.com Thu Jun 8 20:51:26 2023 From: john.r.rose at oracle.com (John Rose) Date: Thu, 08 Jun 2023 13:51:26 -0700 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: On 8 Jun 2023, at 9:52, Dan Heidinga wrote: > On Thu, Jun 8, 2023 at 12:44?PM John Rose > wrote: > >> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >> >>> If we decouple the list of preloadable classes from the classfile, >>> how >>> would non-jdk classes be handled?> What if instead of ditching the >> attribute, or treating it like an >>> optimization, we firmed up the contract and treated it as a >>> guarantee? >> >> If we go down this route, let?s consider putting the control >> information >> into a module file (only) for starters. (Maybe class file later if >> needed.) There would be fewer states to document and test, since (by >> definition) class files could not get out of sync. >> >> A module would document, in one mplace, which types it would >> ?prefer? to >> preload in order to optimize its APIs (internal or external). >> > > This might lead to more class loading than intended. The current > approach > has each classfile register the list of classes it wants preloaded to > get > the best linkage which means we only have to load those classes if we > link > the original class. There's a natural trigger for the preload and a > limited set of classes to load. There?s a spectrum of tradeoffs here: We could put preload attributes on every method and field, to get the maximum amount of fine-grained lazy (pre-)loading, or put them in a global file per JVM instance. The more fine-grained, the harder it will be to write compliance testing, I think. > Moving to a single per-module list loses the natural trigger and may > pre-load more classes than the application will use. If Module A has > classes {A, B, C} and each one preloads 5 separate classes, with a > per-module list that's forcing the loading of 15 additional classes > (plus > supers, etc). With a per-class list, we only preload the classes on a > per-use basis. More of a pay for what you use model. > > Is there a natural trigger or way to limit the preloads to what I > might use > with the per-module file? That?s a very good question. I think what Preload *really is* is a list of ?names that may require special handling before using in APIs?. They don?t need to be loaded when the preload attribute is parsed; they are simply put in a ?watch list? to trigger additional loading *when necessary*. (This is already true.) So I think if we move the preload list to (say) the module level (if not a global file), then the JVM will have its watch list. (And, in fewer chunks than if we put all the stuff all the time redundantly in all class files that might need them: That requires frequent repetition.) The JVM can use its watch list as it does today, with watch lists populated separately for each class file. To emphasize: A watch list does not require loading. It means, ?if you see this name at a point where you could use extra class info, then I encourage you to load sooner rather than later?. The only reason it is ?a thing? at all is that the default behavior (of loading either as late as possible, or as part of a CDS-like thingy) should be changed only on an explicit signal. And, hey, maybe CDS is all the primitive we need here: Just run -Xdump with all of your class path loaded. Et voila, no Preload at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Fri Jun 9 15:32:30 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Fri, 9 Jun 2023 11:32:30 -0400 Subject: Preload attribute In-Reply-To: <0cef3d78-55fc-668f-0c64-4f07ccbf86e9@oracle.com> References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <0cef3d78-55fc-668f-0c64-4f07ccbf86e9@oracle.com> Message-ID: On Thu, Jun 8, 2023 at 2:47?PM Frederic Parain wrote: > Hi Dan, > > I've looked at your exploration of the Preload attribute. The LoaderTest > test loads class A but it doesn't link it. HotSpot looks at the PreLoad > attribute at class link time, not class load time (yes, the name is > misleading), this is why class B is not loaded. > Thanks Fred. I knew I had to be doing something wrong as I couldn't find anything in the jdk that would have accounted for it. Interestingly, I missed this in part due to a behaviour difference between OpenJ9 and Hotspot - by the time OpenJ9 returns a j.l.Class instance, the vtables have been built and method sendTargets have been set. It appears Hotspot initializes the vtable in a separate pass. --Dan > > Replacing the call to loadClass() with a call to Class.forName() to > force the linking of the class will trigger the processing of the > PreLoad attribute. > > > public static void main(String[] args) throws Throwable { > ARGS = args; > > ClassLoader cl = new LoaderTest2(); > Class c = Class.forName(args[0], true, cl); > System.out.println(""+c+"::loader="+c.getClassLoader()); > > } > > > Fred > > > On 6/8/23 12:01 PM, Dan Heidinga wrote: > > Thanks Dan and Fred for the clarification on the current spec and > > Hotspot behaviour and John for suggesting we remove the attribute. > > > > I think the problem with the current status quo for the preload > > attribute is that we're trying to treat it like an optimization - a > > "we can have our cake" (know which classes to load at some > > indeterminate point) and "eat it too" (not promise users anything > > about when or if those classes will be loaded). > > > > As it stands, based just on the spec, it's hard for users to know what > > the preload attribute will do for them. There's nothing they can > > depend on in the spec and are at the mercy of whatever behaviour VM > > implementers pick on a given day. Which means they will depend on the > > current Hotspot behaviour (defacto standardization - yuck!) or expect > > the behaviour to change all the time. > > > > If we decouple the list of preloadable classes from the classfile, how > > would non-jdk classes be handled? Many applications are a mix of > > multiple jar files from different maintenance domains which would make > > having a combined list difficult. We'd also need to think through the > > classloader implications vs today's approach which attempts to load on > > the same loader as the class with the preload attribute. > > > > What if instead of ditching the attribute, or treating it like an > > optimization, we firmed up the contract and treated it as a guarantee > > similar to those of superclasses and superinterfaces as in the first > > line in section 5.4 "Linking"? The new text would read something like: > > > Linking a class or interface involves verifying and preparing > > that class or interface, its direct superclass, its direct > > superinterfaces, and its element type (if it is an array type), if > > necessary. Any classes listed in the preload attribute will be loaded > > at this point. > > > > We don't need to say in the JVMS why the classes are in the attribute, > > but we should be explicit about where the attempt to load them occurs > > and that errors are ignored. This provides stability to the users and > > allows for independent implementation while avoiding de > > facto standardization of today's behaviour. > > > > Before responding, I built the latest lworld branch of the Valhalla > > repo and played with the preload attribute to try and force a > > classloader-related deadlock. I was surprised to find that the > > attribute doesn't seem to have an effect on user-defined loaders and > > with the current spec, I can't even say if that's a bug or not. > > Example classfiles for this exploration are in > > https://github.com/DanHeidinga/valhalla-preload-example > > > > --Dan > > > > On Mon, Jun 5, 2023 at 9:52?PM John Rose wrote: > > > > Overall, as we look at Preload, it is looking more and more like a > > no-op, as far as the JLS and JVMS is concerned. Perhaps that is a > > signal that it should be placed somewhere outside of the JVMS, > > such as in a Leyden-specific mechanism (a preload list) produced > > in a way decoupled from any transactions between the JLS and JVMS. > > > > On 1 Jun 2023, at 11:24, Dan Smith wrote: > > > > On Jun 1, 2023, at 10:53 AM, Dan Heidinga > > wrote: > > > > A couple of questions about the spec for the Preload > > attribute[0]. The current spec says it indicates "certain > > classes contain information that may be of interest during > > linkage." > > > > The Preload attribute removes one need for Q modifiers > > while allowing calling convention optimizations and layout > > decisions to be made early. > > > > The current spec is quite vague on what classes should be > > included in the attribute and on when / what the VM will > > do with those classes (or even if it does anything). > > > > FWIW, the JEP has more detail about when javac is expected to > > include classes in Preload. > > > > I think it's time to tighten up the spec for Preload > > attribute and specify: > > * what the VM will do with classes listed in the attribute > > > > It is intentional that the VM may choose to do nothing. So > > anything it does is purely an optimization. > > > > Looks like a nop? > > > > (In particular, it should not /reject/ any inputs, because that > > would destabilize separate compilability in unpredictable ways.) > > > > * when those classes will be loaded (ie: somewhere in JVMS > > 5.3) > > > > If the VM chooses to load Preload classes, then our thinking > > was that JVMS 5.4 already describes the details of timing: > > > > > https://docs.oracle.com/javase/specs/jvms/se20/html/jvms-5.html#jvms-5.4 > > > > So, for example, "Alternatively, an implementation may choose > > an "eager" linkage strategy, where all symbolic references are > > resolved at once when the class or interface is being > > verified." That is, the Preload classes could all be loaded > > during verification, or at some other stage of linking. > > > > My expectation is that the natural point for processing > > Preload is during preparation as vtables are set up, but > > sometimes I get these things wrong. :-) > > > > Sometimes you want them early (for instance layout) and sometimes > > you need to wait (vtable layout or even just before ). > > > > * how invalid cases are handled, including circularities > > (Class A's Preload mentions B <: A) > > > > Silent supression of errors, if any. Again, like a nop? > > > > "Errors detected during linkage are thrown at a point in the > > program where some action is taken by the program that might, > > directly or indirectly, require linkage to the class or > > interface involved in the error." > > > > I've always found this rule super vague, but I think "require" > > is the key word, and implies that errors caused by Preload > > resolution should just be ignored. (Because Preload isn't > > "required" to be processed at all.) > > > > * what types of classes can be listed (any? only values?) > > > > Definitely intend to support any classes of interest. Say a > > future optimization wants to know about a sealed > > superinterface, for example?it would be fine to tweak javac to > > add that interface to Preload, and then use the information to > > facilitate the optimization. > > > > There's a lot of nondeterminism here?can a compliant system > > trigger changes to class loading timing, but just on my > > birthday??but I think it's within the scope of JVMS 5.4, which > > provides a lot of latitude for loading classes whenever it's > > convenient. > > > > It probably makes sense to start from the current Hotspot > > handling of the attribute and fine tune that into the spec? > > > > So I've outlined our hands-off stake in the ground above. The > > spec would definitely benefit, at least, from a non-normative > > cross-reference to 5.4 and short explanation. Beyond that, I > > think we'd be open to specifying more if we can agree > > something more is needed... > > > > I think we could get the benefits in Fred?s prototype (as he > > describes) with a list that is decoupled from any particular class > > file, and Leyden could deliver this list. > > > > As you see, I?m kind of sour on a Preload attribute these days. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Fri Jun 9 19:37:19 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 9 Jun 2023 19:37:19 +0000 Subject: The Good Default Value In-Reply-To: References: Message-ID: On Jun 1, 2023, at 4:10 PM, Kevin Bourrillion wrote: I'm wondering why we shouldn't require fields of non-nullable value-class types to be explicitly initialized. `Complex x = new Complex(0, 0)` or `Complex x = new Complex()`. I'll stipulate "people would grumble" as self-evident. Just catching up on this... How I read this is that you can imagine (and would prefer) an alternative Java language design in which primitive-typed variables (int/boolean/double/etc.) are always required to be explicitly initialized before use. So you object to Brian's framing of "uninitialized use" as a good thing. This isn't crazy, because it's precisely the approach we take to primitive-typed local variables. It's only fields and arrays that get implicitly initialized. If we set aside historical baggage and expectations, I think the biggest problem with this alternative language is that it's hard to enforce in bytecode. We could handle it in one of two ways: A) Dynamically check & fail if a primitive variable hasn't been written yet. That would require an extra indirection or "uninitialized" metadata flag. Either way, this is equivalent to finding an encoding for a nullable primitive type, which intolerably increases footprint. So that's out. B) Statically guarantee (not just *encourage*) primitive variables are written before use. Three challenges here, in increasing order of difficulty: i) Arrays must be initialized on creation. The instruction set makes this difficult to prove, so we'd probably handle this with a trusted API that either has a native implementation or gets special permission for its bytecode to violate this rule. Ideally, this API would be optimized for comparable performance to today's newarray, at least when the initial value is 0. One future problem here is that it doesn't generalize: generic algorithms don't know what initial value to use, so need to be parameterized by something to use as default, leading to API distortions. (Not an issue in today's Java, but will become one in the future.) ii) Instance fields must be written by methods before publishing 'this'. If the class allows subclasses, then that means methods cannot publish 'this' at all. Presumably this implies no method calls involving 'this', because we're not going to want to track what happens inside the method. Keep in mind that is ad hoc imperative code?there's no such thing as a "field initializer" in bytecode. Also keep in mind that one of the most difficult/unpopular features of bytecode verification is that way it tries to track the initialization state of partially-constructed objects. iii) Static fields must be written by methods before anyone tries to access them. The class initialization protocol makes this effectively impossible?the first time a not-yet-loaded class gets mentioned in the body of , a class loader starts running user code, and there's no guarantee that from there somebody won't call back to look at a not-yet-initialized field. Neither (A) nor (B) seem particularly viable, which leads to the default value solution. Now, we could quibble about how strongly the language should discourage reads-before-writes of fields and arrays, but the point is that there has to be *something* explaining what you see in the corner case in which you read before initialization. Similarly, if we want to allow value classes to avoid the footprint overhead of (B), we need (some of) them to support default values. All that said, I do think there's room for disagreement about whether a read-before-write scenario should be considered normal behavior (Brian's presentation) or a program bug that we unfortunately can't always prevent. One argument for de-emphasizing the behavior of races and the opt-in to tearing is that these behaviors, though they must be specified, are firmly in the "program bug" category. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Fri Jun 9 19:41:33 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Fri, 9 Jun 2023 15:41:33 -0400 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: On Thu, Jun 8, 2023 at 4:51?PM John Rose wrote: > On 8 Jun 2023, at 9:52, Dan Heidinga wrote: > > On Thu, Jun 8, 2023 at 12:44?PM John Rose wrote: > > On 8 Jun 2023, at 9:01, Dan Heidinga wrote: > > If we decouple the list of preloadable classes from the classfile, how > would non-jdk classes be handled?> What if instead of ditching the > > attribute, or treating it like an > > optimization, we firmed up the contract and treated it as a guarantee? > > If we go down this route, let?s consider putting the control information > into a module file (only) for starters. (Maybe class file later if > needed.) There would be fewer states to document and test, since (by > definition) class files could not get out of sync. > > A module would document, in one mplace, which types it would ?prefer? to > preload in order to optimize its APIs (internal or external). > > This might lead to more class loading than intended. The current approach > has each classfile register the list of classes it wants preloaded to get > the best linkage which means we only have to load those classes if we link > the original class. There's a natural trigger for the preload and a > limited set of classes to load. > > There?s a spectrum of tradeoffs here: We could put preload attributes on > every method and field, to get the maximum amount of fine-grained lazy > (pre-)loading, or put them in a global file per JVM instance. The more > fine-grained, the harder it will be to write compliance testing, I think. > Agreed. There's a sweet spot between expressiveness and overheads (testing, metadata, etc). Classfiles have historically been the place where the JVM tracks this kind of information as that fits well with separate compilation and avoids the "external metadata" problems of ie: GraalVM's extra-linguistic configuration files. When compiling the current class, javac already requires directly referenced classes to be findable and thus has the info required to write a preload attribute. Does javac necessarily have the same info when compiling the module-info classfile? Maybe when finding the non-exported packages for the module javac (or jlink? or jmod?) could also find the value classes that need preloading? Moving it into a separate pass like this doesn't feel like quite the right fit though as it excludes the classpath and complicates the other tools processing of the modules. > Moving to a single per-module list loses the natural trigger and may > pre-load more classes than the application will use. If Module A has > classes {A, B, C} and each one preloads 5 separate classes, with a > per-module list that's forcing the loading of 15 additional classes (plus > supers, etc). With a per-class list, we only preload the classes on a > per-use basis. More of a pay for what you use model. > > Is there a natural trigger or way to limit the preloads to what I might > use > with the per-module file? > > That?s a very good question. I think what Preload *really is* is a list > of ?names that may require special handling before using in APIs?. They > don?t need to be loaded when the preload attribute is parsed; they are > simply put in a ?watch list? to trigger additional loading *when > necessary*. (This is already true.) So I think if we move the preload > list to (say) the module level (if not a global file), then the JVM will > have its watch list. (And, in fewer chunks than if we put all the stuff all > the time redundantly in all class files that might need them: That requires > frequent repetition.) The JVM can use its watch list as it does today, with > watch lists populated separately for each class file. > I initially thought a global list would lead to issues if two different classloaders defined classes of the same name but since this is a "go and look" signal, early loading based on name should be fine even in that case as each loader that mentions the name would be asked to be asked to load their version of the named class. So I think a per-JVM list would be OK from that perspective (though I still don't like it). > To emphasize: A watch list does not require loading. It means, ?if you see > this name at a point where you could use extra class info, then I encourage > you to load sooner rather than later?. The only reason it is ?a thing? at > all is that the default behavior (of loading either as late as possible, or > as part of a CDS-like thingy) should be changed only on an explicit signal. > While true for what the JVM needs, this is hard behaviour to explain to users and challenging for compliance test writers (or maybe not if we continue to treat preload as an optimization). Is this where we want to spend our complexity budget? Part of why I'm circling back to treating preload as a per-classfile attribute that forms a requirement on the VM rather than as an optimization is that the model becomes clearer for users, developers and testers. > And, hey, maybe CDS is all the primitive we need here: Just run -Xdump > with all of your class path loaded. Et voila, no Preload at all. > Users may find this behaviour surprising - I ran with a CDS archive and my JVM loaded classes earlier than it would have otherwise? --Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Fri Jun 9 22:06:21 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 9 Jun 2023 22:06:21 +0000 Subject: Minor question about a `MyVal.default`-like syntax In-Reply-To: References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> Message-ID: > On Jun 2, 2023, at 2:46 PM, Kevin Bourrillion wrote: > > This is not too important right now, but I had thoughts, so... > > > On Thu, Jun 1, 2023 at 10:59?AM Brian Goetz wrote: > > Users should be able to say `new Complex()` and get a default complex value. (Maybe they can also say `Complex.default`; maybe we won't need that.) And the same for reflection. > > I think the argument for `MyVal.default` being *unnecessary* might go like this: > > * either there's no implicit constructor and `MyVal.default` won't work > * or there is, and `MyVal.default` would have to mean the same as `new MyVal()`, so what's the point? > > If that's correct, there might not be a strong argument for keeping it, but I came up with a couple weak ones. > > 1. Arguably, its meaning is more apparent without the reader having to dig into MyVal.java (how much does this matter, in this case?) > 2. It *feels like* a well-known immutable value that just kind of "exists" and has no need to be constructed. A constant. In fact people might feel tempted to make such constants? Another weak one: 3. MyVal.default should be a linkage error (or in some worlds, evaluate to 'null') if a B3 class evolves into a B2 class. 'new MyVal()' should be fully compatible?it's the same API point whether the class is B3 or B2, and the caller using this syntax is unlikely to care about its implementation. From john.r.rose at oracle.com Sat Jun 10 01:38:48 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 09 Jun 2023 18:38:48 -0700 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: On 9 Jun 2023, at 12:41, Dan Heidinga wrote: > On Thu, Jun 8, 2023 at 4:51?PM John Rose > wrote: > >> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >> >> On Thu, Jun 8, 2023 at 12:44?PM John Rose >> wrote: >> >> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >> >> If we decouple the list of preloadable classes from the classfile, >> how >> would non-jdk classes be handled?> What if instead of ditching the >> >> attribute, or treating it like an >> >> optimization, we firmed up the contract and treated it as a >> guarantee? >> >> If we go down this route, let?s consider putting the control >> information >> into a module file (only) for starters. (Maybe class file later if >> needed.) There would be fewer states to document and test, since (by >> definition) class files could not get out of sync. >> >> A module would document, in one mplace, which types it would >> ?prefer? to >> preload in order to optimize its APIs (internal or external). >> >> This might lead to more class loading than intended. The current >> approach >> has each classfile register the list of classes it wants preloaded to >> get >> the best linkage which means we only have to load those classes if we >> link >> the original class. There's a natural trigger for the preload and a >> limited set of classes to load. >> >> There?s a spectrum of tradeoffs here: We could put preload >> attributes on >> every method and field, to get the maximum amount of fine-grained >> lazy >> (pre-)loading, or put them in a global file per JVM instance. The >> more >> fine-grained, the harder it will be to write compliance testing, I >> think. >> > > Agreed. There's a sweet spot between expressiveness and overheads > (testing, metadata, etc). Classfiles have historically been the place > where the JVM tracks this kind of information as that fits well with > separate compilation and avoids the "external metadata" problems of > ie: > GraalVM's extra-linguistic configuration files. > > When compiling the current class, javac already requires directly > referenced classes to be findable and thus has the info required to > write a > preload attribute. Does javac necessarily have the same info when > compiling the module-info classfile? Maybe when finding the > non-exported > packages for the module javac (or jlink? or jmod?) could also find the > value classes that need preloading? That is what I am assuming. The module file would be edited by those guys. Or (maybe better) a plain flat textual list is put somewhere the JVM can find it. > > Moving it into a separate pass like this doesn't feel like quite the > right > fit though as it excludes the classpath and complicates the other > tools > processing of the modules. I think it?s better than that. When we are assembling a program (jlink or a Leyden condenser), the responsibility of publicizing value classes (for Preload) surely belongs to the declaration, not collectively on all the uses. So every module (jmod or whatever) that declares 1 or more value classes (if they are exported, at least) should list them on a publicized watch list. There is no need to replicate these watch lists across all potential API clients of a value class. There are reasons *not* to do this, since the clients have only partial, provisional information about the values. > >> Moving to a single per-module list loses the natural trigger and may >> pre-load more classes than the application will use. If Module A has >> classes {A, B, C} and each one preloads 5 separate classes, with a >> per-module list that's forcing the loading of 15 additional classes >> (plus >> supers, etc). With a per-class list, we only preload the classes on a >> per-use basis. More of a pay for what you use model. >> >> Is there a natural trigger or way to limit the preloads to what I >> might >> use >> with the per-module file? >> >> That?s a very good question. I think what Preload *really is* is a >> list >> of ?names that may require special handling before using in >> APIs?. They >> don?t need to be loaded when the preload attribute is parsed; they >> are >> simply put in a ?watch list? to trigger additional loading *when >> necessary*. (This is already true.) So I think if we move the preload >> list to (say) the module level (if not a global file), then the JVM >> will >> have its watch list. (And, in fewer chunks than if we put all the >> stuff all >> the time redundantly in all class files that might need them: That >> requires >> frequent repetition.) The JVM can use its watch list as it does >> today, with >> watch lists populated separately for each class file. >> > I initially thought a global list would lead to issues if two > different > classloaders defined classes of the same name but since this is a "go > and > look" signal, early loading based on name should be fine even in that > case > as each loader that mentions the name would be asked to be asked to > load > their version of the named class. So I think a per-JVM list would be > OK > from that perspective (though I still don't like it). Agreed. > > >> To emphasize: A watch list does not require loading. It means, ?if >> you see >> this name at a point where you could use extra class info, then I >> encourage >> you to load sooner rather than later?. The only reason it is ?a >> thing? at >> all is that the default behavior (of loading either as late as >> possible, or >> as part of a CDS-like thingy) should be changed only on an explicit >> signal. >> > While true for what the JVM needs, this is hard behaviour to explain > to > users and challenging for compliance test writers (or maybe not if we > continue to treat preload as an optimization). I?m trying to reduce this to a pure optimization. In that case, ?watch lists? are just helpers, which are allowed to fail, and allowed to be garbage. > Is this where we want to > spend our complexity budget? (No, hence it should be an optimization.) > Part of why I'm circling back to treating > preload as a per-classfile attribute that forms a requirement on the > VM > rather than as an optimization is that the model becomes clearer for > users, > developers and testers. I think it?s still going to be murky. Why is putting the watch list on the API clients better than putting it on (or near) the value class definitions? > > >> And, hey, maybe CDS is all the primitive we need here: Just run >> -Xdump >> with all of your class path loaded. Et voila, no Preload at all. >> > Users may find this behaviour surprising - I ran with a CDS archive > and my > JVM loaded classes earlier than it would have otherwise? CDS has the effect of making class loading in a more timely fashion, and (under Leyden) will almost certainly trigger reordering of loading as well. So promulgating a ?watch list? has goals which align with CDS. I?m starting to think that the right ?level? to pull for optimizing value-based APIs is to put the value classes in a CDS archive. That is a defacto watch list. The jlink guy should just make a table of all value classes. That?s the best form of Preload I can imagine, frankly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Jun 10 01:41:09 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 09 Jun 2023 18:41:09 -0700 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <0cef3d78-55fc-668f-0c64-4f07ccbf86e9@oracle.com> Message-ID: <8A11A170-D4F0-4C5F-8875-D9B443A405E7@oracle.com> On 9 Jun 2023, at 8:32, Dan Heidinga wrote: > On Thu, Jun 8, 2023 at 2:47?PM Frederic Parain > wrote: > >> Hi Dan, >> >> I've looked at your exploration of the Preload attribute. The LoaderTest >> test loads class A but it doesn't link it. HotSpot looks at the PreLoad >> attribute at class link time, not class load time (yes, the name is >> misleading), this is why class B is not loaded. >> > > Thanks Fred. I knew I had to be doing something wrong as I couldn't find > anything in the jdk that would have accounted for it. > > Interestingly, I missed this in part due to a behaviour difference between > OpenJ9 and Hotspot - by the time OpenJ9 returns a j.l.Class instance, the > vtables have been built and method sendTargets have been set. It appears > Hotspot initializes the vtable in a separate pass. This is an example of how it will be hard to specify Preload, if it is not just a pure optimization, since the existing JVMS allows latitude for different implementations to sequence class loading different, in subtle ways. From heidinga at redhat.com Mon Jun 12 13:26:53 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Mon, 12 Jun 2023 09:26:53 -0400 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: The top-line goal for the preload efforts is to trigger the necessary "go and look" behaviour to support calling convention flattening for values. We want the broadest, most reliable mechanism to ensure that we routinely get flattening in the calling convention for value types so that the flattening horizon can extend beyond a single compiled body (ie: a method and its inlines). Summarizing the options presented so far: A) Value classes should be put into the CDS archive to ensure they are loaded early enough, in a group, and in a form that the VM can quickly discover whether calling convention optimizations apply to them. This involves either a class list to create a static archive (allows jdk classes) or using a dynamic archive with AppCDS. Both cases require a "cold run" to generate the data needed for CDS and only capture classes that have been loaded during that run (I think that's correct?). B) Use a "Watch List" to list class names that should be looked for. When the name appears, trigger loading early enough to allow calling convention optimizations to apply. Name conflicts are "safe" as the worst case is a class is loaded early in multiple loaders but is only a value in one loader. The watch list can be: global or per-module. It's possible a tool like jlink or jmod could be used to generate the watch list by scanning all the classes included in the jimage/jmod file. C) The per-class preload attribute. Each class lists the value classes it may reference to ensure they are loaded early enough. Potentially a lot of duplication as each class in an application would list many of the same value classes. Did I miss any? There's also another dimension we've touched on: how eager is eager loading. Current preload behaviour is to batch load all the listed classes. Alternatively, loading could wait until one of the classes was observed in method signature / field signature and load on an as-needed basis. We've mostly concentrated on preload as an optimization for calling conventions but there may be other uses of the mechanism as well. A user may want to ensure that classes are loaded early to prevent optimizations that need to be walked back later based on their knowledge of application behaviour. For example, ensuring there is always more than a single implementor of an interface loaded to prevent CHA optimizations on some critical path where the second implementation is normally loaded late. Or to ensure an entire sealed hierarchy is loaded together. I haven't put much thought into this yet but expect users will find interesting ways to use "preload" if it's reliable enough for them. (And of course, some will abuse it in ways that hurt their performance as well). Which of these options meets the goal ("reliable, routine calling convention optimization for values") best? --Dan On Fri, Jun 9, 2023 at 9:38?PM John Rose wrote: > On 9 Jun 2023, at 12:41, Dan Heidinga wrote: > > On Thu, Jun 8, 2023 at 4:51?PM John Rose wrote: > > On 8 Jun 2023, at 9:52, Dan Heidinga wrote: > > On Thu, Jun 8, 2023 at 12:44?PM John Rose wrote: > > On 8 Jun 2023, at 9:01, Dan Heidinga wrote: > > If we decouple the list of preloadable classes from the classfile, how > would non-jdk classes be handled?> What if instead of ditching the > > attribute, or treating it like an > > optimization, we firmed up the contract and treated it as a guarantee? > > If we go down this route, let?s consider putting the control information > into a module file (only) for starters. (Maybe class file later if > needed.) There would be fewer states to document and test, since (by > definition) class files could not get out of sync. > > A module would document, in one mplace, which types it would ?prefer? to > preload in order to optimize its APIs (internal or external). > > This might lead to more class loading than intended. The current approach > has each classfile register the list of classes it wants preloaded to get > the best linkage which means we only have to load those classes if we link > the original class. There's a natural trigger for the preload and a > limited set of classes to load. > > There?s a spectrum of tradeoffs here: We could put preload attributes on > every method and field, to get the maximum amount of fine-grained lazy > (pre-)loading, or put them in a global file per JVM instance. The more > fine-grained, the harder it will be to write compliance testing, I think. > > Agreed. There's a sweet spot between expressiveness and overheads > (testing, metadata, etc). Classfiles have historically been the place > where the JVM tracks this kind of information as that fits well with > separate compilation and avoids the "external metadata" problems of ie: > GraalVM's extra-linguistic configuration files. > > When compiling the current class, javac already requires directly > referenced classes to be findable and thus has the info required to write > a > preload attribute. Does javac necessarily have the same info when > compiling the module-info classfile? Maybe when finding the non-exported > packages for the module javac (or jlink? or jmod?) could also find the > value classes that need preloading? > > That is what I am assuming. The module file would be edited by those guys. > Or (maybe better) a plain flat textual list is put somewhere the JVM can > find it. > > Moving it into a separate pass like this doesn't feel like quite the right > fit though as it excludes the classpath and complicates the other tools > processing of the modules. > > I think it?s better than that. When we are assembling a program (jlink or > a Leyden condenser), the responsibility of publicizing value classes (for > Preload) surely belongs to the declaration, not collectively on all the > uses. > > So every module (jmod or whatever) that declares 1 or more value classes > (if they are exported, at least) should list them on a publicized watch > list. > > There is no need to replicate these watch lists across all potential API > clients of a value class. There are reasons *not* to do this, since the > clients have only partial, provisional information about the values. > > Moving to a single per-module list loses the natural trigger and may > pre-load more classes than the application will use. If Module A has > classes {A, B, C} and each one preloads 5 separate classes, with a > per-module list that's forcing the loading of 15 additional classes (plus > supers, etc). With a per-class list, we only preload the classes on a > per-use basis. More of a pay for what you use model. > > Is there a natural trigger or way to limit the preloads to what I might > use > with the per-module file? > > That?s a very good question. I think what Preload *really is* is a list > of ?names that may require special handling before using in APIs?. They > don?t need to be loaded when the preload attribute is parsed; they are > simply put in a ?watch list? to trigger additional loading *when > necessary*. (This is already true.) So I think if we move the preload > list to (say) the module level (if not a global file), then the JVM will > have its watch list. (And, in fewer chunks than if we put all the stuff > all > the time redundantly in all class files that might need them: That > requires > frequent repetition.) The JVM can use its watch list as it does today, > with > watch lists populated separately for each class file. > > I initially thought a global list would lead to issues if two different > classloaders defined classes of the same name but since this is a "go and > look" signal, early loading based on name should be fine even in that case > as each loader that mentions the name would be asked to be asked to load > their version of the named class. So I think a per-JVM list would be OK > from that perspective (though I still don't like it). > > Agreed. > > To emphasize: A watch list does not require loading. It means, ?if you see > this name at a point where you could use extra class info, then I > encourage > you to load sooner rather than later?. The only reason it is ?a thing? at > all is that the default behavior (of loading either as late as possible, > or > as part of a CDS-like thingy) should be changed only on an explicit signal. > > While true for what the JVM needs, this is hard behaviour to explain to > users and challenging for compliance test writers (or maybe not if we > continue to treat preload as an optimization). > > I?m trying to reduce this to a pure optimization. In that case, ?watch > lists? are just helpers, which are allowed to fail, and allowed to be > garbage. > > Is this where we want to > spend our complexity budget? > > (No, hence it should be an optimization.) > > Part of why I'm circling back to treating > preload as a per-classfile attribute that forms a requirement on the VM > rather than as an optimization is that the model becomes clearer for > users, > developers and testers. > > I think it?s still going to be murky. Why is putting the watch list on the > API clients better than putting it on (or near) the value class definitions? > > And, hey, maybe CDS is all the primitive we need here: Just run -Xdump > with all of your class path loaded. Et voila, no Preload at all. > > Users may find this behaviour surprising - I ran with a CDS archive and my > JVM loaded classes earlier than it would have otherwise? > > CDS has the effect of making class loading in a more timely fashion, and > (under Leyden) will almost certainly trigger reordering of loading as well. > So promulgating a ?watch list? has goals which align with CDS. > > I?m starting to think that the right ?level? to pull for optimizing > value-based APIs is to put the value classes in a CDS archive. That is a > defacto watch list. The jlink guy should just make a table of all value > classes. That?s the best form of Preload I can imagine, frankly. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Mon Jun 12 14:05:49 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 12 Jun 2023 16:05:49 +0200 (CEST) Subject: Preload attribute In-Reply-To: References: <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: <630433743.78782139.1686578749882.JavaMail.zimbra@univ-eiffel.fr> > From: "Dan Heidinga" > To: "John Rose" > Cc: "daniel smith" , "valhalla-spec-experts" > > Sent: Monday, June 12, 2023 3:26:53 PM > Subject: Re: Preload attribute > The top-line goal for the preload efforts is to trigger the necessary "go and > look" behaviour to support calling convention flattening for values. We want > the broadest, most reliable mechanism to ensure that we routinely get > flattening in the calling convention for value types so that the flattening > horizon can extend beyond a single compiled body (ie: a method and its > inlines). > Summarizing the options presented so far: > A) Value classes should be put into the CDS archive to ensure they are loaded > early enough, in a group, and in a form that the VM can quickly discover > whether calling convention optimizations apply to them. This involves either a > class list to create a static archive (allows jdk classes) or using a dynamic > archive with AppCDS. Both cases require a "cold run" to generate the data > needed for CDS and only capture classes that have been loaded during that run > (I think that's correct?). > B) Use a "Watch List" to list class names that should be looked for. When the > name appears, trigger loading early enough to allow calling convention > optimizations to apply. Name conflicts are "safe" as the worst case is a class > is loaded early in multiple loaders but is only a value in one loader. The > watch list can be: global or per-module. It's possible a tool like jlink or > jmod could be used to generate the watch list by scanning all the classes > included in the jimage/jmod file. > C) The per-class preload attribute. Each class lists the value classes it may > reference to ensure they are loaded early enough. Potentially a lot of > duplication as each class in an application would list many of the same value > classes. > Did I miss any? For me, B is equivalent to the preload attribute (which is a watch list/set), but instead of being generated by javac, it is appended by jlink. For a module level "watch list", the preload attribute is added to the module-info, for a global level, the preload attribute not really an attribute but a code of a well known class which is inserted by jlink (the same way the module graph is saved by jlink). For A, if there is a runtime description of the watch list/set, it can be captured by CDS/AppCDS archives. > There's also another dimension we've touched on: how eager is eager loading. > Current preload behaviour is to batch load all the listed classes. > Alternatively, loading could wait until one of the classes was observed in > method signature / field signature and load on an as-needed basis. yes, is it a watch list or a watch set. > We've mostly concentrated on preload as an optimization for calling conventions > but there may be other uses of the mechanism as well. A user may want to ensure > that classes are loaded early to prevent optimizations that need to be walked > back later based on their knowledge of application behaviour. For example, > ensuring there is always more than a single implementor of an interface loaded > to prevent CHA optimizations on some critical path where the second > implementation is normally loaded late. Or to ensure an entire sealed hierarchy > is loaded together. I haven't put much thought into this yet but expect users > will find interesting ways to use "preload" if it's reliable enough for them. > (And of course, some will abuse it in ways that hurt their performance as > well). > Which of these options meets the goal ("reliable, routine calling convention > optimization for values") best? The general problem is that per class + generated by javac=a lot of noises, per module + generated by jar=still some noise, no optimization if jar is not used or there is no module-info, per app + generated by jlink=no optimization if jlink is not used. So the solution seems to be generated per class by javac and removed from the module and gathered into one place by jlink. > --Dan R?mi > On Fri, Jun 9, 2023 at 9:38 PM John Rose < [ mailto:john.r.rose at oracle.com | > john.r.rose at oracle.com ] > wrote: >> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >>> On Thu, Jun 8, 2023 at 4:51 PM John Rose < [ mailto:john.r.rose at oracle.com | >>> john.r.rose at oracle.com ] > wrote: >>>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >>>> On Thu, Jun 8, 2023 at 12:44 PM John Rose < [ mailto:john.r.rose at oracle.com | >>>> john.r.rose at oracle.com ] > wrote: >>>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >>>> If we decouple the list of preloadable classes from the classfile, how >>>> would non-jdk classes be handled?> What if instead of ditching the >>>> attribute, or treating it like an >>>> optimization, we firmed up the contract and treated it as a guarantee? >>>> If we go down this route, let?s consider putting the control information >>>> into a module file (only) for starters. (Maybe class file later if >>>> needed.) There would be fewer states to document and test, since (by >>>> definition) class files could not get out of sync. >>>> A module would document, in one mplace, which types it would ?prefer? to >>>> preload in order to optimize its APIs (internal or external). >>>> This might lead to more class loading than intended. The current approach >>>> has each classfile register the list of classes it wants preloaded to get >>>> the best linkage which means we only have to load those classes if we link >>>> the original class. There's a natural trigger for the preload and a >>>> limited set of classes to load. >>>> There?s a spectrum of tradeoffs here: We could put preload attributes on >>>> every method and field, to get the maximum amount of fine-grained lazy >>>> (pre-)loading, or put them in a global file per JVM instance. The more >>>> fine-grained, the harder it will be to write compliance testing, I think. >>> Agreed. There's a sweet spot between expressiveness and overheads >>> (testing, metadata, etc). Classfiles have historically been the place >>> where the JVM tracks this kind of information as that fits well with >>> separate compilation and avoids the "external metadata" problems of ie: >>> GraalVM's extra-linguistic configuration files. >>> When compiling the current class, javac already requires directly >>> referenced classes to be findable and thus has the info required to write a >>> preload attribute. Does javac necessarily have the same info when >>> compiling the module-info classfile? Maybe when finding the non-exported >>> packages for the module javac (or jlink? or jmod?) could also find the >>> value classes that need preloading? >> That is what I am assuming. The module file would be edited by those guys. Or >> (maybe better) a plain flat textual list is put somewhere the JVM can find it. >>> Moving it into a separate pass like this doesn't feel like quite the right >>> fit though as it excludes the classpath and complicates the other tools >>> processing of the modules. >> I think it?s better than that. When we are assembling a program (jlink or a >> Leyden condenser), the responsibility of publicizing value classes (for >> Preload) surely belongs to the declaration, not collectively on all the uses. >> So every module (jmod or whatever) that declares 1 or more value classes (if >> they are exported, at least) should list them on a publicized watch list. >> There is no need to replicate these watch lists across all potential API clients >> of a value class. There are reasons not to do this, since the clients have only >> partial, provisional information about the values. >>>> Moving to a single per-module list loses the natural trigger and may >>>> pre-load more classes than the application will use. If Module A has >>>> classes {A, B, C} and each one preloads 5 separate classes, with a >>>> per-module list that's forcing the loading of 15 additional classes (plus >>>> supers, etc). With a per-class list, we only preload the classes on a >>>> per-use basis. More of a pay for what you use model. >>>> Is there a natural trigger or way to limit the preloads to what I might >>>> use >>>> with the per-module file? >>>> That?s a very good question. I think what Preload *really is* is a list >>>> of ?names that may require special handling before using in APIs?. They >>>> don?t need to be loaded when the preload attribute is parsed; they are >>>> simply put in a ?watch list? to trigger additional loading *when >>>> necessary*. (This is already true.) So I think if we move the preload >>>> list to (say) the module level (if not a global file), then the JVM will >>>> have its watch list. (And, in fewer chunks than if we put all the stuff all >>>> the time redundantly in all class files that might need them: That requires >>>> frequent repetition.) The JVM can use its watch list as it does today, with >>>> watch lists populated separately for each class file. >>> I initially thought a global list would lead to issues if two different >>> classloaders defined classes of the same name but since this is a "go and >>> look" signal, early loading based on name should be fine even in that case >>> as each loader that mentions the name would be asked to be asked to load >>> their version of the named class. So I think a per-JVM list would be OK >>> from that perspective (though I still don't like it). >> Agreed. >>>> To emphasize: A watch list does not require loading. It means, ?if you see >>>> this name at a point where you could use extra class info, then I encourage >>>> you to load sooner rather than later?. The only reason it is ?a thing? at >>>> all is that the default behavior (of loading either as late as possible, or >>>> as part of a CDS-like thingy) should be changed only on an explicit signal. >>> While true for what the JVM needs, this is hard behaviour to explain to >>> users and challenging for compliance test writers (or maybe not if we >>> continue to treat preload as an optimization). >> I?m trying to reduce this to a pure optimization. In that case, ?watch lists? >> are just helpers, which are allowed to fail, and allowed to be garbage. >>> Is this where we want to >>> spend our complexity budget? >> (No, hence it should be an optimization.) >>> Part of why I'm circling back to treating >>> preload as a per-classfile attribute that forms a requirement on the VM >>> rather than as an optimization is that the model becomes clearer for users, >>> developers and testers. >> I think it?s still going to be murky. Why is putting the watch list on the API >> clients better than putting it on (or near) the value class definitions? >>>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >>>> with all of your class path loaded. Et voila, no Preload at all. >>> Users may find this behaviour surprising - I ran with a CDS archive and my >>> JVM loaded classes earlier than it would have otherwise? >> CDS has the effect of making class loading in a more timely fashion, and (under >> Leyden) will almost certainly trigger reordering of loading as well. So >> promulgating a ?watch list? has goals which align with CDS. >> I?m starting to think that the right ?level? to pull for optimizing value-based >> APIs is to put the value classes in a CDS archive. That is a defacto watch >> list. The jlink guy should just make a table of all value classes. That?s the >> best form of Preload I can imagine, frankly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jun 12 14:43:59 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 12 Jun 2023 10:43:59 -0400 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: As a reminder, Leyden will give us a more general tool for "moving stuff around" at build time than CDS does, and that the current CDS behavior may well be folded into a set of condensers. We are trying to find the "perfect" place to put preload information, but we have (as usual) an overconstrained notion of perfection; what makes perfect sense for semantics or non-duplication may not make perfect sense for runtime behavior. Leyden will let us cut this knot by letting us put the information in the classfile in the semantically sensible place, and let tooling boil it down later at pre-deployment time to a representation that is more efficient for runtime. So what I suggest is focusing on capturing the source data, which IMO seems to still be some flavor of "class/method X needs to know more about value class V before making certain decisions".? Preloading is the mechanism of how we find out that "more", and aggregated representations such as per-module / CDS archives are a rearranging of the source data to achieve a more runtime friendly representation _for a particular configuration of classes_. tl;dr: Let's design what captures the semantics we need, and treat computing e.g. optimal load order as a downstream transformation. On 6/12/2023 9:26 AM, Dan Heidinga wrote: > The top-line goal for the preload efforts is to trigger the > necessary?"go and look" behaviour to support calling convention > flattening for values.? We want the broadest, most reliable mechanism > to ensure that we routinely get flattening in the calling convention > for value types so that the flattening horizon?can extend beyond a > single compiled body (ie: a method and its inlines). > > Summarizing the options presented so far: > > A) Value classes should be put into the CDS archive to ensure they are > loaded early enough, in a group, and in a form that the VM can quickly > discover whether calling convention optimizations apply to them.? This > involves either a class list to create a static archive (allows jdk > classes) or using a dynamic archive with AppCDS.? Both cases require a > "cold run" to generate the data needed for CDS and only capture > classes that have?been loaded during that run (I think that's correct?). > > B) Use a "Watch List" to list class?names that should be looked for.? > When the name appears, trigger loading early enough to allow calling > convention optimizations to apply. Name conflicts are "safe" as the > worst case is a class is loaded early in multiple?loaders but is only > a value?in one loader.? The watch list can be: global or per-module.? > It's possible a tool like jlink or jmod could be used to generate the > watch list by scanning all the classes included in the jimage/jmod file. > > C) The per-class preload attribute.? Each class lists the value > classes it may reference to ensure they are loaded early enough.? > Potentially a lot of duplication as each class in an application would > list many of the same value classes. > > Did I miss any? > > There's also another dimension we've touched on: how eager is eager > loading.? Current preload behaviour is to batch load all the listed > classes.? Alternatively, loading could wait until one of the classes > was observed in method signature / field signature and load on an > as-needed basis. > > We've mostly concentrated on preload as an optimization for calling > conventions but there may be other uses of the mechanism as well.? A > user may want to ensure that classes are loaded early to prevent > optimizations that need to be walked back later based on their > knowledge of application behaviour.? For example, ensuring there is > always more than a single implementor of an interface loaded to > prevent CHA optimizations on some critical path where the second > implementation is normally loaded late.? Or to ensure an entire sealed > hierarchy is loaded together.? I haven't put much thought into this > yet but expect users will find interesting ways to use "preload" if > it's reliable enough for them.? (And of course, some will abuse it in > ways that hurt their performance as well). > > Which of these options meets the goal ("reliable, routine calling > convention optimization for values") best? > > --Dan > > On Fri, Jun 9, 2023 at 9:38?PM John Rose wrote: > > On 9 Jun 2023, at 12:41, Dan Heidinga wrote: > > On Thu, Jun 8, 2023 at 4:51?PM John Rose > wrote: > > On 8 Jun 2023, at 9:52, Dan Heidinga wrote: > > On Thu, Jun 8, 2023 at 12:44?PM John Rose > wrote: > > On 8 Jun 2023, at 9:01, Dan Heidinga wrote: > > If we decouple the list of preloadable classes from the > classfile, how > would non-jdk classes be handled?> What if instead of > ditching the > > attribute, or treating it like an > > optimization, we firmed up the contract and treated it as > a guarantee? > > If we go down this route, let?s consider putting the > control information > into a module file (only) for starters. (Maybe class file > later if > needed.) There would be fewer states to document and test, > since (by > definition) class files could not get out of sync. > > A module would document, in one mplace, which types it > would ?prefer? to > preload in order to optimize its APIs (internal or external). > > This might lead to more class loading than intended. The > current approach > has each classfile register the list of classes it wants > preloaded to get > the best linkage which means we only have to load those > classes if we link > the original class. There's a natural trigger for the > preload and a > limited set of classes to load. > > There?s a spectrum of tradeoffs here: We could put preload > attributes on > every method and field, to get the maximum amount of > fine-grained lazy > (pre-)loading, or put them in a global file per JVM > instance. The more > fine-grained, the harder it will be to write compliance > testing, I think. > > Agreed. There's a sweet spot between expressiveness and overheads > (testing, metadata, etc). Classfiles have historically been > the place > where the JVM tracks this kind of information as that fits > well with > separate compilation and avoids the "external metadata" > problems of ie: > GraalVM's extra-linguistic configuration files. > > When compiling the current class, javac already requires directly > referenced classes to be findable and thus has the info > required to write a > preload attribute. Does javac necessarily have the same info when > compiling the module-info classfile? Maybe when finding the > non-exported > packages for the module javac (or jlink? or jmod?) could also > find the > value classes that need preloading? > > That is what I am assuming. The module file would be edited by > those guys. Or (maybe better) a plain flat textual list is put > somewhere the JVM can find it. > > Moving it into a separate pass like this doesn't feel like > quite the right > fit though as it excludes the classpath and complicates the > other tools > processing of the modules. > > I think it?s better than that. When we are assembling a program > (jlink or a Leyden condenser), the responsibility of publicizing > value classes (for Preload) surely belongs to the declaration, not > collectively on all the uses. > > So every module (jmod or whatever) that declares 1 or more value > classes (if they are exported, at least) should list them on a > publicized watch list. > > There is no need to replicate these watch lists across all > potential API clients of a value class. There are reasons /not/ to > do this, since the clients have only partial, provisional > information about the values. > > Moving to a single per-module list loses the natural > trigger and may > pre-load more classes than the application will use. If > Module A has > classes {A, B, C} and each one preloads 5 separate > classes, with a > per-module list that's forcing the loading of 15 > additional classes (plus > supers, etc). With a per-class list, we only preload the > classes on a > per-use basis. More of a pay for what you use model. > > Is there a natural trigger or way to limit the preloads to > what I might > use > with the per-module file? > > That?s a very good question. I think what Preload *really > is* is a list > of ?names that may require special handling before using > in APIs?. They > don?t need to be loaded when the preload attribute is > parsed; they are > simply put in a ?watch list? to trigger additional loading > *when > necessary*. (This is already true.) So I think if we move > the preload > list to (say) the module level (if not a global file), > then the JVM will > have its watch list. (And, in fewer chunks than if we put > all the stuff all > the time redundantly in all class files that might need > them: That requires > frequent repetition.) The JVM can use its watch list as it > does today, with > watch lists populated separately for each class file. > > I initially thought a global list would lead to issues if two > different > classloaders defined classes of the same name but since this > is a "go and > look" signal, early loading based on name should be fine even > in that case > as each loader that mentions the name would be asked to be > asked to load > their version of the named class. So I think a per-JVM list > would be OK > from that perspective (though I still don't like it). > > Agreed. > > To emphasize: A watch list does not require loading. It > means, ?if you see > this name at a point where you could use extra class info, > then I encourage > you to load sooner rather than later?. The only reason it > is ?a thing? at > all is that the default behavior (of loading either as > late as possible, or > as part of a CDS-like thingy) should be changed only on an > explicit signal. > > While true for what the JVM needs, this is hard behaviour to > explain to > users and challenging for compliance test writers (or maybe > not if we > continue to treat preload as an optimization). > > I?m trying to reduce this to a pure optimization. In that case, > ?watch lists? are just helpers, which are allowed to fail, and > allowed to be garbage. > > Is this where we want to > spend our complexity budget? > > (No, hence it should be an optimization.) > > Part of why I'm circling back to treating > preload as a per-classfile attribute that forms a requirement > on the VM > rather than as an optimization is that the model becomes > clearer for users, > developers and testers. > > I think it?s still going to be murky. Why is putting the watch > list on the API clients better than putting it on (or near) the > value class definitions? > > And, hey, maybe CDS is all the primitive we need here: > Just run -Xdump > with all of your class path loaded. Et voila, no Preload > at all. > > Users may find this behaviour surprising - I ran with a CDS > archive and my > JVM loaded classes earlier than it would have otherwise? > > CDS has the effect of making class loading in a more timely > fashion, and (under Leyden) will almost certainly trigger > reordering of loading as well. So promulgating a ?watch list? has > goals which align with CDS. > > I?m starting to think that the right ?level? to pull for > optimizing value-based APIs is to put the value classes in a CDS > archive. That is a defacto watch list. The jlink guy should just > make a table of all value classes. That?s the best form of Preload > I can imagine, frankly. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Tue Jun 13 13:31:24 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 13 Jun 2023 09:31:24 -0400 Subject: Preload attribute In-Reply-To: References: <80926DBE-E25B-4622-BA74-2D78710C5959@oracle.com> <1BCCC1A3-4B9A-4426-89AE-E80702FEAA40@oracle.com> <42615AC3-23F2-4DE2-B5CC-762981A31194@oracle.com> Message-ID: On Mon, Jun 12, 2023 at 10:44?AM Brian Goetz wrote: > As a reminder, Leyden will give us a more general tool for "moving stuff > around" at build time than CDS does, and that the current CDS behavior may > well be folded into a set of condensers. > > We are trying to find the "perfect" place to put preload information, but > we have (as usual) an overconstrained notion of perfection; what makes > perfect sense for semantics or non-duplication may not make perfect sense > for runtime behavior. > > Leyden will let us cut this knot by letting us put the information in the > classfile in the semantically sensible place, and let tooling boil it down > later at pre-deployment time to a representation that is more efficient for > runtime. > > So what I suggest is focusing on capturing the source data, which IMO > seems to still be some flavor of "class/method X needs to know more about > value class V before making certain decisions". Preloading is the > mechanism of how we find out that "more", and aggregated representations > such as per-module / CDS archives are a rearranging of the source data to > achieve a more runtime friendly representation _for a particular > configuration of classes_. > > tl;dr: Let's design what captures the semantics we need, and treat > computing e.g. optimal load order as a downstream transformation. > That sounds reasonable. I think my original question about how the JVMS treats preload still needs to be addressed though. What guarantees / requirements should we impose on the JVM's handling of preload? The current spec is not clear enough for users to understand what they get from it and is too clever in handing off loading rules to JVMS 5.4's flexibility. My current position is we need to specify the behaviour and the point in the loading process where the preload attempts will occur so users can depend on the behaviour. From John's emails, I think he would prefer to see preload become strictly an optimization and be outside the spec (John correct me if I've misstated). --Dan > > > On 6/12/2023 9:26 AM, Dan Heidinga wrote: > > The top-line goal for the preload efforts is to trigger the necessary "go > and look" behaviour to support calling convention flattening for values. > We want the broadest, most reliable mechanism to ensure that we routinely > get flattening in the calling convention for value types so that the > flattening horizon can extend beyond a single compiled body (ie: a method > and its inlines). > > Summarizing the options presented so far: > > A) Value classes should be put into the CDS archive to ensure they are > loaded early enough, in a group, and in a form that the VM can quickly > discover whether calling convention optimizations apply to them. This > involves either a class list to create a static archive (allows jdk > classes) or using a dynamic archive with AppCDS. Both cases require a > "cold run" to generate the data needed for CDS and only capture classes > that have been loaded during that run (I think that's correct?). > > B) Use a "Watch List" to list class names that should be looked for. When > the name appears, trigger loading early enough to allow calling convention > optimizations to apply. Name conflicts are "safe" as the worst case is a > class is loaded early in multiple loaders but is only a value in one > loader. The watch list can be: global or per-module. It's possible a tool > like jlink or jmod could be used to generate the watch list by scanning all > the classes included in the jimage/jmod file. > > C) The per-class preload attribute. Each class lists the value classes it > may reference to ensure they are loaded early enough. Potentially a lot of > duplication as each class in an application would list many of the same > value classes. > > Did I miss any? > > There's also another dimension we've touched on: how eager is eager > loading. Current preload behaviour is to batch load all the listed > classes. Alternatively, loading could wait until one of the classes was > observed in method signature / field signature and load on an as-needed > basis. > > We've mostly concentrated on preload as an optimization for calling > conventions but there may be other uses of the mechanism as well. A user > may want to ensure that classes are loaded early to prevent optimizations > that need to be walked back later based on their knowledge of application > behaviour. For example, ensuring there is always more than a single > implementor of an interface loaded to prevent CHA optimizations on some > critical path where the second implementation is normally loaded late. Or > to ensure an entire sealed hierarchy is loaded together. I haven't put > much thought into this yet but expect users will find interesting ways to > use "preload" if it's reliable enough for them. (And of course, some will > abuse it in ways that hurt their performance as well). > > Which of these options meets the goal ("reliable, routine calling > convention optimization for values") best? > > --Dan > > On Fri, Jun 9, 2023 at 9:38?PM John Rose wrote: > >> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >> >> On Thu, Jun 8, 2023 at 4:51?PM John Rose wrote: >> >> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >> >> On Thu, Jun 8, 2023 at 12:44?PM John Rose wrote: >> >> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >> >> If we decouple the list of preloadable classes from the classfile, how >> would non-jdk classes be handled?> What if instead of ditching the >> >> attribute, or treating it like an >> >> optimization, we firmed up the contract and treated it as a guarantee? >> >> If we go down this route, let?s consider putting the control information >> into a module file (only) for starters. (Maybe class file later if >> needed.) There would be fewer states to document and test, since (by >> definition) class files could not get out of sync. >> >> A module would document, in one mplace, which types it would ?prefer? to >> preload in order to optimize its APIs (internal or external). >> >> This might lead to more class loading than intended. The current approach >> has each classfile register the list of classes it wants preloaded to get >> the best linkage which means we only have to load those classes if we >> link >> the original class. There's a natural trigger for the preload and a >> limited set of classes to load. >> >> There?s a spectrum of tradeoffs here: We could put preload attributes on >> every method and field, to get the maximum amount of fine-grained lazy >> (pre-)loading, or put them in a global file per JVM instance. The more >> fine-grained, the harder it will be to write compliance testing, I think. >> >> Agreed. There's a sweet spot between expressiveness and overheads >> (testing, metadata, etc). Classfiles have historically been the place >> where the JVM tracks this kind of information as that fits well with >> separate compilation and avoids the "external metadata" problems of ie: >> GraalVM's extra-linguistic configuration files. >> >> When compiling the current class, javac already requires directly >> referenced classes to be findable and thus has the info required to write >> a >> preload attribute. Does javac necessarily have the same info when >> compiling the module-info classfile? Maybe when finding the non-exported >> packages for the module javac (or jlink? or jmod?) could also find the >> value classes that need preloading? >> >> That is what I am assuming. The module file would be edited by those >> guys. Or (maybe better) a plain flat textual list is put somewhere the JVM >> can find it. >> >> Moving it into a separate pass like this doesn't feel like quite the >> right >> fit though as it excludes the classpath and complicates the other tools >> processing of the modules. >> >> I think it?s better than that. When we are assembling a program (jlink or >> a Leyden condenser), the responsibility of publicizing value classes (for >> Preload) surely belongs to the declaration, not collectively on all the >> uses. >> >> So every module (jmod or whatever) that declares 1 or more value classes >> (if they are exported, at least) should list them on a publicized watch >> list. >> >> There is no need to replicate these watch lists across all potential API >> clients of a value class. There are reasons *not* to do this, since the >> clients have only partial, provisional information about the values. >> >> Moving to a single per-module list loses the natural trigger and may >> pre-load more classes than the application will use. If Module A has >> classes {A, B, C} and each one preloads 5 separate classes, with a >> per-module list that's forcing the loading of 15 additional classes (plus >> supers, etc). With a per-class list, we only preload the classes on a >> per-use basis. More of a pay for what you use model. >> >> Is there a natural trigger or way to limit the preloads to what I might >> use >> with the per-module file? >> >> That?s a very good question. I think what Preload *really is* is a list >> of ?names that may require special handling before using in APIs?. They >> don?t need to be loaded when the preload attribute is parsed; they are >> simply put in a ?watch list? to trigger additional loading *when >> necessary*. (This is already true.) So I think if we move the preload >> list to (say) the module level (if not a global file), then the JVM will >> have its watch list. (And, in fewer chunks than if we put all the stuff >> all >> the time redundantly in all class files that might need them: That >> requires >> frequent repetition.) The JVM can use its watch list as it does today, >> with >> watch lists populated separately for each class file. >> >> I initially thought a global list would lead to issues if two different >> classloaders defined classes of the same name but since this is a "go and >> look" signal, early loading based on name should be fine even in that >> case >> as each loader that mentions the name would be asked to be asked to load >> their version of the named class. So I think a per-JVM list would be OK >> from that perspective (though I still don't like it). >> >> Agreed. >> >> To emphasize: A watch list does not require loading. It means, ?if you >> see >> this name at a point where you could use extra class info, then I >> encourage >> you to load sooner rather than later?. The only reason it is ?a thing? at >> all is that the default behavior (of loading either as late as possible, >> or >> as part of a CDS-like thingy) should be changed only on an explicit >> signal. >> >> While true for what the JVM needs, this is hard behaviour to explain to >> users and challenging for compliance test writers (or maybe not if we >> continue to treat preload as an optimization). >> >> I?m trying to reduce this to a pure optimization. In that case, ?watch >> lists? are just helpers, which are allowed to fail, and allowed to be >> garbage. >> >> Is this where we want to >> spend our complexity budget? >> >> (No, hence it should be an optimization.) >> >> Part of why I'm circling back to treating >> preload as a per-classfile attribute that forms a requirement on the VM >> rather than as an optimization is that the model becomes clearer for >> users, >> developers and testers. >> >> I think it?s still going to be murky. Why is putting the watch list on >> the API clients better than putting it on (or near) the value class >> definitions? >> >> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >> with all of your class path loaded. Et voila, no Preload at all. >> >> Users may find this behaviour surprising - I ran with a CDS archive and >> my >> JVM loaded classes earlier than it would have otherwise? >> >> CDS has the effect of making class loading in a more timely fashion, and >> (under Leyden) will almost certainly trigger reordering of loading as well. >> So promulgating a ?watch list? has goals which align with CDS. >> >> I?m starting to think that the right ?level? to pull for optimizing >> value-based APIs is to put the value classes in a CDS archive. That is a >> defacto watch list. The jlink guy should just make a table of all value >> classes. That?s the best form of Preload I can imagine, frankly. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Jun 13 14:03:43 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 13 Jun 2023 16:03:43 +0200 (CEST) Subject: Preload attribute In-Reply-To: References: Message-ID: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> > From: "Dan Heidinga" > To: "Brian Goetz" > Cc: "John Rose" , "daniel smith" > , "valhalla-spec-experts" > > Sent: Tuesday, June 13, 2023 3:31:24 PM > Subject: Re: Preload attribute > On Mon, Jun 12, 2023 at 10:44 AM Brian Goetz < [ mailto:brian.goetz at oracle.com | > brian.goetz at oracle.com ] > wrote: >> As a reminder, Leyden will give us a more general tool for "moving stuff around" >> at build time than CDS does, and that the current CDS behavior may well be >> folded into a set of condensers. >> We are trying to find the "perfect" place to put preload information, but we >> have (as usual) an overconstrained notion of perfection; what makes perfect >> sense for semantics or non-duplication may not make perfect sense for runtime >> behavior. >> Leyden will let us cut this knot by letting us put the information in the >> classfile in the semantically sensible place, and let tooling boil it down >> later at pre-deployment time to a representation that is more efficient for >> runtime. >> So what I suggest is focusing on capturing the source data, which IMO seems to >> still be some flavor of "class/method X needs to know more about value class V >> before making certain decisions". Preloading is the mechanism of how we find >> out that "more", and aggregated representations such as per-module / CDS >> archives are a rearranging of the source data to achieve a more runtime >> friendly representation _for a particular configuration of classes_. >> tl;dr: Let's design what captures the semantics we need, and treat computing >> e.g. optimal load order as a downstream transformation. > That sounds reasonable. > I think my original question about how the JVMS treats preload still needs to be > addressed though. What guarantees / requirements should we impose on the JVM's > handling of preload? The current spec is not clear enough for users to > understand what they get from it and is too clever in handing off loading rules > to JVMS 5.4's flexibility. > My current position is we need to specify the behaviour and the point in the > loading process where the preload attempts will occur so users can depend on > the behaviour. From John's emails, I think he would prefer to see preload > become strictly an optimization and be outside the spec (John correct me if > I've misstated). I'm on John side, if the VM never report if an error occurs when the Preload attribute is read, the user has no side effect to see when the attribute is read, so there is no need to specify the exact point where this attribute is read. > --Dan R?mi >> On 6/12/2023 9:26 AM, Dan Heidinga wrote: >>> The top-line goal for the preload efforts is to trigger the necessary "go and >>> look" behaviour to support calling convention flattening for values. We want >>> the broadest, most reliable mechanism to ensure that we routinely get >>> flattening in the calling convention for value types so that the flattening >>> horizon can extend beyond a single compiled body (ie: a method and its >>> inlines). >>> Summarizing the options presented so far: >>> A) Value classes should be put into the CDS archive to ensure they are loaded >>> early enough, in a group, and in a form that the VM can quickly discover >>> whether calling convention optimizations apply to them. This involves either a >>> class list to create a static archive (allows jdk classes) or using a dynamic >>> archive with AppCDS. Both cases require a "cold run" to generate the data >>> needed for CDS and only capture classes that have been loaded during that run >>> (I think that's correct?). >>> B) Use a "Watch List" to list class names that should be looked for. When the >>> name appears, trigger loading early enough to allow calling convention >>> optimizations to apply. Name conflicts are "safe" as the worst case is a class >>> is loaded early in multiple loaders but is only a value in one loader. The >>> watch list can be: global or per-module. It's possible a tool like jlink or >>> jmod could be used to generate the watch list by scanning all the classes >>> included in the jimage/jmod file. >>> C) The per-class preload attribute. Each class lists the value classes it may >>> reference to ensure they are loaded early enough. Potentially a lot of >>> duplication as each class in an application would list many of the same value >>> classes. >>> Did I miss any? >>> There's also another dimension we've touched on: how eager is eager loading. >>> Current preload behaviour is to batch load all the listed classes. >>> Alternatively, loading could wait until one of the classes was observed in >>> method signature / field signature and load on an as-needed basis. >>> We've mostly concentrated on preload as an optimization for calling conventions >>> but there may be other uses of the mechanism as well. A user may want to ensure >>> that classes are loaded early to prevent optimizations that need to be walked >>> back later based on their knowledge of application behaviour. For example, >>> ensuring there is always more than a single implementor of an interface loaded >>> to prevent CHA optimizations on some critical path where the second >>> implementation is normally loaded late. Or to ensure an entire sealed hierarchy >>> is loaded together. I haven't put much thought into this yet but expect users >>> will find interesting ways to use "preload" if it's reliable enough for them. >>> (And of course, some will abuse it in ways that hurt their performance as >>> well). >>> Which of these options meets the goal ("reliable, routine calling convention >>> optimization for values") best? >>> --Dan >>> On Fri, Jun 9, 2023 at 9:38 PM John Rose < [ mailto:john.r.rose at oracle.com | >>> john.r.rose at oracle.com ] > wrote: >>>> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >>>>> On Thu, Jun 8, 2023 at 4:51 PM John Rose < [ mailto:john.r.rose at oracle.com | >>>>> john.r.rose at oracle.com ] > wrote: >>>>>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >>>>>> On Thu, Jun 8, 2023 at 12:44 PM John Rose < [ mailto:john.r.rose at oracle.com | >>>>>> john.r.rose at oracle.com ] > wrote: >>>>>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >>>>>> If we decouple the list of preloadable classes from the classfile, how >>>>>> would non-jdk classes be handled?> What if instead of ditching the >>>>>> attribute, or treating it like an >>>>>> optimization, we firmed up the contract and treated it as a guarantee? >>>>>> If we go down this route, let?s consider putting the control information >>>>>> into a module file (only) for starters. (Maybe class file later if >>>>>> needed.) There would be fewer states to document and test, since (by >>>>>> definition) class files could not get out of sync. >>>>>> A module would document, in one mplace, which types it would ?prefer? to >>>>>> preload in order to optimize its APIs (internal or external). >>>>>> This might lead to more class loading than intended. The current approach >>>>>> has each classfile register the list of classes it wants preloaded to get >>>>>> the best linkage which means we only have to load those classes if we link >>>>>> the original class. There's a natural trigger for the preload and a >>>>>> limited set of classes to load. >>>>>> There?s a spectrum of tradeoffs here: We could put preload attributes on >>>>>> every method and field, to get the maximum amount of fine-grained lazy >>>>>> (pre-)loading, or put them in a global file per JVM instance. The more >>>>>> fine-grained, the harder it will be to write compliance testing, I think. >>>>> Agreed. There's a sweet spot between expressiveness and overheads >>>>> (testing, metadata, etc). Classfiles have historically been the place >>>>> where the JVM tracks this kind of information as that fits well with >>>>> separate compilation and avoids the "external metadata" problems of ie: >>>>> GraalVM's extra-linguistic configuration files. >>>>> When compiling the current class, javac already requires directly >>>>> referenced classes to be findable and thus has the info required to write a >>>>> preload attribute. Does javac necessarily have the same info when >>>>> compiling the module-info classfile? Maybe when finding the non-exported >>>>> packages for the module javac (or jlink? or jmod?) could also find the >>>>> value classes that need preloading? >>>> That is what I am assuming. The module file would be edited by those guys. Or >>>> (maybe better) a plain flat textual list is put somewhere the JVM can find it. >>>>> Moving it into a separate pass like this doesn't feel like quite the right >>>>> fit though as it excludes the classpath and complicates the other tools >>>>> processing of the modules. >>>> I think it?s better than that. When we are assembling a program (jlink or a >>>> Leyden condenser), the responsibility of publicizing value classes (for >>>> Preload) surely belongs to the declaration, not collectively on all the uses. >>>> So every module (jmod or whatever) that declares 1 or more value classes (if >>>> they are exported, at least) should list them on a publicized watch list. >>>> There is no need to replicate these watch lists across all potential API clients >>>> of a value class. There are reasons not to do this, since the clients have only >>>> partial, provisional information about the values. >>>>>> Moving to a single per-module list loses the natural trigger and may >>>>>> pre-load more classes than the application will use. If Module A has >>>>>> classes {A, B, C} and each one preloads 5 separate classes, with a >>>>>> per-module list that's forcing the loading of 15 additional classes (plus >>>>>> supers, etc). With a per-class list, we only preload the classes on a >>>>>> per-use basis. More of a pay for what you use model. >>>>>> Is there a natural trigger or way to limit the preloads to what I might >>>>>> use >>>>>> with the per-module file? >>>>>> That?s a very good question. I think what Preload *really is* is a list >>>>>> of ?names that may require special handling before using in APIs?. They >>>>>> don?t need to be loaded when the preload attribute is parsed; they are >>>>>> simply put in a ?watch list? to trigger additional loading *when >>>>>> necessary*. (This is already true.) So I think if we move the preload >>>>>> list to (say) the module level (if not a global file), then the JVM will >>>>>> have its watch list. (And, in fewer chunks than if we put all the stuff all >>>>>> the time redundantly in all class files that might need them: That requires >>>>>> frequent repetition.) The JVM can use its watch list as it does today, with >>>>>> watch lists populated separately for each class file. >>>>> I initially thought a global list would lead to issues if two different >>>>> classloaders defined classes of the same name but since this is a "go and >>>>> look" signal, early loading based on name should be fine even in that case >>>>> as each loader that mentions the name would be asked to be asked to load >>>>> their version of the named class. So I think a per-JVM list would be OK >>>>> from that perspective (though I still don't like it). >>>> Agreed. >>>>>> To emphasize: A watch list does not require loading. It means, ?if you see >>>>>> this name at a point where you could use extra class info, then I encourage >>>>>> you to load sooner rather than later?. The only reason it is ?a thing? at >>>>>> all is that the default behavior (of loading either as late as possible, or >>>>>> as part of a CDS-like thingy) should be changed only on an explicit signal. >>>>> While true for what the JVM needs, this is hard behaviour to explain to >>>>> users and challenging for compliance test writers (or maybe not if we >>>>> continue to treat preload as an optimization). >>>> I?m trying to reduce this to a pure optimization. In that case, ?watch lists? >>>> are just helpers, which are allowed to fail, and allowed to be garbage. >>>>> Is this where we want to >>>>> spend our complexity budget? >>>> (No, hence it should be an optimization.) >>>>> Part of why I'm circling back to treating >>>>> preload as a per-classfile attribute that forms a requirement on the VM >>>>> rather than as an optimization is that the model becomes clearer for users, >>>>> developers and testers. >>>> I think it?s still going to be murky. Why is putting the watch list on the API >>>> clients better than putting it on (or near) the value class definitions? >>>>>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >>>>>> with all of your class path loaded. Et voila, no Preload at all. >>>>> Users may find this behaviour surprising - I ran with a CDS archive and my >>>>> JVM loaded classes earlier than it would have otherwise? >>>> CDS has the effect of making class loading in a more timely fashion, and (under >>>> Leyden) will almost certainly trigger reordering of loading as well. So >>>> promulgating a ?watch list? has goals which align with CDS. >>>> I?m starting to think that the right ?level? to pull for optimizing value-based >>>> APIs is to put the value classes in a CDS archive. That is a defacto watch >>>> list. The jlink guy should just make a table of all value classes. That?s the >>>> best form of Preload I can imagine, frankly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Tue Jun 13 15:10:07 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 13 Jun 2023 11:10:07 -0400 Subject: Preload attribute In-Reply-To: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> References: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> Message-ID: On Tue, Jun 13, 2023 at 10:13?AM Remi Forax wrote: > > > ------------------------------ > > *From: *"Dan Heidinga" > *To: *"Brian Goetz" > *Cc: *"John Rose" , "daniel smith" < > daniel.smith at oracle.com>, "valhalla-spec-experts" < > valhalla-spec-experts at openjdk.java.net> > *Sent: *Tuesday, June 13, 2023 3:31:24 PM > *Subject: *Re: Preload attribute > > > > On Mon, Jun 12, 2023 at 10:44?AM Brian Goetz > wrote: > >> As a reminder, Leyden will give us a more general tool for "moving stuff >> around" at build time than CDS does, and that the current CDS behavior may >> well be folded into a set of condensers. >> >> We are trying to find the "perfect" place to put preload information, but >> we have (as usual) an overconstrained notion of perfection; what makes >> perfect sense for semantics or non-duplication may not make perfect sense >> for runtime behavior. >> >> Leyden will let us cut this knot by letting us put the information in the >> classfile in the semantically sensible place, and let tooling boil it down >> later at pre-deployment time to a representation that is more efficient for >> runtime. >> >> So what I suggest is focusing on capturing the source data, which IMO >> seems to still be some flavor of "class/method X needs to know more about >> value class V before making certain decisions". Preloading is the >> mechanism of how we find out that "more", and aggregated representations >> such as per-module / CDS archives are a rearranging of the source data to >> achieve a more runtime friendly representation _for a particular >> configuration of classes_. >> >> tl;dr: Let's design what captures the semantics we need, and treat >> computing e.g. optimal load order as a downstream transformation. >> > > That sounds reasonable. > > I think my original question about how the JVMS treats preload still needs > to be addressed though. What guarantees / requirements should we impose on > the JVM's handling of preload? The current spec is not clear enough for > users to understand what they get from it and is too clever in handing off > loading rules to JVMS 5.4's flexibility. > > My current position is we need to specify the behaviour and the point in > the loading process where the preload attempts will occur so users can > depend on the behaviour. From John's emails, I think he would prefer to > see preload become strictly an optimization and be outside the spec (John > correct me if I've misstated). > > > I'm on John side, if the VM never report if an error occurs when the > Preload attribute is read, the user has no side effect to see when the > attribute is read, so there is no need to specify the exact point where > this attribute is read. > Preload attempts to load the class which does cause user visible side effects - ClassLoader::loadClass is called for one which users can observe in a number of ways. JVMTI can expose this info as can j.l.instrument.Instrumentation::getInitiatedClasses(ClassLoader) & :getAllLoadedClasses(). I'm sure there are other ways as well. My point being it is observable so we should specify it clearly. --Dan > > > --Dan > > > R?mi > > > >> >> >> On 6/12/2023 9:26 AM, Dan Heidinga wrote: >> >> The top-line goal for the preload efforts is to trigger the necessary "go >> and look" behaviour to support calling convention flattening for values. >> We want the broadest, most reliable mechanism to ensure that we routinely >> get flattening in the calling convention for value types so that the >> flattening horizon can extend beyond a single compiled body (ie: a method >> and its inlines). >> >> Summarizing the options presented so far: >> >> A) Value classes should be put into the CDS archive to ensure they are >> loaded early enough, in a group, and in a form that the VM can quickly >> discover whether calling convention optimizations apply to them. This >> involves either a class list to create a static archive (allows jdk >> classes) or using a dynamic archive with AppCDS. Both cases require a >> "cold run" to generate the data needed for CDS and only capture classes >> that have been loaded during that run (I think that's correct?). >> >> B) Use a "Watch List" to list class names that should be looked for. >> When the name appears, trigger loading early enough to allow calling >> convention optimizations to apply. Name conflicts are "safe" as the worst >> case is a class is loaded early in multiple loaders but is only a value in >> one loader. The watch list can be: global or per-module. It's possible a >> tool like jlink or jmod could be used to generate the watch list by >> scanning all the classes included in the jimage/jmod file. >> >> C) The per-class preload attribute. Each class lists the value classes >> it may reference to ensure they are loaded early enough. Potentially a lot >> of duplication as each class in an application would list many of the same >> value classes. >> >> Did I miss any? >> >> There's also another dimension we've touched on: how eager is eager >> loading. Current preload behaviour is to batch load all the listed >> classes. Alternatively, loading could wait until one of the classes was >> observed in method signature / field signature and load on an as-needed >> basis. >> >> We've mostly concentrated on preload as an optimization for calling >> conventions but there may be other uses of the mechanism as well. A user >> may want to ensure that classes are loaded early to prevent optimizations >> that need to be walked back later based on their knowledge of application >> behaviour. For example, ensuring there is always more than a single >> implementor of an interface loaded to prevent CHA optimizations on some >> critical path where the second implementation is normally loaded late. Or >> to ensure an entire sealed hierarchy is loaded together. I haven't put >> much thought into this yet but expect users will find interesting ways to >> use "preload" if it's reliable enough for them. (And of course, some will >> abuse it in ways that hurt their performance as well). >> >> Which of these options meets the goal ("reliable, routine calling >> convention optimization for values") best? >> >> --Dan >> >> On Fri, Jun 9, 2023 at 9:38?PM John Rose wrote: >> >>> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >>> >>> On Thu, Jun 8, 2023 at 4:51?PM John Rose wrote: >>> >>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >>> >>> On Thu, Jun 8, 2023 at 12:44?PM John Rose >>> wrote: >>> >>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >>> >>> If we decouple the list of preloadable classes from the classfile, how >>> would non-jdk classes be handled?> What if instead of ditching the >>> >>> attribute, or treating it like an >>> >>> optimization, we firmed up the contract and treated it as a guarantee? >>> >>> If we go down this route, let?s consider putting the control information >>> into a module file (only) for starters. (Maybe class file later if >>> needed.) There would be fewer states to document and test, since (by >>> definition) class files could not get out of sync. >>> >>> A module would document, in one mplace, which types it would ?prefer? to >>> preload in order to optimize its APIs (internal or external). >>> >>> This might lead to more class loading than intended. The current >>> approach >>> has each classfile register the list of classes it wants preloaded to >>> get >>> the best linkage which means we only have to load those classes if we >>> link >>> the original class. There's a natural trigger for the preload and a >>> limited set of classes to load. >>> >>> There?s a spectrum of tradeoffs here: We could put preload attributes on >>> every method and field, to get the maximum amount of fine-grained lazy >>> (pre-)loading, or put them in a global file per JVM instance. The more >>> fine-grained, the harder it will be to write compliance testing, I think. >>> >>> Agreed. There's a sweet spot between expressiveness and overheads >>> (testing, metadata, etc). Classfiles have historically been the place >>> where the JVM tracks this kind of information as that fits well with >>> separate compilation and avoids the "external metadata" problems of ie: >>> GraalVM's extra-linguistic configuration files. >>> >>> When compiling the current class, javac already requires directly >>> referenced classes to be findable and thus has the info required to >>> write a >>> preload attribute. Does javac necessarily have the same info when >>> compiling the module-info classfile? Maybe when finding the non-exported >>> packages for the module javac (or jlink? or jmod?) could also find the >>> value classes that need preloading? >>> >>> That is what I am assuming. The module file would be edited by those >>> guys. Or (maybe better) a plain flat textual list is put somewhere the JVM >>> can find it. >>> >>> Moving it into a separate pass like this doesn't feel like quite the >>> right >>> fit though as it excludes the classpath and complicates the other tools >>> processing of the modules. >>> >>> I think it?s better than that. When we are assembling a program (jlink >>> or a Leyden condenser), the responsibility of publicizing value classes >>> (for Preload) surely belongs to the declaration, not collectively on all >>> the uses. >>> >>> So every module (jmod or whatever) that declares 1 or more value classes >>> (if they are exported, at least) should list them on a publicized watch >>> list. >>> >>> There is no need to replicate these watch lists across all potential API >>> clients of a value class. There are reasons *not* to do this, since the >>> clients have only partial, provisional information about the values. >>> >>> Moving to a single per-module list loses the natural trigger and may >>> pre-load more classes than the application will use. If Module A has >>> classes {A, B, C} and each one preloads 5 separate classes, with a >>> per-module list that's forcing the loading of 15 additional classes >>> (plus >>> supers, etc). With a per-class list, we only preload the classes on a >>> per-use basis. More of a pay for what you use model. >>> >>> Is there a natural trigger or way to limit the preloads to what I might >>> use >>> with the per-module file? >>> >>> That?s a very good question. I think what Preload *really is* is a list >>> of ?names that may require special handling before using in APIs?. They >>> don?t need to be loaded when the preload attribute is parsed; they are >>> simply put in a ?watch list? to trigger additional loading *when >>> necessary*. (This is already true.) So I think if we move the preload >>> list to (say) the module level (if not a global file), then the JVM will >>> have its watch list. (And, in fewer chunks than if we put all the stuff >>> all >>> the time redundantly in all class files that might need them: That >>> requires >>> frequent repetition.) The JVM can use its watch list as it does today, >>> with >>> watch lists populated separately for each class file. >>> >>> I initially thought a global list would lead to issues if two different >>> classloaders defined classes of the same name but since this is a "go >>> and >>> look" signal, early loading based on name should be fine even in that >>> case >>> as each loader that mentions the name would be asked to be asked to load >>> their version of the named class. So I think a per-JVM list would be OK >>> from that perspective (though I still don't like it). >>> >>> Agreed. >>> >>> To emphasize: A watch list does not require loading. It means, ?if you >>> see >>> this name at a point where you could use extra class info, then I >>> encourage >>> you to load sooner rather than later?. The only reason it is ?a thing? >>> at >>> all is that the default behavior (of loading either as late as possible, >>> or >>> as part of a CDS-like thingy) should be changed only on an explicit >>> signal. >>> >>> While true for what the JVM needs, this is hard behaviour to explain to >>> users and challenging for compliance test writers (or maybe not if we >>> continue to treat preload as an optimization). >>> >>> I?m trying to reduce this to a pure optimization. In that case, ?watch >>> lists? are just helpers, which are allowed to fail, and allowed to be >>> garbage. >>> >>> Is this where we want to >>> spend our complexity budget? >>> >>> (No, hence it should be an optimization.) >>> >>> Part of why I'm circling back to treating >>> preload as a per-classfile attribute that forms a requirement on the VM >>> rather than as an optimization is that the model becomes clearer for >>> users, >>> developers and testers. >>> >>> I think it?s still going to be murky. Why is putting the watch list on >>> the API clients better than putting it on (or near) the value class >>> definitions? >>> >>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >>> with all of your class path loaded. Et voila, no Preload at all. >>> >>> Users may find this behaviour surprising - I ran with a CDS archive and >>> my >>> JVM loaded classes earlier than it would have otherwise? >>> >>> CDS has the effect of making class loading in a more timely fashion, and >>> (under Leyden) will almost certainly trigger reordering of loading as well. >>> So promulgating a ?watch list? has goals which align with CDS. >>> >>> I?m starting to think that the right ?level? to pull for optimizing >>> value-based APIs is to put the value classes in a CDS archive. That is a >>> defacto watch list. The jlink guy should just make a table of all value >>> classes. That?s the best form of Preload I can imagine, frankly. >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Tue Jun 13 15:57:50 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 13 Jun 2023 11:57:50 -0400 Subject: implicit constructor translation? In-Reply-To: <168769A6-FC8D-412C-ABB4-397996C16A54@oracle.com> References: <384e2229-19a5-1ce0-8938-5fea675bafa5@oracle.com> <168769A6-FC8D-412C-ABB4-397996C16A54@oracle.com> Message-ID: On Tue, Jun 6, 2023 at 10:47?PM John Rose wrote: > On 1 Jun 2023, at 12:34, Brian Goetz wrote: > > > > On 6/1/2023 2:32 PM, Dan Heidinga wrote: > > > > On Thu, Jun 1, 2023 at 1:59?PM Brian Goetz wrote: > >> I think that there should be an explicit (and not synthetic) method_info >> for the implicit constructor, with the obvious Code attribute >> (`defaultvalue` / `areturn`.) >> > > I'm slightly concerned about having a Code attribute for the implicit > constructor as it allows agents (ClassFile load hook & redefinition) to > modify the bytecodes to be inconsistent with the VM's behaviour given the > VM won't actually call the implicit constructor. > > Telling users it's "as if" the VM called the implicit ctor and then having > the reflective behaviour be different after retransformation is slightly > uncomfortable. > > > I'm fine if the expansion of the constructor body happens at runtime > rather than compile time; my suggestion was mostly "this is super-easy in > the static compiler, why make more work for the JVM." But, if you're > signing up for that work, I won't stop you.... > > > > I haven?t signed up the JVM for that work! It seems like busy-work to me. > JVM code which implements some busy-work requirement costs money for no > benefit, and its (inevitable) bug tail risks increased attack surfaces. > Definitely curious about the busy-work here as my mental model for this is that it's a fairly cheap implementation akin to the code required to throw AbstractMethodError. > I?m assuming (a) the aconst_default bytecode is not API (it?s private) > and therefore that (b) any materialization of a default value will go > through either the class?s API (real new C()) or else via some reflective > API point (C.class.defaultValue()). Maybe something very simple like: > > C defaultValue() { > // if the next line throws an exception, it goes to the client > var a1 = Array.newInstance(asNonNullable(), 1); > // at this point we know C has a default value > return (C ) Array.get(a1, 0); > //OR?? return this.newInstance(); > } > > Regarding the problem of agents, I?d say either we don?t care what a buggy > agent does, or else we can might add an implementation restriction that > refuses to allow a buggy agent to load an implicit constructor with a bad > body. > > Agents can modify all sorts of JDK internals. (Also non-javac-spun > classfiles explore all sorts of odd states.) We can?t play whack-a-mole > trying to prevent all kinds of agent misbehavior. That would just end up > creating an endless parade of costs and bugs. > Agreed. I'm not concerned about buggy agents but about user expectations and avoiding an attractive nuisance. By having bytecodes in the implicit constructor, we encourage users to think they can modify its behaviour. Again, user expectation matters. Though these typically will be power users so there's more leeway for technical explanations. Detailing the way the implicit language level constructor generates both the ImplciitCreation and the helper method would be a great fit for the (non-existent) translation guide. I'm OK with spec'ing our way out by saying the generated for the implicit constructor is only a helper but I don't know which spec we would put that info in. --Dan > And I don?t want to add a verification rule; the costs of tweaking the > verifier far outweigh the benefits. Let?s give some basic trust our > classfile generators and agents. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jun 14 06:56:38 2023 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 14 Jun 2023 08:56:38 +0200 (CEST) Subject: Preload attribute In-Reply-To: References: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <2041916117.80497404.1686725798303.JavaMail.zimbra@univ-eiffel.fr> > From: "Dan Heidinga" > To: "Remi Forax" > Cc: "Brian Goetz" , "John Rose" > , "daniel smith" , > "valhalla-spec-experts" > Sent: Tuesday, June 13, 2023 5:10:07 PM > Subject: Re: Preload attribute > On Tue, Jun 13, 2023 at 10:13 AM Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >>> From: "Dan Heidinga" < [ mailto:heidinga at redhat.com | heidinga at redhat.com ] > >>> To: "Brian Goetz" < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] > >>> Cc: "John Rose" < [ mailto:john.r.rose at oracle.com | john.r.rose at oracle.com ] >, >>> "daniel smith" < [ mailto:daniel.smith at oracle.com | daniel.smith at oracle.com ] >>> >, "valhalla-spec-experts" < [ mailto:valhalla-spec-experts at openjdk.java.net | >>> valhalla-spec-experts at openjdk.java.net ] > >>> Sent: Tuesday, June 13, 2023 3:31:24 PM >>> Subject: Re: Preload attribute >>> On Mon, Jun 12, 2023 at 10:44 AM Brian Goetz < [ mailto:brian.goetz at oracle.com | >>> brian.goetz at oracle.com ] > wrote: >>>> As a reminder, Leyden will give us a more general tool for "moving stuff around" >>>> at build time than CDS does, and that the current CDS behavior may well be >>>> folded into a set of condensers. >>>> We are trying to find the "perfect" place to put preload information, but we >>>> have (as usual) an overconstrained notion of perfection; what makes perfect >>>> sense for semantics or non-duplication may not make perfect sense for runtime >>>> behavior. >>>> Leyden will let us cut this knot by letting us put the information in the >>>> classfile in the semantically sensible place, and let tooling boil it down >>>> later at pre-deployment time to a representation that is more efficient for >>>> runtime. >>>> So what I suggest is focusing on capturing the source data, which IMO seems to >>>> still be some flavor of "class/method X needs to know more about value class V >>>> before making certain decisions". Preloading is the mechanism of how we find >>>> out that "more", and aggregated representations such as per-module / CDS >>>> archives are a rearranging of the source data to achieve a more runtime >>>> friendly representation _for a particular configuration of classes_. >>>> tl;dr: Let's design what captures the semantics we need, and treat computing >>>> e.g. optimal load order as a downstream transformation. >>> That sounds reasonable. >>> I think my original question about how the JVMS treats preload still needs to be >>> addressed though. What guarantees / requirements should we impose on the JVM's >>> handling of preload? The current spec is not clear enough for users to >>> understand what they get from it and is too clever in handing off loading rules >>> to JVMS 5.4's flexibility. >>> My current position is we need to specify the behaviour and the point in the >>> loading process where the preload attempts will occur so users can depend on >>> the behaviour. From John's emails, I think he would prefer to see preload >>> become strictly an optimization and be outside the spec (John correct me if >>> I've misstated). >> I'm on John side, if the VM never report if an error occurs when the Preload >> attribute is read, the user has no side effect to see when the attribute is >> read, so there is no need to specify the exact point where this attribute is >> read. > Preload attempts to load the class which does cause user visible side effects - > ClassLoader::loadClass is called for one which users can observe in a number of > ways. JVMTI can expose this info as can > j.l.instrument.Instrumentation::getInitiatedClasses(ClassLoader) & > :getAllLoadedClasses(). I'm sure there are other ways as well. > My point being it is observable so we should specify it clearly. Observability of classloading is an issue that Leyden has to handle, the Preload attribute is just an instance of that issue. For me, until a class must be initialized, the VM is free to initiate a class loading before that point, if the exception is delayed to only appear at that point. R?mi > --Dan >>> --Dan >> R?mi >>>> On 6/12/2023 9:26 AM, Dan Heidinga wrote: >>>>> The top-line goal for the preload efforts is to trigger the necessary "go and >>>>> look" behaviour to support calling convention flattening for values. We want >>>>> the broadest, most reliable mechanism to ensure that we routinely get >>>>> flattening in the calling convention for value types so that the flattening >>>>> horizon can extend beyond a single compiled body (ie: a method and its >>>>> inlines). >>>>> Summarizing the options presented so far: >>>>> A) Value classes should be put into the CDS archive to ensure they are loaded >>>>> early enough, in a group, and in a form that the VM can quickly discover >>>>> whether calling convention optimizations apply to them. This involves either a >>>>> class list to create a static archive (allows jdk classes) or using a dynamic >>>>> archive with AppCDS. Both cases require a "cold run" to generate the data >>>>> needed for CDS and only capture classes that have been loaded during that run >>>>> (I think that's correct?). >>>>> B) Use a "Watch List" to list class names that should be looked for. When the >>>>> name appears, trigger loading early enough to allow calling convention >>>>> optimizations to apply. Name conflicts are "safe" as the worst case is a class >>>>> is loaded early in multiple loaders but is only a value in one loader. The >>>>> watch list can be: global or per-module. It's possible a tool like jlink or >>>>> jmod could be used to generate the watch list by scanning all the classes >>>>> included in the jimage/jmod file. >>>>> C) The per-class preload attribute. Each class lists the value classes it may >>>>> reference to ensure they are loaded early enough. Potentially a lot of >>>>> duplication as each class in an application would list many of the same value >>>>> classes. >>>>> Did I miss any? >>>>> There's also another dimension we've touched on: how eager is eager loading. >>>>> Current preload behaviour is to batch load all the listed classes. >>>>> Alternatively, loading could wait until one of the classes was observed in >>>>> method signature / field signature and load on an as-needed basis. >>>>> We've mostly concentrated on preload as an optimization for calling conventions >>>>> but there may be other uses of the mechanism as well. A user may want to ensure >>>>> that classes are loaded early to prevent optimizations that need to be walked >>>>> back later based on their knowledge of application behaviour. For example, >>>>> ensuring there is always more than a single implementor of an interface loaded >>>>> to prevent CHA optimizations on some critical path where the second >>>>> implementation is normally loaded late. Or to ensure an entire sealed hierarchy >>>>> is loaded together. I haven't put much thought into this yet but expect users >>>>> will find interesting ways to use "preload" if it's reliable enough for them. >>>>> (And of course, some will abuse it in ways that hurt their performance as >>>>> well). >>>>> Which of these options meets the goal ("reliable, routine calling convention >>>>> optimization for values") best? >>>>> --Dan >>>>> On Fri, Jun 9, 2023 at 9:38 PM John Rose < [ mailto:john.r.rose at oracle.com | >>>>> john.r.rose at oracle.com ] > wrote: >>>>>> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >>>>>>> On Thu, Jun 8, 2023 at 4:51 PM John Rose < [ mailto:john.r.rose at oracle.com | >>>>>>> john.r.rose at oracle.com ] > wrote: >>>>>>>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >>>>>>>> On Thu, Jun 8, 2023 at 12:44 PM John Rose < [ mailto:john.r.rose at oracle.com | >>>>>>>> john.r.rose at oracle.com ] > wrote: >>>>>>>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >>>>>>>> If we decouple the list of preloadable classes from the classfile, how >>>>>>>> would non-jdk classes be handled?> What if instead of ditching the >>>>>>>> attribute, or treating it like an >>>>>>>> optimization, we firmed up the contract and treated it as a guarantee? >>>>>>>> If we go down this route, let?s consider putting the control information >>>>>>>> into a module file (only) for starters. (Maybe class file later if >>>>>>>> needed.) There would be fewer states to document and test, since (by >>>>>>>> definition) class files could not get out of sync. >>>>>>>> A module would document, in one mplace, which types it would ?prefer? to >>>>>>>> preload in order to optimize its APIs (internal or external). >>>>>>>> This might lead to more class loading than intended. The current approach >>>>>>>> has each classfile register the list of classes it wants preloaded to get >>>>>>>> the best linkage which means we only have to load those classes if we link >>>>>>>> the original class. There's a natural trigger for the preload and a >>>>>>>> limited set of classes to load. >>>>>>>> There?s a spectrum of tradeoffs here: We could put preload attributes on >>>>>>>> every method and field, to get the maximum amount of fine-grained lazy >>>>>>>> (pre-)loading, or put them in a global file per JVM instance. The more >>>>>>>> fine-grained, the harder it will be to write compliance testing, I think. >>>>>>> Agreed. There's a sweet spot between expressiveness and overheads >>>>>>> (testing, metadata, etc). Classfiles have historically been the place >>>>>>> where the JVM tracks this kind of information as that fits well with >>>>>>> separate compilation and avoids the "external metadata" problems of ie: >>>>>>> GraalVM's extra-linguistic configuration files. >>>>>>> When compiling the current class, javac already requires directly >>>>>>> referenced classes to be findable and thus has the info required to write a >>>>>>> preload attribute. Does javac necessarily have the same info when >>>>>>> compiling the module-info classfile? Maybe when finding the non-exported >>>>>>> packages for the module javac (or jlink? or jmod?) could also find the >>>>>>> value classes that need preloading? >>>>>> That is what I am assuming. The module file would be edited by those guys. Or >>>>>> (maybe better) a plain flat textual list is put somewhere the JVM can find it. >>>>>>> Moving it into a separate pass like this doesn't feel like quite the right >>>>>>> fit though as it excludes the classpath and complicates the other tools >>>>>>> processing of the modules. >>>>>> I think it?s better than that. When we are assembling a program (jlink or a >>>>>> Leyden condenser), the responsibility of publicizing value classes (for >>>>>> Preload) surely belongs to the declaration, not collectively on all the uses. >>>>>> So every module (jmod or whatever) that declares 1 or more value classes (if >>>>>> they are exported, at least) should list them on a publicized watch list. >>>>>> There is no need to replicate these watch lists across all potential API clients >>>>>> of a value class. There are reasons not to do this, since the clients have only >>>>>> partial, provisional information about the values. >>>>>>>> Moving to a single per-module list loses the natural trigger and may >>>>>>>> pre-load more classes than the application will use. If Module A has >>>>>>>> classes {A, B, C} and each one preloads 5 separate classes, with a >>>>>>>> per-module list that's forcing the loading of 15 additional classes (plus >>>>>>>> supers, etc). With a per-class list, we only preload the classes on a >>>>>>>> per-use basis. More of a pay for what you use model. >>>>>>>> Is there a natural trigger or way to limit the preloads to what I might >>>>>>>> use >>>>>>>> with the per-module file? >>>>>>>> That?s a very good question. I think what Preload *really is* is a list >>>>>>>> of ?names that may require special handling before using in APIs?. They >>>>>>>> don?t need to be loaded when the preload attribute is parsed; they are >>>>>>>> simply put in a ?watch list? to trigger additional loading *when >>>>>>>> necessary*. (This is already true.) So I think if we move the preload >>>>>>>> list to (say) the module level (if not a global file), then the JVM will >>>>>>>> have its watch list. (And, in fewer chunks than if we put all the stuff all >>>>>>>> the time redundantly in all class files that might need them: That requires >>>>>>>> frequent repetition.) The JVM can use its watch list as it does today, with >>>>>>>> watch lists populated separately for each class file. >>>>>>> I initially thought a global list would lead to issues if two different >>>>>>> classloaders defined classes of the same name but since this is a "go and >>>>>>> look" signal, early loading based on name should be fine even in that case >>>>>>> as each loader that mentions the name would be asked to be asked to load >>>>>>> their version of the named class. So I think a per-JVM list would be OK >>>>>>> from that perspective (though I still don't like it). >>>>>> Agreed. >>>>>>>> To emphasize: A watch list does not require loading. It means, ?if you see >>>>>>>> this name at a point where you could use extra class info, then I encourage >>>>>>>> you to load sooner rather than later?. The only reason it is ?a thing? at >>>>>>>> all is that the default behavior (of loading either as late as possible, or >>>>>>>> as part of a CDS-like thingy) should be changed only on an explicit signal. >>>>>>> While true for what the JVM needs, this is hard behaviour to explain to >>>>>>> users and challenging for compliance test writers (or maybe not if we >>>>>>> continue to treat preload as an optimization). >>>>>> I?m trying to reduce this to a pure optimization. In that case, ?watch lists? >>>>>> are just helpers, which are allowed to fail, and allowed to be garbage. >>>>>>> Is this where we want to >>>>>>> spend our complexity budget? >>>>>> (No, hence it should be an optimization.) >>>>>>> Part of why I'm circling back to treating >>>>>>> preload as a per-classfile attribute that forms a requirement on the VM >>>>>>> rather than as an optimization is that the model becomes clearer for users, >>>>>>> developers and testers. >>>>>> I think it?s still going to be murky. Why is putting the watch list on the API >>>>>> clients better than putting it on (or near) the value class definitions? >>>>>>>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >>>>>>>> with all of your class path loaded. Et voila, no Preload at all. >>>>>>> Users may find this behaviour surprising - I ran with a CDS archive and my >>>>>>> JVM loaded classes earlier than it would have otherwise? >>>>>> CDS has the effect of making class loading in a more timely fashion, and (under >>>>>> Leyden) will almost certainly trigger reordering of loading as well. So >>>>>> promulgating a ?watch list? has goals which align with CDS. >>>>>> I?m starting to think that the right ?level? to pull for optimizing value-based >>>>>> APIs is to put the value classes in a CDS archive. That is a defacto watch >>>>>> list. The jlink guy should just make a table of all value classes. That?s the >>>>>> best form of Preload I can imagine, frankly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Wed Jun 14 12:46:21 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 14 Jun 2023 08:46:21 -0400 Subject: Preload attribute In-Reply-To: <2041916117.80497404.1686725798303.JavaMail.zimbra@univ-eiffel.fr> References: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> <2041916117.80497404.1686725798303.JavaMail.zimbra@univ-eiffel.fr> Message-ID: On Wed, Jun 14, 2023 at 2:56?AM wrote: > > > ------------------------------ > > *From: *"Dan Heidinga" > *To: *"Remi Forax" > *Cc: *"Brian Goetz" , "John Rose" < > john.r.rose at oracle.com>, "daniel smith" , > "valhalla-spec-experts" > *Sent: *Tuesday, June 13, 2023 5:10:07 PM > *Subject: *Re: Preload attribute > > > > On Tue, Jun 13, 2023 at 10:13?AM Remi Forax wrote: > >> >> >> ------------------------------ >> >> *From: *"Dan Heidinga" >> *To: *"Brian Goetz" >> *Cc: *"John Rose" , "daniel smith" < >> daniel.smith at oracle.com>, "valhalla-spec-experts" < >> valhalla-spec-experts at openjdk.java.net> >> *Sent: *Tuesday, June 13, 2023 3:31:24 PM >> *Subject: *Re: Preload attribute >> >> >> >> On Mon, Jun 12, 2023 at 10:44?AM Brian Goetz >> wrote: >> >>> As a reminder, Leyden will give us a more general tool for "moving stuff >>> around" at build time than CDS does, and that the current CDS behavior may >>> well be folded into a set of condensers. >>> >>> We are trying to find the "perfect" place to put preload information, >>> but we have (as usual) an overconstrained notion of perfection; what makes >>> perfect sense for semantics or non-duplication may not make perfect sense >>> for runtime behavior. >>> >>> Leyden will let us cut this knot by letting us put the information in >>> the classfile in the semantically sensible place, and let tooling boil it >>> down later at pre-deployment time to a representation that is more >>> efficient for runtime. >>> >>> So what I suggest is focusing on capturing the source data, which IMO >>> seems to still be some flavor of "class/method X needs to know more about >>> value class V before making certain decisions". Preloading is the >>> mechanism of how we find out that "more", and aggregated representations >>> such as per-module / CDS archives are a rearranging of the source data to >>> achieve a more runtime friendly representation _for a particular >>> configuration of classes_. >>> >>> tl;dr: Let's design what captures the semantics we need, and treat >>> computing e.g. optimal load order as a downstream transformation. >>> >> >> That sounds reasonable. >> >> I think my original question about how the JVMS treats preload still >> needs to be addressed though. What guarantees / requirements should we >> impose on the JVM's handling of preload? The current spec is not clear >> enough for users to understand what they get from it and is too clever in >> handing off loading rules to JVMS 5.4's flexibility. >> >> My current position is we need to specify the behaviour and the point in >> the loading process where the preload attempts will occur so users can >> depend on the behaviour. From John's emails, I think he would prefer to >> see preload become strictly an optimization and be outside the spec (John >> correct me if I've misstated). >> >> >> I'm on John side, if the VM never report if an error occurs when the >> Preload attribute is read, the user has no side effect to see when the >> attribute is read, so there is no need to specify the exact point where >> this attribute is read. >> > > Preload attempts to load the class which does cause user visible side > effects - ClassLoader::loadClass is called for one which users can observe > in a number of ways. JVMTI can expose this info as can > j.l.instrument.Instrumentation::getInitiatedClasses(ClassLoader) & > :getAllLoadedClasses(). I'm sure there are other ways as well. > > My point being it is observable so we should specify it clearly. > > > Observability of classloading is an issue that Leyden has to handle, the > Preload attribute is just an instance of that issue. > Remi, we (the Valhalla EG) don't get to design Leyden's solutions. We need to work within the JVMS or extend it in ways that are appropriate for supporting our efforts. Let's let the Leyden folks solve Leyden problems =) > For me, until a class must be initialized, the VM is free to initiate a > class loading before that point, if the exception is delayed to only appear > at that point. > The spec gives us a lot of leeway on when classes are loaded provided errors are reported at the correct time. One of the major strengths of Java is the specifications and the guarantees they provide to our users. Those guarantees constrain what we JVM implementers can do but they also provide guide rails for framework authors and app developers to know what behaviour they can rely on from the JVM. When we get too "cute" or clever in our application of the freedoms in the spec, we undermine those guarantees our users need. And we undermine the value of what we're providing. Preload as an optimization has been a great model for us to get to where we are today - Q's removed while still getting the calling convention optimizations for values. Who would have thought we'd get here given where we started? But it's a model that we need to thank for its service and wish it well Marie Kondo-style. Now we need to spec the behaviour so users can rely on it or they will try to reverse engineer rules from today's behaviour that constrains us in the future. Better to author the rules we want than to be constrained by past's that's-how-it-happened-to-work behaviour. --Dan > > R?mi > > > --Dan > >> >> >> --Dan >> >> >> R?mi >> >> >> >>> >>> >>> On 6/12/2023 9:26 AM, Dan Heidinga wrote: >>> >>> The top-line goal for the preload efforts is to trigger the >>> necessary "go and look" behaviour to support calling convention flattening >>> for values. We want the broadest, most reliable mechanism to ensure that >>> we routinely get flattening in the calling convention for value types so >>> that the flattening horizon can extend beyond a single compiled body (ie: a >>> method and its inlines). >>> >>> Summarizing the options presented so far: >>> >>> A) Value classes should be put into the CDS archive to ensure they are >>> loaded early enough, in a group, and in a form that the VM can quickly >>> discover whether calling convention optimizations apply to them. This >>> involves either a class list to create a static archive (allows jdk >>> classes) or using a dynamic archive with AppCDS. Both cases require a >>> "cold run" to generate the data needed for CDS and only capture classes >>> that have been loaded during that run (I think that's correct?). >>> >>> B) Use a "Watch List" to list class names that should be looked for. >>> When the name appears, trigger loading early enough to allow calling >>> convention optimizations to apply. Name conflicts are "safe" as the worst >>> case is a class is loaded early in multiple loaders but is only a value in >>> one loader. The watch list can be: global or per-module. It's possible a >>> tool like jlink or jmod could be used to generate the watch list by >>> scanning all the classes included in the jimage/jmod file. >>> >>> C) The per-class preload attribute. Each class lists the value classes >>> it may reference to ensure they are loaded early enough. Potentially a lot >>> of duplication as each class in an application would list many of the same >>> value classes. >>> >>> Did I miss any? >>> >>> There's also another dimension we've touched on: how eager is eager >>> loading. Current preload behaviour is to batch load all the listed >>> classes. Alternatively, loading could wait until one of the classes was >>> observed in method signature / field signature and load on an as-needed >>> basis. >>> >>> We've mostly concentrated on preload as an optimization for calling >>> conventions but there may be other uses of the mechanism as well. A user >>> may want to ensure that classes are loaded early to prevent optimizations >>> that need to be walked back later based on their knowledge of application >>> behaviour. For example, ensuring there is always more than a single >>> implementor of an interface loaded to prevent CHA optimizations on some >>> critical path where the second implementation is normally loaded late. Or >>> to ensure an entire sealed hierarchy is loaded together. I haven't put >>> much thought into this yet but expect users will find interesting ways to >>> use "preload" if it's reliable enough for them. (And of course, some will >>> abuse it in ways that hurt their performance as well). >>> >>> Which of these options meets the goal ("reliable, routine calling >>> convention optimization for values") best? >>> >>> --Dan >>> >>> On Fri, Jun 9, 2023 at 9:38?PM John Rose wrote: >>> >>>> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >>>> >>>> On Thu, Jun 8, 2023 at 4:51?PM John Rose >>>> wrote: >>>> >>>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >>>> >>>> On Thu, Jun 8, 2023 at 12:44?PM John Rose >>>> wrote: >>>> >>>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >>>> >>>> If we decouple the list of preloadable classes from the classfile, how >>>> would non-jdk classes be handled?> What if instead of ditching the >>>> >>>> attribute, or treating it like an >>>> >>>> optimization, we firmed up the contract and treated it as a guarantee? >>>> >>>> If we go down this route, let?s consider putting the control >>>> information >>>> into a module file (only) for starters. (Maybe class file later if >>>> needed.) There would be fewer states to document and test, since (by >>>> definition) class files could not get out of sync. >>>> >>>> A module would document, in one mplace, which types it would ?prefer? >>>> to >>>> preload in order to optimize its APIs (internal or external). >>>> >>>> This might lead to more class loading than intended. The current >>>> approach >>>> has each classfile register the list of classes it wants preloaded to >>>> get >>>> the best linkage which means we only have to load those classes if we >>>> link >>>> the original class. There's a natural trigger for the preload and a >>>> limited set of classes to load. >>>> >>>> There?s a spectrum of tradeoffs here: We could put preload attributes >>>> on >>>> every method and field, to get the maximum amount of fine-grained lazy >>>> (pre-)loading, or put them in a global file per JVM instance. The more >>>> fine-grained, the harder it will be to write compliance testing, I >>>> think. >>>> >>>> Agreed. There's a sweet spot between expressiveness and overheads >>>> (testing, metadata, etc). Classfiles have historically been the place >>>> where the JVM tracks this kind of information as that fits well with >>>> separate compilation and avoids the "external metadata" problems of ie: >>>> GraalVM's extra-linguistic configuration files. >>>> >>>> When compiling the current class, javac already requires directly >>>> referenced classes to be findable and thus has the info required to >>>> write a >>>> preload attribute. Does javac necessarily have the same info when >>>> compiling the module-info classfile? Maybe when finding the >>>> non-exported >>>> packages for the module javac (or jlink? or jmod?) could also find the >>>> value classes that need preloading? >>>> >>>> That is what I am assuming. The module file would be edited by those >>>> guys. Or (maybe better) a plain flat textual list is put somewhere the JVM >>>> can find it. >>>> >>>> Moving it into a separate pass like this doesn't feel like quite the >>>> right >>>> fit though as it excludes the classpath and complicates the other tools >>>> processing of the modules. >>>> >>>> I think it?s better than that. When we are assembling a program (jlink >>>> or a Leyden condenser), the responsibility of publicizing value classes >>>> (for Preload) surely belongs to the declaration, not collectively on all >>>> the uses. >>>> >>>> So every module (jmod or whatever) that declares 1 or more value >>>> classes (if they are exported, at least) should list them on a publicized >>>> watch list. >>>> >>>> There is no need to replicate these watch lists across all potential >>>> API clients of a value class. There are reasons *not* to do this, >>>> since the clients have only partial, provisional information about the >>>> values. >>>> >>>> Moving to a single per-module list loses the natural trigger and may >>>> pre-load more classes than the application will use. If Module A has >>>> classes {A, B, C} and each one preloads 5 separate classes, with a >>>> per-module list that's forcing the loading of 15 additional classes >>>> (plus >>>> supers, etc). With a per-class list, we only preload the classes on a >>>> per-use basis. More of a pay for what you use model. >>>> >>>> Is there a natural trigger or way to limit the preloads to what I might >>>> use >>>> with the per-module file? >>>> >>>> That?s a very good question. I think what Preload *really is* is a list >>>> of ?names that may require special handling before using in APIs?. They >>>> don?t need to be loaded when the preload attribute is parsed; they are >>>> simply put in a ?watch list? to trigger additional loading *when >>>> necessary*. (This is already true.) So I think if we move the preload >>>> list to (say) the module level (if not a global file), then the JVM >>>> will >>>> have its watch list. (And, in fewer chunks than if we put all the stuff >>>> all >>>> the time redundantly in all class files that might need them: That >>>> requires >>>> frequent repetition.) The JVM can use its watch list as it does today, >>>> with >>>> watch lists populated separately for each class file. >>>> >>>> I initially thought a global list would lead to issues if two different >>>> classloaders defined classes of the same name but since this is a "go >>>> and >>>> look" signal, early loading based on name should be fine even in that >>>> case >>>> as each loader that mentions the name would be asked to be asked to >>>> load >>>> their version of the named class. So I think a per-JVM list would be OK >>>> from that perspective (though I still don't like it). >>>> >>>> Agreed. >>>> >>>> To emphasize: A watch list does not require loading. It means, ?if you >>>> see >>>> this name at a point where you could use extra class info, then I >>>> encourage >>>> you to load sooner rather than later?. The only reason it is ?a thing? >>>> at >>>> all is that the default behavior (of loading either as late as >>>> possible, or >>>> as part of a CDS-like thingy) should be changed only on an explicit >>>> signal. >>>> >>>> While true for what the JVM needs, this is hard behaviour to explain to >>>> users and challenging for compliance test writers (or maybe not if we >>>> continue to treat preload as an optimization). >>>> >>>> I?m trying to reduce this to a pure optimization. In that case, ?watch >>>> lists? are just helpers, which are allowed to fail, and allowed to be >>>> garbage. >>>> >>>> Is this where we want to >>>> spend our complexity budget? >>>> >>>> (No, hence it should be an optimization.) >>>> >>>> Part of why I'm circling back to treating >>>> preload as a per-classfile attribute that forms a requirement on the VM >>>> rather than as an optimization is that the model becomes clearer for >>>> users, >>>> developers and testers. >>>> >>>> I think it?s still going to be murky. Why is putting the watch list on >>>> the API clients better than putting it on (or near) the value class >>>> definitions? >>>> >>>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >>>> with all of your class path loaded. Et voila, no Preload at all. >>>> >>>> Users may find this behaviour surprising - I ran with a CDS archive and >>>> my >>>> JVM loaded classes earlier than it would have otherwise? >>>> >>>> CDS has the effect of making class loading in a more timely fashion, >>>> and (under Leyden) will almost certainly trigger reordering of loading as >>>> well. So promulgating a ?watch list? has goals which align with CDS. >>>> >>>> I?m starting to think that the right ?level? to pull for optimizing >>>> value-based APIs is to put the value classes in a CDS archive. That is a >>>> defacto watch list. The jlink guy should just make a table of all value >>>> classes. That?s the best form of Preload I can imagine, frankly. >>>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Jun 14 15:46:57 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 14 Jun 2023 15:46:57 +0000 Subject: EG meeting 2023-06-14 Message-ID: EG meeting today, June 14, at 4pm UTC (9am PDT, 12pm EDT). Lots of recent discussions that we can revisit: - "Design document on nullability and value types": Brian shared a document describing the evolution to our current design over the last few months, including making use of nullness rather than '.ref' and '.val' types - "Preload attribute": Dan H discussing specifying stronger guarantees about the treatment of 'Preload' for Value Objects - "implicit constructor translation?": Dan H exploring the translation strategy of implicit constructors - "Minor question about a `MyVal.default`-like syntax": Kevin asks about the distinction between 'Foo.default' and 'new Foo()' - "The Good Default Value": Kevin discusses the motivation behind default values for primitives From heidinga at redhat.com Thu Jun 15 13:31:34 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 15 Jun 2023 09:31:34 -0400 Subject: Preload attribute In-Reply-To: References: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> <2041916117.80497404.1686725798303.JavaMail.zimbra@univ-eiffel.fr> Message-ID: Following on from our discussion during the EG meeting, I think our current error handling approach (drop the errors) unfortunately violates the spec. Here's the example: ===== CPReuse.java public class CPReuse { public static void main(String[] args) { V.callme(); } public static void forcePreload(V v) { } } ===== V.java value class V { final int i; public V() { i = 5; }; public static void callme() { } } ====================== which generates a classfile with: Constant pool: ... #7 = Methodref #8.#9 // V.callme:()V #8 = Class #10 // V public static void main(java.lang.String[]); ... Code: stack=0, locals=1, args_size=1 0: invokestatic #7 // Method V.callme:()V Classes to be preloaded: #8; // value class V The important thing to note here is that CONSTANT_Class #8 is used by both the preload attribute and the methodref for the invokestatic. Why is this a problem? JVMS 5.4 Linking says: Linking also involves *resolution of symbolic references in the class or interface*, though not necessarily at the same time as the class or interface is verified and prepared. This specification allows an implementation flexibility as to when linking activities (and, because of recursion, loading) take place, provided that all of the following properties are maintained: ... * *Errors* detected during linkage *are thrown at a point in the program where some action is taken by the program that might, directly or indirectly, require linkage *to the class or interface involved in the error. And JVMS 5.4.3 Resolution says: ... Subsequent attempts to resolve the symbolic reference always fail with the same error that was thrown as a result of the initial resolution attempt. So using the existing flexibility in the resolution of symbolic references means we still have to report errors where they may occur in the program. Looking back at the example classfile, we're within the spec to preload Constant_class #8 using the existing rules but if it fails during preload, we need to poison Constant_class #8 so reuse of it fails with the same exception even if the load of class "V" would succeed later. This is not the outcome I wanted when digging through the spec - I liked our decision to ignore errors from preload. --Dan On Wed, Jun 14, 2023 at 8:46?AM Dan Heidinga wrote: > > > On Wed, Jun 14, 2023 at 2:56?AM wrote: > >> >> >> ------------------------------ >> >> *From: *"Dan Heidinga" >> *To: *"Remi Forax" >> *Cc: *"Brian Goetz" , "John Rose" < >> john.r.rose at oracle.com>, "daniel smith" , >> "valhalla-spec-experts" >> *Sent: *Tuesday, June 13, 2023 5:10:07 PM >> *Subject: *Re: Preload attribute >> >> >> >> On Tue, Jun 13, 2023 at 10:13?AM Remi Forax wrote: >> >>> >>> >>> ------------------------------ >>> >>> *From: *"Dan Heidinga" >>> *To: *"Brian Goetz" >>> *Cc: *"John Rose" , "daniel smith" < >>> daniel.smith at oracle.com>, "valhalla-spec-experts" < >>> valhalla-spec-experts at openjdk.java.net> >>> *Sent: *Tuesday, June 13, 2023 3:31:24 PM >>> *Subject: *Re: Preload attribute >>> >>> >>> >>> On Mon, Jun 12, 2023 at 10:44?AM Brian Goetz >>> wrote: >>> >>>> As a reminder, Leyden will give us a more general tool for "moving >>>> stuff around" at build time than CDS does, and that the current CDS >>>> behavior may well be folded into a set of condensers. >>>> >>>> We are trying to find the "perfect" place to put preload information, >>>> but we have (as usual) an overconstrained notion of perfection; what makes >>>> perfect sense for semantics or non-duplication may not make perfect sense >>>> for runtime behavior. >>>> >>>> Leyden will let us cut this knot by letting us put the information in >>>> the classfile in the semantically sensible place, and let tooling boil it >>>> down later at pre-deployment time to a representation that is more >>>> efficient for runtime. >>>> >>>> So what I suggest is focusing on capturing the source data, which IMO >>>> seems to still be some flavor of "class/method X needs to know more about >>>> value class V before making certain decisions". Preloading is the >>>> mechanism of how we find out that "more", and aggregated representations >>>> such as per-module / CDS archives are a rearranging of the source data to >>>> achieve a more runtime friendly representation _for a particular >>>> configuration of classes_. >>>> >>>> tl;dr: Let's design what captures the semantics we need, and treat >>>> computing e.g. optimal load order as a downstream transformation. >>>> >>> >>> That sounds reasonable. >>> >>> I think my original question about how the JVMS treats preload still >>> needs to be addressed though. What guarantees / requirements should we >>> impose on the JVM's handling of preload? The current spec is not clear >>> enough for users to understand what they get from it and is too clever in >>> handing off loading rules to JVMS 5.4's flexibility. >>> >>> My current position is we need to specify the behaviour and the point in >>> the loading process where the preload attempts will occur so users can >>> depend on the behaviour. From John's emails, I think he would prefer to >>> see preload become strictly an optimization and be outside the spec (John >>> correct me if I've misstated). >>> >>> >>> I'm on John side, if the VM never report if an error occurs when the >>> Preload attribute is read, the user has no side effect to see when the >>> attribute is read, so there is no need to specify the exact point where >>> this attribute is read. >>> >> >> Preload attempts to load the class which does cause user visible side >> effects - ClassLoader::loadClass is called for one which users can observe >> in a number of ways. JVMTI can expose this info as can >> j.l.instrument.Instrumentation::getInitiatedClasses(ClassLoader) & >> :getAllLoadedClasses(). I'm sure there are other ways as well. >> >> My point being it is observable so we should specify it clearly. >> >> >> Observability of classloading is an issue that Leyden has to handle, the >> Preload attribute is just an instance of that issue. >> > > Remi, we (the Valhalla EG) don't get to design Leyden's solutions. We > need to work within the JVMS or extend it in ways that are appropriate for > supporting our efforts. Let's let the Leyden folks solve Leyden problems =) > > > >> For me, until a class must be initialized, the VM is free to initiate a >> class loading before that point, if the exception is delayed to only appear >> at that point. >> > > The spec gives us a lot of leeway on when classes are loaded provided > errors are reported at the correct time. One of the major strengths of > Java is the specifications and the guarantees they provide to our users. > Those guarantees constrain what we JVM implementers can do but they also > provide guide rails for framework authors and app developers to know what > behaviour they can rely on from the JVM. When we get too "cute" or clever > in our application of the freedoms in the spec, we undermine those > guarantees our users need. And we undermine the value of what we're > providing. > > Preload as an optimization has been a great model for us to get to where > we are today - Q's removed while still getting the calling convention > optimizations for values. Who would have thought we'd get here given where > we started? But it's a model that we need to thank for its service and > wish it well Marie Kondo-style. > > Now we need to spec the behaviour so users can rely on it or they will try > to reverse engineer rules from today's behaviour that constrains us in the > future. Better to author the rules we want than to be constrained by > past's that's-how-it-happened-to-work behaviour. > > --Dan > > > >> >> R?mi >> >> >> --Dan >> >>> >>> >>> --Dan >>> >>> >>> R?mi >>> >>> >>> >>>> >>>> >>>> On 6/12/2023 9:26 AM, Dan Heidinga wrote: >>>> >>>> The top-line goal for the preload efforts is to trigger the >>>> necessary "go and look" behaviour to support calling convention flattening >>>> for values. We want the broadest, most reliable mechanism to ensure that >>>> we routinely get flattening in the calling convention for value types so >>>> that the flattening horizon can extend beyond a single compiled body (ie: a >>>> method and its inlines). >>>> >>>> Summarizing the options presented so far: >>>> >>>> A) Value classes should be put into the CDS archive to ensure they are >>>> loaded early enough, in a group, and in a form that the VM can quickly >>>> discover whether calling convention optimizations apply to them. This >>>> involves either a class list to create a static archive (allows jdk >>>> classes) or using a dynamic archive with AppCDS. Both cases require a >>>> "cold run" to generate the data needed for CDS and only capture classes >>>> that have been loaded during that run (I think that's correct?). >>>> >>>> B) Use a "Watch List" to list class names that should be looked for. >>>> When the name appears, trigger loading early enough to allow calling >>>> convention optimizations to apply. Name conflicts are "safe" as the worst >>>> case is a class is loaded early in multiple loaders but is only a value in >>>> one loader. The watch list can be: global or per-module. It's possible a >>>> tool like jlink or jmod could be used to generate the watch list by >>>> scanning all the classes included in the jimage/jmod file. >>>> >>>> C) The per-class preload attribute. Each class lists the value classes >>>> it may reference to ensure they are loaded early enough. Potentially a lot >>>> of duplication as each class in an application would list many of the same >>>> value classes. >>>> >>>> Did I miss any? >>>> >>>> There's also another dimension we've touched on: how eager is eager >>>> loading. Current preload behaviour is to batch load all the listed >>>> classes. Alternatively, loading could wait until one of the classes was >>>> observed in method signature / field signature and load on an as-needed >>>> basis. >>>> >>>> We've mostly concentrated on preload as an optimization for calling >>>> conventions but there may be other uses of the mechanism as well. A user >>>> may want to ensure that classes are loaded early to prevent optimizations >>>> that need to be walked back later based on their knowledge of application >>>> behaviour. For example, ensuring there is always more than a single >>>> implementor of an interface loaded to prevent CHA optimizations on some >>>> critical path where the second implementation is normally loaded late. Or >>>> to ensure an entire sealed hierarchy is loaded together. I haven't put >>>> much thought into this yet but expect users will find interesting ways to >>>> use "preload" if it's reliable enough for them. (And of course, some will >>>> abuse it in ways that hurt their performance as well). >>>> >>>> Which of these options meets the goal ("reliable, routine calling >>>> convention optimization for values") best? >>>> >>>> --Dan >>>> >>>> On Fri, Jun 9, 2023 at 9:38?PM John Rose >>>> wrote: >>>> >>>>> On 9 Jun 2023, at 12:41, Dan Heidinga wrote: >>>>> >>>>> On Thu, Jun 8, 2023 at 4:51?PM John Rose >>>>> wrote: >>>>> >>>>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote: >>>>> >>>>> On Thu, Jun 8, 2023 at 12:44?PM John Rose >>>>> wrote: >>>>> >>>>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote: >>>>> >>>>> If we decouple the list of preloadable classes from the classfile, how >>>>> would non-jdk classes be handled?> What if instead of ditching the >>>>> >>>>> attribute, or treating it like an >>>>> >>>>> optimization, we firmed up the contract and treated it as a guarantee? >>>>> >>>>> If we go down this route, let?s consider putting the control >>>>> information >>>>> into a module file (only) for starters. (Maybe class file later if >>>>> needed.) There would be fewer states to document and test, since (by >>>>> definition) class files could not get out of sync. >>>>> >>>>> A module would document, in one mplace, which types it would ?prefer? >>>>> to >>>>> preload in order to optimize its APIs (internal or external). >>>>> >>>>> This might lead to more class loading than intended. The current >>>>> approach >>>>> has each classfile register the list of classes it wants preloaded to >>>>> get >>>>> the best linkage which means we only have to load those classes if we >>>>> link >>>>> the original class. There's a natural trigger for the preload and a >>>>> limited set of classes to load. >>>>> >>>>> There?s a spectrum of tradeoffs here: We could put preload attributes >>>>> on >>>>> every method and field, to get the maximum amount of fine-grained lazy >>>>> (pre-)loading, or put them in a global file per JVM instance. The more >>>>> fine-grained, the harder it will be to write compliance testing, I >>>>> think. >>>>> >>>>> Agreed. There's a sweet spot between expressiveness and overheads >>>>> (testing, metadata, etc). Classfiles have historically been the place >>>>> where the JVM tracks this kind of information as that fits well with >>>>> separate compilation and avoids the "external metadata" problems of >>>>> ie: >>>>> GraalVM's extra-linguistic configuration files. >>>>> >>>>> When compiling the current class, javac already requires directly >>>>> referenced classes to be findable and thus has the info required to >>>>> write a >>>>> preload attribute. Does javac necessarily have the same info when >>>>> compiling the module-info classfile? Maybe when finding the >>>>> non-exported >>>>> packages for the module javac (or jlink? or jmod?) could also find the >>>>> value classes that need preloading? >>>>> >>>>> That is what I am assuming. The module file would be edited by those >>>>> guys. Or (maybe better) a plain flat textual list is put somewhere the JVM >>>>> can find it. >>>>> >>>>> Moving it into a separate pass like this doesn't feel like quite the >>>>> right >>>>> fit though as it excludes the classpath and complicates the other >>>>> tools >>>>> processing of the modules. >>>>> >>>>> I think it?s better than that. When we are assembling a program (jlink >>>>> or a Leyden condenser), the responsibility of publicizing value classes >>>>> (for Preload) surely belongs to the declaration, not collectively on all >>>>> the uses. >>>>> >>>>> So every module (jmod or whatever) that declares 1 or more value >>>>> classes (if they are exported, at least) should list them on a publicized >>>>> watch list. >>>>> >>>>> There is no need to replicate these watch lists across all potential >>>>> API clients of a value class. There are reasons *not* to do this, >>>>> since the clients have only partial, provisional information about the >>>>> values. >>>>> >>>>> Moving to a single per-module list loses the natural trigger and may >>>>> pre-load more classes than the application will use. If Module A has >>>>> classes {A, B, C} and each one preloads 5 separate classes, with a >>>>> per-module list that's forcing the loading of 15 additional classes >>>>> (plus >>>>> supers, etc). With a per-class list, we only preload the classes on a >>>>> per-use basis. More of a pay for what you use model. >>>>> >>>>> Is there a natural trigger or way to limit the preloads to what I >>>>> might >>>>> use >>>>> with the per-module file? >>>>> >>>>> That?s a very good question. I think what Preload *really is* is a >>>>> list >>>>> of ?names that may require special handling before using in APIs?. >>>>> They >>>>> don?t need to be loaded when the preload attribute is parsed; they are >>>>> simply put in a ?watch list? to trigger additional loading *when >>>>> necessary*. (This is already true.) So I think if we move the preload >>>>> list to (say) the module level (if not a global file), then the JVM >>>>> will >>>>> have its watch list. (And, in fewer chunks than if we put all the >>>>> stuff all >>>>> the time redundantly in all class files that might need them: That >>>>> requires >>>>> frequent repetition.) The JVM can use its watch list as it does today, >>>>> with >>>>> watch lists populated separately for each class file. >>>>> >>>>> I initially thought a global list would lead to issues if two >>>>> different >>>>> classloaders defined classes of the same name but since this is a "go >>>>> and >>>>> look" signal, early loading based on name should be fine even in that >>>>> case >>>>> as each loader that mentions the name would be asked to be asked to >>>>> load >>>>> their version of the named class. So I think a per-JVM list would be >>>>> OK >>>>> from that perspective (though I still don't like it). >>>>> >>>>> Agreed. >>>>> >>>>> To emphasize: A watch list does not require loading. It means, ?if you >>>>> see >>>>> this name at a point where you could use extra class info, then I >>>>> encourage >>>>> you to load sooner rather than later?. The only reason it is ?a thing? >>>>> at >>>>> all is that the default behavior (of loading either as late as >>>>> possible, or >>>>> as part of a CDS-like thingy) should be changed only on an explicit >>>>> signal. >>>>> >>>>> While true for what the JVM needs, this is hard behaviour to explain >>>>> to >>>>> users and challenging for compliance test writers (or maybe not if we >>>>> continue to treat preload as an optimization). >>>>> >>>>> I?m trying to reduce this to a pure optimization. In that case, ?watch >>>>> lists? are just helpers, which are allowed to fail, and allowed to be >>>>> garbage. >>>>> >>>>> Is this where we want to >>>>> spend our complexity budget? >>>>> >>>>> (No, hence it should be an optimization.) >>>>> >>>>> Part of why I'm circling back to treating >>>>> preload as a per-classfile attribute that forms a requirement on the >>>>> VM >>>>> rather than as an optimization is that the model becomes clearer for >>>>> users, >>>>> developers and testers. >>>>> >>>>> I think it?s still going to be murky. Why is putting the watch list on >>>>> the API clients better than putting it on (or near) the value class >>>>> definitions? >>>>> >>>>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump >>>>> with all of your class path loaded. Et voila, no Preload at all. >>>>> >>>>> Users may find this behaviour surprising - I ran with a CDS archive >>>>> and my >>>>> JVM loaded classes earlier than it would have otherwise? >>>>> >>>>> CDS has the effect of making class loading in a more timely fashion, >>>>> and (under Leyden) will almost certainly trigger reordering of loading as >>>>> well. So promulgating a ?watch list? has goals which align with CDS. >>>>> >>>>> I?m starting to think that the right ?level? to pull for optimizing >>>>> value-based APIs is to put the value classes in a CDS archive. That is a >>>>> defacto watch list. The jlink guy should just make a table of all value >>>>> classes. That?s the best form of Preload I can imagine, frankly. >>>>> >>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Fri Jun 23 22:36:28 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 23 Jun 2023 22:36:28 +0000 Subject: Preload attribute In-Reply-To: References: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> <2041916117.80497404.1686725798303.JavaMail.zimbra@univ-eiffel.fr> Message-ID: > On Jun 15, 2023, at 6:31 AM, Dan Heidinga wrote: > > Following on from our discussion during the EG meeting, I think our current error handling approach (drop the errors) unfortunately violates the spec. > So using the existing flexibility in the resolution of symbolic references means we still have to report errors where they may occur in the program. Looking back at the example classfile, we're within the spec to preload Constant_class #8 using the existing rules but if it fails during preload, we need to poison Constant_class #8 so reuse of it fails with the same exception even if the load of class "V" would succeed later. > > This is not the outcome I wanted when digging through the spec - I liked our decision to ignore errors from preload. Let's consider a vanilla VM, without 'Preload'. According to 5.4, when can an implementation resolve symbolic references? - At first execution of an instruction that references the constant - During verification - Immediately after loading - ... In fact, there's no constraint on resolution at all, within these bounds: sometime after loading, sometime before instruction execution (or some similar "blocking" requirement to access the resolution result). There *are* some additional constraints on surfacing errors: they must be "thrown at a point in the program where some action is taken by the program that might, directly or indirectly, require linkage to the class or interface involved in the error". I'm not entirely sure what that means, but one implication is that resolution might happen at time X, but the error is held until time Y (X > Y) when it is allowed to be thrown. (Implemented, perhaps, by setting the state of the constant as "resolved with error", then leaving it to future resolution attempts to discover and throw the error.) Within that framework, my interpretation of Preload is that it (1) forces some symbolic references into the constant pool, giving VMs the opportunity to resolve them whenever it's convenient (subject to the usual rules); and (2) encourages VMs to pay special attention to those references, perhaps resolving them earlier than other references. But this doesn't change anything about what the spec allows vs. the equivalent class file that has had its Preload attribute removed. What would be invalid, according to JVMS, is to attempt to resolve something mentioned by 'Preload', get an error, discard the error, and then come back later and attempt to resolve it again (perhaps succeeding this time). So that's my read of the status quo. What about this do you specifically think we should change? From heidinga at redhat.com Mon Jun 26 13:58:16 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Mon, 26 Jun 2023 09:58:16 -0400 Subject: Preload attribute In-Reply-To: References: <1890516580.79962441.1686665023237.JavaMail.zimbra@univ-eiffel.fr> <2041916117.80497404.1686725798303.JavaMail.zimbra@univ-eiffel.fr> Message-ID: On Fri, Jun 23, 2023 at 6:36?PM Dan Smith wrote: > > On Jun 15, 2023, at 6:31 AM, Dan Heidinga wrote: > > > > Following on from our discussion during the EG meeting, I think our > current error handling approach (drop the errors) unfortunately violates > the spec. > > So using the existing flexibility in the resolution of symbolic > references means we still have to report errors where they may occur in the > program. Looking back at the example classfile, we're within the spec to > preload Constant_class #8 using the existing rules but if it fails during > preload, we need to poison Constant_class #8 so reuse of it fails with the > same exception even if the load of class "V" would succeed later. > > > > This is not the outcome I wanted when digging through the spec - I liked > our decision to ignore errors from preload. > > Let's consider a vanilla VM, without 'Preload'. According to 5.4, when can > an implementation resolve symbolic references? > > - At first execution of an instruction that references the constant > - During verification > - Immediately after loading > - ... > > In fact, there's no constraint on resolution at all, within these bounds: > sometime after loading, sometime before instruction execution (or some > similar "blocking" requirement to access the resolution result). > > There *are* some additional constraints on surfacing errors: they must be > "thrown at a point in the program where some action is taken by the program > that might, directly or indirectly, require linkage to the class or > interface involved in the error". I'm not entirely sure what that means, > but one implication is that resolution might happen at time X, but the > error is held until time Y (X > Y) when it is allowed to be thrown. > (Implemented, perhaps, by setting the state of the constant as "resolved > with error", then leaving it to future resolution attempts to discover and > throw the error.) > Agreed. Once resolution of a symbolic reference to a class fails, it must always fail. > > Within that framework, my interpretation of Preload is that it (1) forces > some symbolic references into the constant pool, giving VMs the opportunity > to resolve them whenever it's convenient (subject to the usual rules); and > (2) encourages VMs to pay special attention to those references, perhaps > resolving them earlier than other references. But this doesn't change > anything about what the spec allows vs. the equivalent class file that has > had its Preload attribute removed. > Also agreed. My concern was that during our EG discussion, we had indicated that Preload-related errors were dropped. Not reporting them at the preload point is fine, but the spec requires that failed preloads poison symbolic reference (ie: Constant_Class constant pool entry) so future attempts to use it also fail. It sounds like you and I are interpreting the current spec the same way but that it's more subtle than we expressed on the EG call. > > What would be invalid, according to JVMS, is to attempt to resolve > something mentioned by 'Preload', get an error, discard the error, and then > come back later and attempt to resolve it again (perhaps succeeding this > time). > 100% > > So that's my read of the status quo. What about this do you specifically > think we should change? > As I said above, I think you and I agree on how error handling has to happen and it's unfortunately more subtle than just drop the errors. I would still like to see a defined point in the loading / linking sequence where the VM has to have attempted the preloads as it makes a stronger guarantee for users. The current approach feels very much like a hack to work around limitations in how calling conventions are bound (and something we'd be happy to see go away in the future) rather than a principled addition to the VM's behaviour. --Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Jun 28 15:23:20 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 28 Jun 2023 15:23:20 +0000 Subject: EG meeting *canceled* 2023-06-28 Message-ID: <3B7A29BE-E2E8-4854-A9B5-BBC2E94746DB@oracle.com> I'm on vacation today, so let's cancel the meeting. (If anybody shows up and wants to chat, feel free...) From brian.goetz at oracle.com Fri Jun 30 20:51:43 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 30 Jun 2023 16:51:43 -0400 Subject: We don't need no stinkin' Q descriptors Message-ID: This mail summarizes some discussions we?ve been having about eliminating Q descriptors from the VM design. Over time, we?ve been giving Q fewer and fewer jobs to do, to the point where (perhaps surprisingly) we can replace the remaining jobs with less intrusive mechanisms. Additionally, as the language model has simplified, the gap between the language and VM has increased, and the proposal herein offers a path to narrowing that gap. I?ll be on vacation for a while, but Dan and John will be able to carry forward this discussion. Please bear in mind that this is a very rough draft of direction; we don?t need to bikeshed anything right now, as much as agree that there is a better, simpler, more aligned direction than we had previously. We don?t need no stinkin? Q types In the last six months, we made a significant breakthrough at the language/user level ? to decompose B3 with its value and reference companions, into two simpler concepts: implicit constructibility (a declaration-site property) and null restriction (a use-site property.) The .ref/.val distinction, and all its excess complexity, stemmed from the mistaken desire to model the int/Integer divide directly. By breaking B3-ness down into more ?primitive? properties (some of which are shared with non-B3 classes), we arrived at a simpler model; no more ref/val projections, and more uniform treatment of X! (including for B1 and B2 classes). As we worked through the language and translation details, we continued to seek a lower energy state. We concluded that we can erase |X!| to |LX;| in a number of places (locals, method descriptors, verifier type system) while still meeting our performance objectives. Doing so eliminates a number of issues with method resolution and distinguishing overloads from overrides. In fact, we found ourselves using Q for fewer and fewer things, at which point we started to ask ourselves: do we need Q descriptors at all? In our VM, there is a (mostly) 1-1-1 correspondence between runtime types, descriptors, and class mirrors. In a world where QFoo and LFoo are separate runtime types, it makes sense for them to have their own descriptors and mirrors. But as |Foo!| and |Foo?| have come together in the language, mapping to a VM which seems them as separate runtime types starts to show gaps. The role of Q has historically been one of ?other?, rather than something on its own; any class which had a Q type, also had an L type, and Q was the ?other flavor.? The ?two flavors? orientation made sense when we were modeling the int/Integer split; we needed two flavors for that in both language and VM. The language since discovered that we can break down the int/Integer divide into two more primitive notions ? implicit constructibility (an int can be used without calling a constructor, an Integer cannot) and non-nullity (non-identity plus default constructibility plus non-nullity unlocks flattening.) If Q is a valid descriptor and there is always a Q mirror, we are in a stable place with respect to runtime types. But if we intend to allow |m(Foo!)| to override |m(Foo?)|, to be tolerant of bang-mismatches in method resolution, and give Q fewer jobs, then we are moving to an unstable place. We?ve explored a number of ?only use Q for certain things? positions, and have found many of them to be unstable in various ways. The other stable point is that there are no Q types, and no Q mirrors ? but then we need some new channel to encode the request to exclude null, and so give the VM the flattening hint that is needed. As it turns out, there are surprisingly few places that truly need such a new channel. We basically need the VM to take ?Q-ness? into account in three places: * Field layout ? a field of type |Foo!| (where Foo is implicitly constructible) needs a hint that this field is null-restricted, so we can lay it out flat. * Array layout ? at the point of |anewarray| and friends, we need a hint when the component type is an implicitly-constructible, null-restricted type. * Casting ? casts need to be able to express a value-set check for the restricted value set of |Foo!| as well as the unrestricted value set of |Foo|. We are convinced that these three are all that is truly required to get the flattening we want. So rather than invent new runtime types / mirrors / descriptors that are going to flow everywhere (into reflection, method handles, verification, etc), let?s invent the minimal additional classfile surface and VM model to model that. At the same time, let?s make sure that the new thing aligns with the new language model, where the star of the show is null-restricted types. What about species? In separate investigations, we have a notion of ?species? for a long time, which we know we?re going to need when we get to specialization. Species form a partition of a classes instances; every instance of a class belongs to exactly one species, and different species may have different layouts and value set restrictions. And we struggled with species for a long time over the same runtime type affordances (mirrors and descriptors) ? what does a field descriptor for a field of type |ArrayList| look like? What does |getClass| return? In both cases, the constraints of compatibility have been pushing us towards more erasure in descriptors and reflection, with side channels to reconstruct information necessary for optimized heap layout, and with separate API points for |getClass| vs |getSpecies|. While specialization is considerably more complicated, nearly all the same considerations (descriptors, mirrors, reflection) are present for null-restriction types. We took an earlier swing at unifying the two under the rubric of ?type restrictions?, but I think our model wasn?t quite clean enough at the time to admit this unification. But I think we are now (almost) there, and the payoff is big. What we concluded around species and specialization is that we would have to continue to erase descriptors (|ArrayList| as a method or field descriptor continues to erase to |LArrayList;|), that |getClass| returns the primary mirror (|ArrayList|), and that species information is pushed into a side channel. These are pretty much the exact same considerations as for null-restriction types. Species and bang types are /refinement types/ A /refinement type/ is a type whose value set is that of another type, plus a predicate restricting the value set. A ?bang? type |Point!| is a refinement of Point, where we eliminate the value |null|. (Other well-known refinement types from PL history include C enums and Pascal ranges.) Refinement types are often erased to their base type, but some refinements enable better layout. Indeed, our interest in Q types is flattening, and for an implicitly constructible class, a variable holding a null-excluding type can be flattened. Similarly, for a sufficiently constrained generic type (e.g., |Point[int,int]|), the layout of such a variable can be flattened as well. What we previously called ?type restrictions? in the Parametric VM document is in fact a refinement type. We claim that we can design the null-restriction channel in such a way that it can be extended, in some reasonable way, to support more general specialization. Both specialization, and null-restriction, are forms of refinement types. Given that we?ve already discovered that we need to erase these to their primary (L) type in a lot of places, let?s stake out some general principles for representing refinements in the VM: * Refinement types are erased to their base type in method and field descriptors. * Refinement types do not have /class/ mirrors. * |Object::getClass| returns a class mirror. * Reflection deals in class mirrors, so refinements are erased from base reflection. * Method handles deal in class mirrors, so refinements are erased from method handles. That?s a lot of erasure, so we have to bake refinement back in where it matters, but we want to be careful to limit the ?blast radius? of the refinement information to where it does actually matter. The new channel that encodes a refinement type will appear only when needed to carry out the tasks listed above: field declaration, array creation, and casting. * Fields are enhanced with some sort of ?refinement? attribute, which (a) guards against stores of bad values (the field equivalent of |ArrayStoreException|) and (b) enables flatter layouts when the refinement permits. * Array creation (|anewarray| / `multianewarray?) is enhanced to support creating arrays with refined component types, enabling the same benefits (storage safety / layout flattening.) * Casting is enhanced to support refinements. This is needed mostly because of erasure ? we are erasing away refinement information and sometimes need to reassert it. * When we get to specialization, |new| is enhanced to support refinements, and possibly method declarations (to enable calling convention optimization in the presence of highly specialized types like |Point[int,int]|.) We had previously been assuming that |[QPoint| is somehow more of a ?real? type than (specialized) |Point[int,int]|, but I think we are better served seeing them both as refinements, where we continue to report a broad type but sort-of-secretly use refinement information to optimize layout. A strawman What follows is a strawman that eliminates Qs completely, replacing the few jobs Q has (field layout, array layout, and casts) with a single mechanism for refinement types which stays in the background until explicitly summoned. We believe the model outlined here can extend cleanly to species, as well as |B1!| types like |String!| as well. Call this No-Q world. This should not be taken as a concrete proposal, as much as a sketch of the concepts and the players. We have come to believe that adding Q descriptors to the JVM specification, while perhaps the right move in a from-scratch VM design, would be overreach as an evolutionary step. For old APIs to adopt new descriptors will require many bridge methods with complex properties. To avoid such bridges, old APIs would be forbidden from mentioning the new types. For these reasons, new descriptors, and the mirrors that would accompany them, are quite literally a bridge too far. Accordingly, in No-Q world, descriptors reclaim their former role: describing primitives and classes. Field and method descriptors will use |L| descriptors, even when carrying a null-restricted value (or a species.) Similarly, class mirrors return to their former role: describing classfiles and non-refined VM-derived types (such as array types.) As a self-imposed rule of this essay, we will not appeal to runtime support, condy or indy. Everything will be done with bytecodes, descriptors, constant pool entries, and other classfile structures, and not via specially-known methods. As this is a strawman, we may indulge in some ?wasteful? design, which can be transformed or lumped in later iterations. The new elements of the design are: * A new reflective concept for |RefinementType|, which represents a refinement of an existing (class) type. * A new reflective concept for |RepresentableType|, which is the common supertype between |Class| and |RefinementType|. * New constant pool forms representing null-restriction of classes and of arrays. * A new field attribute called |FieldRefinement|. * Adjustments to various bytecodes to interact with the new constant pool forms. * Additions to reflective APIs. Refined types A refined type is a combination of a type (called the base type) and a value set restriction for that type which excludes some values in the value set of the base type. Null-restricted types, arrays of null-restricted types, and eventually, species of generics are refined types. Refined types can be represented by a reflective object |sealed interface RefinementType implements RepresentableType { RepresentableType baseType(); } | The type parameter |T| represents the base type. There are initially two implementations of |RefinementType|, which may be private, and are known to the VM: |private record NullRestrictedClass(Class baseType) implements RefinementType { } private record NullRestrictedArray(Class baseType) implements RefinementType { } | Constant pool entries The two jobs for null restriction must be representable in the constant pool: a null-restricted B3, and an array of a null-restricted B3. (These correspond to |Constant_Class_info| with a descriptor of |QFoo;| and |[QFoo;| in the traditional design.) In addition to being referenced by bytecodes and attributes, such constants should ideally be loadable, evaluating to a |RefinementType| or |RepresentableType|. The exact form of the constant pool entry (whether new bespoke constant pool entries, ad-hoc extensions to Constant_Class_info, or condy) can be bikeshod at the appropriate time; there are clearly tradeoffs here. Initially, null-restricted types must be implicitly constructible (B3), which would be checked when the constant is resolved. Eventually, we can relax null-restriction to support all class types. Similarly, we may initially restrict to one-dimensional flat arrays, and leave |multianewarray| to its old job. Representable types The new common superinterface between |Class| and |RefinementType| exists so that both classes and class refinements can be used as array components, type parameters for specializations, etc. Some operations from |Class|, such as casting, may be pulled up into this interface. |sealed interface RepresentableType { T cast(Object o) throws ClassCastException; ... } | Refined fields Any field whose type is a null-restricted implicitly constructible class may be considered by the VM as a candidate for flattening. Rather than using |field_info.descriptor_index| to encode a null-restricted type, we continue to erase to the traditional |L| descriptor, but add a |FieldRefinement| attribute on the field. Similarly, |Constant_FieldRef_info| continues to link fields using the |L| descriptor. |FieldRefinement { u2 name_index; // "FieldRefinement" u4 length; u2 refinement_index; // symbolic reference to a RefinementType } | The symbolic reference must be to a null-restricted, implicitly constructible class type, not an array type. We may relax this restriction later. Additionally, a field refinement may affect the behavior of |putfield|. For a null-restricted class, attempts to |putfield| a null will result in |NullPointerException| (or perhaps a more general |FieldStoreException|.) Looking ahead, for the null-restriction of a B1 or B2 class, there is no change to the layout but we could enforce the storage restriction on |putfield.| When we get to species, the refinement for a species may affect the layout, and attempting to store a value of the wrong species may result in an exception or in an automatic conversion. It is a free choice as to whether we want to translate a field of type |Point![]| using an array refinement or fully erase it to |Point[]|. Refined casts The operand of a |checkcast| or |instanceof| may be a symbolic reference to a class or refinement. (Since |instanceof| is null-hostile, changing |instanceof| is not necessary now, but when we get to species, we will need to be able to test for species membership.) The |cast| operation may be pulled up from |Class| to |RepresentableType| so that casts can be done reflectively with either a |Class| or a refinement. Refined array creation An |anewarray| may make a symbolic reference to a class refinement type, as well as to a class, array, or interface type. For a refined array, |a.getClass()| continues to return the primary mirror for the array type, and |Class::getComponentType| on that array continues to return the primary mirror for the component type, but we may provide an additional API point akin to |getComponentType| that returns a |RepresentableType| which may be a |RefinementType|. Arrays of null-restricted values can be created reflectively; the existing |Array::newInstance| method will get an overload that takes |RepresentableType|. |Arrays::copyOf| when presented with a refined array type will create a refined array. Refinement information stays in the background until summoned The place where we need discipline is avoiding the temptation of ?but someone might profitably use the information that this field holds a flat array.? Yes, they might ? but supporting that as a general-purpose runtime type (with descriptor and mirror) has costs. The model proposed here resists the temptation to redefine mirrors, descriptors, symbolic resolution, and reflection, instead leaning on erasure here for both null-restriction and specialization, and providing a secondary reflective channel (which almost no users will actually need) to get refinement information. (An example of code that needs to summon refinement information is Arrays::copy, which would need to fetch the refined component type and instantiate an array using the refined type; most other reflective code would not need to even be aware of it.) Bonus round: specialization The framework so far seems to accomodate specialization fairly well. There?ll be a new subtype of |RefinementType| to represent a specialization, a reflective method for creating such specialization such as: |static SpecializedType specialization(Class baseClass, RepresentableType... arguments) | and a new way to get such a type refinement in the constant pool (possibly just a condy whose bootstrap is the above method.) The |new| bytecode is extended to accept a specialization refinement. Field refinements would then be able to refer to specialization refinements. Conclusions In the current world we have a (mostly) 1:1:1 relationship between runtime types, descriptors, and mirrors; a model where species/refinements are not full runtime types preserves this. The surface area where refinement information leaks to users who are not prepared for it is dramatically smaller. Refinements are not full runtime types, they don?t have full Class mirrors. We erase down to real runtime types in descriptors and in reflective API points like |Object::getClass|. This seems a powerful simplification, and one that aligns with the previous language simplification. To summarize: * Yes, we should get rid of Q descriptors, but should do so in a more principled way by getting rid of Q as a runtime type entirely, replacing it with a refinement type which stays in the background until it is actually needed. * We should erase Q from method and field descriptors and from the obvious mirrors, because refinement information is on a need-to-know basis. * Refinement information primarily flows from source -> classfile -> VM, and mostly does not flow in the other direction. Specialized reflection might expose it, but we should do so not on general principles, but based on where it is actually needed by the programming model. * Null restriction is more like specialization than not; they are both value set refinements that possibly enable layout optimization, and we should seek to treat them the same. * While leaving the door open for additional kinds of species and type migration, we use our new powers, at first, only to define flattenable fields and flattenable one-dimensional arrays. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jun 30 20:52:33 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 30 Jun 2023 16:52:33 -0400 Subject: We don't need no stinkin' Q descriptors In-Reply-To: References: Message-ID: In case the HTML got mangled by the mailer, I enclose the markdown original here. # We don't need no stinkin' Q types In the last six months, we made a significant breakthrough at the language/user level -- to decompose B3 with its value and reference companions, into two simpler concepts: implicit constructibility (a declaration-site property) and null restriction (a use-site property.)? The .ref/.val distinction, and all its excess complexity, stemmed from the mistaken desire to model the int/Integer divide directly.? By breaking B3-ness down into more "primitive" properties (some of which are shared with non-B3 classes), we arrived at a simpler model; no more ref/val projections, and more uniform treatment of X! (including for B1 and B2 classes). As we worked through the language and translation details, we continued to seek a lower energy state.? We concluded that we can erase `X!` to `LX;` in a number of places (locals, method descriptors, verifier type system) while still meeting our performance objectives.? Doing so eliminates a number of issues with method resolution and distinguishing overloads from overrides.? In fact, we found ourselves using Q for fewer and fewer things, at which point we started to ask ourselves: do we need Q descriptors at all? In our VM, there is a (mostly) 1-1-1 correspondence between runtime types, descriptors, and class mirrors.? In a world where QFoo and LFoo are separate runtime types, it makes sense for them to have their own descriptors and mirrors.? But as `Foo!` and `Foo?` have come together in the language, mapping to a VM which seems them as separate runtime types starts to show gaps. The role of Q has historically been one of "other", rather than something on its own; any class which had a Q type, also had an L type, and Q was the "other flavor."? The "two flavors" orientation made sense when we were modeling the int/Integer split; we needed two flavors for that in both language and VM.? The language since discovered that we can break down the int/Integer divide into two more primitive notions -- implicit constructibility (an int can be used without calling a constructor, an Integer cannot) and non-nullity (non-identity plus default constructibility plus non-nullity unlocks flattening.) If Q is a valid descriptor and there is always a Q mirror, we are in a stable place with respect to runtime types.? But if we intend to allow `m(Foo!)` to override `m(Foo?)`, to be tolerant of bang-mismatches in method resolution, and give Q fewer jobs, then we are moving to an unstable place.? We've explored a number of "only use Q for certain things" positions, and have found many of them to be unstable in various ways.? The other stable point is that there are no Q types, and no Q mirrors -- but then we need some new channel to encode the request to exclude null, and so give the VM the flattening hint that is needed. As it turns out, there are surprisingly few places that truly need such a new channel.? We basically need the VM to take "Q-ness" into account in three places: ?- Field layout -- a field of type `Foo!` (where Foo is implicitly ?? constructible) needs a hint that this field is null-restricted, so we can lay ?? it out flat. ?- Array layout -- at the point of `anewarray` and friends, we need a hint when ?? the component type is an implicitly-constructible, null-restricted type. ?- Casting -- casts need to be able to express a value-set check for the ?? restricted value set of `Foo!` as well as the unrestricted value set of ?? `Foo`. We are convinced that these three are all that is truly required to get the flattening we want.? So rather than invent new runtime types / mirrors / descriptors that are going to flow everywhere (into reflection, method handles, verification, etc), let's invent the minimal additional classfile surface and VM model to model that.? At the same time, let's make sure that the new thing aligns with the new language model, where the star of the show is null-restricted types. #### What about species? In separate investigations, we have a notion of "species" for a long time, which we know we're going to need when we get to specialization.? Species form a partition of a classes instances; every instance of a class belongs to exactly one species, and different species may have different layouts and value set restrictions.? And we struggled with species for a long time over the same runtime type affordances (mirrors and descriptors) -- what does a field descriptor for a field of type `ArrayList` look like? What does `getClass` return? In both cases, the constraints of compatibility have been pushing us towards more erasure in descriptors and reflection, with side channels to reconstruct information necessary for optimized heap layout, and with separate API points for `getClass` vs `getSpecies`.? While specialization is considerably more complicated, nearly all the same considerations (descriptors, mirrors, reflection) are present for null-restriction types.? We took an earlier swing at unifying the two under the rubric of "type restrictions", but I think our model wasn't quite clean enough at the time to admit this unification. But I think we are now (almost) there, and the payoff is big. What we concluded around species and specialization is that we would have to continue to erase descriptors (`ArrayList` as a method or field descriptor continues to erase to `LArrayList;`), that `getClass` returns the primary mirror (`ArrayList`), and that species information is pushed into a side channel. These are pretty much the exact same considerations as for null-restriction types. #### Species and bang types are _refinement types_ A _refinement type_ is a type whose value set is that of another type, plus a predicate restricting the value set.? A "bang" type `Point!` is a refinement of Point, where we eliminate the value `null`.? (Other well-known refinement types from PL history include C enums and Pascal ranges.)? Refinement types are often erased to their base type, but some refinements enable better layout.? Indeed, our interest in Q types is flattening, and for an implicitly constructible class, a variable holding a null-excluding type can be flattened. Similarly, for a sufficiently constrained generic type (e.g., `Point[int,int]`), the layout of such a variable can be flattened as well. What we previously called "type restrictions" in the [Parametric VM](https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/parametric-vm/parametric-vm.md#type-restricted-methods-and-fields-and-the-typerestriction-attribute) document is in fact a refinement type.? We claim that we can design the null-restriction channel in such a way that it can be extended, in some reasonable way, to support more general specialization. Both specialization, and null-restriction, are forms of refinement types.? Given that we've already discovered that we need to erase these to their primary (L) type in a lot of places, let's stake out some general principles for representing refinements in the VM: ?- Refinement types are erased to their base type in method and field ?? descriptors. ?- Refinement types do not have _class_ mirrors. ?- `Object::getClass` returns a class mirror. ?- Reflection deals in class mirrors, so refinements are erased from base ?? reflection. ?- Method handles deal in class mirrors, so refinements are erased from method ?? handles. That's a lot of erasure, so we have to bake refinement back in where it matters, but we want to be careful to limit the "blast radius" of the refinement information to where it does actually matter.? The new channel that encodes a refinement type will appear only when needed to carry out the tasks listed above: field declaration, array creation, and casting. ?- Fields are enhanced with some sort of "refinement" attribute, which (a) ?? guards against stores of bad values (the field equivalent of ?? `ArrayStoreException`) and (b) enables flatter layouts when the refinement ?? permits. ?- Array creation (`anewarray` / `multianewarray') is enhanced to support ?? creating arrays with refined component types, enabling the same benefits ?? (storage safety / layout flattening.) ?- Casting is enhanced to support refinements.? This is needed mostly because of ?? erasure -- we are erasing away refinement information and sometimes need to ?? reassert it. ?- When we get to specialization, `new` is enhanced to support refinements, and ?? possibly method declarations (to enable calling convention optimization in ?? the presence of highly specialized types like `Point[int,int]`.) We had previously been assuming that `[QPoint` is somehow more of a "real" type than (specialized) `Point[int,int]`, but I think we are better served seeing them both as refinements, where we continue to report a broad type but sort-of-secretly use refinement information to optimize layout. ## A strawman What follows is a strawman that eliminates Qs completely, replacing the few jobs Q has (field layout, array layout, and casts) with a single mechanism for refinement types which stays in the background until explicitly summoned. We believe the model outlined here can extend cleanly to species, as well as `B1!` types like `String!` as well.? Call this No-Q world.? This should not be taken as a concrete proposal, as much as a sketch of the concepts and the players. We have come to believe that adding Q descriptors to the JVM specification, while perhaps the right move in a from-scratch VM design, would be overreach as an evolutionary step.? For old APIs to adopt new descriptors will require many bridge methods with complex properties.? To avoid such bridges, old APIs would be forbidden from mentioning the new types.? For these reasons, new descriptors, and the mirrors that would accompany them, are quite literally a bridge too far. Accordingly, in No-Q world, descriptors reclaim their former role: describing primitives and classes.? Field and method descriptors will use `L` descriptors, even when carrying a null-restricted value (or a species.) Similarly, class mirrors return to their former role: describing classfiles and non-refined VM-derived types (such as array types.) As a self-imposed rule of this essay, we will not appeal to runtime support, condy or indy. Everything will be done with bytecodes, descriptors, constant pool entries, and other classfile structures, and not via specially-known methods.? As this is a strawman, we may indulge in some "wasteful" design, which can be transformed or lumped in later iterations.? The new elements of the design are: ?- A new reflective concept for `RefinementType`, which represents a refinement ?? of an existing (class) type. ?- A new reflective concept for `RepresentableType`, which is the common ?? supertype between `Class` and `RefinementType`. ?- New constant pool forms representing null-restriction of classes and of ?? arrays. ?- A new field attribute called `FieldRefinement`. ?- Adjustments to various bytecodes to interact with the new constant pool ?? forms. ?- Additions to reflective APIs. ## Refined types A refined type is a combination of a type (called the base type) and a value set restriction for that type which excludes some values in the value set of the base type.? Null-restricted types, arrays of null-restricted types, and eventually, species of generics are refined types. Refined types can be represented by a reflective object ``` sealed interface RefinementType implements RepresentableType { ??? RepresentableType baseType(); } ``` The type parameter `T` represents the base type. There are initially two implementations of `RefinementType`, which may be private, and are known to the VM: ``` private record NullRestrictedClass(Class baseType) ??????? implements RefinementType { } private record NullRestrictedArray(Class baseType) ??????? implements RefinementType { } ``` #### Constant pool entries The two jobs for null restriction must be representable in the constant pool: a null-restricted B3, and an array of a null-restricted B3.? (These correspond to `Constant_Class_info` with a descriptor of `QFoo;` and `[QFoo;` in the traditional design.)? In addition to being referenced by bytecodes and attributes, such constants should ideally be loadable, evaluating to a `RefinementType` or `RepresentableType`. The exact form of the constant pool entry (whether new bespoke constant pool entries, ad-hoc extensions to Constant_Class_info, or condy) can be bikeshod at the appropriate time; there are clearly tradeoffs here. Initially, null-restricted types must be implicitly constructible (B3), which would be checked when the constant is resolved.? Eventually, we can relax null-restriction to support all class types.? Similarly, we may initially restrict to one-dimensional flat arrays, and leave `multianewarray` to its old job. #### Representable types The new common superinterface between `Class` and `RefinementType` exists so that both classes and class refinements can be used as array components, type parameters for specializations, etc.? Some operations from `Class`, such as casting, may be pulled up into this interface. ``` sealed interface RepresentableType { ??? T cast(Object o) throws ClassCastException; ??? ... } ``` #### Refined fields Any field whose type is a null-restricted implicitly constructible class may be considered by the VM as a candidate for flattening.? Rather than using `field_info.descriptor_index` to encode a null-restricted type, we continue to erase to the traditional `L` descriptor, but add a `FieldRefinement` attribute on the field.? Similarly, `Constant_FieldRef_info` continues to link fields using the `L` descriptor. ``` FieldRefinement { ??? u2 name_index;??????? // "FieldRefinement" ??? u4 length; ??? u2 refinement_index;? // symbolic reference to a RefinementType } ``` The symbolic reference must be to a null-restricted, implicitly constructible class type, not an array type.? We may relax this restriction later. Additionally, a field refinement may affect the behavior of `putfield`.? For a null-restricted class, attempts to `putfield` a null will result in `NullPointerException` (or perhaps a more general `FieldStoreException`.) Looking ahead, for the null-restriction of a B1 or B2 class, there is no change to the layout but we could enforce the storage restriction on `putfield.`? When we get to species, the refinement for a species may affect the layout, and attempting to store a value of the wrong species may result in an exception or in an automatic conversion. It is a free choice as to whether we want to translate a field of type `Point![]` using an array refinement or fully erase it to `Point[]`. #### Refined casts The operand of a `checkcast` or `instanceof` may be a symbolic reference to a class or refinement.? (Since `instanceof` is null-hostile, changing `instanceof` is not necessary now, but when we get to species, we will need to be able to test for species membership.)? The `cast` operation may be pulled up from `Class` to `RepresentableType` so that casts can be done reflectively with either a `Class` or a refinement. #### Refined array creation An `anewarray` may make a symbolic reference to a class refinement type, as well as to a class, array, or interface type. For a refined array, `a.getClass()` continues to return the primary mirror for the array type, and `Class::getComponentType` on that array continues to return the primary mirror for the component type, but we may provide an additional API point akin to `getComponentType` that returns a `RepresentableType` which may be a `RefinementType`. Arrays of null-restricted values can be created reflectively; the existing `Array::newInstance` method will get an overload that takes `RepresentableType`. `Arrays::copyOf` when presented with a refined array type will create a refined array. #### Refinement information stays in the background until summoned The place where we need discipline is avoiding the temptation of "but someone might profitably use the information that this field holds a flat array."? Yes, they might -- but supporting that as a general-purpose runtime type (with descriptor and mirror) has costs. The model proposed here resists the temptation to redefine mirrors, descriptors, symbolic resolution, and reflection, instead leaning on erasure here for both null-restriction and specialization, and providing a secondary reflective channel (which almost no users will actually need) to get refinement information.? (An example of code that needs to summon refinement information is Arrays::copy, which would need to fetch the refined component type and instantiate an array using the refined type; most other reflective code would not need to even be aware of it.) #### Bonus round: specialization The framework so far seems to accomodate specialization fairly well.? There'll be a new subtype of `RefinementType` to represent a specialization, a reflective method for creating such specialization such as: ??? static SpecializedType specialization(Class baseClass, RepresentableType... arguments) and a new way to get such a type refinement in the constant pool (possibly just a condy whose bootstrap is the above method.)? The `new` bytecode is extended to accept a specialization refinement.? Field refinements would then be able to refer to specialization refinements. ## Conclusions In the current world we have a (mostly) 1:1:1 relationship between runtime types, descriptors, and mirrors; a model where species/refinements are not full runtime types preserves this.? The surface area where refinement information leaks to users who are not prepared for it is dramatically smaller. Refinements are not full runtime types, they don't have full Class mirrors.? We erase down to real runtime types in descriptors and in reflective API points like `Object::getClass`.? This seems a powerful simplification, and one that aligns with the previous language simplification.? To summarize: ?- Yes, we should get rid of Q descriptors, but should do so in a more ?? principled way by getting rid of Q as a runtime type entirely, replacing it ?? with a refinement type which stays in the background until it is actually ?? needed. ?- We should erase Q from method and field descriptors and from the obvious ?? mirrors, because refinement information is on a need-to-know basis. ?- Refinement information primarily flows from source -> classfile -> VM, and ?? mostly does not flow in the other direction.? Specialized reflection might ?? expose it, but we should do so not on general principles, but based on where ?? it is actually needed by the programming model. ?- Null restriction is more like specialization than not; they are both value ?? set refinements that possibly enable layout optimization, and we should seek ?? to treat them the same. ?- While leaving the door open for additional kinds of species and type ?? migration, we use our new powers, at first, only to define flattenable fields ?? and flattenable one-dimensional arrays. From john.r.rose at oracle.com Fri Jun 30 23:57:54 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 30 Jun 2023 16:57:54 -0700 Subject: We don't need no stinkin' Q descriptors In-Reply-To: References: Message-ID: <6367D637-CE41-4502-BD9B-3580FA5970DE@oracle.com> This is a major step forward. I have three sets of comments overall. # Looking back? First a bit of history here, as I recall it: When we started working on VM support for Valhalla, I remember very early conversations (2015-ish), involving Brian, Mark R., Guy S., Doug L., Dan H., Karen K., Dan S., etc., in Burlington MA and Ottawa (IBM). During these conversations we had all manner of crazy ideas (as we still do today, TBH), including ornate new syntaxes for descriptors. Brian made the point that we should pick just one L-like descriptor to describe the new flavor of data, and so Q was born. Brian further said something to this effect, IIRC: ?We won?t necessarily keep the Q forever, but it will help us, during prototyping, to clearly map all of the places where value-ness needs to be tracked.? I remember thinking, ?OK, but we?ll never get rid of it; it?s too obviously correct.? One result of this was we were able to define everything about values in the VM more crisply and clearly. Another result was yearly struggle sessions about how we were ever going to handle migration of Optional, LocalDate, etc. I?m surprised and glad that we have come to a place of maximum erasure, where (a) all the places where Q-ness needs mapping have been mapped, and yet (b) there is now no remaining migration problem (despite no investment in high-tech API bridges). Along the way Dan S. started quietly talking about Type Restrictions, which seemed (at first) to be some high-tech ceremony for stuff that could just as easily be spelled with Q?s. I?m glad he persisted, because now they seem to be the right-sized bucket in which to place the Q-signals, after Q?s go away. So, although I am wistful to see that clarity of Q?s go, it is more with nostalgia than regret. We have the clarity they bought us. And (bonus) they seem to dovetail with the next giant task of Valhalla, which is coping with generic data structure specialization (|List|). ## Avoiding the slippery slope Next, I want to point out that part of the trick of doing this well is not doing too much all at once. It?s not straightforward. Our newly won insights make it clear that we could do for |String!| what we propose for |Point!|, but if we take such incremental RFEs as they occur to us we will, in fact, be falling down a slippery slope towards a Big Bang of VM functionality that gets deferred further and further. (A Big Crunch would be a more likely outcome, frankly. Happily, we have learned to deliver incrementally, yes?) I would like to restate from Brian?s proposal a guiding principle to keep us off the slippery slope, until such time as we agree to take the next steps downward. I think one key principle here is to embrace erasure, and hide the presence of new refinement types from legacy code. (A nit: We should pick a phrase and stick with it. ?Type refinements? or ?refined types? are fine phrases, but it?s not clear they are exact synonyms with ?refinement types?. Rather arbitrarily, I prefer ?refinement type?, perhaps because it point to two realities: It?s a type, and there was a refinement decision made. Here is a complementary principle: In the VM, we should choose to support exactly and only those refinement types that support Valhalla?s prime goals, which are data structure improvement (flattening). Since |String!| doesn?t (yet) have a flattening story, |String!| should not be a (VM) representable type. Since |Integer!| is already covered by |int|, neither should |Integer!| be a (VM) representable type. (A programmer may get fewer mirrors than expected, but note that we are not adding any mirrors at all!) Although |Point![]| is a useful specialized data structure, |Point![][]| is not so useful; its usefulness stems from the structure of its components, not its own top-level structure. Therefore, making a distinction between |Point![][]| and |Point![]![]| (and |Point![][]!| and so on) is bookkeeping which we would have to pay for but which wouldn?t pay us back. This takes me to the following specific points about the Big Three use cases: - Field declaration - The refinement type can only be of the form |B3!| - Array creation - The component type can only be of the form |B3!| - Casting - The cast type can be either |B3!| or |B3![]| I think Brian covered all that, except for the following lines, which I think are a mis-step (down that slope I mentioned): > It is a free choice as to whether we want to translate a field of type > |Point![]| using an array refinement or fully erase it to |Point[]|. If we support |multianewarray| then it must take a CP reference to |B3![]|. But I don?t think that pulls its weight, so let?s not. Why does |checkcast| get extra powers not enjoyed by the other two use cases? I think the answer is pretty simple: |checkcast| is the last resort for a Java compiler?s T.S. (translation strategy); if some type cannot be represented on a VM container (and enforced by the verifier) then either it cannot be safely cast (leading to ?unchecked? warnings) or else it must be dynamically checked (requiring a |checkcast|). In order for a Java cast like |(Point!)x| to be efficient, it seems that |checkcast| should pick up the the job in one go, rather than require the T.S. to emit first |Objects::requireNN| and then a |checkcast|. (Note also our self-imposed rules for avoiding library dependencies?) And having |(Point!)x| be unchecked would be far too surprising, yes? The case for an effective cast of the form |(Point![])a| is perhaps less obvious, but it it very useful (from my VM perspective) to let the programmer use it to communicate flattening intentions outside of a loop, before the individual |Point| values are read or written. So the T.S. puts a dynamic check on an initialized |Point![]| variable and then all the downstream code can ?know? that flat access is being performed. Note that this design pattern works great for multi-dimensional arrays (at source level), except that the type |Point![][]| is uncheckable. I?m not sure how to explain this gap to users, but the VM-level reality is that the optimizations for flat access care only about arrays of dimension one, so I?m happy the gap is there. I hope we won?t be forced to fill it, because that will cause a large set of new compliance tests and a bug tail. ``` Point![][] a2d = ?; // T.S. cannot put checkcast on a2d, I hope for (var a1d : a2d) { // T.S. puts checkcast on each a1d, I hope for (var x : a1d) { ? process x ? } } ``` Another likely use of a |checkcast| of a both kinds of type is when the T.S. emits code to load from a field with of type |B3!| or |B3![]|. ``` class C { B3! x; B3![] a; } ? C c = ?; var x = c.x; // T.S. could put a checkcast here, if it helps var a = c.a; // ditto c.x = x.nextX(); // T.S. is very likely to put a checkcast here c.a = Arrays.copyOf(a); // ditto ``` Exactly where to put each |checkcast| (and where not to bother) is an interesting question; perhaps it?s too much work to place them on every read of a field. (I think it?s a good idea, because redundant checks are free in the VM and earlier checks are better than later ones.) But it seems very likely that at least field writes will benefit from checkcasts, for all types that are representable. And, note that type of `new B3![]` is representable. Its class will be `B3[].class`, but its representable type will be something like `NullRestrictedArray.of(B3.class)`. ## Healing the rift? One goal that is being held loosely at this moment is the old promise of Valhalla to ?heal the rift? between |int| and |Integer|. (More generally, between primitives and references.) We?ve come this far, are we going to give up on that goal now? By choosing not to allow |RefinementType| mix with |Class|, we are committing to leaving |int.class| and other primitive classes (and |void.class|) by themselves as outliers among the ?proper? classes and interfaces (|C.class|, |I.class|) and ?array classes? (|T[].class|). That?s not a rift-healing move, but it doesn?t have to interfere with other rift-healing moves that we *could* do. I don?t think there is a rift-healing move we could do with field declarations, since flat |int| fields are already fully supported. Although it is technically an incompatibility, we might consider allowing legacy |int[]| arrays to interoperate with |Object[]|, so that more generic code can be written. That would be close to the spirit of allowing |B3![]| arrays be viewed transparently as possibly-null-tolerant |B3[]| arrays. But there is definitely a slippery slope here. Should |int[]| be a subtype of |Object[]|? I think that also would be required. I would like to do this, if possible. (There is no cause to ask that |int|, which isn?t even a reference type, should somehow be made to look like a subtype of |Integer|.) One rift-widening move I?d like to avoid is introducing a third representable type, between |int| and |Integer|, for the purpose of making flat arrays of |Integer| that are not |int| arrays. Any ?value-ification? of |Integer| should avoid that trap. Rather |Integer![]|, if it is representable at all, should be |int[]|. I guess Dan S. is tracking these issues; I don?t recall them being discussed recently, but maybe they will ripen after we get closure on the bigger questions about Q. There is another place where a ?heal the rift? move might make sense, and that is in the API for |Class|. Brian suggests that perhaps the |Class::cast| method could be lifted to |RepresentableType|. That will make it easier to reflectively emulate |checkcast| instructions, but it will give wider exposure to an existing sharp edge in the |Class| API, which is the non-functionality of primitive mirrors. (I suppose Brian?s mention of lifting |cast| is why I?m getting into the question of ?healing rift? at all. Pulling on that string brings us to the that rift, IMO.) I mean that the call |int.class.cast(x)| does not work, and lifting that non-behavior up to |RepresentableType| will make a new and unwelcome distinction between |B3!| and |int|: The mirror for |B3!| would (presumably) do a null check and cast to |B3|, while the mirror for |int| would fail. Here are options to handle this sharpened edge: - Leave it as is. Sorry if you accidentally used |int.class|. - Enhance |int.class::cast| (and |isInstance|) to check for |Integer|. - Deem |cast| not liftable; make a new |RepresentableType::checkType| (and |isType|), and have it be total over B1/B2/B3 and primitives (B0??). Enhancing |int.class::cast| is arguably in the same spirit as allowing |int[]| to be a subtype of |Object[]|. But I think I prefer the last. In any case, I don?t look forward to a widening rift between primitives (B0!) and the other types. -------------- next part -------------- An HTML attachment was scrubbed... URL: