From daniel.smith at oracle.com Wed Jul 8 22:08:18 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 8 Jul 2020 16:08:18 -0600 Subject: JEP draft: Identity Warnings for Inline Class Candidates Message-ID: Here's an initial JEP draft for the "Identity Warnings for Inline Class Candidates" feature, which I'm hoping we can target to 16. https://bugs.openjdk.java.net/browse/JDK-8249100 Feedback is welcome. From amalloy at google.com Wed Jul 8 23:37:30 2020 From: amalloy at google.com (Alan Malloy) Date: Wed, 8 Jul 2020 16:37:30 -0700 Subject: JEP draft: Identity Warnings for Inline Class Candidates In-Reply-To: References: Message-ID: I had to squint a fair bit at "to use the new bytecode" before I realized that new is intended to be monospaced, referring specifically to the JVM bytecode instruction named new. At first I thought it was somehow talking about "bytecode that is new" in general, e.g. classfiles that were compiled recently could be "new" in some sense. Is there some way to refer specifically to bytecode instructions? Otherwise I'm a fan. Looks like a clear description of a feature I'm happy to see coming. On Wed, Jul 8, 2020 at 3:08 PM Dan Smith wrote: > Here's an initial JEP draft for the "Identity Warnings for Inline Class > Candidates" feature, which I'm hoping we can target to 16. > > https://bugs.openjdk.java.net/browse/JDK-8249100 > > Feedback is welcome. From daniel.smith at oracle.com Thu Jul 9 05:47:45 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 8 Jul 2020 23:47:45 -0600 Subject: JEP draft: Identity Warnings for Inline Class Candidates In-Reply-To: References: Message-ID: <3FB787A2-5409-4A9F-B9B7-BEDDA9400041@oracle.com> > On Jul 8, 2020, at 5:37 PM, Alan Malloy wrote: > > I had to squint a fair bit at "to use the new bytecode" before I realized that new is intended to be monospaced, referring specifically to the JVM bytecode instruction named new. At first I thought it was somehow talking about "bytecode that is new" in general, e.g. classfiles that were compiled recently could be "new" in some sense. Is there some way to refer specifically to bytecode instructions? Haha, yeah, 'new' is monospaced, but it's so short that the font change is subtle. Changed to "the opcode 'new'", which is hopefully a little less ambiguous. From mcnepp02 at googlemail.com Fri Jul 10 10:06:32 2020 From: mcnepp02 at googlemail.com (Gernot Neppert) Date: Fri, 10 Jul 2020 12:06:32 +0200 Subject: Clarification needed about primitive wrappers? Message-ID: Hello, it seems some clarification is needed about the fate of the primitive wrappers in "Valhalla-world". In this and the related Mailing Lists, you can find the following two proposals, with subtle differences: 1. the primitive wrappers (java.lang.Integer etc) are designated to become inline classes. This idea has been most recently cited in the posting "Identity warnings for inline class candidates". 2. the primitive wrappers should become the reference-projections of corresponding inline classes. This has sometimes been augmented with the idea that the denominations for the primitive types (such as "int") will then become aliases for those new inline types. So, what's it going to be? Right now, only proposal 2 makes much sense to me, because proposal 1 would simply add a second, redundant "inline" type for each primitive (also inherently "inline") type. But then, if we are going with proposal 2, what would be so special about the reference-projections of the primitive types? Shouldn't all reference projects be treated equally? Wouldn't it mean that synchronizing on an IdentityObject obtained via reference-projection should always warrant a warning? It may well be that all this is perfectly clear to you experts, however, it might still be advantageous to use consistent wording everywhere! From forax at univ-mlv.fr Fri Jul 10 12:53:48 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 10 Jul 2020 14:53:48 +0200 (CEST) Subject: Clarification needed about primitive wrappers? In-Reply-To: References: Message-ID: <2135504190.1360493.1594385628115.JavaMail.zimbra@u-pem.fr> >From valhalla-spec-observers, ----- Mail original ----- > De: "Gernot Neppert" > ?: "Valhalla Expert Group Observers" > Envoy?: Vendredi 10 Juillet 2020 12:06:32 > Objet: Clarification needed about primitive wrappers? > Hello, > > it seems some clarification is needed about the fate of the primitive > wrappers in "Valhalla-world". > In this and the related Mailing Lists, you can find the following two > proposals, with subtle differences: > > 1. the primitive wrappers (java.lang.Integer etc) are designated to become > inline classes. This idea has been most recently cited in the posting > "Identity warnings for inline class candidates". > > 2. the primitive wrappers should become the reference-projections of > corresponding inline classes. This has sometimes been augmented with the > idea that the denominations for the primitive types (such as "int") will > then become aliases for those new inline types. > > So, what's it going to be? The idea is to retrofit primitive types to be inline types, at the Java language level, not at the VM level. Once you have to done, given that a wrapper type is a way to transform a primitive type to an object, a wrapper type is the reference projection of the corresponding primitive type (which is now an inline type). [...] > > But then, if we are going with proposal 2, what would be so special about > the reference-projections of the primitive types? Shouldn't all reference > projects be treated equally? nothing special, it's more than the semantics of Integer now and the semantics of Integer being the reference projection of int are slightly different. > Wouldn't it mean that synchronizing on an IdentityObject obtained via > reference-projection should always warrant a warning? the reference projection is a projection not a boxing, so at runtime, the reference projection of an inline type is still an inline type, so a reference projection can not be an instance of IdentityObject. > > It may well be that all this is perfectly clear to you experts, however, it > might still be advantageous to use consistent wording everywhere! I'm not sure it's perfectly clear even to us :) R?mi From brian.goetz at oracle.com Fri Jul 10 14:24:57 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 10 Jul 2020 10:24:57 -0400 Subject: Clarification needed about primitive wrappers? In-Reply-To: <2135504190.1360493.1594385628115.JavaMail.zimbra@u-pem.fr> References: <2135504190.1360493.1594385628115.JavaMail.zimbra@u-pem.fr> Message-ID: <7825B1D1-BF9C-4CB9-A73D-67BF9D367A8D@oracle.com> The reality is we are evolving our perspective as we gain experience with the model. Once we prove out that we have something that actually works, There will be a round of updating the terminology. Sent from my iPad > On Jul 10, 2020, at 8:54 AM, Remi Forax wrote: > > ?From valhalla-spec-observers, > > ----- Mail original ----- >> De: "Gernot Neppert" >> ?: "Valhalla Expert Group Observers" >> Envoy?: Vendredi 10 Juillet 2020 12:06:32 >> Objet: Clarification needed about primitive wrappers? > >> Hello, >> >> it seems some clarification is needed about the fate of the primitive >> wrappers in "Valhalla-world". >> In this and the related Mailing Lists, you can find the following two >> proposals, with subtle differences: >> >> 1. the primitive wrappers (java.lang.Integer etc) are designated to become >> inline classes. This idea has been most recently cited in the posting >> "Identity warnings for inline class candidates". >> >> 2. the primitive wrappers should become the reference-projections of >> corresponding inline classes. This has sometimes been augmented with the >> idea that the denominations for the primitive types (such as "int") will >> then become aliases for those new inline types. >> >> So, what's it going to be? > > The idea is to retrofit primitive types to be inline types, at the Java language level, not at the VM level. > Once you have to done, given that a wrapper type is a way to transform a primitive type to an object, > a wrapper type is the reference projection of the corresponding primitive type (which is now an inline type). > > [...] > >> >> But then, if we are going with proposal 2, what would be so special about >> the reference-projections of the primitive types? Shouldn't all reference >> projects be treated equally? > > nothing special, it's more than the semantics of Integer now and the semantics of Integer being the reference projection of int are slightly different. > >> Wouldn't it mean that synchronizing on an IdentityObject obtained via >> reference-projection should always warrant a warning? > > the reference projection is a projection not a boxing, so at runtime, the reference projection of an inline type is still an inline type, so a reference projection can not be an instance of IdentityObject. > >> >> It may well be that all this is perfectly clear to you experts, however, it >> might still be advantageous to use consistent wording everywhere! > > I'm not sure it's perfectly clear even to us :) > > R?mi > From daniel.smith at oracle.com Fri Jul 10 16:36:35 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 Jul 2020 10:36:35 -0600 Subject: Clarification needed about primitive wrappers? In-Reply-To: <2135504190.1360493.1594385628115.JavaMail.zimbra@u-pem.fr> References: <2135504190.1360493.1594385628115.JavaMail.zimbra@u-pem.fr> Message-ID: <949FF369-2361-41D3-9E4F-52C8CF404FFE@oracle.com> > On Jul 10, 2020, at 6:53 AM, Remi Forax wrote: > > From valhalla-spec-observers, > > ----- Mail original ----- >> De: "Gernot Neppert" >> ?: "Valhalla Expert Group Observers" >> Envoy?: Vendredi 10 Juillet 2020 12:06:32 >> Objet: Clarification needed about primitive wrappers? > >> Hello, >> >> it seems some clarification is needed about the fate of the primitive >> wrappers in "Valhalla-world". >> In this and the related Mailing Lists, you can find the following two >> proposals, with subtle differences: >> >> 1. the primitive wrappers (java.lang.Integer etc) are designated to become >> inline classes. This idea has been most recently cited in the posting >> "Identity warnings for inline class candidates". >> >> 2. the primitive wrappers should become the reference-projections of >> corresponding inline classes. This has sometimes been augmented with the >> idea that the denominations for the primitive types (such as "int") will >> then become aliases for those new inline types. >> >> So, what's it going to be? Answer: both. This question reflects some confusion about the concepts (understandable, given the evolution of the concepts; but I think they're stable as of a few months ago). A class may be declared 'inline', which indicates that its instances have no identity. The instances of that class may be treated as values in two ways: 1) As inline objects 2) As references to objects It's the same object in both cases, just handled in different ways. There are, correspondingly, two types: 1) An inline type 2) A reference type A "reference projection" is a way to think about (and, in our compilation strategy, implement) the reference type. But, in the language model, it's a *type*, not a *class* (or interface). A better way to think about it is that there is a single class with two corresponding types. Typically, for an inline class 'Foo', the inline type is spelled 'Foo' and the reference type is spelled 'Foo.ref'. For some inline classes, the declaration will indicate that the inline type is spelled 'Foo.val' and the reference type is spelled 'Foo'. In the case of java.lang.Integer: it will be declared as an inline class. The inline type will be spelled 'int' (and perhaps also 'Integer.val'). The reference type will be spelled 'Integer' (and perhaps also 'int.ref'). The only special thing here is the interpretation of the keyword 'int'. (Well, and lots of compilation magic under the hood.) The Identity Warnings JEP sidesteps much of this discussion by making no mention of types?it's only concerned with the classes and the changing behaviors of their instances. From daniel.smith at oracle.com Fri Jul 10 18:23:25 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 Jul 2020 12:23:25 -0600 Subject: Revisiting default values Message-ID: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Brian pointed out that my list of candidate inline classes in the Identity Warnings JEP (JDK-8249100) includes a number of classes that, despite being "value-based classes" and disavowing their identity, might not end up as inline classes. The problem? Default values. This might be a good time to revisit the open design issues surrounding default values and see if we can make some progress. Background/status quo: every inline class has a default instance, which provides the initial value of fields and array components that have the inline type (e.g., in 'new Point[10]'). It's also the prototype instance used to create all other instances (start with 'vdefault', then apply 'withfield' as needed). The default value is, by fiat, the class instance produced by setting all fields to *their* default values. Often, but not always, this means field/array initialization amounts to setting all the bits to 0. Importantly, no user code is involved in creating a default instance. Real code is always useful for grounding design discussions, so let's start there. Among the classes I listed as inline class candidates, we can put them in three buckets: Bucket #1: Have a reasonable default, as declared. - wrapper classes (the primitive zeros) - Optional & friends (empty) - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration (0s), Period (0d), Year (1 BC, if that's acceptable) Bucket #2: Could have a reasonable default after re-interpreting fields. - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special handling) - ListN, SetN, MapN (null array interpreted as empty) Bucket #3: No good default. - Runtime.Version (need a non-null List) - ProcessHandleImpl (need a valid process ID) - List12, Set12, Map1 (need a non-null value) - All ConstantDesc implementations (need real class & method names, etc.) There's some subjectivity between the 2nd and 3rd buckets, but the idea behind the 2nd is that, with some translation layer between physical fields and interpretation of those fields, we can come up with an intuitive default (e.g., "0 means January"; "a null String means time zone 'UTC'"). In contrast, in the third bucket, any attempt to define a default value is going to be pretty unintuitive ("A null method name means 'toString'"). The question here is how much work the JVM and language are willing to do, or how much work we're willing to ask clients to do, in order to support use cases that don't fall into Bucket #1. I don't think totally excluding Buckets #2 and #3 is a very good outcome. It means that, in many cases, inline classes need to be built up exclusively from primitives or other inline types, because if you use reference types, your default value will have a null field. (Sometimes, as in Optional, null fields have straightforward interpretations, but most of the time programs are designed to prevent them.) Whether we support Bucket #2 but not Bucket #3 is a harder question. It wouldn't be so bad if none of the examples above in Bucket #3 become inline classes?for the most part they're handled via interfaces, anyway. (Counterpoint: inline class instances that are immediately typed with interface types still potentially provide a performance boost.) But I'm also not sure this is representative. We've noted before that many use cases, like database records or data structure cursors, don't have meaningful defaults (what's a default mailing address?). The ConstantDesc classes really illustrate this, even though they happen to not be public. Another observation is that if we support Bucket #3 but not Bucket #2, that's probably not a big deal?I'm not sure anybody really *wants* to deal with the default instance; it's just the price you pay for being an inline class. If there's a way to opt out of that extra weirdness and move from Bucket #2 to Bucket #3, great. With that discussion in mind, here are some summaries of approaches we've considered, or that I think we ought to consider, for supporting buckets #2 and #3. (This is as best as I recall. If there's something I've missed, add it to the list!) [Weighing in for myself: my current preference is to do one of F, G, or I. I'm not that interested in supporting Bucket #2, for reasons given above, although Option A works for programmers who really want it.] === Solutions to support Bucket #2 === Two broad strategies here: re-interpreting fields (A, B), and re-interpreting the default instance (C, D). --- Option A: Encourage programmers to re-interpret fields Guidance to programmers: when you declare an inline class, identify any fields for which the default instance should hold something other than zero/null; define a mapping for your implementation from zero/null to the value you want. One way to do this is to define a (possibly private) getter for each field, and include logic like 'return month + 1' or 'return id == null ? "UTC" : id'. Or maybe you inline that logic, as long as you're careful to do so everywhere. Importantly, you also need to reverse the logic in your constructor?for the sake of '==', if somebody manually creates the default instance, you should set fields to zero/null. This doesn't work if you want public fields, but that's life as an OO programmer. In this approach, it would be important that inline classes be expected to document their default instance in Javadoc (perhaps with a new Javadoc tag)?the interpretation of the default instance is less apparent to users than "all zeros". Limitations: - It's a fairly error-prone approach. Programmers will absolutely forget to apply the mapping in one place, and everything will be fine until somebody tries to invoke a particular method on the default instance. Put that bug in a security-sensitive context, and maybe you have an exploit. (Something that could help some is choosing good names?call your field 'monthIndex', not plain 'month', to remind yourself that it's zero-based.) - Performance impact of an extra layer of computation on all field accesses. Probably not a big deal in general, but all those null checks, etc., could have a negative impact in certain contexts. And the *appearance* of extra cost might scare programmers away from doing the right thing ("eh, I probably won't use the default value anyway, I'll just ignore it to make my code faster"). --- Option B: Language support for field re-interpretation The language allows inline classes to declare fields with mappings to/from an internal representation. Just like Option A, but with guarantees that the internal representation isn't inappropriately accessed directly. This pulls on a thread we explored a bit for Amber awhile back, some form of "abstract fields" or "virtual fields". Maybe there's something there, but it seems like a general-purpose feature, and one we're not likely to reach a final solution on anytime soon. --- Option C: Language support for a designated default The language provides some way for programmers to declare the "logical" default instance (something like a special static field). The compiler inserts a test for the "physical" default on any field/array access, and replaces it with the logical default. That is: Point p = points[3]; compiles to point p$0 = points[3]; Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; This is much less bug-prone than Option A?the compiler does all the work?and much more achievable in the short/medium term than Option B. Compared to Option B, this pushes the computation overhead from inline class field accesses to reads of the inline type from fields/arrays. I don't know if that's good or bad?maybe a wash, heavily dependent on the use case. A few big problems: - The physical default still exists, and malicious bytecode can use it. If programmers want strong guarantees, they'll have to check and throw wherever an untrusted instance is provided. (Clients with access to the inline class's fields have to do so, too.) - Covariant arrays mean every read from any array type that might be flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through translation logic. - There's an assumption here that the programmer doesn't intend to use the physical default as a valid non-default instance. That's hard for the compiler to enforce, and weird stuff happens in fields/arrays if the programmer doesn't prevent it. (Could be mitigated with extra implicit logic on field/array writes or in constructors.) --- Option D: JVM support for a designated default The VM allows inline classes to designate a logical default instance, and the field/array access instructions map from the physical default to the logical default. The 'vdefault' instruction produces the logical default instance; something else is used by the class's factories to build from the physical default. This addresses the first two problems with Option C?the VM gives strong guarantees, and can make the translation a virtual operation of certain arrays. To address the second problem, it seems like we'd need the more complex logic I hinted at: on writes, map the physical default to the logical default, and map the logical default to the physical default. Do the reverse on reads. The problem here is bytecode complexity/slowdowns. We've already added some complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate similar changes to 'putfield'/'getfield' (specialized fields), so maybe that means we might as well do more. Or maybe it means we're already over budget. :-) From the users' perspective, if any performance reduction on reads/writes can be limited to the inline classes in Bucket #2, *all* the options have a similar cost, whether imposed by the programmer, language, or VM. So, to a first approximation, slower opcode execution is fine. === Solutions to support Bucket #3 === Two broad strategies here: rejecting member accesses on the default instance (E, F, G), and preventing programs from ever seeing the default instance (H, I). --- Option E: Encourage programmers to guard against default instances Guidance to programmers: if you don't like your class's default instance, check for it in your methods and throw. Maybe Java SE defines a new RuntimeException to encourage this. The simple way to do this is with some boilerplate at the start of all your methods: if (this == MyClass.default) throw new InvalidDefaultException(); More permissive classes could just do some validation on the fields that are relevant to a particular operation. (E.g., 'getMonth' doesn't care if 'zoneId' is null.) This doesn't work if you want public fields, but that's life as an OO programmer. It's not ideal that an invalid instance can float around a program until somebody trips on one of these checks, rather than detecting the invalid value earlier?we're propagating the NPE problem. And it takes some getting used to that there are two null-like values in the reference type's domain. --- Option F: Language support for default instance guards An inline class declaration can indicate that the default instance is invalid. The compiler generates guards, as in Option E, at the start of all instance method bodies, and perhaps on all field accesses outside of those methods. Programmers give up finer-grained control, but get more safety. I'm sure most would be happy with that trade. Improper/separately-compiled bytecode can skip the field access checks, but that's a minor concern. Same issues as Option E regarding adding a "new NPE" to the platform. --- Option G: JVM support for default instance guards Inline class files can indicate that their default instance is invalid. All attempts to operate on that instance (via field/method accesses, other than 'withfield') result in an exception. This tightens up Option F, making it just as impossible to access members of the default instance as it is to access members of 'null'. Same issues as Option E regarding adding a "new NPE" to the platform. --- Option H: Language checks on field/array reads An inline class declaration can indicate that the default instance is invalid. Every field and array access that may involved an uninitialized field/array component of that inline type gets augmented with a check that rejects reads of the default value (treating it as "you forgot to initialize this variable"). That is: Point p = points[3]; compiles to point p$0 = points[3]; if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); Point p = p$0; This is much like Option C, and has roughly the same advantages/problems. There's not a strong guarantee that the default value won't pop up from untrusted bytecode (or unreliable inline class authors), and lots of array types need guards. --- Option I: JVM checks on field/array reads Inline class files can indicate that their default instance is invalid. When reading from a field/array component of the inline type ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default value is found (treating it as "you forgot to initialize this variable"). The 'vdefault' instruction, like 'withfield', is illegal outside of the inline class's nest. Better than Option H in that it can be optimized to occur on only certain reads, and in that it provides strong guarantees?only the inline class can ever "see" the default instance. Well, unless the inline class chooses to share that instance with the world. Not sure how we prevent that. But maybe at that point, anything bad/weird that happens is the author's own fault. (E.g., putting the default value in an array will make that component effectively "uninitialized" again.) Like Option D, there's a question of whether we're willing to add this complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is that at least it's less complexity than you have in Option D. From kevinb at google.com Fri Jul 10 18:46:55 2020 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 10 Jul 2020 11:46:55 -0700 Subject: Revisiting default values In-Reply-To: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: This response is not to the main topic; not trying to send us down a rabbit-hole but this point is very important to me (as will be clear :-)). On Fri, Jul 10, 2020 at 11:23 AM Dan Smith wrote: Bucket #1: Have a reasonable default, as declared. > - wrapper classes (the primitive zeros) > - Optional & friends (empty) > - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), > Duration (0s), Period (0d), Year (1 BC, if that's acceptable) > Duration and Period: sure. Instant and the others: please, please put these in a separate bucket. They can have a *default*, but it is absolutely *not* a "reasonable" default. In fact many tens (hundreds?) of thousands of bug reports in the last 50 years of computing have been "why in the world did 1970-01-01 or 1969-12-31 show up on this screen??" (Source: my team at Google has invested literally multiple person-years in an effort to stamp out bugs with how users use java.time, which I kicked off and have stayed peripherally involved in. I feel this should make our perspective worth listening to.) Realize that primitive types having default values *already* causes some number of bugs today even though we know they are the least-bad category and that risk is acceptable. My reason for complaining here is not just about the java.time types themselves, but to argue that this is an important 4th bucket we should be concerned about. In some ways it is a bigger problem that Bucket #3 "no good default", since it is an *actively harmful* default. For all of these types, there is one really fantastic default value that does everything you would want it to do: null. That is why these types should not become inline types, or *certainly* not val-default inline types, and why Error Prone will have to ban usage of `.val` if they do. (Tangent of tangent: midnight is an interesting choice of default value for LocalTime, since I think there are some LocalTimes that so far have *always happened* in every date and location in history and that's not one of them. That's not to say any other choice would work, but just to highlight how wrong it is to have any default value at all.) > Bucket #2: Could have a reasonable default after re-interpreting fields. > - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, > ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, > MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days > should be nonzero; null Strings, ZoneIds, HijrahChronologies, and > JapaneseEras require special handling) > Echoing... default seems harmful in every one of these. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From daniel.smith at oracle.com Fri Jul 10 20:02:02 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 Jul 2020 14:02:02 -0600 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: > On Jul 10, 2020, at 12:46 PM, Kevin Bourrillion wrote: > > My reason for complaining here is not just about the java.time types themselves, but to argue that this is an important 4th bucket we should be concerned about. In some ways it is a bigger problem that Bucket #3 "no good default", since it is an actively harmful default. > > For all of these types, there is one really fantastic default value that does everything you would want it to do: null. That is why these types should not become inline types, or certainly not val-default inline types, and why Error Prone will have to ban usage of `.val` if they do. Appreciate the thoughts, this is definitely relevant. For the purpose of this discussion, I'd say you're arguing for these classes to move to Bucket #3. Because then the question becomes, just like for the other classes there: do we use the Bucket #3 strategies to support these as inline classes, or do we give up and leave them as identity classes? From kevinb at google.com Fri Jul 10 20:42:48 2020 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 10 Jul 2020 13:42:48 -0700 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: Yes, that would satisfy me if Bucket #3 acknowledges that it's both for cases where a default is impossible as well as cases where it is simply judged to be too harmful. On Fri, Jul 10, 2020 at 1:02 PM Dan Smith wrote: > > On Jul 10, 2020, at 12:46 PM, Kevin Bourrillion > wrote: > > > > My reason for complaining here is not just about the java.time types > themselves, but to argue that this is an important 4th bucket we should be > concerned about. In some ways it is a bigger problem that Bucket #3 "no > good default", since it is an actively harmful default. > > > > For all of these types, there is one really fantastic default value that > does everything you would want it to do: null. That is why these types > should not become inline types, or certainly not val-default inline types, > and why Error Prone will have to ban usage of `.val` if they do. > > Appreciate the thoughts, this is definitely relevant. > > For the purpose of this discussion, I'd say you're arguing for these > classes to move to Bucket #3. Because then the question becomes, just like > for the other classes there: do we use the Bucket #3 strategies to support > these as inline classes, or do we give up and leave them as identity > classes? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From orionllmain at gmail.com Mon Jul 13 04:45:34 2020 From: orionllmain at gmail.com (Zheka Kozlov) Date: Mon, 13 Jul 2020 11:45:34 +0700 Subject: Revisiting default values In-Reply-To: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: Hi Dan! Sorry for a probably stupid question but aren't all classes from Bucket #2 and #3 ref-default? Which means when we are calling new LocalDate[10], all elements of the array are initialized to null. And since the constructors of these classes are private, the external user will never see the instances in their default state. So why do we need to care about the default initialization at all? Am I wrong? ??, 11 ???. 2020 ?. ? 01:25, Dan Smith : > Brian pointed out that my list of candidate inline classes in the Identity > Warnings JEP (JDK-8249100) includes a number of classes that, despite being > "value-based classes" and disavowing their identity, might not end up as > inline classes. The problem? Default values. > > This might be a good time to revisit the open design issues surrounding > default values and see if we can make some progress. > > Background/status quo: every inline class has a default instance, which > provides the initial value of fields and array components that have the > inline type (e.g., in 'new Point[10]'). It's also the prototype instance > used to create all other instances (start with 'vdefault', then apply > 'withfield' as needed). The default value is, by fiat, the class instance > produced by setting all fields to *their* default values. Often, but not > always, this means field/array initialization amounts to setting all the > bits to 0. Importantly, no user code is involved in creating a default > instance. > > Real code is always useful for grounding design discussions, so let's > start there. Among the classes I listed as inline class candidates, we can > put them in three buckets: > > Bucket #1: Have a reasonable default, as declared. > - wrapper classes (the primitive zeros) > - Optional & friends (empty) > - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), > Duration (0s), Period (0d), Year (1 BC, if that's acceptable) > > Bucket #2: Could have a reasonable default after re-interpreting fields. > - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, > ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, > MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days > should be nonzero; null Strings, ZoneIds, HijrahChronologies, and > JapaneseEras require special handling) > - ListN, SetN, MapN (null array interpreted as empty) > > Bucket #3: No good default. > - Runtime.Version (need a non-null List) > - ProcessHandleImpl (need a valid process ID) > - List12, Set12, Map1 (need a non-null value) > - All ConstantDesc implementations (need real class & method names, etc.) > > There's some subjectivity between the 2nd and 3rd buckets, but the idea > behind the 2nd is that, with some translation layer between physical fields > and interpretation of those fields, we can come up with an intuitive > default (e.g., "0 means January"; "a null String means time zone 'UTC'"). > In contrast, in the third bucket, any attempt to define a default value is > going to be pretty unintuitive ("A null method name means 'toString'"). > > The question here is how much work the JVM and language are willing to do, > or how much work we're willing to ask clients to do, in order to support > use cases that don't fall into Bucket #1. > > I don't think totally excluding Buckets #2 and #3 is a very good outcome. > It means that, in many cases, inline classes need to be built up > exclusively from primitives or other inline types, because if you use > reference types, your default value will have a null field. (Sometimes, as > in Optional, null fields have straightforward interpretations, but most of > the time programs are designed to prevent them.) > > Whether we support Bucket #2 but not Bucket #3 is a harder question. It > wouldn't be so bad if none of the examples above in Bucket #3 become inline > classes?for the most part they're handled via interfaces, anyway. > (Counterpoint: inline class instances that are immediately typed with > interface types still potentially provide a performance boost.) But I'm > also not sure this is representative. We've noted before that many use > cases, like database records or data structure cursors, don't have > meaningful defaults (what's a default mailing address?). The ConstantDesc > classes really illustrate this, even though they happen to not be public. > > Another observation is that if we support Bucket #3 but not Bucket #2, > that's probably not a big deal?I'm not sure anybody really *wants* to deal > with the default instance; it's just the price you pay for being an inline > class. If there's a way to opt out of that extra weirdness and move from > Bucket #2 to Bucket #3, great. > > With that discussion in mind, here are some summaries of approaches we've > considered, or that I think we ought to consider, for supporting buckets #2 > and #3. (This is as best as I recall. If there's something I've missed, add > it to the list!) > > [Weighing in for myself: my current preference is to do one of F, G, or I. > I'm not that interested in supporting Bucket #2, for reasons given above, > although Option A works for programmers who really want it.] > > > > === Solutions to support Bucket #2 === > > Two broad strategies here: re-interpreting fields (A, B), and > re-interpreting the default instance (C, D). > > --- > > Option A: Encourage programmers to re-interpret fields > > Guidance to programmers: when you declare an inline class, identify any > fields for which the default instance should hold something other than > zero/null; define a mapping for your implementation from zero/null to the > value you want. > > One way to do this is to define a (possibly private) getter for each > field, and include logic like 'return month + 1' or 'return id == null ? > "UTC" : id'. Or maybe you inline that logic, as long as you're careful to > do so everywhere. Importantly, you also need to reverse the logic in your > constructor?for the sake of '==', if somebody manually creates the default > instance, you should set fields to zero/null. > > This doesn't work if you want public fields, but that's life as an OO > programmer. > > In this approach, it would be important that inline classes be expected to > document their default instance in Javadoc (perhaps with a new Javadoc > tag)?the interpretation of the default instance is less apparent to users > than "all zeros". > > Limitations: > > - It's a fairly error-prone approach. Programmers will absolutely forget > to apply the mapping in one place, and everything will be fine until > somebody tries to invoke a particular method on the default instance. Put > that bug in a security-sensitive context, and maybe you have an exploit. > (Something that could help some is choosing good names?call your field > 'monthIndex', not plain 'month', to remind yourself that it's zero-based.) > > - Performance impact of an extra layer of computation on all field > accesses. Probably not a big deal in general, but all those null checks, > etc., could have a negative impact in certain contexts. And the > *appearance* of extra cost might scare programmers away from doing the > right thing ("eh, I probably won't use the default value anyway, I'll just > ignore it to make my code faster"). > > --- > > Option B: Language support for field re-interpretation > > The language allows inline classes to declare fields with mappings to/from > an internal representation. Just like Option A, but with guarantees that > the internal representation isn't inappropriately accessed directly. > > This pulls on a thread we explored a bit for Amber awhile back, some form > of "abstract fields" or "virtual fields". Maybe there's something there, > but it seems like a general-purpose feature, and one we're not likely to > reach a final solution on anytime soon. > > --- > > Option C: Language support for a designated default > > The language provides some way for programmers to declare the "logical" > default instance (something like a special static field). The compiler > inserts a test for the "physical" default on any field/array access, and > replaces it with the logical default. > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; > > This is much less bug-prone than Option A?the compiler does all the > work?and much more achievable in the short/medium term than Option B. > > Compared to Option B, this pushes the computation overhead from inline > class field accesses to reads of the inline type from fields/arrays. I > don't know if that's good or bad?maybe a wash, heavily dependent on the use > case. > > A few big problems: > > - The physical default still exists, and malicious bytecode can use it. If > programmers want strong guarantees, they'll have to check and throw > wherever an untrusted instance is provided. (Clients with access to the > inline class's fields have to do so, too.) > > - Covariant arrays mean every read from any array type that might be > flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through > translation logic. > > - There's an assumption here that the programmer doesn't intend to use the > physical default as a valid non-default instance. That's hard for the > compiler to enforce, and weird stuff happens in fields/arrays if the > programmer doesn't prevent it. (Could be mitigated with extra implicit > logic on field/array writes or in constructors.) > > --- > > Option D: JVM support for a designated default > > The VM allows inline classes to designate a logical default instance, and > the field/array access instructions map from the physical default to the > logical default. The 'vdefault' instruction produces the logical default > instance; something else is used by the class's factories to build from the > physical default. > > This addresses the first two problems with Option C?the VM gives strong > guarantees, and can make the translation a virtual operation of certain > arrays. > > To address the second problem, it seems like we'd need the more complex > logic I hinted at: on writes, map the physical default to the logical > default, and map the logical default to the physical default. Do the > reverse on reads. > > The problem here is bytecode complexity/slowdowns. We've already added > some complexity to 'aaload'/'aastore' (covariant flattened arrays), and > anticipate similar changes to 'putfield'/'getfield' (specialized fields), > so maybe that means we might as well do more. Or maybe it means we're > already over budget. :-) > > From the users' perspective, if any performance reduction on reads/writes > can be limited to the inline classes in Bucket #2, *all* the options have a > similar cost, whether imposed by the programmer, language, or VM. So, to a > first approximation, slower opcode execution is fine. > > > > === Solutions to support Bucket #3 === > > Two broad strategies here: rejecting member accesses on the default > instance (E, F, G), and preventing programs from ever seeing the default > instance (H, I). > > --- > > Option E: Encourage programmers to guard against default instances > > Guidance to programmers: if you don't like your class's default instance, > check for it in your methods and throw. Maybe Java SE defines a new > RuntimeException to encourage this. > > The simple way to do this is with some boilerplate at the start of all > your methods: > > if (this == MyClass.default) throw new InvalidDefaultException(); > > More permissive classes could just do some validation on the fields that > are relevant to a particular operation. (E.g., 'getMonth' doesn't care if > 'zoneId' is null.) > > This doesn't work if you want public fields, but that's life as an OO > programmer. > > It's not ideal that an invalid instance can float around a program until > somebody trips on one of these checks, rather than detecting the invalid > value earlier?we're propagating the NPE problem. And it takes some getting > used to that there are two null-like values in the reference type's domain. > > --- > > Option F: Language support for default instance guards > > An inline class declaration can indicate that the default instance is > invalid. The compiler generates guards, as in Option E, at the start of all > instance method bodies, and perhaps on all field accesses outside of those > methods. > > Programmers give up finer-grained control, but get more safety. I'm sure > most would be happy with that trade. > > Improper/separately-compiled bytecode can skip the field access checks, > but that's a minor concern. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option G: JVM support for default instance guards > > Inline class files can indicate that their default instance is invalid. > All attempts to operate on that instance (via field/method accesses, other > than 'withfield') result in an exception. > > This tightens up Option F, making it just as impossible to access members > of the default instance as it is to access members of 'null'. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option H: Language checks on field/array reads > > An inline class declaration can indicate that the default instance is > invalid. Every field and array access that may involved an uninitialized > field/array component of that inline type gets augmented with a check that > rejects reads of the default value (treating it as "you forgot to > initialize this variable"). > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); > Point p = p$0; > > This is much like Option C, and has roughly the same advantages/problems. > There's not a strong guarantee that the default value won't pop up from > untrusted bytecode (or unreliable inline class authors), and lots of array > types need guards. > > --- > > Option I: JVM checks on field/array reads > > Inline class files can indicate that their default instance is invalid. > When reading from a field/array component of the inline type > ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default > value is found (treating it as "you forgot to initialize this variable"). > The 'vdefault' instruction, like 'withfield', is illegal outside of the > inline class's nest. > > Better than Option H in that it can be optimized to occur on only certain > reads, and in that it provides strong guarantees?only the inline class can > ever "see" the default instance. > > Well, unless the inline class chooses to share that instance with the > world. Not sure how we prevent that. But maybe at that point, anything > bad/weird that happens is the author's own fault. (E.g., putting the > default value in an array will make that component effectively > "uninitialized" again.) > > Like Option D, there's a question of whether we're willing to add this > complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is > that at least it's less complexity than you have in Option D. > > From daniel.smith at oracle.com Mon Jul 13 17:36:34 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 13 Jul 2020 11:36:34 -0600 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: <8F9302A6-4AF0-4D7F-8F7F-32745EB38AE8@oracle.com> From valhalla-spec-observers: > On Jul 12, 2020, at 10:45 PM, Zheka Kozlov wrote: > > Sorry for a probably stupid question but aren't all classes from Bucket #2 and #3 ref-default? Which means when we are calling new LocalDate[10], all elements of the array are initialized to null. And since the constructors of these classes are private, the external user will never see the instances in their default state. True, 'new LocalDate[10]' will continue to allocate an array of nulls. The default instance is only relevant when someone does 'new LocalDate.val[10]'. Regardless of the syntax, if there exists an inline type for instances of an inline class ('LocalDate.val' above), there will also be a semantic question of how we initialize fields/arrays of that inline type. From daniel.smith at oracle.com Mon Jul 13 18:19:46 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 13 Jul 2020 12:19:46 -0600 Subject: Revisiting default values In-Reply-To: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> > On Jul 10, 2020, at 12:23 PM, Dan Smith wrote: > > Option G: JVM support for default instance guards > > Inline class files can indicate that their default instance is invalid. All attempts to operate on that instance (via field/method accesses, other than 'withfield') result in an exception. > > This tightens up Option F, making it just as impossible to access members of the default instance as it is to access members of 'null'. > > Same issues as Option E regarding adding a "new NPE" to the platform. There's a variant of this that deserves spelling out: --- Option J: JVM treats default instance as 'null' Like Option G, an inline class file can indicate that its default instance is invalid?in this case, 'null'. All attempts to operate on that instance result in an NPE. Conceptually, the null instance and the null reference are the same, and should generally be indistinguishable. (We explored this awhile back as a tool for migration, before going in a different direction.) Some implications: - The VM probably wants to normalize its encodings (null reference vs. null instance), meaning there's a translation layer on field/array reads, just like Option I, and also for field/array writes, just like Option D. - Casts to Q types for certain classes should also translate from null reference to null instance, rather than NPE. - For these classes, the 'withfield' instruction is uniquely able to operate on and produce 'null'. - In the language, the 'null' literal can be assigned to some inline types. (In the VM, the verifier could require using 'defaultvalue' instead, if it wants to avoid some class loading.) - We could revisit the question of whether it's possible to migrate an identity class to be an inline-default inline class as long as the default instance is 'null'. (There are additional issues, like binary compatibility. But we could we-open that exploration...) --- My sense is that Option I dominates Option J by most measures?it achieves the same result (default value is invalid), with less work at flattened storage barriers, fewer tweaks to the rest of the system, and a more useful programming model (no nulls being passed around). From kevinb at google.com Mon Jul 13 19:12:54 2020 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 13 Jul 2020 12:12:54 -0700 Subject: Revisiting default values In-Reply-To: <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> Message-ID: It sounds like this debate is between `null` and a value which really is the *moral equivalent* of `null`. You basically would have two kinds of nullability that look different from each other. If you can surface both cases as literal `null` then nullness analysis tools could work the same way for both. That seems really appealing to me. On Mon, Jul 13, 2020 at 11:20 AM Dan Smith wrote: > > On Jul 10, 2020, at 12:23 PM, Dan Smith wrote: > > > > Option G: JVM support for default instance guards > > > > Inline class files can indicate that their default instance is invalid. > All attempts to operate on that instance (via field/method accesses, other > than 'withfield') result in an exception. > > > > This tightens up Option F, making it just as impossible to access > members of the default instance as it is to access members of 'null'. > > > > Same issues as Option E regarding adding a "new NPE" to the platform. > > There's a variant of this that deserves spelling out: > > --- > > Option J: JVM treats default instance as 'null' > > Like Option G, an inline class file can indicate that its default instance > is invalid?in this case, 'null'. All attempts to operate on that instance > result in an NPE. Conceptually, the null instance and the null reference > are the same, and should generally be indistinguishable. > > (We explored this awhile back as a tool for migration, before going in a > different direction.) > > Some implications: > > - The VM probably wants to normalize its encodings (null reference vs. > null instance), meaning there's a translation layer on field/array reads, > just like Option I, and also for field/array writes, just like Option D. > > - Casts to Q types for certain classes should also translate from null > reference to null instance, rather than NPE. > > - For these classes, the 'withfield' instruction is uniquely able to > operate on and produce 'null'. > > - In the language, the 'null' literal can be assigned to some inline > types. (In the VM, the verifier could require using 'defaultvalue' instead, > if it wants to avoid some class loading.) > > - We could revisit the question of whether it's possible to migrate an > identity class to be an inline-default inline class as long as the default > instance is 'null'. (There are additional issues, like binary > compatibility. But we could we-open that exploration...) > > --- > > My sense is that Option I dominates Option J by most measures?it achieves > the same result (default value is invalid), with less work at flattened > storage barriers, fewer tweaks to the rest of the system, and a more useful > programming model (no nulls being passed around). > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From daniel.smith at oracle.com Mon Jul 13 21:39:06 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 13 Jul 2020 15:39:06 -0600 Subject: Revisiting default values In-Reply-To: <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> Message-ID: <514E7092-A9AA-42AD-97C3-9B5EA341B9EE@oracle.com> > On Jul 13, 2020, at 12:19 PM, Dan Smith wrote: > >> On Jul 10, 2020, at 12:23 PM, Dan Smith wrote: >> >> Option G: JVM support for default instance guards >> >> Inline class files can indicate that their default instance is invalid. All attempts to operate on that instance (via field/method accesses, other than 'withfield') result in an exception. >> >> This tightens up Option F, making it just as impossible to access members of the default instance as it is to access members of 'null'. >> >> Same issues as Option E regarding adding a "new NPE" to the platform. > > There's a variant of this that deserves spelling out: > > --- > > Option J: JVM treats default instance as 'null' > > Like Option G, an inline class file can indicate that its default instance is invalid?in this case, 'null'. All attempts to operate on that instance result in an NPE. Conceptually, the null instance and the null reference are the same, and should generally be indistinguishable. > > (We explored this awhile back as a tool for migration, before going in a different direction.) > > Some implications: > > - The VM probably wants to normalize its encodings (null reference vs. null instance), meaning there's a translation layer on field/array reads, just like Option I, and also for field/array writes, just like Option D. > > - Casts to Q types for certain classes should also translate from null reference to null instance, rather than NPE. > > - For these classes, the 'withfield' instruction is uniquely able to operate on and produce 'null'. > > - In the language, the 'null' literal can be assigned to some inline types. (In the VM, the verifier could require using 'defaultvalue' instead, if it wants to avoid some class loading.) > > - We could revisit the question of whether it's possible to migrate an identity class to be an inline-default inline class as long as the default instance is 'null'. (There are additional issues, like binary compatibility. But we could we-open that exploration...) > > --- > > My sense is that Option I dominates Option J by most measures?it achieves the same result (default value is invalid), with less work at flattened storage barriers, fewer tweaks to the rest of the system, and a more useful programming model (no nulls being passed around). And here's another option that has been previously discarded, but might be worth picking back up. This one to address Bucket #2: --- Option K: JVM initializes fields/arrays to a designated default The VM allows inline classes to designate a logical default instance, and during class preparation or array allocation, any fields/components of the inline type are initialized to the logical default. Compare to Option D. Rather than adding barriers to reads/writes that interact with the storage, we simply initialize the storage "properly" in the first place. The possibly-fatal downside is that it means every array allocation for that inline type has to stamp out a bunch of copies of a particular bit pattern, rather than the simpler all-zeros pattern. But that extra cost may be worth it in exchange for faster reads/writes to the array. (Same comments for class instances, although I don't think it's as much of a concern, given the relatively small sizes of class instances.) Note that some arrays *already* have to stamp out a nonzero bit pattern, if the encoding of an inline type uses pointers rather than flattened fields (e.g., for an inline class with too many fields). --- If we're enthusiastic about addressing Bucket #2, this seems like a viable approach?quite simple, and with comparable performance to most of the other approaches. From peter.levart at gmail.com Tue Jul 14 12:39:07 2020 From: peter.levart at gmail.com (Peter Levart) Date: Tue, 14 Jul 2020 14:39:07 +0200 Subject: Revisiting default values In-Reply-To: <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> Message-ID: What about a variant of G or J where an inline class would designate a single field to be used for "isDefault" checks. Instead of comparing all fields for "zero" value, a single designated field would be used in checks. So a class is free to choose which of the existing fields is "never zero/null" in the set of valid class states or can even add a special-purpose (boolean) field to be used just for that. Often no such special field would need to be added. WDYT? Peter On 7/13/20 8:19 PM, Dan Smith wrote: >> On Jul 10, 2020, at 12:23 PM, Dan Smith wrote: >> >> Option G: JVM support for default instance guards >> >> Inline class files can indicate that their default instance is invalid. All attempts to operate on that instance (via field/method accesses, other than 'withfield') result in an exception. >> >> This tightens up Option F, making it just as impossible to access members of the default instance as it is to access members of 'null'. >> >> Same issues as Option E regarding adding a "new NPE" to the platform. > There's a variant of this that deserves spelling out: > > --- > > Option J: JVM treats default instance as 'null' > > Like Option G, an inline class file can indicate that its default instance is invalid?in this case, 'null'. All attempts to operate on that instance result in an NPE. Conceptually, the null instance and the null reference are the same, and should generally be indistinguishable. > > (We explored this awhile back as a tool for migration, before going in a different direction.) > > Some implications: > > - The VM probably wants to normalize its encodings (null reference vs. null instance), meaning there's a translation layer on field/array reads, just like Option I, and also for field/array writes, just like Option D. > > - Casts to Q types for certain classes should also translate from null reference to null instance, rather than NPE. > > - For these classes, the 'withfield' instruction is uniquely able to operate on and produce 'null'. > > - In the language, the 'null' literal can be assigned to some inline types. (In the VM, the verifier could require using 'defaultvalue' instead, if it wants to avoid some class loading.) > > - We could revisit the question of whether it's possible to migrate an identity class to be an inline-default inline class as long as the default instance is 'null'. (There are additional issues, like binary compatibility. But we could we-open that exploration...) > > --- > > My sense is that Option I dominates Option J by most measures?it achieves the same result (default value is invalid), with less work at flattened storage barriers, fewer tweaks to the rest of the system, and a more useful programming model (no nulls being passed around). > From Roger.Riggs at oracle.com Tue Jul 14 18:58:37 2020 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Tue, 14 Jul 2020 14:58:37 -0400 Subject: Revisiting default values In-Reply-To: <8F9302A6-4AF0-4D7F-8F7F-32745EB38AE8@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <8F9302A6-4AF0-4D7F-8F7F-32745EB38AE8@oracle.com> Message-ID: Hi Dan, I don't think its tennable to have inline values around that are not valid because they have not been 'created' without going through the 'constructors'. I know that is what you are trying to solve (the tradeoffs). On 7/13/20 1:36 PM, Dan Smith wrote: > From valhalla-spec-observers: > >> On Jul 12, 2020, at 10:45 PM, Zheka Kozlov wrote: >> >> Sorry for a probably stupid question but aren't all classes from Bucket #2 and #3 ref-default? Which means when we are calling new LocalDate[10], all elements of the array are initialized to null. And since the constructors of these classes are private, the external user will never see the instances in their default state. > True, 'new LocalDate[10]' will continue to allocate an array of nulls. The default instance is only relevant when someone does 'new LocalDate.val[10]'. If LocalDate becomes the inline class, then 'new LocalDate[10]' will be an array of inline instances, not references. At least that was the expectation when labeling LocalDate as a value class. And they should be initialized to a value that is provided by or consistent with one of the constructors. So for arrays, that needs to happen before the first reference, some lazyness is ok but would need to be enforced by the VM. This needs to be true for fields as well as arrays. Having a class defined default value would at least provide a mechanism to make that instance be under control of the class, even if only is able to throw an exception because there is no valid value as in some of your previous examples. I know this point went by a while back, but allowing the default and 'withfield' bytecodes outside of a legatimate constructor seems like integrity problem.? With identity classes, the verifier and VM goes to some lengths to check that a partially/not initialized instance is not published. I agree with Kevin's concerns. Regards, Roger > > Regardless of the syntax, if there exists an inline type for instances of an inline class ('LocalDate.val' above), there will also be a semantic question of how we initialize fields/arrays of that inline type. From daniel.smith at oracle.com Tue Jul 14 21:11:36 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 14 Jul 2020 15:11:36 -0600 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <2B3A8A59-318D-4881-A2A2-42EC70A8C525@oracle.com> Message-ID: <7D815BF4-EC41-4FD6-8CDD-362BAC222626@oracle.com> > On Jul 14, 2020, at 6:39 AM, Peter Levart wrote: > > What about a variant of G or J where an inline class would designate a single field to be used for "isDefault" checks. Instead of comparing all fields for "zero" value, a single designated field would be used in checks. So a class is free to choose which of the existing fields is "never zero/null" in the set of valid class states or can even add a special-purpose (boolean) field to be used just for that. Often no such special field would need to be added. > > WDYT? This is probably more fine-grained than I want to get into right now?let's choose a direction before drilling down on how we can make it fast?but, yes, in previous discussions we have considered using a designated field as the 'isDefault' signal, rather than doing a full 'val == Foo.default'. I don't know whether that's likely to be a worthwhile optimization or not. From scolebourne at joda.org Tue Jul 14 23:57:04 2020 From: scolebourne at joda.org (Stephen Colebourne) Date: Wed, 15 Jul 2020 00:57:04 +0100 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <8F9302A6-4AF0-4D7F-8F7F-32745EB38AE8@oracle.com> Message-ID: On Tue, 14 Jul 2020 at 19:59, Roger Riggs wrote: > If LocalDate becomes the inline class, then 'new LocalDate[10]' will be > an array of inline instances, not references. > At least that was the expectation when labeling LocalDate as a value class. > And they should be initialized to a value that is provided by or > consistent with one of the constructors. My view is that the only acceptable default for LocalDate is (conceptually) null. But it is also my view that this should not prevent it from being an inline class. The complete value set of LocalDate consists of all valid dates plus null. Conceptually, this isn't the null reference as we currently know it, but an instance of the LocalDate inline type that can be treated as null at the language level. ie.a different kind of null but one that cannot be distinguished as different at the language level. Thus new LocalDate[10] will continue to be full of nulls, but they would be inline instance LocalDate nulls, not null references. Thus you have: a) values where "all bits zero" has meaning as a default - int, Long128, Optional b) values where "all bits zero" is treated as null - LocalDate, Currency, Money I struggle with any Valhalla outcome where (a) is significantly more performant/optimised than (b), because that would push developers to expose inappropriate default values. We'd end up with a lot more of Kevin's 1970-01-01 oddities (such as a default of Currency XXX or Money XXX 0). As such, I'm unconvinced by the current .ref/.val approach to solving this. My desired semantics are exactly as today: var a = new LocalDate[10]; assert a[0] == null; // OK a[0].getDayOfMonth(); // NPE but with the memory storage of the array being flattened/inlined. This works for LocalDate because "all bits zero" is not in use and free to be interpreted as "null". The language/JVM would need to guarantee this for all nullable inline types (perhaps by checking the bits are not all zero at the end of the constructor). Any attempt to say LocalDate has a default (or that LocalDate.val has a default) seems doomed to failure, even if it is just to say the default is invalid. The last thing I'd want to see is an "undefined" concept added to Java that is similar to null but different. Dan's Option J is supposed to capture this direction. Stephen From daniel.smith at oracle.com Wed Jul 15 06:01:10 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 Jul 2020 00:01:10 -0600 Subject: EG meeting, 2020-07-15 Message-ID: The next EG Zoom meeting is tomorrow, 4pm UTC (9am PDT, 12pm EDT). Recent threads to discuss: - "JEP draft: Identity Warnings for Inline Class Candidates": I drafted a JEP that lists candidates for inline class migration and describes the compiler and runtime warnings needed to prepare for their migration - "Revisiting default values": I described use cases for different treatments of inline types' default values, and then summarized various strategies for supporting some of the nontrivial use cases From forax at univ-mlv.fr Wed Jul 15 14:19:49 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 15 Jul 2020 16:19:49 +0200 (CEST) Subject: Revisiting default values In-Reply-To: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: <295577363.1172974.1594822789211.JavaMail.zimbra@u-pem.fr> So the default value may a valid value or may be an invalid value, if it's an invalid value it should be the author of the class that say that because in Java we prefer declaration site to use site. One way is to try to teach the VM how to do the conversions, i want to explore another way where we try to solve that issue at the language level, to avoid to have a more complex VM. A default value which is invalid should behave like null, i.e. calling any methods on the default value should result in an exception. Doing that at the language level means adding a check before calling any instance methods and before accessing any instance fields. So there are two parts to solve, 1/ how to specify the check, is it just this == Inline.default or is it whatever the user want (or something in the middle, like a field check) 2/ how to execute that check when accessing a field or a method ? Let explore the solution that offers the maximum freedom for the author of the inline class, i.e. for 1/, the check is user defined. For that we can introduce a new kind of initializer, like the static block, let's call it the invariant block inline class Foo { private final Object o; invariant { if (o == null) { throw new InvalidFooException(); } } } this invariant block is translated into a method (that has the name see later why) and is called each time a method or a field is accessed. For 2/, we can either change the spec of the VM so the invariant block is called automatically by the VM or we can use invokedynamic. invokedynamic has the advantage of not requiring more VM support at the expanse of the bootstrap issue. The main issue with invokedynamic is that it's not a backward compatible change because it requires to change the call sites. So we can lessen the requirement like this, requiring only the call to when accessing an instance method because we suppose that people will not be foolish enough to declare the fields public, In that case, there is no need for using invokedynamic because a call to the invariant method can be inserted by the compiler at the beginning of any instance method. This solution also has the advantage of lowering the cost at runtime compared to using invokedynamic. In term of performance, i believe the language spec should say that the invariant block has to be idempotent. Because in that case, the VM is free to not execute several calls to the method once one is executed on a specific instance (like the JITs do nullchecks collapsing currently). To summarize, i believe we should allow more value based classes to be retrofitted as inline class by adding the concept of invariant block to the Java language spec. An invariant block being a simple idempotent method called at the beginning of every instance methods. R?mi ----- Mail original ----- > De: "daniel smith" > ?: "valhalla-spec-experts" > Envoy?: Vendredi 10 Juillet 2020 20:23:25 > Objet: Revisiting default values > Brian pointed out that my list of candidate inline classes in the Identity > Warnings JEP (JDK-8249100) includes a number of classes that, despite being > "value-based classes" and disavowing their identity, might not end up as inline > classes. The problem? Default values. > > This might be a good time to revisit the open design issues surrounding default > values and see if we can make some progress. > > Background/status quo: every inline class has a default instance, which provides > the initial value of fields and array components that have the inline type > (e.g., in 'new Point[10]'). It's also the prototype instance used to create all > other instances (start with 'vdefault', then apply 'withfield' as needed). The > default value is, by fiat, the class instance produced by setting all fields to > *their* default values. Often, but not always, this means field/array > initialization amounts to setting all the bits to 0. Importantly, no user code > is involved in creating a default instance. > > Real code is always useful for grounding design discussions, so let's start > there. Among the classes I listed as inline class candidates, we can put them > in three buckets: > > Bucket #1: Have a reasonable default, as declared. > - wrapper classes (the primitive zeros) > - Optional & friends (empty) > - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration > (0s), Period (0d), Year (1 BC, if that's acceptable) > > Bucket #2: Could have a reasonable default after re-interpreting fields. > - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime, > OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate, > JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null > Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special > handling) > - ListN, SetN, MapN (null array interpreted as empty) > > Bucket #3: No good default. > - Runtime.Version (need a non-null List) > - ProcessHandleImpl (need a valid process ID) > - List12, Set12, Map1 (need a non-null value) > - All ConstantDesc implementations (need real class & method names, etc.) > > There's some subjectivity between the 2nd and 3rd buckets, but the idea behind > the 2nd is that, with some translation layer between physical fields and > interpretation of those fields, we can come up with an intuitive default (e.g., > "0 means January"; "a null String means time zone 'UTC'"). In contrast, in the > third bucket, any attempt to define a default value is going to be pretty > unintuitive ("A null method name means 'toString'"). > > The question here is how much work the JVM and language are willing to do, or > how much work we're willing to ask clients to do, in order to support use cases > that don't fall into Bucket #1. > > I don't think totally excluding Buckets #2 and #3 is a very good outcome. It > means that, in many cases, inline classes need to be built up exclusively from > primitives or other inline types, because if you use reference types, your > default value will have a null field. (Sometimes, as in Optional, null fields > have straightforward interpretations, but most of the time programs are > designed to prevent them.) > > Whether we support Bucket #2 but not Bucket #3 is a harder question. It wouldn't > be so bad if none of the examples above in Bucket #3 become inline classes?for > the most part they're handled via interfaces, anyway. (Counterpoint: inline > class instances that are immediately typed with interface types still > potentially provide a performance boost.) But I'm also not sure this is > representative. We've noted before that many use cases, like database records > or data structure cursors, don't have meaningful defaults (what's a default > mailing address?). The ConstantDesc classes really illustrate this, even though > they happen to not be public. > > Another observation is that if we support Bucket #3 but not Bucket #2, that's > probably not a big deal?I'm not sure anybody really *wants* to deal with the > default instance; it's just the price you pay for being an inline class. If > there's a way to opt out of that extra weirdness and move from Bucket #2 to > Bucket #3, great. > > With that discussion in mind, here are some summaries of approaches we've > considered, or that I think we ought to consider, for supporting buckets #2 and > #3. (This is as best as I recall. If there's something I've missed, add it to > the list!) > > [Weighing in for myself: my current preference is to do one of F, G, or I. I'm > not that interested in supporting Bucket #2, for reasons given above, although > Option A works for programmers who really want it.] > > > > === Solutions to support Bucket #2 === > > Two broad strategies here: re-interpreting fields (A, B), and re-interpreting > the default instance (C, D). > > --- > > Option A: Encourage programmers to re-interpret fields > > Guidance to programmers: when you declare an inline class, identify any fields > for which the default instance should hold something other than zero/null; > define a mapping for your implementation from zero/null to the value you want. > > One way to do this is to define a (possibly private) getter for each field, and > include logic like 'return month + 1' or 'return id == null ? "UTC" : id'. Or > maybe you inline that logic, as long as you're careful to do so everywhere. > Importantly, you also need to reverse the logic in your constructor?for the > sake of '==', if somebody manually creates the default instance, you should > set fields to zero/null. > > This doesn't work if you want public fields, but that's life as an OO > programmer. > > In this approach, it would be important that inline classes be expected to > document their default instance in Javadoc (perhaps with a new Javadoc tag)?the > interpretation of the default instance is less apparent to users than "all > zeros". > > Limitations: > > - It's a fairly error-prone approach. Programmers will absolutely forget to > apply the mapping in one place, and everything will be fine until somebody > tries to invoke a particular method on the default instance. Put that bug in a > security-sensitive context, and maybe you have an exploit. (Something that > could help some is choosing good names?call your field 'monthIndex', not plain > 'month', to remind yourself that it's zero-based.) > > - Performance impact of an extra layer of computation on all field accesses. > Probably not a big deal in general, but all those null checks, etc., could have > a negative impact in certain contexts. And the *appearance* of extra cost might > scare programmers away from doing the right thing ("eh, I probably won't use > the default value anyway, I'll just ignore it to make my code faster"). > > --- > > Option B: Language support for field re-interpretation > > The language allows inline classes to declare fields with mappings to/from an > internal representation. Just like Option A, but with guarantees that the > internal representation isn't inappropriately accessed directly. > > This pulls on a thread we explored a bit for Amber awhile back, some form of > "abstract fields" or "virtual fields". Maybe there's something there, but it > seems like a general-purpose feature, and one we're not likely to reach a final > solution on anytime soon. > > --- > > Option C: Language support for a designated default > > The language provides some way for programmers to declare the "logical" default > instance (something like a special static field). The compiler inserts a test > for the "physical" default on any field/array access, and replaces it with the > logical default. > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; > > This is much less bug-prone than Option A?the compiler does all the work?and > much more achievable in the short/medium term than Option B. > > Compared to Option B, this pushes the computation overhead from inline class > field accesses to reads of the inline type from fields/arrays. I don't know if > that's good or bad?maybe a wash, heavily dependent on the use case. > > A few big problems: > > - The physical default still exists, and malicious bytecode can use it. If > programmers want strong guarantees, they'll have to check and throw wherever an > untrusted instance is provided. (Clients with access to the inline class's > fields have to do so, too.) > > - Covariant arrays mean every read from any array type that might be flattened > (Object[], Runnable[], ConstantDesc[], ...) has to go through translation > logic. > > - There's an assumption here that the programmer doesn't intend to use the > physical default as a valid non-default instance. That's hard for the compiler > to enforce, and weird stuff happens in fields/arrays if the programmer doesn't > prevent it. (Could be mitigated with extra implicit logic on field/array writes > or in constructors.) > > --- > > Option D: JVM support for a designated default > > The VM allows inline classes to designate a logical default instance, and the > field/array access instructions map from the physical default to the logical > default. The 'vdefault' instruction produces the logical default instance; > something else is used by the class's factories to build from the physical > default. > > This addresses the first two problems with Option C?the VM gives strong > guarantees, and can make the translation a virtual operation of certain arrays. > > To address the second problem, it seems like we'd need the more complex logic I > hinted at: on writes, map the physical default to the logical default, and map > the logical default to the physical default. Do the reverse on reads. > > The problem here is bytecode complexity/slowdowns. We've already added some > complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate > similar changes to 'putfield'/'getfield' (specialized fields), so maybe that > means we might as well do more. Or maybe it means we're already over budget. > :-) > > From the users' perspective, if any performance reduction on reads/writes can be > limited to the inline classes in Bucket #2, *all* the options have a similar > cost, whether imposed by the programmer, language, or VM. So, to a first > approximation, slower opcode execution is fine. > > > > === Solutions to support Bucket #3 === > > Two broad strategies here: rejecting member accesses on the default instance (E, > F, G), and preventing programs from ever seeing the default instance (H, I). > > --- > > Option E: Encourage programmers to guard against default instances > > Guidance to programmers: if you don't like your class's default instance, check > for it in your methods and throw. Maybe Java SE defines a new RuntimeException > to encourage this. > > The simple way to do this is with some boilerplate at the start of all your > methods: > > if (this == MyClass.default) throw new InvalidDefaultException(); > > More permissive classes could just do some validation on the fields that are > relevant to a particular operation. (E.g., 'getMonth' doesn't care if 'zoneId' > is null.) > > This doesn't work if you want public fields, but that's life as an OO > programmer. > > It's not ideal that an invalid instance can float around a program until > somebody trips on one of these checks, rather than detecting the invalid value > earlier?we're propagating the NPE problem. And it takes some getting used to > that there are two null-like values in the reference type's domain. > > --- > > Option F: Language support for default instance guards > > An inline class declaration can indicate that the default instance is invalid. > The compiler generates guards, as in Option E, at the start of all instance > method bodies, and perhaps on all field accesses outside of those methods. > > Programmers give up finer-grained control, but get more safety. I'm sure most > would be happy with that trade. > > Improper/separately-compiled bytecode can skip the field access checks, but > that's a minor concern. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option G: JVM support for default instance guards > > Inline class files can indicate that their default instance is invalid. All > attempts to operate on that instance (via field/method accesses, other than > 'withfield') result in an exception. > > This tightens up Option F, making it just as impossible to access members of the > default instance as it is to access members of 'null'. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option H: Language checks on field/array reads > > An inline class declaration can indicate that the default instance is invalid. > Every field and array access that may involved an uninitialized field/array > component of that inline type gets augmented with a check that rejects reads of > the default value (treating it as "you forgot to initialize this variable"). > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); > Point p = p$0; > > This is much like Option C, and has roughly the same advantages/problems. > There's not a strong guarantee that the default value won't pop up from > untrusted bytecode (or unreliable inline class authors), and lots of array > types need guards. > > --- > > Option I: JVM checks on field/array reads > > Inline class files can indicate that their default instance is invalid. When > reading from a field/array component of the inline type > ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default value > is found (treating it as "you forgot to initialize this variable"). The > 'vdefault' instruction, like 'withfield', is illegal outside of the inline > class's nest. > > Better than Option H in that it can be optimized to occur on only certain reads, > and in that it provides strong guarantees?only the inline class can ever "see" > the default instance. > > Well, unless the inline class chooses to share that instance with the world. Not > sure how we prevent that. But maybe at that point, anything bad/weird that > happens is the author's own fault. (E.g., putting the default value in an array > will make that component effectively "uninitialized" again.) > > Like Option D, there's a question of whether we're willing to add this > complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is > that at least it's less complexity than you have in Option D. From daniel.smith at oracle.com Wed Jul 15 17:31:57 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 Jul 2020 11:31:57 -0600 Subject: EG meeting, 2020-07-15 In-Reply-To: References: Message-ID: <40E0B126-0CC9-416D-A0EC-77116A507F26@oracle.com> Notes from the discussion: > On Jul 15, 2020, at 12:01 AM, Dan Smith wrote: > > The next EG Zoom meeting is tomorrow, 4pm UTC (9am PDT, 12pm EDT). > > Recent threads to discuss: > > - "JEP draft: Identity Warnings for Inline Class Candidates": I drafted a JEP that lists candidates for inline class migration and describes the compiler and runtime warnings needed to prepare for their migration Remi wondered whether changing the semantics of '==' for wrapper classes will be a problem. Answer: constructors will be deprecated; for the small integers (-128 to 127?) produced by 'valueOf', == will have the same behavior; for the large integers produced by 'valueOf', behavior of == is currently unspecified, so we're fine to change behavior. There's a consensus that we should probably track the inline class candidates (or maybe all value-based classes?) with an annotation. > > - "Revisiting default values": I described use cases for different treatments of inline types' default values, and then summarized various strategies for supporting some of the nontrivial use cases General agreement that the "Bucket #3" group (classes without a natural default) is what we should focus on, not the "Bucket #2" group (classes that want a nontrivial default). We talked through various "Bucket #3" approaches. The preferred approaches seem to be: - Guards on member access, either compiler-generated or JVM-enforced. Fields and default methods require special attention. (May involve making the invalid default equivalent to "null", but there are complexity concerns.) - Tracking uninitialized fields/arrays in the JVM. There's concern about performance of array reads, best optimization might be adding a boolean "isInitialized" flag to each flattened value. There's also concern about tearing producing unexpected default values in corner cases, possibly mitigated with our story for atomicity. - Bucket #3 classes must be reference-default, and fields/arrays of their inline type are illegal outside of the declaring class. The declaring class can provide a flat array factory if it wants to. (A new idea from Tobi, he'll write it up for the thread.) From brian.goetz at oracle.com Mon Jul 20 15:38:30 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Jul 2020 11:38:30 -0400 Subject: JEP draft: Identity Warnings for Inline Class Candidates In-Reply-To: References: Message-ID: <6bbf0263-4310-d720-0c92-33365deddf2a@oracle.com> Comments: "lack a unique identity" could cause readers to puzzle over whether inline objects have multiple identities.? Suggest "lack object identity" instead. In the motivation, I would add examples like: ??? void foo(Object o) { ??????? synchronized(o) { ... } ??? } ??? void foo(T t) { /* same */ } to make it clear what we're talking about.? The above code is valid today -- though may be semantically questionable, and doubly so when `o` is an `Integer`.? Tomorrow, when someone passes an inline object, it will IMSE.? These are the places where we want to detect when people are using questionable identities as locks. I would also outline why such code is questionable: that for the classes you cite, the libraries are free to but generally not obligated to intern and cache instances, so you can't really be sure what lock you're talking about. I would also remind users what the _benefits_ of migrating, say, `Duration` to inline classes, to head off the inevitable? "why are you guys always changing stuff that makes more work for me" objections. In Description, I would s/likely to become/may become/.? Part of the goal of this JEP is to determine which of these have hidden constraints that would prevent them from being candidates, and, as the other discussion on default values indicates, there may be other reasons too. Where you mention annotations, remind readers that the annotation would have no semantic value, it would only be for documentation purposes. On 7/8/2020 6:08 PM, Dan Smith wrote: > Here's an initial JEP draft for the "Identity Warnings for Inline Class Candidates" feature, which I'm hoping we can target to 16. > > https://bugs.openjdk.java.net/browse/JDK-8249100 > > Feedback is welcome. From brian.goetz at oracle.com Mon Jul 20 16:27:55 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Jul 2020 12:27:55 -0400 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: Responding to Kevin's tangent: ?- Of the one's on Dan's list, one could argue that even some of the ones in Bucket 1 are questionable, such as `char` or `Instant`.? The ones that really seem like slam dunks are: numerics (int, long, etc), boolean, and maybe Optional.? That's a small list. (Another candidate for bucket 1: BigDecimal.) More generally: ?- The language is schizoid about uninitialized variables.? DA analysis requires that we always initialize locals (even when we want to initialize `count` to `0`), but doesn't require it for fields.? This is because we know that there are windows of unfortunateness where the default value is still observable -- inside the ctor, or if `this` escapes the ctor. > Option J: JVM treats default instance as 'null' Implementation note: when we explored this a while back, we were interested in identifying a "pivot field" where the programmer committed (or analysis proved) that all properly initialized instances would have a non-default value for this field, as would be the case if any field had an unconditional `foo = new Foo()` assignment in the constructor.? This makes detection of the default value much faster, since you only have to check the pivot field. (Peter raises this in his "what about" query later.) We were initially excited about this approach but later realized it was feeding the "optimization dopamine receptor" rather than actually solving a problem :) > It sounds like this debate is between `null` and a value which really > is the /moral equivalent/ of `null`. You basically would have two > kinds of nullability that look different from each other. John has made an impassioned plea for "no new nulls". Accordingly, we did explore a variant of J where a `withfield` that set the pivot field to its default value _actually put a null on the stack_.? (We backed off.) > And here's another option that has been previously discarded, but might be worth picking back up. This one to address Bucket #2: > > --- > > Option K: JVM initializes fields/arrays to a designated default John has in the past pushed back on this, in part because of the problem identified above (can't close the window 100%, only 99.5%, and that 0.5% is where security bugs come from), and in part because of the cost/complexity in the JVM. That said, doing so in the language is potentially more viable.? It would mean, for classes that opt into this treatment: ?- Ensuring that `C.default` evaluates to the right thing - Preventing `this` from escaping the constructor (which might be a good thing to enforce for inline classes anyway) ?- Ensuring all fields are DA (which we do already), and that assignments to fields in ctors are not their default value ?- Translating `new Foo[n]` (and reflective equivalent) with something that initializes the array elements The goal is to keep default instances from being observed. If we lock down `this` from constructors, the major cost here is instantiating arrays of these things, but we already optimize array initialization loops like this pretty well. Overall this doesn't seem terrible.? It means that the cost of this is borne by the users of classes that opt into this treatment, and keeps the complexity out of the VM.? It does mean that "attackers" can generate bytecode to generate bad instances (a problem we have with multiple vectors today.) Call this "L". I'd suggest, though, we back away from implementation techniques (you've got a good menu going already), and focus more on "what language do we want to build."? You claim: > I don't think totally excluding Buckets #2 and #3 is a very good outcome. Which I think is a reasonable hypothesis, but I suggest we focus the discussion on whether we believe this or not, and what we might want to do about it (and when), first. On 7/10/2020 2:46 PM, Kevin Bourrillion wrote: > This response is not to the main topic; not trying to send us down a > rabbit-hole but this point is very important to me (as will be clear :-)). > > > On Fri, Jul 10, 2020 at 11:23 AM Dan Smith > wrote: > > Bucket #1: Have a reasonable default, as declared. > - wrapper classes (the primitive zeros) > - Optional & friends (empty) > - From java.time: Instant (start of 1970-01-01), LocalTime > (midnight), Duration (0s), Period (0d), Year (1 BC, if that's > acceptable) > > > Duration and Period: sure. > > Instant and the others: please, please put these in a separate bucket. > They can have a /default/, but it is absolutely /not/?a "reasonable" > default. In fact many tens (hundreds?) of thousands of bug reports in > the last 50 years of computing have been?"why in the world did > 1970-01-01 or 1969-12-31 show up on this screen??" > > (Source: my team at Google has invested literally multiple > person-years in an effort to stamp out bugs with how users use > java.time, which I kicked off and have stayed peripherally involved > in. I feel this should make our perspective worth listening to.) > > Realize that primitive types having default values /already/?causes > some number of bugs today even though we know they are the least-bad > category and that risk is acceptable. > > My reason for complaining here is not just about the java.time types > themselves, but to argue that this is an important 4th bucket we > should be concerned about. In some ways it is a bigger problem that > Bucket #3 "no good default", since it is an /actively harmful/?default. > > For all of these types, there is one really fantastic default value > that does everything you would want it to do: null. That is why these > types should not become inline types, or /certainly/?not val-default > inline types, and?why Error Prone will have to ban usage of `.val` if > they do. > > (Tangent of tangent: midnight is an interesting choice of default > value for LocalTime, since I think there are some LocalTimes that so > far have /always happened/?in every date and location in history and > that's not one of them. That's not to say any other choice would work, > but just to highlight how wrong it is to have any default value at all.) > > Bucket #2: Could have a reasonable default after re-interpreting > fields. > - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, > ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, > MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and > days should be nonzero; null Strings, ZoneIds, HijrahChronologies, > and JapaneseEras require special handling) > > > Echoing... default seems harmful in every one of these. > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > From mcnepp02 at googlemail.com Tue Jul 21 08:07:44 2020 From: mcnepp02 at googlemail.com (Gernot Neppert) Date: Tue, 21 Jul 2020 10:07:44 +0200 Subject: Revisiting .ref and .value Message-ID: If I've got this right, the way for specifying which variant of an inline class to use, one would have to use one of the suffixes .ref or .value. Existing classes refactored as inline classes could somehow specify that .ref was the default, whereas new inline classes would have .value the default. What strikes me as odd is the fact that we are now having 4 new terms that would have to be understood by the developer: - the keyword 'inline' that declares a class that represents a value-type. - the keyword-suffix '.val' that says "At this point, use the inline representation of a class". - the keyword-suffix '.ref' that says "At this point, use the regular non-inline representation of a class". - the Standard interface "IdentityObject" that probably most often will be encountered as a type-bound for a generic function or class. This makes me think: do we really need to learn so many different terms when they are so closely related? Couldn't we do without the '.val' and '.ref' suffixes if we shifted towards decorating the *type-use* instead of the *type* itself? Which leads me to this idea: 1. A declaration "inline class Bar" would always declare a 'ref-default" class. The purpose of 'inline' here would be simply to enforce certain restrictions, such as final fields etc. 2. inline classes would not implement the interface "IdentityObject". 3. In order to make use of the "inline characteristics", one would have to specify "inline" again: class Outer { Bar byRef; // ref-member, may be null. inline Bar embedded; // inline member, mandates the same definite-assignment rule as that for final fields. } // inline return-value and inline parameters will copy by-value. inline Bar calculate(inline Bar bar) { inline Bar temp = bar; // inline local var, will copy by-value. // inline array. Must use an initializer-instance that will be used for all array-members. inline Bar[] arr = new Bar[10] { new Bar("") }; } // ref-return-value and ref-parameters Bar calculate(Bar bar) { Bar temp = bar; // local var, assigns the reference only. } // generic function requires inline-class, so copy-by-value may be used void foo(inline C) { } // normal generic function for any ref-type. Will use implicit reference-projection if an inline-value is passed void foo(C obj) { } // generic function that explicitly requires non-inline class. // By requiring IdentityObject, we know that 'obj' cannot be the result of a reference-projection, so we can safely synchronize on it! void foo(C obj) { synchronized(obj) { } } // Conflicting declaration, will be rejected by compiler void foo(inline C obj) { synchronized(obj) { } } From paul.bjorkstrand at gmail.com Tue Jul 21 11:38:37 2020 From: paul.bjorkstrand at gmail.com (Paul Bjorkstrand) Date: Tue, 21 Jul 2020 06:38:37 -0500 Subject: Revisiting .ref and .value In-Reply-To: References: Message-ID: I don't find it that confusing, but I have been following this project for years now. I do see how it can get confusing with all the new terminology. Fwiw, I believe the syntax is not yet set in stone (or if I am wrong please let an expert correct me). Question: if you make a new, numeric type, that fits inside a machine word, would you really want to be forced to always say 'inline' at every use to get all the benefits of the inline type? I'm not against this idea; I see the allure of the simplicity. I had to ask though. On Tue, Jul 21, 2020, 03:12 Gernot Neppert wrote: > If I've got this right, the way for specifying which variant of an inline > class to use, one would have to use one of the suffixes .ref or .value. > Existing classes refactored as inline classes could somehow specify that > .ref was the default, whereas new inline classes would have .value the > default. > > What strikes me as odd is the fact that we are now having 4 new terms that > would have to be understood by the developer: > - the keyword 'inline' that declares a class that represents a value-type. > - the keyword-suffix '.val' that says "At this point, use the inline > representation of a class". > - the keyword-suffix '.ref' that says "At this point, use the regular > non-inline representation of a class". > - the Standard interface "IdentityObject" that probably most often will be > encountered as a type-bound for a generic function or class. > > This makes me think: do we really need to learn so many different terms > when they are so closely related? > > Couldn't we do without the '.val' and '.ref' suffixes if we shifted towards > decorating the *type-use* instead of the *type* itself? > Which leads me to this idea: > > 1. A declaration "inline class Bar" would always declare a 'ref-default" > class. The purpose of 'inline' here would be simply to enforce certain > restrictions, such as final fields etc. > 2. inline classes would not implement the interface "IdentityObject". > 3. In order to make use of the "inline characteristics", one would have to > specify "inline" again: > > class Outer { > Bar byRef; // ref-member, may be null. > inline Bar embedded; // inline member, mandates the same > definite-assignment rule as that for final fields. > } > > > // inline return-value and inline parameters will copy by-value. > inline Bar calculate(inline Bar bar) { > inline Bar temp = bar; // inline local var, will copy by-value. > // inline array. Must use an initializer-instance that will be used for > all array-members. > inline Bar[] arr = new Bar[10] { new Bar("") }; > } > > // ref-return-value and ref-parameters > Bar calculate(Bar bar) { > Bar temp = bar; // local var, assigns the reference only. > } > > // generic function requires inline-class, so copy-by-value may be used > void foo(inline C) { > } > > // normal generic function for any ref-type. Will use implicit > reference-projection if an inline-value is passed > void foo(C obj) { > } > > // generic function that explicitly requires non-inline class. > // By requiring IdentityObject, we know that 'obj' cannot be the result of > a reference-projection, so we can safely synchronize on it! > void foo(C obj) { > synchronized(obj) { > } > } > > // Conflicting declaration, will be rejected by compiler > void foo(inline C obj) { > synchronized(obj) { > } > } > From daniel.smith at oracle.com Tue Jul 21 18:41:11 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 21 Jul 2020 12:41:11 -0600 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: <614D2739-4756-4D61-BF45-BA36490AACB3@oracle.com> > On Jul 20, 2020, at 10:27 AM, Brian Goetz wrote: > > That said, doing so in the language is potentially more viable. It would mean, for classes that opt into this treatment: > > - Ensuring that `C.default` evaluates to the right thing > - Preventing `this` from escaping the constructor (which might be a good thing to enforce for inline classes anyway) > - Ensuring all fields are DA (which we do already), and that assignments to fields in ctors are not their default value > - Translating `new Foo[n]` (and reflective equivalent) with something that initializes the array elements > > The goal is to keep default instances from being observed. If we lock down `this` from constructors, the major cost here is instantiating arrays of these things, but we already optimize array initialization loops like this pretty well. > > Overall this doesn't seem terrible. It means that the cost of this is borne by the users of classes that opt into this treatment, and keeps the complexity out of the VM. It does mean that "attackers" can generate bytecode to generate bad instances (a problem we have with multiple vectors today.) > > Call this "L". More letters! Expanding on ways to support Bucket #3 by ensuring initialization of fields/arrays: --- Option L: Language requires field/array initialization An inline class may be declared to have no default. Fields and arrays of that class's inline type must be provably initialized (via compiler analysis) before they are read or published. Instance fields of the class's inline type must be initialized before a method call involving 'this' occurs. (It's already illegal to allow the constructor to return before initialization.) Static fields... seem hopeless, so maybe must have a reference type (perhaps implicitly). Maybe we can do an analysis that permits some very simple cases, but once you allow method calls of almost any sort, you've lost. (We'd have to prove that no initialization of *other* classes triggered by refers to the field before it has been initialized.) Arrays must be initialized at creation time, either with an array initializer ("Address[] as = { x, y, z };") or via a trusted API ("Address[] as = Arrays.of(i -> x);"). We might introduce a language sugar for the trusted API ("Address[] as = { i -> x };"). We *could* support two-stage initialization via things like 'Arrays.fill', but analysis to track uninitialized arrays from creation to filling doesn't seem worthwhile. This is less expressive, obviously. In particular, many comfortable idioms for initializing an array won't work. As a case study: what happens in generic code like ArrayList? When it wants to allocate its array (we're in a specialized world where T has been specialized to 'QAddress;'), what value does it fill the array with? Nothing is available, because at this point the list is empty, and it's just allocating storage for later. I guess ArrayList (and similar data structures) has to have a special back door, and we're left to trust the author not to expose the uninitialized payload. As with all language features, there's also the question of what happens when a class file doesn't conform to the language's rules. Option L can't really stand alone?it needs to be backed up by some other option when the language's guarantees fail. --- Option M: JVM requires field/array initialization Inline class files can indicate that their default instance is invalid. Fields and arrays of that class's inline type must be provably initialized (via verification or related analysis) before they are read or published. All the compile-time analysis of Option L applies here, because the language compiler needs to be sure its generated class files are valid. We can use some new verification types to track the initialization status of 'this', the way we do to require 'super' calls today. You don't have a fully formed 'Foo', capable of being passed to other methods, etc., until all fields are initialized. This would also apply to 'defaultvalue' for an inline class with a field of a default-less inline type. Again, static fields are hopeless, it's an error to use the inline type as a static field type. 'anewarray' of the inline type is illegal, except within a trusted API. That API promises to initialize every array component before publishing the array. (We won't try to guarantee this with an analysis?the API is trusted because it has been vetted by humans.) In addition to some standard factory methods, we could decide that the inline class itself is always a trusted API. (A related approach was discussed at our last EG meeting, but with much less expressiveness: inline-typed fields are always illegal, and arrays can only be allocated by the class author.) This closes the backdoor of other bytecode not playing by the language's rules. The expressiveness problems of Option L remain?e.g., ArrayList's early allocation strategy is impossible. From Tobi_Ajila at ca.ibm.com Tue Jul 28 17:33:15 2020 From: Tobi_Ajila at ca.ibm.com (Tobi Ajila) Date: Tue, 28 Jul 2020 13:33:15 -0400 Subject: Revisiting default values In-Reply-To: <614D2739-4756-4D61-BF45-BA36490AACB3@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <614D2739-4756-4D61-BF45-BA36490AACB3@oracle.com> Message-ID: > Bucket #3 classes must be reference-default, and fields/arrays of their inline type are illegal outside of the declaring class. The declaring class can provide a flat array factory if it wants to. (A new idea from Tobi, he'll write it up for the thread.) ``` public sealed abstract class LegacyType permits LegacyType.val { //Formerly a concrete class, but now its abstract or maybe an interface //factory methods public static LegacyType makeALegacyType(...);//in some cases this already exists public static LegacyType[] newALegacyTypeArray(int size);//can be flattened } private inline class LegacyType.val extends LegacyType { ... } //this type is hidden, only LegacyType knows about it ``` This approach is based on what Kevin mentioned earlier, "For all of these types, there is one really fantastic default value that does everything you would want it to do: null. That is why these types should not become inline-types, or certainly not val-default inline types ...". Essentially, by making these types reference-default and by providing an avenue to restrict the value-projection to the reference-default type, the writer maintains control of where and when the value-projection is allowed to be observed thus solving the bad default problem. The writer also has the ability to supply a flattened array factory with initialized elements. This approach is appealing for the following reasons: no additional JVM complexity (ie. no bytecode checks for the bad default value), no javac boilerplate (ie. guards on member access, guards on method entries, etc.). On the other there are two big drawbacks: no instance field flattening for these types, and creating flattened arrays is a bit unnatural since it has to be done via a factory. Going back to Brian's comment: > I'd suggest, though, we back away from implementation techniques (you've got a good menu going already), and focus more on "what language do we want to build." You claim: > > I don't think totally excluding Buckets #2 and #3 is a very good outcome. > Which I think is a reasonable hypothesis, but I suggest we focus the discussion on whether we believe this or not, and what we might want to do about it (and when), first. I think it would help if we had a clear sense as to what proportion of inline-types we think will have this "bad default" problem. Last year when we discussed null-default inline types the thinking was that about 75% of the motivation for null-defaults was migrating VBC, 20% for security, 5% for "I want null in my value set.". My assumption is that the vast majority of inline-types will not be migrated types, they will be new types. If this is correct then it would appear that the default value problem is really a problem for a minority of inline-types. All the solutions proposed have some kind of cost associated with them, and these costs vary (ie. jvm complexity, throughput overhead, JIT compilation time, etc.). If the default value problem is only for a minority of the types, I would argue that the costs should be limited to types that want to opt-in to not expose their default value or un-initialized value. How we feel about this will determine which direction we choose to take when exploring the solution space. So, in short I want to second Brian's comment, I think its important to decide if we want this kind of feature but also what we are willing to give up to get it. --Tobi "valhalla-spec-experts" wrote on 2020/07/21 02:41:11 PM: > From: Dan Smith > To: valhalla-spec-experts > Cc: Brian Goetz > Date: 2020/07/21 02:41 PM > Subject: [EXTERNAL] Re: Revisiting default values > Sent by: "valhalla-spec-experts" > > > > On Jul 20, 2020, at 10:27 AM, Brian Goetz wrote: > > > > That said, doing so in the language is potentially more viable. > It would mean, for classes that opt into this treatment: > > > > - Ensuring that `C.default` evaluates to the right thing > > - Preventing `this` from escaping the constructor (which might be > a good thing to enforce for inline classes anyway) > > - Ensuring all fields are DA (which we do already), and that > assignments to fields in ctors are not their default value > > - Translating `new Foo[n]` (and reflective equivalent) with > something that initializes the array elements > > > > The goal is to keep default instances from being observed. If we > lock down `this` from constructors, the major cost here is > instantiating arrays of these things, but we already optimize array > initialization loops like this pretty well. > > > > Overall this doesn't seem terrible. It means that the cost of > this is borne by the users of classes that opt into this treatment, > and keeps the complexity out of the VM. It does mean that > "attackers" can generate bytecode to generate bad instances (a > problem we have with multiple vectors today.) > > > > Call this "L". > > More letters! > > Expanding on ways to support Bucket #3 by ensuring initialization of > fields/arrays: > > --- > > Option L: Language requires field/array initialization > > An inline class may be declared to have no default. Fields and > arrays of that class's inline type must be provably initialized (via > compiler analysis) before they are read or published. > > Instance fields of the class's inline type must be initialized > before a method call involving 'this' occurs. (It's already illegal > to allow the constructor to return before initialization.) > > Static fields... seem hopeless, so maybe must have a reference type > (perhaps implicitly). Maybe we can do an analysis that permits some > very simple cases, but once you allow method calls of almost any > sort, you've lost. (We'd have to prove that no initialization of > *other* classes triggered by refers to the field before it > has been initialized.) > > Arrays must be initialized at creation time, either with an array > initializer ("Address[] as = { x, y, z };") or via a trusted API > ("Address[] as = Arrays.of(i -> x);"). We might introduce a language > sugar for the trusted API ("Address[] as = { i -> x };"). We *could* > support two-stage initialization via things like 'Arrays.fill', but > analysis to track uninitialized arrays from creation to filling > doesn't seem worthwhile. > > This is less expressive, obviously. In particular, many comfortable > idioms for initializing an array won't work. As a case study: what > happens in generic code like ArrayList? When it wants to allocate > its array (we're in a specialized world where T has been specialized > to 'QAddress;'), what value does it fill the array with? Nothing is > available, because at this point the list is empty, and it's just > allocating storage for later. I guess ArrayList (and similar data > structures) has to have a special back door, and we're left to trust > the author not to expose the uninitialized payload. > > As with all language features, there's also the question of what > happens when a class file doesn't conform to the language's rules. > Option L can't really stand alone?it needs to be backed up by some > other option when the language's guarantees fail. > > --- > > Option M: JVM requires field/array initialization > > Inline class files can indicate that their default instance is > invalid. Fields and arrays of that class's inline type must be > provably initialized (via verification or related analysis) before > they are read or published. > > All the compile-time analysis of Option L applies here, because the > language compiler needs to be sure its generated class files are valid. > > We can use some new verification types to track the initialization > status of 'this', the way we do to require 'super' calls today. You > don't have a fully formed 'Foo', capable of being passed to other > methods, etc., until all fields are initialized. This would also > apply to 'defaultvalue' for an inline class with a field of a > default-less inline type. > > Again, static fields are hopeless, it's an error to use the inline > type as a static field type. > > 'anewarray' of the inline type is illegal, except within a trusted > API. That API promises to initialize every array component before > publishing the array. (We won't try to guarantee this with an > analysis?the API is trusted because it has been vetted by humans.) > In addition to some standard factory methods, we could decide that > the inline class itself is always a trusted API. > > (A related approach was discussed at our last EG meeting, but with > much less expressiveness: inline-typed fields are always illegal, > and arrays can only be allocated by the class author.) > > This closes the backdoor of other bytecode not playing by the > language's rules. The expressiveness problems of Option L remain? > e.g., ArrayList's early allocation strategy is impossible. > From brian.goetz at oracle.com Tue Jul 28 19:06:16 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 28 Jul 2020 15:06:16 -0400 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <614D2739-4756-4D61-BF45-BA36490AACB3@oracle.com> Message-ID: > I think it would help if we had a clear sense as to what proportion of > inline-types we think will have this "bad default" problem. Last year > when we discussed null-default inline types the thinking was that > about 75% of the motivation for null-defaults was migrating VBC, 20% > for security, 5% for "I want null in my value set.". My assumption is > that the vast majority of inline-types will not be migrated types, > they will be new types. If this is correct then it would appear that > the default value problem is really a problem for a minority of > inline-types. Indeed, we've come up with good solutions for migrating VBCs (migrate it to a ref-default inline class) and "I want null in my value set" (then just use the ref projection.) For the "migrate from VBC" crowd, we offer the advice: "keep using `Foo` (really `Foo.ref`) in your APIs, but feel free to use `Foo.val` inside your implementation, where you are confident of no nulls."? And further, we offer that advice to both the VBC author and its clients.? So, we can expect existing APIs to continue to return Optional, but more fields of type `Optional.ref`, to get the flattening, and doing null checks in the constructor: ??? this.foo = requireNonNull(foo) And this is one of the sources of "zero pollution"; a client may have a field of type `Foo.val` and just not initialize it in their constructor, and then later someone calls `foo.bar()`.? Unlike with a reference type, which would NPE in this situation, we might enter the `bar()` method, which might not be defensively coded to check for the (meaningless) default, and it will do something dumb.? Where dumb ranges from "Welcome to 1970" to "delete all my files." I think what we need for Bucket 3 (which I think we agree is more important than Bucket 2) is to (optionally, only for NGD inline classes) restore parity with reference types by ensuring that the receiver of a method invocation is never seen to be the default value.? (We already do this for reference types; we NPE before the dispatch would succeed.)?? And the strategies we've been kicking around have ranged from "try to prevent the default from showing up in the heap" to "detect when the default shows at various times." If the important point in time is method dispatch, then we can probably simplify to: ?- Let some classes mark themselves as NGD (no good default) ?- At the point of invocation of an NGD instance method, check the receiver against the default, throw NPE if it is ?- Optionally, try to optimize this check by identifying (manually or automatically) a pivot field Note that even an unoptimized check is probably pretty fast already: "are all the bits zero."? But we can probably often optimize down to a single-word comparison to zero. Note too that we can implement this check in either generated bytecode or in the VM; the semantics are the same, the latter is more secure. From daniel.smith at oracle.com Tue Jul 28 22:42:21 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 28 Jul 2020 16:42:21 -0600 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <614D2739-4756-4D61-BF45-BA36490AACB3@oracle.com> Message-ID: <157A1F4E-3ECC-4541-95E3-CA51BE00B4B1@oracle.com> > On Jul 28, 2020, at 11:33 AM, Tobi Ajila wrote: > > > Bucket #3 classes must be reference-default, and fields/arrays of their inline type are illegal outside of the declaring class. The declaring class can provide a flat array factory if it wants to. (A new idea from Tobi, he'll write it up for the thread.) I've since come to see this as a variant of Option L or Option M: we apply some restrictions + analysis to guarantee that uninitialized fields/arrays are never exposed. In this case, the guarantee is easy to prove because nobody can declare fields/arrays at all, except the class author. > This approach is appealing for the following reasons: no additional JVM complexity (ie. no bytecode checks for the bad default value), no javac boilerplate (ie. guards on member access, guards on method entries, etc.). On the other there are two big drawbacks: no instance field flattening for these types, and creating flattened arrays is a bit unnatural since it has to be done via a factory. The biggest problem I see with approaches that prevent use of 'anewarray' is that they violate our uniform bytecode design, which is crucial to specialization. That is: how do I allocate a flat array of T in something like ArrayList? I can't be calling arbitrary factory methods depending on T. There's also a problem of exactly what these array factory methods are supposed to do. Sure, we can blame the author if they choose to leak garbage data through the factory. But... what are they going to put in the array, if not garbage data? This is really more of a Bucket #2 solution, where there exists some reasonable default to fill the array with. > I think it would help if we had a clear sense as to what proportion of inline-types we think will have this "bad default" problem. Last year when we discussed null-default inline types the thinking was that about 75% of the motivation for null-defaults was migrating VBC, 20% for security, 5% for "I want null in my value set.". My assumption is that the vast majority of inline-types will not be migrated types, they will be new types. If this is correct then it would appear that the default value problem is really a problem for a minority of inline-types. My two cents: this is not about migrated vs. new types. This is about what's being modeled. A certain subset of inline classes will model some sort of numeric quantity with a natural "zero" value. Many others?I'd predict more than 50%, though it will depend a lot on how accommodating we are to these use cases?will represent non-numeric data without any "zero" analog. These will often wrap non-null references (strings, for example). (Challenge: can we think of any use cases for inline classes that have a natural all-zeros default value *other than* a numeric zero, a singleton with no fields, or the equivalent of Optional.empty()? Maybe a collection of boolean flags? Once you've got references, it's pretty unusual to expect them to be null.) Within the subset that doesn't have a good default, it's often the case that the class has limited exposure, and some programmers might happily trade safety guarantees for performance, knowing they can trust all clients (or if there's a bug, they'll catch it in testing). So maybe they'll be fine with the all-zeros default story. But any class that belongs to a public API, or even that has significant non-public exposure, is going to want to be confident that it's operating on valid data. > I would argue that the costs should be limited to types that want to opt-in to not expose their default value or un-initialized value. Yes, agreed. Major demerits for any approach that imposes costs on programs that don't make use of no-default inline classes. > I think its important to decide if we want this kind of feature but also what we are willing to give up to get it. The right way to think about it is this: there exist many classes that don't need identity and also don't have natural defaults. We're not going to make those classes cease to exist. It's not a "yes or no" choice, it's a "what is the sanctioned approach?" choice. The "yes or no" framing leads to attempts to compare performance with or without checks. But the "which approach" choice means choosing between performance of: - An identity class - A class with hand-coded checks in methods - A class that automatically checks member accesses, like we do with null - A dynamic requirement that fields/arrays of a certain class type have to be initialized before they're read - Etc. From daniel.smith at oracle.com Tue Jul 28 22:48:01 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 28 Jul 2020 16:48:01 -0600 Subject: EG meeting, 2020-07-29 Message-ID: <6BDD11CC-5E21-456F-B21A-7E493C627859@oracle.com> The next EG Zoom meeting is tomorrow, 4pm UTC (9am PDT, 12pm EDT). The only active topic in the mailing list is "Revisiting default values". We discussed it last time, and I'm not sure there's much new to add to the discussion right now. (I'm pursuing some internal explorations, not ready to report any conclusions yet.) So... I guess we can check in and see if there's interest in further discussion. If not, short meeting. From brian.goetz at oracle.com Fri Jul 31 19:41:19 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 31 Jul 2020 15:41:19 -0400 Subject: Fwd: no good default issue In-Reply-To: References: Message-ID: <1439bf36-bf30-a561-9d70-8074f12dbbe5@oracle.com> Received in the -comments box. As far as I can tell what you're suggesting, it is that, when we detect a field is not initialized, we initialize it for you with some sort of default.? But that brings us back to the main problem: what if the class _has no good default_??? With what do we initialize it? A good example of such as class is a record like ??? inline record Person(String firstName, String lastName) { } If I construct a `new Person[10]`, what should the default factory stuff in there?? There's no good default. Dan describes these as Bucket 2 and Bucket 3, where Bucket 2 are those that have a reasonable but nonzero default (e.g., an immutable List implementation might want to have an empty array, rather than a null), and Bucket 3 have no good default.? It turns out that Bucket 3 is pretty big. -------- Forwarded Message -------- Subject: no good default issue Date: Wed, 29 Jul 2020 01:02:02 -0400 From: Jack Ammo To: valhalla-spec-comments at openjdk.java.net feel free to disregard if this doesn't make sense, but i wonder if the definitely unassigned / definitely assigned rules coupled with an explicit opt in default could help the situation. if we know that a field has not been explicitly assigned, can there be a quick check if the class opts in to a default factory (a no arg constructor perhaps?) and then takes the slower path of calling that factory? or maybe upon accessing a definitely unassigned field, do the check and call the default factory? i don't know how expensive it would be to keep track of that, but if the opt in is a no arg constructor then there can be enough restrictions in place to only allow either throwing an exception or pass relevant defaults to the full all arg constructor. and if you're willing to go the default no arg constructor route, than i also have another idea for array initialization... what if you can define a default "array initializer" (it can sorta look like a constructor if you squint your eyes just right) except it takes in an array with definitely unassigned elements of the value type and must return an array with definitely assigned elements or throw an exception. and than that class opts in to a slower path for arrays created with the default value. just my 2 cents. -Jack From brian.goetz at oracle.com Fri Jul 31 20:04:34 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 31 Jul 2020 16:04:34 -0400 Subject: Fwd: IdentityObject/InlineObject naming In-Reply-To: References: Message-ID: Received on the -comments list. If I understand it correctly, your claim is that a "plain" object is immutable/non-lockable but supports polymorphism (you need identity for mutability, otherwise you don't know what object you are mutating.)? `==` on plain objects would have to be state-based as with inlines today (there's no identity to compare.)? Inline objects "subtract" from that by ruling out polymorphism; Identity objects would "add" to that to support mutability, reference-based equality, and synchronization. Essentially, this is a "split" over "lump" move; there are true values (inlines), polymorphic values, and identity objects. It's not crazy, and as you say there was a possible ordering of events that would more naturally build that tower, but that's not the world we have. (Actually, if we're splitting, we could split more than three ways; plenty of people would prefer something that is inlinable but mutable -- where mutation would be "promoted" to the innermost enclosing container with identity.? You might call these "structs", which would be mutable but identity-free (they borrow an identity for mutation purposes.)) Would Java developers be well-served by splitting the world into three, rather than two, where identity is selected separately from polymorphism?? I don't see it. FTR, the current thinking is that we need `IdentityObject` (because you want to be able to use it as an API type / bound) but don't really need `InlineObject`, which is more "inline" with what you are saying at the end: `class Foo implements IdentityObject` has a more additive feel. -------- Forwarded Message -------- Subject: IdentityObject/InlineObject naming Date: Wed, 20 May 2020 00:44:21 +0100 From: Stephen Colebourne To: valhalla-spec-comments at openjdk.java.net Some comments wrt IdentityObject/InlineObject naming. The current valhalla state is that "Inline classes have some restrictions compared to ordinary (identity) classes; they are final, their fields are final, and their ability to participate in inheritance is limited." But these restrictions imply to me that there are three concepts a developer might want to express in the future: - an "identity" object (needs identity/synchronization for compatibility) - a "plain" object (no need for identity/synchronization) - an "inline" object (needs to be inline for performance and can accept the restrictions) While it is understandable that there is a reluctance to open the can of worms for "plain" objects, they are clearly a basic concept. ie. once you discuss the presence of identity, you naturally discuss the absence of it. Yet the concepts around inline are in many ways orthogonal. (Imagining a world where "plain" objects were added to Java, then 10 years afterwards "plain" objects would be the norm, and "identity" objects would be very rare). I mention the above, because it bears on the naming choices available. Even if you don't want to tackle plain objects now, the naming choice could make it harder to add them later. I also struggle with a single interface extends/implements hierarchy because developers can write "extends Object" today: - "class Foo" (default) => identity - "class Foo extends Object" => identity - "class Foo extends Object implements ObjectIdentity" => identity - "class Foo extends Object implements ObjectInline" => identity and inline ??? With a single new interface, the hierarchy is implying that "implements ObjectInline" *subtracts* identity, something that object hierarchies do not do. Keeping an ObjectInline interface appears better placed to tackle this particular pedagogical problem, as it seems more explainable that: - a class always extends Object - a class always implements either ObjectIdentity or ObjectInline - the defaults are "extends Object" and "implements ObjectIdentity" ie. this makes more sense because manually writing "extends Object" only impacts bullet point 1, not bullet points 2 and 3 Some other possible names: - ObjectIdentifiable / ObjectInlineable - JvmIdentifiable / JvmInlineable Stephen