From daniel.smith at oracle.com Tue Mar 2 20:57:14 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 2 Mar 2021 20:57:14 +0000 Subject: Extended project meetings, March 23-24 Message-ID: Valhalla developers and experts are invited to attend an extended Zoom project meeting, Tuesday, March 23, and Wednesday, March 24. Please reserve some time on your calendars. We'll meet for 3-4 hours each day, starting at 3p UTC (8a PDT, 4p CET). Here's a time zone converter: https://www.timeanddate.com/worldclock/converter.html?iso=20210323T150000&p1=1440&p2=224&p3=43&p4=78&p5=239 (I'll also create a calendar invite, starting with the EG invite list and some internal Oracle folks. If you'd like to be added, let me know.) This session will be focused on the design of generics, in both the Java language and the JVM. If it works well, we may plan future meetings to discuss other topics. We'll have some prepared slides and unstructured discussions. I'm exploring options for text and whiteboard sharing, to try to get us closer to an "in the same room" experience. Looking forward to seeing you there! From daniel.smith at oracle.com Wed Mar 10 16:12:46 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 10 Mar 2021 16:12:46 +0000 Subject: EG meeting, 2021-03-10 Message-ID: <93E6A2FD-7BBE-40E4-ADEC-034344206AED@oracle.com> The next EG Zoom meeting is today at 5pm UTC (9am PST, 12pm EST). No new email threads, but John would like to spend some time talking about the specialization design. From forax at univ-mlv.fr Wed Mar 10 16:34:07 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 10 Mar 2021 17:34:07 +0100 (CET) Subject: EG meeting, 2021-03-10 In-Reply-To: <93E6A2FD-7BBE-40E4-ADEC-034344206AED@oracle.com> References: <93E6A2FD-7BBE-40E4-ADEC-034344206AED@oracle.com> Message-ID: <1444519311.2108440.1615394047575.JavaMail.zimbra@u-pem.fr> cool ! R?mi ----- Mail original ----- > De: "daniel smith" > ?: "valhalla-spec-experts" > Envoy?: Mercredi 10 Mars 2021 17:12:46 > Objet: EG meeting, 2021-03-10 > The next EG Zoom meeting is today at 5pm UTC (9am PST, 12pm EST). > > No new email threads, but John would like to spend some time talking about the > specialization design. From forax at univ-mlv.fr Wed Mar 10 21:59:49 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 10 Mar 2021 22:59:49 +0100 (CET) Subject: Parametric-vm spec / unused CONSTANT_Parameter is illegal Message-ID: <1066610625.2321445.1615413589101.JavaMail.zimbra@u-pem.fr> Hi all, slowly reading the Parametric-vm spec. With my ASM hat, "As a structural constraint, it is illegal for a CONSTANT_Parameter constant to be unused." This is different of all other CONSTANTs and doesn't work well with the idea that you can patch a classfile by copying the existing constant pool and replace only the method(s) you want, because if i replace a parametric method by a non parametric one, the existing CONSTANT_Parameter will stay in the constant pool with no reference to it anymore. Being able to patch a classfile like this is very important in term of speed for some transformers / agents. If the VM is able to find that a CONSTANT_Parameter is unused, instead of throwing an error, why not ignoring it ? R?mi From john.r.rose at oracle.com Wed Mar 10 22:25:09 2021 From: john.r.rose at oracle.com (John Rose) Date: Wed, 10 Mar 2021 14:25:09 -0800 Subject: Parametric-vm spec / unused CONSTANT_Parameter is illegal In-Reply-To: <1066610625.2321445.1615413589101.JavaMail.zimbra@u-pem.fr> References: <1066610625.2321445.1615413589101.JavaMail.zimbra@u-pem.fr> Message-ID: On Mar 10, 2021, at 1:59 PM, Remi Forax wrote: > > Hi all, > slowly reading the Parametric-vm spec. > > With my ASM hat, > "As a structural constraint, it is illegal for a CONSTANT_Parameter constant to be unused." > > This is different of all other CONSTANTs and doesn't work well with the idea that you can patch a classfile by copying the existing constant pool and replace only the method(s) you want, because if i replace a parametric method by a non parametric one, the existing CONSTANT_Parameter will stay in the constant pool with no reference to it anymore. > > Being able to patch a classfile like this is very important in term of speed for some transformers / agents. > > If the VM is able to find that a CONSTANT_Parameter is unused, instead of throwing an error, why not ignoring it ? That?s possible. I agree that this makes parametric constants unusual, in a way that doesn?t buy much. Here?s my motivation: The presence of a C_Param forces the JVM to do more work than if it were absent, work that is non-local (a dependency analysis over the CP). Some of that analysis is likely to create detailed CP metadata describing the parametricity of each constant. Having parts of that metadata be unused feels like it could breed bugs. But, it?s just a feeling. I?m not against your request, and maybe I?ll write it into the next draft. (I have a vague memory that there was another reason that occurred to me during prototyping the CP analysis, but it?s been a while now.) ? John From forax at univ-mlv.fr Wed Mar 10 23:51:59 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 11 Mar 2021 00:51:59 +0100 (CET) Subject: parametric-vm / section other upcalls Message-ID: <949348225.3224.1615420319610.JavaMail.zimbra@u-pem.fr> In the sub-section "other upcalls", for type-testing, the VM does an upcall to the method "isAssignableFrom" of the Species but it's not clear for me how a user can creates such "Species" given that it seems that the way to configure a species is to return a ParameterBinding properly configured. It looks like a vestige of an old behavior ? right ? R?mi From forax at univ-mlv.fr Fri Mar 12 12:49:22 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 12 Mar 2021 13:49:22 +0100 (CET) Subject: parametric-vm and stacktrace Message-ID: <347147873.1058112.1615553362549.JavaMail.zimbra@u-pem.fr> I wonder if the ParameterBinding should not have kind of name or at least a textual representation so you can see them in stacktraces, it will be really helpful for debugging. R?mi From brian.goetz at oracle.com Mon Mar 15 15:52:17 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 15 Mar 2021 11:52:17 -0400 Subject: Revisiting default values In-Reply-To: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: Picking this issue up again. To summarize Dan's buckets: Bucket 1 -- the zero default is in the domain, and is a sensible default value.? Zero for numerics, empty optionals. Bucket 2 -- there is a sensible default value, but all-zero-bits isn't it. Bucket 3 -- there simply is no sensible default value. Ultimately, though, this is not about defaults; it is about _uninitialized variables_.? The default only comes into play when the user uses an uninitialized variable, which usually means (a) uninitialized fields or (b) uninitialized array elements.? It is possible that the language could give us seat belts to dramatically narrow the chance of uninitialized fields, but uninitialized array elements are much harder to stamp out. It is an attractive distraction to get caught up in designing mechanisms for supplying an alternate default ("just let the user declare a no-arg constructor"), but this is focusing on the "writing code" part of the problem, not the "keeping code safe" part of the problem. In some sense, it is the existence (and size) of Bucket 1 that causes the problem; Bucket 1 is what gives us our sense that it is safe to use uninitialized variables.? In the current language, uninitialized reference variables are also safe in that if you use them before they are initialized, you get an exception before anything bad can happen.? Uninitialized primitives in today's language are more dangerous, because we may interpret the uninitialized value, but this has been a problem we've been able to live with because today's primitives are pretty limited and zero is usually a good-enough default in most domains.? As we extend primitives to look more like objects, with behavior, this gets harder. Both buckets 2 and 3 can be remediated without help from the language or VM, perhaps inconveniently, by careful coding on the part of the author of the primitive class: ?- don't expose fields to users (a good practice anyway) ?- check for zero on entry to each method These are options A and E.? The difference between Buckets 2 (A) and 3 (E) in this model is what do we do when we find a zero; for bucket 2, we substitute some pre-baked value and use that, and for bucket 3, we throw something (what we throw is a separate discussion.)? The various remediation techniques Dan offers represents a menu which allows us to trade off reliability/cost/intrusiveness. I think we should lean on the model currently implemented by reference types, where _accessing_ an uninitialized field is OK, but _using_ the value in the field is not.? If we have: ??? String s; All of the following are fine: ??? String t = s; ??? if (s == null) { ... } ??? if (s == t) { ... } The thing that is not fine is s-dot-something.? These are the E/F/G options, not the H/I options. Secondarily, H/I, which attempt to hide the default, create another problem down the road: when we get to specialized generics, `T.default` would become partial. Some of the solutions for Bucket 3 generalize well enough to Bucket 2 that we might consider merging them (though there are still messy details).? Option F, for example, injects code at the top of each method body: ??? int m() { ? ?? ?? if (this == ) ??????????? throw new NullPointerException(); ??????? /* body of m */ ??? } into the top of each method; a corresponding feature for Bucket 2 might inject slightly different code: ??? int m() { ? ?? ?? if (this == ) ??????????? return .m(); ??????? /* body of m */ ??? } Another thing that has evolved since we started this discussion is recognizing the difference between .val and .ref projections.? Imagine you could declare your membership in bucket 3: ??? __bucket_3 primitive class NGD { ... } If, in addition to some way of generating an NPE on dereference (F, G, etc), we mucked with the conversion of NGD.val to NGD.ref (which the compiler can inject code on), we could actually put a null on top of the stack.? Then, code like: ??? if (ngd == null) { ... } would actually work, because to do the comparison, we'd first promote ngd to a reference type (null is already a reference), and we'd compare two nulls. On 7/10/2020 2:23 PM, Dan Smith wrote: > Brian pointed out that my list of candidate inline classes in the Identity Warnings JEP (JDK-8249100) includes a number of classes that, despite being "value-based classes" and disavowing their identity, might not end up as inline classes. The problem? Default values. > > This might be a good time to revisit the open design issues surrounding default values and see if we can make some progress. > > Background/status quo: every inline class has a default instance, which provides the initial value of fields and array components that have the inline type (e.g., in 'new Point[10]'). It's also the prototype instance used to create all other instances (start with 'vdefault', then apply 'withfield' as needed). The default value is, by fiat, the class instance produced by setting all fields to *their* default values. Often, but not always, this means field/array initialization amounts to setting all the bits to 0. Importantly, no user code is involved in creating a default instance. > > Real code is always useful for grounding design discussions, so let's start there. Among the classes I listed as inline class candidates, we can put them in three buckets: > > Bucket #1: Have a reasonable default, as declared. > - wrapper classes (the primitive zeros) > - Optional & friends (empty) > - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration (0s), Period (0d), Year (1 BC, if that's acceptable) > > Bucket #2: Could have a reasonable default after re-interpreting fields. > - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special handling) > - ListN, SetN, MapN (null array interpreted as empty) > > Bucket #3: No good default. > - Runtime.Version (need a non-null List) > - ProcessHandleImpl (need a valid process ID) > - List12, Set12, Map1 (need a non-null value) > - All ConstantDesc implementations (need real class & method names, etc.) > > There's some subjectivity between the 2nd and 3rd buckets, but the idea behind the 2nd is that, with some translation layer between physical fields and interpretation of those fields, we can come up with an intuitive default (e.g., "0 means January"; "a null String means time zone 'UTC'"). In contrast, in the third bucket, any attempt to define a default value is going to be pretty unintuitive ("A null method name means 'toString'"). > > The question here is how much work the JVM and language are willing to do, or how much work we're willing to ask clients to do, in order to support use cases that don't fall into Bucket #1. > > I don't think totally excluding Buckets #2 and #3 is a very good outcome. It means that, in many cases, inline classes need to be built up exclusively from primitives or other inline types, because if you use reference types, your default value will have a null field. (Sometimes, as in Optional, null fields have straightforward interpretations, but most of the time programs are designed to prevent them.) > > Whether we support Bucket #2 but not Bucket #3 is a harder question. It wouldn't be so bad if none of the examples above in Bucket #3 become inline classes?for the most part they're handled via interfaces, anyway. (Counterpoint: inline class instances that are immediately typed with interface types still potentially provide a performance boost.) But I'm also not sure this is representative. We've noted before that many use cases, like database records or data structure cursors, don't have meaningful defaults (what's a default mailing address?). The ConstantDesc classes really illustrate this, even though they happen to not be public. > > Another observation is that if we support Bucket #3 but not Bucket #2, that's probably not a big deal?I'm not sure anybody really *wants* to deal with the default instance; it's just the price you pay for being an inline class. If there's a way to opt out of that extra weirdness and move from Bucket #2 to Bucket #3, great. > > With that discussion in mind, here are some summaries of approaches we've considered, or that I think we ought to consider, for supporting buckets #2 and #3. (This is as best as I recall. If there's something I've missed, add it to the list!) > > [Weighing in for myself: my current preference is to do one of F, G, or I. I'm not that interested in supporting Bucket #2, for reasons given above, although Option A works for programmers who really want it.] > > > > === Solutions to support Bucket #2 === > > Two broad strategies here: re-interpreting fields (A, B), and re-interpreting the default instance (C, D). > > --- > > Option A: Encourage programmers to re-interpret fields > > Guidance to programmers: when you declare an inline class, identify any fields for which the default instance should hold something other than zero/null; define a mapping for your implementation from zero/null to the value you want. > > One way to do this is to define a (possibly private) getter for each field, and include logic like 'return month + 1' or 'return id == null ? "UTC" : id'. Or maybe you inline that logic, as long as you're careful to do so everywhere. Importantly, you also need to reverse the logic in your constructor?for the sake of '==', if somebody manually creates the default instance, you should set fields to zero/null. > > This doesn't work if you want public fields, but that's life as an OO programmer. > > In this approach, it would be important that inline classes be expected to document their default instance in Javadoc (perhaps with a new Javadoc tag)?the interpretation of the default instance is less apparent to users than "all zeros". > > Limitations: > > - It's a fairly error-prone approach. Programmers will absolutely forget to apply the mapping in one place, and everything will be fine until somebody tries to invoke a particular method on the default instance. Put that bug in a security-sensitive context, and maybe you have an exploit. (Something that could help some is choosing good names?call your field 'monthIndex', not plain 'month', to remind yourself that it's zero-based.) > > - Performance impact of an extra layer of computation on all field accesses. Probably not a big deal in general, but all those null checks, etc., could have a negative impact in certain contexts. And the *appearance* of extra cost might scare programmers away from doing the right thing ("eh, I probably won't use the default value anyway, I'll just ignore it to make my code faster"). > > --- > > Option B: Language support for field re-interpretation > > The language allows inline classes to declare fields with mappings to/from an internal representation. Just like Option A, but with guarantees that the internal representation isn't inappropriately accessed directly. > > This pulls on a thread we explored a bit for Amber awhile back, some form of "abstract fields" or "virtual fields". Maybe there's something there, but it seems like a general-purpose feature, and one we're not likely to reach a final solution on anytime soon. > > --- > > Option C: Language support for a designated default > > The language provides some way for programmers to declare the "logical" default instance (something like a special static field). The compiler inserts a test for the "physical" default on any field/array access, and replaces it with the logical default. > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; > > This is much less bug-prone than Option A?the compiler does all the work?and much more achievable in the short/medium term than Option B. > > Compared to Option B, this pushes the computation overhead from inline class field accesses to reads of the inline type from fields/arrays. I don't know if that's good or bad?maybe a wash, heavily dependent on the use case. > > A few big problems: > > - The physical default still exists, and malicious bytecode can use it. If programmers want strong guarantees, they'll have to check and throw wherever an untrusted instance is provided. (Clients with access to the inline class's fields have to do so, too.) > > - Covariant arrays mean every read from any array type that might be flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through translation logic. > > - There's an assumption here that the programmer doesn't intend to use the physical default as a valid non-default instance. That's hard for the compiler to enforce, and weird stuff happens in fields/arrays if the programmer doesn't prevent it. (Could be mitigated with extra implicit logic on field/array writes or in constructors.) > > --- > > Option D: JVM support for a designated default > > The VM allows inline classes to designate a logical default instance, and the field/array access instructions map from the physical default to the logical default. The 'vdefault' instruction produces the logical default instance; something else is used by the class's factories to build from the physical default. > > This addresses the first two problems with Option C?the VM gives strong guarantees, and can make the translation a virtual operation of certain arrays. > > To address the second problem, it seems like we'd need the more complex logic I hinted at: on writes, map the physical default to the logical default, and map the logical default to the physical default. Do the reverse on reads. > > The problem here is bytecode complexity/slowdowns. We've already added some complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate similar changes to 'putfield'/'getfield' (specialized fields), so maybe that means we might as well do more. Or maybe it means we're already over budget. :-) > > From the users' perspective, if any performance reduction on reads/writes can be limited to the inline classes in Bucket #2, *all* the options have a similar cost, whether imposed by the programmer, language, or VM. So, to a first approximation, slower opcode execution is fine. > > > > === Solutions to support Bucket #3 === > > Two broad strategies here: rejecting member accesses on the default instance (E, F, G), and preventing programs from ever seeing the default instance (H, I). > > --- > > Option E: Encourage programmers to guard against default instances > > Guidance to programmers: if you don't like your class's default instance, check for it in your methods and throw. Maybe Java SE defines a new RuntimeException to encourage this. > > The simple way to do this is with some boilerplate at the start of all your methods: > > if (this == MyClass.default) throw new InvalidDefaultException(); > > More permissive classes could just do some validation on the fields that are relevant to a particular operation. (E.g., 'getMonth' doesn't care if 'zoneId' is null.) > > This doesn't work if you want public fields, but that's life as an OO programmer. > > It's not ideal that an invalid instance can float around a program until somebody trips on one of these checks, rather than detecting the invalid value earlier?we're propagating the NPE problem. And it takes some getting used to that there are two null-like values in the reference type's domain. > > --- > > Option F: Language support for default instance guards > > An inline class declaration can indicate that the default instance is invalid. The compiler generates guards, as in Option E, at the start of all instance method bodies, and perhaps on all field accesses outside of those methods. > > Programmers give up finer-grained control, but get more safety. I'm sure most would be happy with that trade. > > Improper/separately-compiled bytecode can skip the field access checks, but that's a minor concern. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option G: JVM support for default instance guards > > Inline class files can indicate that their default instance is invalid. All attempts to operate on that instance (via field/method accesses, other than 'withfield') result in an exception. > > This tightens up Option F, making it just as impossible to access members of the default instance as it is to access members of 'null'. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option H: Language checks on field/array reads > > An inline class declaration can indicate that the default instance is invalid. Every field and array access that may involved an uninitialized field/array component of that inline type gets augmented with a check that rejects reads of the default value (treating it as "you forgot to initialize this variable"). > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); > Point p = p$0; > > This is much like Option C, and has roughly the same advantages/problems. There's not a strong guarantee that the default value won't pop up from untrusted bytecode (or unreliable inline class authors), and lots of array types need guards. > > --- > > Option I: JVM checks on field/array reads > > Inline class files can indicate that their default instance is invalid. When reading from a field/array component of the inline type ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default value is found (treating it as "you forgot to initialize this variable"). The 'vdefault' instruction, like 'withfield', is illegal outside of the inline class's nest. > > Better than Option H in that it can be optimized to occur on only certain reads, and in that it provides strong guarantees?only the inline class can ever "see" the default instance. > > Well, unless the inline class chooses to share that instance with the world. Not sure how we prevent that. But maybe at that point, anything bad/weird that happens is the author's own fault. (E.g., putting the default value in an array will make that component effectively "uninitialized" again.) > > Like Option D, there's a question of whether we're willing to add this complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is that at least it's less complexity than you have in Option D. > From brian.goetz at oracle.com Wed Mar 17 15:14:26 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Mar 2021 11:14:26 -0400 Subject: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> Message-ID: <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> Let me propose another strategy for Bucket 3.? It could be implemented at either the VM or language level, but the latter probably needs some help from the VM anyway.? The idea is that the default value is _indistinguishable from null_.? Strawman: ?- Classes can be marked as default-hostile (e.g., `primitive class X implements NoGoodDefault`); ?- Prior to dereferencing a default-hostile class, a check is made against the default value, and an NPE is thrown if it is the default value; ?- When widening to a reference type, a check is made if it is the default value, and if so, is converted to null; ?- When narrowing from a reference type, a check is made for null, and if so, converted to the default value; ?- It is allowable to compare `x == null`, which is intepreted as "widen x to X.ref, and compare"; - (optional) the interface NoGoodDefault could have a method that optimizes the check, such as by using a pivot field, or the language/VM could try to automatically pick a pivot field. Classes which opt for NoGoodDefault will be slower than those that do not due to the check, but they will flatten. Essentially, this lets authors choose between "zero means default" and "zero means null", at some cost. A risk here is that ignorant users who don't understand the tradeoffs will say "oh, great, there's my nullable primitive types", overuse them, and then say "primitive types are slow, java sucks."? The goal here would be to provide _safety_ for primitive types for which the default is dangerous. On 3/15/2021 11:52 AM, Brian Goetz wrote: > Picking this issue up again.? To summarize Dan's buckets: > > Bucket 1 -- the zero default is in the domain, and is a sensible > default value.? Zero for numerics, empty optionals. > > Bucket 2 -- there is a sensible default value, but all-zero-bits isn't > it. > > Bucket 3 -- there simply is no sensible default value. > > > Ultimately, though, this is not about defaults; it is about > _uninitialized variables_.? The default only comes into play when the > user uses an uninitialized variable, which usually means (a) > uninitialized fields or (b) uninitialized array elements.? It is > possible that the language could give us seat belts to dramatically > narrow the chance of uninitialized fields, but uninitialized array > elements are much harder to stamp out. > > It is an attractive distraction to get caught up in designing > mechanisms for supplying an alternate default ("just let the user > declare a no-arg constructor"), but this is focusing on the "writing > code" part of the problem, not the "keeping code safe" part of the > problem. > > In some sense, it is the existence (and size) of Bucket 1 that causes > the problem; Bucket 1 is what gives us our sense that it is safe to > use uninitialized variables.? In the current language, uninitialized > reference variables are also safe in that if you use them before they > are initialized, you get an exception before anything bad can happen.? > Uninitialized primitives in today's language are more dangerous, > because we may interpret the uninitialized value, but this has been a > problem we've been able to live with because today's primitives are > pretty limited and zero is usually a good-enough default in most > domains.? As we extend primitives to look more like objects, with > behavior, this gets harder. > > > Both buckets 2 and 3 can be remediated without help from the language > or VM, perhaps inconveniently, by careful coding on the part of the > author of the primitive class: > > ?- don't expose fields to users (a good practice anyway) > ?- check for zero on entry to each method > > These are options A and E.? The difference between Buckets 2 (A) and 3 > (E) in this model is what do we do when we find a zero; for bucket 2, > we substitute some pre-baked value and use that, and for bucket 3, we > throw something (what we throw is a separate discussion.)? The various > remediation techniques Dan offers represents a menu which allows us to > trade off reliability/cost/intrusiveness. > > I think we should lean on the model currently implemented by reference > types, where _accessing_ an uninitialized field is OK, but _using_ the > value in the field is not. If we have: > > ??? String s; > > All of the following are fine: > > ??? String t = s; > ??? if (s == null) { ... } > ??? if (s == t) { ... } > > The thing that is not fine is s-dot-something.? These are the E/F/G > options, not the H/I options. > > Secondarily, H/I, which attempt to hide the default, create another > problem down the road: when we get to specialized generics, > `T.default` would become partial. > > Some of the solutions for Bucket 3 generalize well enough to Bucket 2 > that we might consider merging them (though there are still messy > details).? Option F, for example, injects code at the top of each > method body: > > ??? int m() { > ? ?? ?? if (this == ) > ??????????? throw new NullPointerException(); > ??????? /* body of m */ > ??? } > > into the top of each method; a corresponding feature for Bucket 2 > might inject slightly different code: > > ??? int m() { > ? ?? ?? if (this == ) > ??????????? return .m(); > ??????? /* body of m */ > ??? } > > > Another thing that has evolved since we started this discussion is > recognizing the difference between .val and .ref projections.? Imagine > you could declare your membership in bucket 3: > > ??? __bucket_3 primitive class NGD { ... } > > If, in addition to some way of generating an NPE on dereference (F, G, > etc), we mucked with the conversion of NGD.val to NGD.ref (which the > compiler can inject code on), we could actually put a null on top of > the stack.? Then, code like: > > ??? if (ngd == null) { ... } > > would actually work, because to do the comparison, we'd first promote > ngd to a reference type (null is already a reference), and we'd > compare two nulls. > > > > On 7/10/2020 2:23 PM, Dan Smith wrote: >> Brian pointed out that my list of candidate inline classes in the Identity Warnings JEP (JDK-8249100) includes a number of classes that, despite being "value-based classes" and disavowing their identity, might not end up as inline classes. The problem? Default values. >> >> This might be a good time to revisit the open design issues surrounding default values and see if we can make some progress. >> >> Background/status quo: every inline class has a default instance, which provides the initial value of fields and array components that have the inline type (e.g., in 'new Point[10]'). It's also the prototype instance used to create all other instances (start with 'vdefault', then apply 'withfield' as needed). The default value is, by fiat, the class instance produced by setting all fields to *their* default values. Often, but not always, this means field/array initialization amounts to setting all the bits to 0. Importantly, no user code is involved in creating a default instance. >> >> Real code is always useful for grounding design discussions, so let's start there. Among the classes I listed as inline class candidates, we can put them in three buckets: >> >> Bucket #1: Have a reasonable default, as declared. >> - wrapper classes (the primitive zeros) >> - Optional & friends (empty) >> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration (0s), Period (0d), Year (1 BC, if that's acceptable) >> >> Bucket #2: Could have a reasonable default after re-interpreting fields. >> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special handling) >> - ListN, SetN, MapN (null array interpreted as empty) >> >> Bucket #3: No good default. >> - Runtime.Version (need a non-null List) >> - ProcessHandleImpl (need a valid process ID) >> - List12, Set12, Map1 (need a non-null value) >> - All ConstantDesc implementations (need real class & method names, etc.) >> >> There's some subjectivity between the 2nd and 3rd buckets, but the idea behind the 2nd is that, with some translation layer between physical fields and interpretation of those fields, we can come up with an intuitive default (e.g., "0 means January"; "a null String means time zone 'UTC'"). In contrast, in the third bucket, any attempt to define a default value is going to be pretty unintuitive ("A null method name means 'toString'"). >> >> The question here is how much work the JVM and language are willing to do, or how much work we're willing to ask clients to do, in order to support use cases that don't fall into Bucket #1. >> >> I don't think totally excluding Buckets #2 and #3 is a very good outcome. It means that, in many cases, inline classes need to be built up exclusively from primitives or other inline types, because if you use reference types, your default value will have a null field. (Sometimes, as in Optional, null fields have straightforward interpretations, but most of the time programs are designed to prevent them.) >> >> Whether we support Bucket #2 but not Bucket #3 is a harder question. It wouldn't be so bad if none of the examples above in Bucket #3 become inline classes?for the most part they're handled via interfaces, anyway. (Counterpoint: inline class instances that are immediately typed with interface types still potentially provide a performance boost.) But I'm also not sure this is representative. We've noted before that many use cases, like database records or data structure cursors, don't have meaningful defaults (what's a default mailing address?). The ConstantDesc classes really illustrate this, even though they happen to not be public. >> >> Another observation is that if we support Bucket #3 but not Bucket #2, that's probably not a big deal?I'm not sure anybody really *wants* to deal with the default instance; it's just the price you pay for being an inline class. If there's a way to opt out of that extra weirdness and move from Bucket #2 to Bucket #3, great. >> >> With that discussion in mind, here are some summaries of approaches we've considered, or that I think we ought to consider, for supporting buckets #2 and #3. (This is as best as I recall. If there's something I've missed, add it to the list!) >> >> [Weighing in for myself: my current preference is to do one of F, G, or I. I'm not that interested in supporting Bucket #2, for reasons given above, although Option A works for programmers who really want it.] >> >> >> >> === Solutions to support Bucket #2 === >> >> Two broad strategies here: re-interpreting fields (A, B), and re-interpreting the default instance (C, D). >> >> --- >> >> Option A: Encourage programmers to re-interpret fields >> >> Guidance to programmers: when you declare an inline class, identify any fields for which the default instance should hold something other than zero/null; define a mapping for your implementation from zero/null to the value you want. >> >> One way to do this is to define a (possibly private) getter for each field, and include logic like 'return month + 1' or 'return id == null ? "UTC" : id'. Or maybe you inline that logic, as long as you're careful to do so everywhere. Importantly, you also need to reverse the logic in your constructor?for the sake of '==', if somebody manually creates the default instance, you should set fields to zero/null. >> >> This doesn't work if you want public fields, but that's life as an OO programmer. >> >> In this approach, it would be important that inline classes be expected to document their default instance in Javadoc (perhaps with a new Javadoc tag)?the interpretation of the default instance is less apparent to users than "all zeros". >> >> Limitations: >> >> - It's a fairly error-prone approach. Programmers will absolutely forget to apply the mapping in one place, and everything will be fine until somebody tries to invoke a particular method on the default instance. Put that bug in a security-sensitive context, and maybe you have an exploit. (Something that could help some is choosing good names?call your field 'monthIndex', not plain 'month', to remind yourself that it's zero-based.) >> >> - Performance impact of an extra layer of computation on all field accesses. Probably not a big deal in general, but all those null checks, etc., could have a negative impact in certain contexts. And the *appearance* of extra cost might scare programmers away from doing the right thing ("eh, I probably won't use the default value anyway, I'll just ignore it to make my code faster"). >> >> --- >> >> Option B: Language support for field re-interpretation >> >> The language allows inline classes to declare fields with mappings to/from an internal representation. Just like Option A, but with guarantees that the internal representation isn't inappropriately accessed directly. >> >> This pulls on a thread we explored a bit for Amber awhile back, some form of "abstract fields" or "virtual fields". Maybe there's something there, but it seems like a general-purpose feature, and one we're not likely to reach a final solution on anytime soon. >> >> --- >> >> Option C: Language support for a designated default >> >> The language provides some way for programmers to declare the "logical" default instance (something like a special static field). The compiler inserts a test for the "physical" default on any field/array access, and replaces it with the logical default. >> >> That is: >> >> Point p = points[3]; >> >> compiles to >> >> point p$0 = points[3]; >> Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; >> >> This is much less bug-prone than Option A?the compiler does all the work?and much more achievable in the short/medium term than Option B. >> >> Compared to Option B, this pushes the computation overhead from inline class field accesses to reads of the inline type from fields/arrays. I don't know if that's good or bad?maybe a wash, heavily dependent on the use case. >> >> A few big problems: >> >> - The physical default still exists, and malicious bytecode can use it. If programmers want strong guarantees, they'll have to check and throw wherever an untrusted instance is provided. (Clients with access to the inline class's fields have to do so, too.) >> >> - Covariant arrays mean every read from any array type that might be flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through translation logic. >> >> - There's an assumption here that the programmer doesn't intend to use the physical default as a valid non-default instance. That's hard for the compiler to enforce, and weird stuff happens in fields/arrays if the programmer doesn't prevent it. (Could be mitigated with extra implicit logic on field/array writes or in constructors.) >> >> --- >> >> Option D: JVM support for a designated default >> >> The VM allows inline classes to designate a logical default instance, and the field/array access instructions map from the physical default to the logical default. The 'vdefault' instruction produces the logical default instance; something else is used by the class's factories to build from the physical default. >> >> This addresses the first two problems with Option C?the VM gives strong guarantees, and can make the translation a virtual operation of certain arrays. >> >> To address the second problem, it seems like we'd need the more complex logic I hinted at: on writes, map the physical default to the logical default, and map the logical default to the physical default. Do the reverse on reads. >> >> The problem here is bytecode complexity/slowdowns. We've already added some complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate similar changes to 'putfield'/'getfield' (specialized fields), so maybe that means we might as well do more. Or maybe it means we're already over budget. :-) >> >> From the users' perspective, if any performance reduction on reads/writes can be limited to the inline classes in Bucket #2, *all* the options have a similar cost, whether imposed by the programmer, language, or VM. So, to a first approximation, slower opcode execution is fine. >> >> >> >> === Solutions to support Bucket #3 === >> >> Two broad strategies here: rejecting member accesses on the default instance (E, F, G), and preventing programs from ever seeing the default instance (H, I). >> >> --- >> >> Option E: Encourage programmers to guard against default instances >> >> Guidance to programmers: if you don't like your class's default instance, check for it in your methods and throw. Maybe Java SE defines a new RuntimeException to encourage this. >> >> The simple way to do this is with some boilerplate at the start of all your methods: >> >> if (this == MyClass.default) throw new InvalidDefaultException(); >> >> More permissive classes could just do some validation on the fields that are relevant to a particular operation. (E.g., 'getMonth' doesn't care if 'zoneId' is null.) >> >> This doesn't work if you want public fields, but that's life as an OO programmer. >> >> It's not ideal that an invalid instance can float around a program until somebody trips on one of these checks, rather than detecting the invalid value earlier?we're propagating the NPE problem. And it takes some getting used to that there are two null-like values in the reference type's domain. >> >> --- >> >> Option F: Language support for default instance guards >> >> An inline class declaration can indicate that the default instance is invalid. The compiler generates guards, as in Option E, at the start of all instance method bodies, and perhaps on all field accesses outside of those methods. >> >> Programmers give up finer-grained control, but get more safety. I'm sure most would be happy with that trade. >> >> Improper/separately-compiled bytecode can skip the field access checks, but that's a minor concern. >> >> Same issues as Option E regarding adding a "new NPE" to the platform. >> >> --- >> >> Option G: JVM support for default instance guards >> >> Inline class files can indicate that their default instance is invalid. All attempts to operate on that instance (via field/method accesses, other than 'withfield') result in an exception. >> >> This tightens up Option F, making it just as impossible to access members of the default instance as it is to access members of 'null'. >> >> Same issues as Option E regarding adding a "new NPE" to the platform. >> >> --- >> >> Option H: Language checks on field/array reads >> >> An inline class declaration can indicate that the default instance is invalid. Every field and array access that may involved an uninitialized field/array component of that inline type gets augmented with a check that rejects reads of the default value (treating it as "you forgot to initialize this variable"). >> >> That is: >> >> Point p = points[3]; >> >> compiles to >> >> point p$0 = points[3]; >> if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); >> Point p = p$0; >> >> This is much like Option C, and has roughly the same advantages/problems. There's not a strong guarantee that the default value won't pop up from untrusted bytecode (or unreliable inline class authors), and lots of array types need guards. >> >> --- >> >> Option I: JVM checks on field/array reads >> >> Inline class files can indicate that their default instance is invalid. When reading from a field/array component of the inline type ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default value is found (treating it as "you forgot to initialize this variable"). The 'vdefault' instruction, like 'withfield', is illegal outside of the inline class's nest. >> >> Better than Option H in that it can be optimized to occur on only certain reads, and in that it provides strong guarantees?only the inline class can ever "see" the default instance. >> >> Well, unless the inline class chooses to share that instance with the world. Not sure how we prevent that. But maybe at that point, anything bad/weird that happens is the author's own fault. (E.g., putting the default value in an array will make that component effectively "uninitialized" again.) >> >> Like Option D, there's a question of whether we're willing to add this complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is that at least it's less complexity than you have in Option D. >> > From mcnepp02 at googlemail.com Wed Mar 17 16:09:11 2021 From: mcnepp02 at googlemail.com (Gernot Neppert) Date: Wed, 17 Mar 2021 17:09:11 +0100 Subject: Revisiting default values In-Reply-To: <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> Message-ID: I like your idea of having the programmer mark explicitly which primitive classes should support the 'zero-default' case. However, I suggest to revert the meaning of the Marker interface to 'ZeroDefaultable'. Why? Because it better matches the idea that an implementing type is more capable than a type that does not implement it. Then it can be used as a type-bound for generic functions. As an example, have a look at Collection#toArray(IntFunction generator). This could then have the signature: toArray(IntFunction generator) The following compile-time rules would apply: For a type "ND" that does not implement 'ZeroDefaultable', the compiler would ensure two things: 1. Force initialization of a member of that type, either at declaration point or in a constructor, exactly as it does now for final members. 2. Forbid the expression "new ND[N]". This also includes disallowing the lambda-expression "ND[]::new". For being able to write generic code that create arrays of any type, the JDK would provide a standard function in class java.util.Arrays such as static T[] newArray(int dimension, T initializer) This would leave us with the corner case of accessing uninitialized variables of derived classes via constructors of base-classes (or indirectly via virtual dispatch from such a constructor). In my observation, this case is extremely rare, and can and should be neglected, as it represents a programming error already today. Am Mi., 17. M?rz 2021 um 16:14 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > Let me propose another strategy for Bucket 3. It could be implemented > at either the VM or language level, but the latter probably needs some > help from the VM anyway. The idea is that the default value is > _indistinguishable from null_. Strawman: > > - Classes can be marked as default-hostile (e.g., `primitive class X > implements NoGoodDefault`); > - Prior to dereferencing a default-hostile class, a check is made > against the default value, and an NPE is thrown if it is the default value; > - When widening to a reference type, a check is made if it is the > default value, and if so, is converted to null; > - When narrowing from a reference type, a check is made for null, and > if so, converted to the default value; > - It is allowable to compare `x == null`, which is intepreted as > "widen x to X.ref, and compare"; > - (optional) the interface NoGoodDefault could have a method that > optimizes the check, such as by using a pivot field, or the language/VM > could try to automatically pick a pivot field. > > Classes which opt for NoGoodDefault will be slower than those that do > not due to the check, but they will flatten. Essentially, this lets > authors choose between "zero means default" and "zero means null", at > some cost. > > A risk here is that ignorant users who don't understand the tradeoffs > will say "oh, great, there's my nullable primitive types", overuse them, > and then say "primitive types are slow, java sucks." The goal here > would be to provide _safety_ for primitive types for which the default > is dangerous. > > > On 3/15/2021 11:52 AM, Brian Goetz wrote: > > Picking this issue up again. To summarize Dan's buckets: > > > > Bucket 1 -- the zero default is in the domain, and is a sensible > > default value. Zero for numerics, empty optionals. > > > > Bucket 2 -- there is a sensible default value, but all-zero-bits isn't > > it. > > > > Bucket 3 -- there simply is no sensible default value. > > > > > > Ultimately, though, this is not about defaults; it is about > > _uninitialized variables_. The default only comes into play when the > > user uses an uninitialized variable, which usually means (a) > > uninitialized fields or (b) uninitialized array elements. It is > > possible that the language could give us seat belts to dramatically > > narrow the chance of uninitialized fields, but uninitialized array > > elements are much harder to stamp out. > > > > It is an attractive distraction to get caught up in designing > > mechanisms for supplying an alternate default ("just let the user > > declare a no-arg constructor"), but this is focusing on the "writing > > code" part of the problem, not the "keeping code safe" part of the > > problem. > > > > In some sense, it is the existence (and size) of Bucket 1 that causes > > the problem; Bucket 1 is what gives us our sense that it is safe to > > use uninitialized variables. In the current language, uninitialized > > reference variables are also safe in that if you use them before they > > are initialized, you get an exception before anything bad can happen. > > Uninitialized primitives in today's language are more dangerous, > > because we may interpret the uninitialized value, but this has been a > > problem we've been able to live with because today's primitives are > > pretty limited and zero is usually a good-enough default in most > > domains. As we extend primitives to look more like objects, with > > behavior, this gets harder. > > > > > > Both buckets 2 and 3 can be remediated without help from the language > > or VM, perhaps inconveniently, by careful coding on the part of the > > author of the primitive class: > > > > - don't expose fields to users (a good practice anyway) > > - check for zero on entry to each method > > > > These are options A and E. The difference between Buckets 2 (A) and 3 > > (E) in this model is what do we do when we find a zero; for bucket 2, > > we substitute some pre-baked value and use that, and for bucket 3, we > > throw something (what we throw is a separate discussion.) The various > > remediation techniques Dan offers represents a menu which allows us to > > trade off reliability/cost/intrusiveness. > > > > I think we should lean on the model currently implemented by reference > > types, where _accessing_ an uninitialized field is OK, but _using_ the > > value in the field is not. If we have: > > > > String s; > > > > All of the following are fine: > > > > String t = s; > > if (s == null) { ... } > > if (s == t) { ... } > > > > The thing that is not fine is s-dot-something. These are the E/F/G > > options, not the H/I options. > > > > Secondarily, H/I, which attempt to hide the default, create another > > problem down the road: when we get to specialized generics, > > `T.default` would become partial. > > > > Some of the solutions for Bucket 3 generalize well enough to Bucket 2 > > that we might consider merging them (though there are still messy > > details). Option F, for example, injects code at the top of each > > method body: > > > > int m() { > > if (this == ) > > throw new NullPointerException(); > > /* body of m */ > > } > > > > into the top of each method; a corresponding feature for Bucket 2 > > might inject slightly different code: > > > > int m() { > > if (this == ) > > return .m(); > > /* body of m */ > > } > > > > > > Another thing that has evolved since we started this discussion is > > recognizing the difference between .val and .ref projections. Imagine > > you could declare your membership in bucket 3: > > > > __bucket_3 primitive class NGD { ... } > > > > If, in addition to some way of generating an NPE on dereference (F, G, > > etc), we mucked with the conversion of NGD.val to NGD.ref (which the > > compiler can inject code on), we could actually put a null on top of > > the stack. Then, code like: > > > > if (ngd == null) { ... } > > > > would actually work, because to do the comparison, we'd first promote > > ngd to a reference type (null is already a reference), and we'd > > compare two nulls. > > > > > > > > On 7/10/2020 2:23 PM, Dan Smith wrote: > >> Brian pointed out that my list of candidate inline classes in the > Identity Warnings JEP (JDK-8249100) includes a number of classes that, > despite being "value-based classes" and disavowing their identity, might > not end up as inline classes. The problem? Default values. > >> > >> This might be a good time to revisit the open design issues surrounding > default values and see if we can make some progress. > >> > >> Background/status quo: every inline class has a default instance, which > provides the initial value of fields and array components that have the > inline type (e.g., in 'new Point[10]'). It's also the prototype instance > used to create all other instances (start with 'vdefault', then apply > 'withfield' as needed). The default value is, by fiat, the class instance > produced by setting all fields to *their* default values. Often, but not > always, this means field/array initialization amounts to setting all the > bits to 0. Importantly, no user code is involved in creating a default > instance. > >> > >> Real code is always useful for grounding design discussions, so let's > start there. Among the classes I listed as inline class candidates, we can > put them in three buckets: > >> > >> Bucket #1: Have a reasonable default, as declared. > >> - wrapper classes (the primitive zeros) > >> - Optional & friends (empty) > >> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), > Duration (0s), Period (0d), Year (1 BC, if that's acceptable) > >> > >> Bucket #2: Could have a reasonable default after re-interpreting fields. > >> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, > ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, > MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days > should be nonzero; null Strings, ZoneIds, HijrahChronologies, and > JapaneseEras require special handling) > >> - ListN, SetN, MapN (null array interpreted as empty) > >> > >> Bucket #3: No good default. > >> - Runtime.Version (need a non-null List) > >> - ProcessHandleImpl (need a valid process ID) > >> - List12, Set12, Map1 (need a non-null value) > >> - All ConstantDesc implementations (need real class & method names, > etc.) > >> > >> There's some subjectivity between the 2nd and 3rd buckets, but the idea > behind the 2nd is that, with some translation layer between physical fields > and interpretation of those fields, we can come up with an intuitive > default (e.g., "0 means January"; "a null String means time zone 'UTC'"). > In contrast, in the third bucket, any attempt to define a default value is > going to be pretty unintuitive ("A null method name means 'toString'"). > >> > >> The question here is how much work the JVM and language are willing to > do, or how much work we're willing to ask clients to do, in order to > support use cases that don't fall into Bucket #1. > >> > >> I don't think totally excluding Buckets #2 and #3 is a very good > outcome. It means that, in many cases, inline classes need to be built up > exclusively from primitives or other inline types, because if you use > reference types, your default value will have a null field. (Sometimes, as > in Optional, null fields have straightforward interpretations, but most of > the time programs are designed to prevent them.) > >> > >> Whether we support Bucket #2 but not Bucket #3 is a harder question. It > wouldn't be so bad if none of the examples above in Bucket #3 become inline > classes?for the most part they're handled via interfaces, anyway. > (Counterpoint: inline class instances that are immediately typed with > interface types still potentially provide a performance boost.) But I'm > also not sure this is representative. We've noted before that many use > cases, like database records or data structure cursors, don't have > meaningful defaults (what's a default mailing address?). The ConstantDesc > classes really illustrate this, even though they happen to not be public. > >> > >> Another observation is that if we support Bucket #3 but not Bucket #2, > that's probably not a big deal?I'm not sure anybody really *wants* to deal > with the default instance; it's just the price you pay for being an inline > class. If there's a way to opt out of that extra weirdness and move from > Bucket #2 to Bucket #3, great. > >> > >> With that discussion in mind, here are some summaries of approaches > we've considered, or that I think we ought to consider, for supporting > buckets #2 and #3. (This is as best as I recall. If there's something I've > missed, add it to the list!) > >> > >> [Weighing in for myself: my current preference is to do one of F, G, or > I. I'm not that interested in supporting Bucket #2, for reasons given > above, although Option A works for programmers who really want it.] > >> > >> > >> > >> === Solutions to support Bucket #2 === > >> > >> Two broad strategies here: re-interpreting fields (A, B), and > re-interpreting the default instance (C, D). > >> > >> --- > >> > >> Option A: Encourage programmers to re-interpret fields > >> > >> Guidance to programmers: when you declare an inline class, identify any > fields for which the default instance should hold something other than > zero/null; define a mapping for your implementation from zero/null to the > value you want. > >> > >> One way to do this is to define a (possibly private) getter for each > field, and include logic like 'return month + 1' or 'return id == null ? > "UTC" : id'. Or maybe you inline that logic, as long as you're careful to > do so everywhere. Importantly, you also need to reverse the logic in your > constructor?for the sake of '==', if somebody manually creates the default > instance, you should set fields to zero/null. > >> > >> This doesn't work if you want public fields, but that's life as an OO > programmer. > >> > >> In this approach, it would be important that inline classes be expected > to document their default instance in Javadoc (perhaps with a new Javadoc > tag)?the interpretation of the default instance is less apparent to users > than "all zeros". > >> > >> Limitations: > >> > >> - It's a fairly error-prone approach. Programmers will absolutely > forget to apply the mapping in one place, and everything will be fine until > somebody tries to invoke a particular method on the default instance. Put > that bug in a security-sensitive context, and maybe you have an exploit. > (Something that could help some is choosing good names?call your field > 'monthIndex', not plain 'month', to remind yourself that it's zero-based.) > >> > >> - Performance impact of an extra layer of computation on all field > accesses. Probably not a big deal in general, but all those null checks, > etc., could have a negative impact in certain contexts. And the > *appearance* of extra cost might scare programmers away from doing the > right thing ("eh, I probably won't use the default value anyway, I'll just > ignore it to make my code faster"). > >> > >> --- > >> > >> Option B: Language support for field re-interpretation > >> > >> The language allows inline classes to declare fields with mappings > to/from an internal representation. Just like Option A, but with guarantees > that the internal representation isn't inappropriately accessed directly. > >> > >> This pulls on a thread we explored a bit for Amber awhile back, some > form of "abstract fields" or "virtual fields". Maybe there's something > there, but it seems like a general-purpose feature, and one we're not > likely to reach a final solution on anytime soon. > >> > >> --- > >> > >> Option C: Language support for a designated default > >> > >> The language provides some way for programmers to declare the "logical" > default instance (something like a special static field). The compiler > inserts a test for the "physical" default on any field/array access, and > replaces it with the logical default. > >> > >> That is: > >> > >> Point p = points[3]; > >> > >> compiles to > >> > >> point p$0 = points[3]; > >> Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; > >> > >> This is much less bug-prone than Option A?the compiler does all the > work?and much more achievable in the short/medium term than Option B. > >> > >> Compared to Option B, this pushes the computation overhead from inline > class field accesses to reads of the inline type from fields/arrays. I > don't know if that's good or bad?maybe a wash, heavily dependent on the use > case. > >> > >> A few big problems: > >> > >> - The physical default still exists, and malicious bytecode can use it. > If programmers want strong guarantees, they'll have to check and throw > wherever an untrusted instance is provided. (Clients with access to the > inline class's fields have to do so, too.) > >> > >> - Covariant arrays mean every read from any array type that might be > flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through > translation logic. > >> > >> - There's an assumption here that the programmer doesn't intend to use > the physical default as a valid non-default instance. That's hard for the > compiler to enforce, and weird stuff happens in fields/arrays if the > programmer doesn't prevent it. (Could be mitigated with extra implicit > logic on field/array writes or in constructors.) > >> > >> --- > >> > >> Option D: JVM support for a designated default > >> > >> The VM allows inline classes to designate a logical default instance, > and the field/array access instructions map from the physical default to > the logical default. The 'vdefault' instruction produces the logical > default instance; something else is used by the class's factories to build > from the physical default. > >> > >> This addresses the first two problems with Option C?the VM gives strong > guarantees, and can make the translation a virtual operation of certain > arrays. > >> > >> To address the second problem, it seems like we'd need the more complex > logic I hinted at: on writes, map the physical default to the logical > default, and map the logical default to the physical default. Do the > reverse on reads. > >> > >> The problem here is bytecode complexity/slowdowns. We've already added > some complexity to 'aaload'/'aastore' (covariant flattened arrays), and > anticipate similar changes to 'putfield'/'getfield' (specialized fields), > so maybe that means we might as well do more. Or maybe it means we're > already over budget. :-) > >> > >> From the users' perspective, if any performance reduction on > reads/writes can be limited to the inline classes in Bucket #2, *all* the > options have a similar cost, whether imposed by the programmer, language, > or VM. So, to a first approximation, slower opcode execution is fine. > >> > >> > >> > >> === Solutions to support Bucket #3 === > >> > >> Two broad strategies here: rejecting member accesses on the default > instance (E, F, G), and preventing programs from ever seeing the default > instance (H, I). > >> > >> --- > >> > >> Option E: Encourage programmers to guard against default instances > >> > >> Guidance to programmers: if you don't like your class's default > instance, check for it in your methods and throw. Maybe Java SE defines a > new RuntimeException to encourage this. > >> > >> The simple way to do this is with some boilerplate at the start of all > your methods: > >> > >> if (this == MyClass.default) throw new InvalidDefaultException(); > >> > >> More permissive classes could just do some validation on the fields > that are relevant to a particular operation. (E.g., 'getMonth' doesn't care > if 'zoneId' is null.) > >> > >> This doesn't work if you want public fields, but that's life as an OO > programmer. > >> > >> It's not ideal that an invalid instance can float around a program > until somebody trips on one of these checks, rather than detecting the > invalid value earlier?we're propagating the NPE problem. And it takes some > getting used to that there are two null-like values in the reference type's > domain. > >> > >> --- > >> > >> Option F: Language support for default instance guards > >> > >> An inline class declaration can indicate that the default instance is > invalid. The compiler generates guards, as in Option E, at the start of all > instance method bodies, and perhaps on all field accesses outside of those > methods. > >> > >> Programmers give up finer-grained control, but get more safety. I'm > sure most would be happy with that trade. > >> > >> Improper/separately-compiled bytecode can skip the field access checks, > but that's a minor concern. > >> > >> Same issues as Option E regarding adding a "new NPE" to the platform. > >> > >> --- > >> > >> Option G: JVM support for default instance guards > >> > >> Inline class files can indicate that their default instance is invalid. > All attempts to operate on that instance (via field/method accesses, other > than 'withfield') result in an exception. > >> > >> This tightens up Option F, making it just as impossible to access > members of the default instance as it is to access members of 'null'. > >> > >> Same issues as Option E regarding adding a "new NPE" to the platform. > >> > >> --- > >> > >> Option H: Language checks on field/array reads > >> > >> An inline class declaration can indicate that the default instance is > invalid. Every field and array access that may involved an uninitialized > field/array component of that inline type gets augmented with a check that > rejects reads of the default value (treating it as "you forgot to > initialize this variable"). > >> > >> That is: > >> > >> Point p = points[3]; > >> > >> compiles to > >> > >> point p$0 = points[3]; > >> if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); > >> Point p = p$0; > >> > >> This is much like Option C, and has roughly the same > advantages/problems. There's not a strong guarantee that the default value > won't pop up from untrusted bytecode (or unreliable inline class authors), > and lots of array types need guards. > >> > >> --- > >> > >> Option I: JVM checks on field/array reads > >> > >> Inline class files can indicate that their default instance is invalid. > When reading from a field/array component of the inline type > ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default > value is found (treating it as "you forgot to initialize this variable"). > The 'vdefault' instruction, like 'withfield', is illegal outside of the > inline class's nest. > >> > >> Better than Option H in that it can be optimized to occur on only > certain reads, and in that it provides strong guarantees?only the inline > class can ever "see" the default instance. > >> > >> Well, unless the inline class chooses to share that instance with the > world. Not sure how we prevent that. But maybe at that point, anything > bad/weird that happens is the author's own fault. (E.g., putting the > default value in an array will make that component effectively > "uninitialized" again.) > >> > >> Like Option D, there's a question of whether we're willing to add this > complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is > that at least it's less complexity than you have in Option D. > >> > > > > From jesper at selskabet.org Wed Mar 17 18:34:54 2021 From: jesper at selskabet.org (=?utf-8?Q?Jesper_Steen_M=C3=B8ller?=) Date: Wed, 17 Mar 2021 19:34:54 +0100 Subject: Revisiting default values In-Reply-To: <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> Message-ID: <3803F257-D1CD-4AAE-9F7A-0AE53D2FAA65@selskabet.org> Hi list Observation inine: > On 17 Mar 2021, at 16.14, Brian Goetz wrote: > > Let me propose another strategy for Bucket 3. It could be implemented at either the VM or language level, but the latter probably needs some help from the VM anyway. The idea is that the default value is _indistinguishable from null_. Strawman: > > - Classes can be marked as default-hostile (e.g., `primitive class X implements NoGoodDefault`); > - Prior to dereferencing a default-hostile class, a check is made against the default value, and an NPE is thrown if it is the default value; > - When widening to a reference type, a check is made if it is the default value, and if so, is converted to null; > - When narrowing from a reference type, a check is made for null, and if so, converted to the default value; > - It is allowable to compare `x == null`, which is intepreted as "widen x to X.ref, and compare"; > - (optional) the interface NoGoodDefault could have a method that optimizes the check, such as by using a pivot field, or the language/VM could try to automatically pick a pivot field. > > Classes which opt for NoGoodDefault will be slower than those that do not due to the check, but they will flatten. Essentially, this lets authors choose between "zero means default" and "zero means null", at some cost. > To avoid confusion, a constructor of such class should synthetically check that the finished instance is indeed ?non-null?. Otherwise, an implementation may encode values so that an unsuspecting user would make a ?new DubiousLocalDate(1970,1,1)? but get a value indistinguishable from null. The compiler could ensure this in some cases, but not in the general case. > A risk here is that ignorant users who don't understand the tradeoffs will say "oh, great, there's my nullable primitive types", overuse them, and then say "primitive types are slow, java sucks." The goal here would be to provide _safety_ for primitive types for which the default is dangerous. > -Jesper From mcnepp02 at googlemail.com Mon Mar 22 16:50:47 2021 From: mcnepp02 at googlemail.com (Gernot Neppert) Date: Mon, 22 Mar 2021 17:50:47 +0100 Subject: [External] : Re: Revisiting default values In-Reply-To: References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> <87c59552-daae-9088-e2e1-5d2894eea240@oracle.com> Message-ID: <67979f76-8bb7-5873-b45a-f1212fde7833@gmail.com> Hi Brian, thank you for your response. Two things: firstly, the function wouldn't need reified generics to work. If you rename it to "newInitializedArray" (which makes more sense in the first place) and document that "initializer" must not be null, then you're done: the array's component-type would be the initializer.getClass() or initializer.getDeclaringClass() for enums. secondly, the term "valid default" is misleading here. You wouldn't need such a universal value for every type. You'd just need a reasonable value to pass every time you call this function. This value could well be different on every? invocation to suit your needs. Am 20.03.2021 um 14:08 schrieb Brian Goetz: > Also: I think you are conflating Buckets 2 and 3.? The method > > ?? static T[] newArray(int dimension, T initializer) > > (which would also require reified generics to work) assumes that there > *is* a valid default you can pass it, it's just not zero.? Bucket 3, > which is more important than 2, is for the cases where there is no > reasonable default. Yes, I fully agree that zero-default primitive classes should make up the majority of their kind. And of course, they also should be simple to devise. However, nudging the developer to think hard whether zero-default is really fitting the bill in every single case is not overly burdening, I think. She would only have to add the two tokens "implements ZeroDefaultable" to her class-declaration - that's not too much to ask for :) > > On 3/20/2021 8:50 AM, Brian Goetz wrote: >> I get where this is coming from, but I think it's misguided. >> >> First, zero-hostile is a bad default.? The #1 use case for primitive >> classes is numerics, so the defaults should be tailored to their needs. Hmm, not really. The usecase that I was presenting with the "Collection#toArray" example was explicitly about restricting the applicable types to the _sub-set_ of the "zero-defaultable" ones. These types do, of course, include all reference-types, which follows naturally by having "interface IdentityObject extends ZeroDefaultable". In such a case like the above, the type-bound fits perfectly. >> >> Second, if the goal is to be able to generically abstract over >> nullable types (which include references and, in this model, >> zero-hostile primitives), what you really want is a _lower_ bound, >> and to generify over .? We don't have those, but >> distorting the hierarchy to wedge it into the bounds we have would >> likely be compounding an error. >> >> That said, I like the idea that the type system could have something >> to say about array elements and construction.? Let me think on that >> some more. >> >> On 3/20/2021 5:02 AM, Gernot Neppert wrote: >>> >>> I like your idea of having the programmer mark explicitly which >>> primitive classes should support the 'zero-default' case. >>> >>> However, I suggest to revert the meaning of the Marker interface to >>> 'ZeroDefaultable'. >>> >>> Why? Because it better matches the idea that an implementing type is >>> more capable than a type that does not implement it. >>> Then it can be used as a type-bound for generic functions. >>> >>> As an example, have a look at Collection#toArray(IntFunction >>> generator). >>> This could then have the signature: >>> toArray(IntFunction generator) >>> >>> The following compile-time rules would apply: >>> For a type "ND" that does not implement 'ZeroDefaultable', the >>> compiler would ensure two things: >>> >>> ?1. Force initialization of a member of that type, either at >>> declaration point or in a constructor, exactly as it does now for >>> final members. >>> ?2. Forbid the expression "new ND[N]". This also includes >>> disallowing the lambda-expression "ND[]::new". >>> >>> For being able to write generic code that create arrays of any type, >>> the JDK would provide a standard function in class java.util.Arrays >>> such as >>> static T[] newArray(int dimension, T initializer) >>> This would leave us with the corner case of accessing uninitialized >>> variables of derived classes via constructors of base-classes (or >>> indirectly via virtual dispatch from such a constructor). >>> In my observation, this case is extremely rare, and can and should >>> be neglected, as it represents a programming error already today. >>> >>> (Also posted this to valhalla-spec-observers) From brian.goetz at oracle.com Mon Mar 22 17:08:39 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Mar 2021 13:08:39 -0400 Subject: [External] : Re: Revisiting default values In-Reply-To: <67979f76-8bb7-5873-b45a-f1212fde7833@gmail.com> References: <3D227EB1-4D86-4F97-BFCB-A5949C63A717@oracle.com> <741e0451-0b6b-05a7-5fae-f56e5c312092@oracle.com> <87c59552-daae-9088-e2e1-5d2894eea240@oracle.com> <67979f76-8bb7-5873-b45a-f1212fde7833@gmail.com> Message-ID: > Two things: firstly, the function wouldn't need reified generics to > work. If you rename it to "newInitializedArray" (which makes more > sense in the first place) and document that "initializer" must not be > null, > then you're done: the array's component-type would be the > initializer.getClass() or initializer.getDeclaringClass() for enums. > > secondly, the term "valid default" is misleading here. You wouldn't > need such a universal value for every type. You'd just need a > reasonable value to pass every time you call this function. > This value could well be different on every? invocation to suit your > needs. Good news: now I understand what you're suggesting. Bad news: it's not a very good idea, for many reasons. First, it conflates language and library design; newArray() becomes a special magic method that users can't write, only the JDK can have it, and the language has to bless it.? This is a tangling of concerns that should be a hint. Second, it still doesn't solve the underlying problem of Bucket 3, which is making it safe to use uninitialized fields or array elements.? You're positing a T value that is a valid initialization value, maybe not for all cases, but still for any code path that might be exposed to this array element.? But you've only made the problem slightly simpler, and there still may well not be any such value.? Worse, because the "uninitialized" value is instantiation-site-specific default, no one will be able to even ask "is this the default value." Better to just tell people "don't expose fields, and check for default explicitly on entry to every method."? (Which still isn't a very good answer.) This game is hard... > >> On 3/20/2021 8:50 AM, Brian Goetz wrote: >>> I get where this is coming from, but I think it's misguided. >>> >>> First, zero-hostile is a bad default.? The #1 use case for primitive >>> classes is numerics, so the defaults should be tailored to their needs. > > > Hmm, not really. The usecase that I was presenting with the > "Collection#toArray" example was explicitly about restricting the > applicable types to the _sub-set_ of the "zero-defaultable" ones. > I get that you want to be able to do this, and I agree it would be nice to do so.? What I'm saying is that your proposal distorts a big feature for the sake of a little one; that's another of those hints we shouldn't ignore.? Zero-hostile is not the right default for primitives; flipping it for the sake of "I want to express a bound here" is the tail wagging the dog.? I'm not saying the bound isn't useful; I'm saying you're getting caught up on a specific solution rather than bringing clarity to the problem.? If you bring clarity to the problem, the solution is often apparent. This game is hard. Cheers, -Brian From daniel.smith at oracle.com Tue Mar 23 21:35:45 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 23 Mar 2021 21:35:45 +0000 Subject: Extended project meetings, 2021-03-23 & 2021-03-24 Message-ID: <12A89055-AA87-4CC9-969B-17901E54E000@oracle.com> Thanks for attending today's session of our extended project meetings about generics! For attendees, I've updated your calendar invite with a link to archived slides & video. We'll meet again tomorrow, and I'll post a short summary here when we're done. From daniel.smith at oracle.com Fri Mar 26 17:12:05 2021 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 26 Mar 2021 17:12:05 +0000 Subject: Extended project meetings, 2021-03-23 & 2021-03-24 In-Reply-To: <12A89055-AA87-4CC9-969B-17901E54E000@oracle.com> References: <12A89055-AA87-4CC9-969B-17901E54E000@oracle.com> Message-ID: <4214CF7D-38D7-4076-A3BC-F94DBFE8D866@oracle.com> On Mar 23, 2021, at 3:35 PM, Dan Smith > wrote: I'll post a short summary here when we're done. So I thought we had useful set of presentations/discussions to encourage progress on the generics piece of Valhalla's efforts. For those who participated, please keep the momentum going by, e.g., raising questions and providing feedback in this mailing list. For those who didn't attend, a quick overview (to be supplemented by a deeper treatment to come in JEPs and other design documents): - The core primitive objects work in Valhalla (covered by now-candidate JEPs 401 and 402) has a significant limitation, in that Java's generics are designed to work only with reference types, erased to their bounds (e.g., Object). So generic APIs miss out on the flattening and memory efficiency benefits offered to value types in other contexts. - We propose to address this in two stages: first, a "Universal Generics" language change that allows value types as type arguments, but continues to implement generics via erasure; and second, a "Parametric JVM" feature that Java can take advantage of to specialize classes for different value type argument layouts. - Universal Generics will allow for compatible migration of existing generic code by introducing new warnings when developers assign nulls to type variables, and new language features to opt in to nulls in appropriate circumstances. "Null pollution" of value types will be detected at use sites. - With the Parametric JVM, the language won't change (much), but the underlying behavior will rely on specialized classes and methods to improve performance, along with new, more eager checks to detect some forms of heap pollution. Each instance of a generic class will be associated with a specific *species* of that class. - At the JVM level, the Parametric JVM allows for the creation of parameterized classes and methods, with parameters "passed" via the constant pool. Classes correspond to families of species, each of which has a unique parameter value and its own subset of the constant pool derived from that value. Parameterized methods have similar families of specialized methods. - Class, method, and field references at use sites can be associated with parameters from the constant pool; these parameterized references are resolved to species and specialized methods via a handshake between the JVM and the language runtime. - Constraints on field and method types of species are expressed with *type restrictions* derived from species parameters. These types are independent of descriptors but supplement them with additional run-time checks on passed values. The JIT is able to track type restrictions and eliminate redundant checks. Type restrictions express value and species types, allowing for flattened layout. - Prototyping work on the Parametric JVM is occurring in the Valhalla GitHub repo, https://github.com/openjdk/valhalla, in the 'species' and 'type-restrictions' branches.