From daniel.smith at oracle.com Wed May 3 02:06:08 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 3 May 2023 02:06:08 +0000 Subject: Revised JEP and JVMS: Flattened Heap Layouts for Value Objects Message-ID: <035C7A77-C61D-465B-8521-C0568413EE18@oracle.com> JEP 401 has been updated to reflect our most recent discussions about deriving flattening from nullness and value class properties. Now calling this feature "Flattened Heap Layouts for Value Objects". https://openjdk.org/jeps/401 I've also put together a revised JVMS document to specify the needed JVM attributes. https://cr.openjdk.org/~dlsmith/jep401/jep401-20230428/specs/flattened-heap-jvms.html This approach to the JVM avoids any use of 'Q' types in class files, which we decided had too much overhead for our limited needs. At this stage, there's no JVMS feature to support flattened array creation. Instead, these are created by passing Class objects representing null-restricted types to the 'Array.newInstance' methods. From daniel.smith at oracle.com Wed May 3 02:10:01 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 3 May 2023 02:10:01 +0000 Subject: EG meeting, 2023-03-22 Message-ID: An EG meeting will be held on May 3 at 4pm UTC (9am PDT, 12pm EDT). Topics we may discuss: - Revised JEP and JVMS for Flattened Heap Layouts - Brian's discussion about atomicity and tearing - spec-comments feedback about "B3, default values, and implicit initialization" From forax at univ-mlv.fr Wed May 3 15:49:18 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 3 May 2023 17:49:18 +0200 (CEST) Subject: EG meeting, 2023-03-22 In-Reply-To: References: Message-ID: <609404436.47893882.1683128958673.JavaMail.zimbra@univ-eiffel.fr> Sadly, I will not be able to attend. I will send my comments about the Flattened Heap Layouts later this week. R?mi ----- Original Message ----- > From: "daniel smith" > To: "valhalla-spec-experts" > Sent: Wednesday, May 3, 2023 4:10:01 AM > Subject: EG meeting, 2023-03-22 > An EG meeting will be held on May 3 at 4pm UTC (9am PDT, 12pm EDT). > > Topics we may discuss: > > - Revised JEP and JVMS for Flattened Heap Layouts > - Brian's discussion about atomicity and tearing > - spec-comments feedback about "B3, default values, and implicit initialization" From forax at univ-mlv.fr Wed May 3 15:49:18 2023 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 3 May 2023 17:49:18 +0200 (CEST) Subject: EG meeting, 2023-03-22 In-Reply-To: References: Message-ID: <609404436.47893882.1683128958673.JavaMail.zimbra@univ-eiffel.fr> Sadly, I will not be able to attend. I will send my comments about the Flattened Heap Layouts later this week. R?mi ----- Original Message ----- > From: "daniel smith" > To: "valhalla-spec-experts" > Sent: Wednesday, May 3, 2023 4:10:01 AM > Subject: EG meeting, 2023-03-22 > An EG meeting will be held on May 3 at 4pm UTC (9am PDT, 12pm EDT). > > Topics we may discuss: > > - Revised JEP and JVMS for Flattened Heap Layouts > - Brian's discussion about atomicity and tearing > - spec-comments feedback about "B3, default values, and implicit initialization" From v.a.ammodytes at googlemail.com Wed May 3 19:11:07 2023 From: v.a.ammodytes at googlemail.com (Arne Siegel) Date: Wed, 3 May 2023 21:11:07 +0200 Subject: Revised JEP and JVMS: Flattened Heap Layouts for Value Objects In-Reply-To: <035C7A77-C61D-465B-8521-C0568413EE18@oracle.com> References: <035C7A77-C61D-465B-8521-C0568413EE18@oracle.com> Message-ID: Hi, regarding the "implements NonAtomic" approach, what about the corner case where a class implementing NonAtomic contains a field of a second value type not implementing NonAtomic? Special measurements in the JVM will be necessary to guarantee the intention of the second value type's author. Best regards Arne Siegel Am Mi., 3. Mai 2023 um 04:06 Uhr schrieb Dan Smith : > JEP 401 has been updated to reflect our most recent discussions about > deriving flattening from nullness and value class properties. Now calling > this feature "Flattened Heap Layouts for Value Objects". > > https://openjdk.org/jeps/401 > > I've also put together a revised JVMS document to specify the needed JVM > attributes. > > > https://cr.openjdk.org/~dlsmith/jep401/jep401-20230428/specs/flattened-heap-jvms.html > > This approach to the JVM avoids any use of 'Q' types in class files, which > we decided had too much overhead for our limited needs. > > At this stage, there's no JVMS feature to support flattened array > creation. Instead, these are created by passing Class objects representing > null-restricted types to the 'Array.newInstance' methods. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed May 3 19:11:32 2023 From: john.r.rose at oracle.com (John Rose) Date: Wed, 03 May 2023 12:11:32 -0700 Subject: Revised JEP and JVMS: Flattened Heap Layouts for Value Objects In-Reply-To: <035C7A77-C61D-465B-8521-C0568413EE18@oracle.com> References: <035C7A77-C61D-465B-8521-C0568413EE18@oracle.com> Message-ID: <43E01CCA-CF10-4534-876C-368AF1F41B49@oracle.com> Here?s a bit of ?translation strategy lore? that doesn?t need to go into the JEP, but I think might be useful to contemplate. When we employed Q-descriptors, we translated Foo! (or Foo.val) uniformly to /QFoo;/ but now we do something different. Here are assembly sequences that used to employ Q-descriptors but which will have to change: // Foo foo = ?; return (Foo!)foo; (one ref on stack) checkcast[QFoo;] (if TOS is already LFoo;) ? dup invokestatic Objects::requireNonNull pop // (Foo!)obj (one ref on stack) checkcast[QFoo;] (otherwise) ? invokestatic Objects::requireNonNull checkcast[LFoo;] // obj instanceof Foo! (one ref on stack) instanceof[QFoo;] ? instanceof[LFoo;] (note that loading is lazier here) // new Foo![dim] (one int on stack) anewarray[QFoo;] ? ldc[Condy[Foo.class.asNullRestrictedType]] swap invokestatic Array::newInstance(Class,int) checkcast[[LFoo;] (as necessary) // new Foo![dim0][dim1?] (#Dim ints on stack) multianewarray[(?[?*#Dim)QFoo;,#Dim] ? ldc #Dim (or bipush etc.) dup; istore $P newarray[T_INTEGER] dup2; swap; iinc $P, -1; iload $P; iastore (#Dim times) invokestatic Array::newInstance(Class,int[]) checkcast[[[?LFoo;] (as necessary) For arrays, generally speaking, we erase Foo![][] to Foo[][], not translating to multi-level /[[?QFoo;/ Thus remaining hypothetical uses of /[?QLFoo;/ would get replaced by /[?LFoo;/. For example: multianewarray[(?[?*#Dim)[?QFoo;,#Dim] ? multianewarray[(?[?*#Dim)[?LFoo;,#Dim] -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed May 3 19:19:28 2023 From: john.r.rose at oracle.com (John Rose) Date: Wed, 03 May 2023 12:19:28 -0700 Subject: Revised JEP and JVMS: Flattened Heap Layouts for Value Objects In-Reply-To: <43E01CCA-CF10-4534-876C-368AF1F41B49@oracle.com> References: <035C7A77-C61D-465B-8521-C0568413EE18@oracle.com> <43E01CCA-CF10-4534-876C-368AF1F41B49@oracle.com> Message-ID: > dup2; swap; iinc $P, -1; iload $P; iastore (#Dim times) > invokestatic Array::newInstance(Class,int[]) ?Oops, forgot to push the component mirror in that code. There may be other bugs too. The JDK could special-case the condy BSM. A constant dimension array could be condy-fied as well. YMMV. HTH! From brian.goetz at oracle.com Wed May 3 20:20:00 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 May 2023 20:20:00 +0000 Subject: B3, default values, and implicit initialization In-Reply-To: References: <34283df4-f328-812a-dd70-0c479566fba4@oracle.com> <1550840502.43358908.1682620290012.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <54C7FD75-A46E-43E0-A85C-A24D35FB5505@oracle.com> FWIW, Dan has brought me around to ?implicitly constructible value class? as my preferred way (so far) to describe a B3 class. On Apr 27, 2023, at 2:54 PM, Brian Goetz > wrote: Agreed. The fact that it looks like a field, but its initial value is not actually an expression of that type, is pretty much disqualifying. But, they syntax is not really the main point here. Stephen's point is that he's worried that "performance lore" will drive people to reach for B3, even when the zero-default sucks (like LocalDate). We can't stop developers from being moths to the performance flame, but what we can do is try to find the most clear way to represent "instances of this class can be implicitly initialized", and have users explicitly opt into that. And we can show what good judgment looks like by leading by example in the JDK. We're good on the "requiring opt in" part, what we're mostly debating here is whether a class modifier or field or constructor or other special member or supertype is the best way to say "implicitly initializable value". (The field syntax also teases that you can put any value there, but you can't. Which is why the implicit constructor syntax has no body; you can't put code in there that would make you think that you get to choose the default state.) On 4/27/2023 2:31 PM, Remi Forax wrote: I do not find this syntax attractive, especially the "new" in "default = new", i can hear my students saying "new what" ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed May 17 14:42:07 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 17 May 2023 14:42:07 +0000 Subject: EG meeting, 2023-05-17 Message-ID: <8492C270-31E1-42A6-9C62-6EECFF6D1888@oracle.com> An EG meeting will be held today, May 17, at 4pm UTC (9am PDT, 12pm EDT). A lot of people couldn't make it last time, so today we can revisit the agenda from that meeting: - Revised JEP and JVMS for Flattened Heap Layouts - Brian's discussion about atomicity and tearing - spec-comments feedback about "B3, default values, and implicit initialization" From daniel.smith at oracle.com Wed May 31 14:20:08 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 31 May 2023 14:20:08 +0000 Subject: EG meeting, 2023-05-31 Message-ID: An EG meeting will be held today, May 31, at 4pm UTC (9am PDT, 12pm EDT). We've asked Kevin to review with us some of the lessons learned about nullness in the JSpecify project. From brian.goetz at oracle.com Wed May 31 18:37:34 2023 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 31 May 2023 14:37:34 -0400 Subject: Design document on nullability and value types Message-ID: As we've hinted at, we've made some progress refining the essential differences between primitive and reference types, which has enabled us to shed the `.val` / `.ref` distinction and lean more heavily on nullability.? The following document outlines the observations that have enabled this current turn of direction and some of its consequences. This document is mostly to be interpreted in the context of the Valhalla journey, and so talks about where we were a few months ago and where we're heading now. # Rehabilitating primitive classes: a nullity-centric approach Over the course of Project Valhalla, we have observed that there are two distinct groups of value types.? We've tried stacking them in various ways, but there are always two groups, which we've historically described as "objects without identity" and "primitive classes", and which admit different degrees of flattening. The first group, which we are now calling "value objects" or "value classes", represent the minimal departure from traditional classes to disavow object identity.? The existing classes that are described as "value-based", such as `Optional` or `LocalDate`, are candidate for migrating to value classes.? Such classes give up object identity; identity-sensitive behaviors are either recast as state-based (such as for `==` and `Objects::identityHashCode`) or partialized (`synchronized`, `WeakReference`), and such classes must live without the affordances of identity (mutability, layout polymorphism.)? In return, they avoid being burdened by "accidental identity" which can be a source of bugs, and gain significant optimization for stack-based values (e.g., scalarization in calling convention) and other JIT optimizations. The second group, which we had been calling "primitive classes" (we are now moving away from that term), are those that are more like the existing primitives, such as `Decimal` or `Complex`.? Where ordinary value classes, like identity classes, gave rise to a single (reference) type, these classes gave rise to two types, a value type (`X.val`) and a reference type (`X.ref`).? This pair of types was directly analogous to legacy primitives and their boxes. These classes come with more restrictions and more to think about, but are rewarded with greater heap flattening.? This model -- after several iterations -- seemed to meet the goals for expressiveness and performance: we can express the difference between `int`-like behavior and `Integer`-like behavior, and get routine flattening for `int`-like types.? But the result still had many imbalances; the distinction was heavyweight, and a significant fraction of the incremental specification complexity was centered only on these types.? We eventually concluded that the source of this was trying to model the `int` / `Integer` distinction directly, and that this distinction, while grounded in user experience, was just not "primitive" enough. In this document, we will break down the characteristics of so-called "primitive classes" into more "primitive" (and hopefully less ad-hoc) distinctions.? This results in a simpler model, streamlines the syntactic baggage, and enables us to finally reunite with an old friend, null-exclusion (bang) types.? Rather than treating "value types" and "reference types" as different things, we can treat the existing primitives (and the "value projection" of user-defined primitive classes) as being restricted references, whose restrictions enable the desired runtime properties. ## Primitives and objects In a previous edition of _State of Valhalla_, we outlined a host of differences between primitives and objects: | Primitives???????????????????????????????? | Objects?????????????????????????????????? | | ------------------------------------------ | ----------------------------------------- | | No identity (pure values)????????????????? | Identity????????????????????????????????? | | `==` compares state??????????????????????? | `==` compares object identity???????????? | | Built-in?????????????????????????????????? | Declared in classes?????????????????????? | | No members (fields, methods, constructors) | Members (including mutable fields)??????? | | No supertypes or subtypes????????????????? | Class and interface inheritance?????????? | | Represented directly in memory???????????? | Represented indirectly through references | | Not nullable?????????????????????????????? | Nullable????????????????????????????????? | | Default value is zero????????????????????? | Default value is null???????????????????? | | Arrays are monomorphic???????????????????? | Arrays are covariant????????????????????? | | May tear under race??????????????????????? | Initialization safety guarantees????????? | | Have reference companions (boxes)????????? | Don't need reference companions?????????? | Over many iterations, we have chipped away at this list, mostly by making classes richer: value classes can disavow identity (and thereby opt into state-based `==` comparison); the lack of members and supertypes are an accidental restriction that can go away with declarable value classes; we can make primitive arrays covariant with arrays of their boxes; we can let some class declarations opt into non-atomicity under race.? That leaves the following, condensed list of differences: | Primitives??????????????????????? | Objects?????????????????????????????????? | | --------------------------------- | ----------------------------------------- | | Represented directly in memory??? | Represented indirectly through references | | Not nullable????????????????????? | Nullable????????????????????????????????? | | Default value is zero???????????? | Default value is null???????????????????? | | Have reference companions (boxes) | Don't need reference companions?????????? | The previous approach ("primitive classes") started with the assumption that this is the list of things to be modeled by the value/reference distinction.? In this document we go further, by showing that flattening (direct representation) is derived from more basic principles around nullity and initialization requirements, and perhaps surprisingly, the concept of "primitive type" can disappear almost completely, save only for historical vestiges related to the existing eight primitives.? The `.val` type can be replaced by restricted references whose restrictions enable the desired representational properties. As is consistent with the goals of Valhalla, flattenability is an emergent property, gained by giving up those properties that would undermine flattenability, rather than being a linguistic concept on its own. ### Initialization The key distinction between today's primitives and objects has to do with _initialization requirements_.?? Primitives are designed to be _used uninitialized_; if we declare a field `int count`, it is reliably initialized to zero by the JVM before any code can access it.? This initial value is a perfectly good default, and it is not a bug to read or even increment this field before it has been explicitly assigned a value by the program, because it has _already_ been initialized to a known good value by the JVM. The zero value pre-written by the JVM is not just a safety net; it is actually part of the programming model that primitives start out life with "good enough" defaults. This is part of what it means to be a primitive type. Objects, on the other hand, are not designed for uninitialized use; they must be initialized via constructors before use.? The default zero values written to an object's fields by the JVM typically don't necessarily constitute a valid state according to the classes specification, and, even if it did, is rarely a good default value.? Therefore, we require that class instances be initialized by their constructors before they can be exposed to the rest of the program.? To ensure that this happens, objects are referenced exclusively through _object references_, which _can_ be safely used uninitialized -- because they reliably have the usable default value of `null`.? (Some may quibble with this use of "safely" and "usable", because null references are fairly limited, but they do their limited job correctly: we can easily and safely test whether a reference is null, and if we accidentally dereference a null reference, we get a clear exception rather than accessing uninitialized object state.) > Primitives can be safely used without explicit initialization; objects cannot. > Object references are nullable _precisely because_ objects cannot be used > safely without explicit initialization. ### Nullability A key difference between today's primitives and references is that primitives are non-nullable and references are nullable.? One might think this was primarily a choice of convenience: null is useful for references as a universal sentinel, and not all that useful for primitives (when we want nullable primitives we can use the box classes -- but we usually don't.) But the reality is not one of convenience, but of necessity: nullability is _required_ for the safety of objects, and usually _detrimental_ to the performance of primitives. Nullability for object references is a forced move because null is what is preventing us from accessing uninitialized object state. Nullability for primitives is usually not needed, but that's not the only reason primitives are non-nullable.? If primitives were nullable, `null` would be another state that would have to be represented in memory, and the costs would be out of line with the benefits.? Since a 64-bit `long` uses all of its bit patterns, a nullable `long` would require at least 65 bits, and alignment requirements would likely round this up to 128 bits, doubling memory usage.? (The density cost here is substantial, but it gets worse because most hardware today does not have cheap atomic 128 bit loads and stores.? Since tearing might conflate a null value with a non-null value -- even worse than the usual consequences of tearing -- this would push us strongly towards using an indirection instead.) So non-nullability is a precondition for effective flattening and density of primitives, and nullable primitives would involve giving up the flatness and density that are the reason to have primitives in the first place. > Nullability interferes with heap flattening. To summarize, the design of primitives and objects implicitly stems from the following facts: ?- For most objects, the uninitialized (zeroed) state is either invalid or not a ?? good-enough default value; ?- For primitives, the uninitialized (zeroed) state is both valid and a ?? good-enough default value; ?- Having the uninitialized (zeroed) state be a good-enough default is a ?? precondition for reliable flattening; ?- Nullability is required when the the uninitialized (zeroed) state is not a ?? good-enough default; ?- Nullability not only has a footprint cost, but often is an impediment to ?? flattening. > Primitives exist in the first place because they can be flattened to give us > better numeric performance; flattening requires giving up nullity and > tolerance of uninitialized (zero) values. These observations were baked in to the language (and other languages too), but the motivation for these decisions was then "erased" by the rigid distinction between primitives and objects.? Valhalla seeks to put that choice back into the user's hands. ### Getting the best of both worlds Project Valhalla promises the best of both worlds: sufficiently constrained entities can "code like a class and work like an int."? Classes that give up object identity can get some of the runtime benefits of primitives, but to get full heap flattening, we must embrace the two defining characteristics of primitives described so far: non-nullability and safe uninitialized use. Some candidates for value classes, such as `Complex`, are safe to use uninitialized because the default (zero) value is a good initial value.? Others, like `LocalDate`, simply have no good default value (zero or otherwise), and therefore need the initialzation protocol enabled by null-default object references.? This distinction in inherent to the semantics of the domain; some domains simply do not have reasonable default value, and this is a choice that the class author must capture when the code is written. There is a long list of classes that are candidates to be value classes; some are like `Complex`, but many are more like `LocalDate`.? The latter group can still benefit significantly from eliminating identity, but can't necessarily get full heap flattening.? The former group, which are most like today's primitives, can get all the benefits, including heap flattening -- when their instances are non-null. ### Declaring value classes As in previous iterations, a class can be declared as as _value class_: ``` value class LocalDate { ... } ``` A value class gives up identity and its consequences (e.g., mutability) -- and that's it.? The resulting? `LocalDate` type is still a reference type, and variables of type `LocalDate` are still nullable.? Instances can get significant optimizations for on-stack use but are still usually represented in the heap via indirections. ### Implicitly constructible value classes In order to get the next group of benefits, a value class must additionally attest that it can be used uninitialized.? Because this is a statement of how instances of this class come into existence, modeling this as a special kind of constructor seems natural: ``` value class Complex { ??? private int re; ??? private int im; ??? public implicit Complex(); ??? public Complex(int re, int im) { ... } ??? ... } ``` These two constructors say that there are two ways a `Complex` instance comes into existence: the first is via the traditional constructor that takes real and imaginary values (`new Complex(1.0, 1.0)`), and the second is via the _implicit_ constructor that produces the instance used to initialize fields and array elements to their default values.? That the implicit constructor cannot have a body is a signal that the "zero default" is not something the class author can fine-tune.? A value class with an implicit constructor is called an _implicitly constructible_ value class. Having an implicit constructor is a necessary but not sufficient condition for heap flattening.? The other required condition is that variable that holds a `Complex` needs to be non-nullable.? In the previous iteration, the `.val` type was non-nullable for the same reason primitive types were, and therefore `.val` types could be fully flattened.? However, after several rounds of teasing apart the fundamental properties of primitives and value types, nullability has finally sedimented to a place in the model where a sensible reunion between value types and non-nullable types may be possible. ## Null exclusion Non-nullable reference types have been a frequent request for Java for years, having been explored in `C#`, Kotlin, and Scala.? The goals of non-nullable types are sensible: richer types means safer programs.? It is a pervasive problem in Java libraries that we are not able to express within the language whether a returned object reference might be null, or is known never to be null, and programmers can therefore easily make wrong assumptions about nullability. To date, Project Valhalla has deliberately steered clear of non-nullable types as a standalone feature. This is not only because the goals of Valhalla were too ambitious to burden the project with another ambitious goal (though that is true), but for a more fundamental reason: the assumptions one might make in a vacuum about the semantics of non-nullable types would likely become hidden sources of constraints for the value type design, which was already bordering on over-constrained.? Now that the project has progressed sufficiently, we are more confident that we can engage with the issue of null exclusion. A _refinement type_ (or _restriction type_) is a type that is derived from another type that excludes certain values from the derived type's value set, such as "the non-negative integers". In the most general form, a refinement type is defined by one or more predicates (Liquid Haskell and Clojure Spec are examples of this); range types in Pascal are a more constrained form of refinement type.? Non-nullable types ("bang" types) can similarly be viewed as a constrained form of refinement type, characterized by the predicate `x != null`. (Note that the null-excluding refinement type `X!` of a reference type is still a reference type.) Rather than saying that primitive classes give rise to two types, `X.val` and `X.ref`, we can observe the the null-excluding type `X!` of a implicitly-constructible value class can have the same runtime characteristic as the `.val` type in the previous round.? Both the declaration-site property that a value class is implicitly constructible, and the use-site property that a variable is null-excluding, are necessary to routinely get flattening. Related to null exclusion is _null-adjunction_; this takes a non-nullable type (such as `int`) or a type of indeterminate nullability (such as a type variable `T` in a generic class that can be instantiated with either nullable or non-nullable type parameters) and produces a type that is explicitly nullable (`int?` or `T?`.)? In the current form of the design, there is only one place where the null-adjoining type is strictly needed -- when generic code needs to express "`T`, but might be null.? The canonical example of this is `Map::get`; it wants to wants to return `V?`, to capture the fact that `Map` uses `null` to represent "no mapping". For a given class `C`, the type `C!` is clearly non-nullable, and the type `C?` is clearly nullable.? What of the unadorned name `C`?? This has _unspecified_ nullability.? Unspecified nullability is analogous to raw types in generics (we could call this "raw nullability"); we cannot be sure what the author had in mind, and so must find a balance between the desire for greater null safety and tolerance of ambiguity in author intent. Readers who are familiar with explicitly nullable and non-nullable types in other languages may be initially surprised at some of the choices made regarding null-exclusion (and null-adjunction) types here.? The interpretation outlined here is not necessarily the "obvious" one, because it is constrained both by the needs of null-exclusion, of Valhalla, and the migration-compatibility constraints needed for the ecosystem to make a successful transition to types that have richer nullability information. While the theory outlined here will allow all class types to have a null-excluding refinement type, it is also possible that we will initially restrict null-exclusion to implicitly constructible value types.? There are several reasons to consider pursuing such an incremental path, including the fact that we will be able to reify the non-nullability of implicitly constructible value types in the JVM, whereas the null-exclusion types of other classes such as `String` or of ordinary value classes such as `LocalDate` would need to be done through erasure, increasing the possible sources of null polluion. ### Goals We adopt the following set of goals for adding null-excluding refinement types: ?- More complete unification of primitives with classes; ?- Flatness is an emergent property that can derive from more basic semantic ?? constraints, such as identity-freedom, implicit constructibility, and ?? non-nullity; ?- Merge the concept of "value companion" (`.val` type) into the null-restricted ?? refinement type of implicitly constructible value classes; ?- Allow programmers to annotate type uses to explicitly exclude or affirm nulls ?? in the value set; ?- Provide some degree of runtime nullness checking to detect null pollution; ?- Annotating an existing API (one based on identity classes) with additional ?? nullness information should be binary- and source-compatible. The last goal is a source of strong constraints, and not one to be taken lightly.? If an existing API that specifies "this method never returns null" cannot be compatibly migrated to one where this constraint is reflected in the method declaration proper, the usefulness of null-exclusion types is greatly reduced; library maintainers will be put to a bad choice of forgoing a feature that will make their APIs safer, or making an incompatible change in order to do so.? If we were building a new language from scratch, the considerations might be different, but we do not have that luxury.? "Just copying" what other languages have done here is a non-starter. ### Interoperation between nullable and non-nullable types We enable conversions between a nullable type and a compatible null-excluding refinement type by adding new widening and narrowing conversions between `T?` and `T!` that have analogous semantics to the existing boxing and unboxing conversions between `Integer` and `int`.? Just as with boxing and unboxing, widening from a non-nullable type to a nullable type is unconditional and never fails, and narrowing from a nullable type to a non-nullable type may fail by throwing `NullPointerException`.? These conversions for null-excluding types would be sensible in assignment context, cast context, and method invocation context (both loose and strict, unlike boxing for primitives today.) This would allow existing assignments, invocation, and overload applicability checks to continue to work even after migrating one of the types involved, as required for source-compatibility. Checking for bad values can mirror the approach taken for generics.? When a richer compile-time type system erases to a less-rich runtime type system, type safety derives from a mix of compile-time type checking and synthetic runtime checks.? In both cases, there is a possibility of pollution which can be injected at the boundary between legacy and new code, by malicious code, or through injudicious use of unchecked casts and raw types.? And like generics, we would like to offer the possibility that if a program compiles in its entirety with no unchecked warnings, null-excluding types will not be observed to contain null.? To achieve this, we will need a combination of runtime checks, new unchecked warnings, and possibly restrictions on initialization. The intrusion on the type-checking of generics here is considerable; nullity will have to be handled in type inference, bounds conformance, subtyping, etc. In addition, there are new sources of heap pollution and new conditions under which a varaible may be polluted.? The _Universal Generics_ JEP outlines a number of unchecked warnings that must be issued in order to avoid null pollution in type variables that might be instantiated either with a nullable or null-excluding type.? While this work was designed for `ref` and `val` types, much of it applies directly to null-excluding types. The liberal use of conversion rather than subtyping here may be surprising to readers who are familiar with other languages that support null-excluding types. At first, it may appear to be "giving up all the benefit" of having annotated APIs for nullness, since a nullable value may be assigned directly to a non-nullable type without requiring a cast.? But the reality is that for the first decade at least, we will at best be living in a mixed world where some APIs are migrated to use nullness information and some will not, and forcing users to modify code that uses these libraries (and then do so again and again as more libraries migrate) would be an unnacceptable tax on Java users, and a deterrent to libraries migrating to use these features. Starting from `T! <: T?` -- and forcing explicit conversions when you want to go from nullable to non-nullable values -- does seem an obvious choice if you have the luxury of building a type system from scratch.? But if we want to make migration to null-excluding types a source-compatible change for libraries and clients, we cannot accept a strict subtyping approach.? (Even if we did, we could still only use subtyping in one direction, and would have to add an additional implicit conversion for the other direction -- a conversion that is similar to the narrowing conversion proposed here.) Further, primitives _already_ use boxing and unboxing conversions to go between their nullable (box) and non-nullable (primitive) forms.? So choosing subtyping for references (plus an unbalanced implicit conversion) and boxing/unboxing conversion for primitives means our treatment of null-excluding types is gratuitously different for primitives than for other classes. Another consequence of wanting migration compatibility for annotating a library with nullness constraints is that nullness constraints cannot affect overload selection.? Compatibility is not just for clients, it is also for subclasses. ### Null exclusion for implicitly constructible value classes Implicitly constructible value classes go particularly well with null exclusion, because we can choose a memory representation that _cannot_ encode null, enabling a more compact and direct representation. The Valhalla JVM has support for such a representation, and so we describe the null-exclusion type of an implicitly constructible value class as _strongly null excluding_.? This means that its null exclusion is reified by the JVM.? Such a variable can never be seen to contain null, because null simply does not have a runtime representation for these types.? This is only possible because these classes are implicitly constructible; that the default zero value written by the JVM is known to be a valid value of the domain.? As with primitives, these types are explicitly safe to use uninitialized. A strongly null-excluding type will have a type mirror, as type mirrors describe reifiable types. ### Null exclusion for other classes For identity classes and non-implicitly-constructible value classes, the story is not quite as nice.? Since there is no JVM representation of "non-nullable String", the best we can do is translate `String!` to `String` (a form of erasure), and then try to keep the nulls at bay.? This means that we do not get the flattening or density benefits, and null-excluding variables may still be subject to heap pollution.?? We can try to minimize this with a combination of static type checking and generated runtime checks.? We refer to the null-exclusion type of an identity or non-implicitly constructible value class as _weakly null-excluding_. There is an additional source of potential null pollution, aside from the sources analogous to generic heap pollution: the JVM itself. The JVM initializes references in the heap to null.? If `String!` erases to an ordinary `String` reference, there is at least a small window in time when this supposedly non-nullable field contains null.? We can erect barriers to reduce the window in which this can be observed, but these barriers will not be foolproof.? For example, the compiler could enforce that a field of type `String!` either has an initializer or is definitely assigned in every constructor.? However, if the receiver escapes during construction, all bets are off, just as they are with initialization safety for final fields. We have a similar problem with arrays of `String!`; newly created arrays initialize their elements to the default value for the component type, which is `null`, and we don't even have the option of requiring an initializer as we would with fields.? (Since a `String![]` is also a `String[]`, one option is to to outlaw the direct creation of arrays of weakly null-excluding types, instead providing reflective API points which will safely create the array and initialize all elements to a non-null value.) A weakly null-excluding type will not have a type mirror, as the nullity information is erased for these types.? Generic signatures would be extended to represent null-exclusion, and similarly the `Type` hiearchy would reflect such signatures. Because of erasure and the new possibilities for pollution, allowing null-exclusion types for identity classes introduces significant potential new complexity.? For this reason, we may choose a staged approach where null-restricted types are initially limited to the strongly null-restricted ones. ### Null exclusion for other value classes Value classes that are not implicitly constructible are similar to identity classes in that their null-exclusion types are only weakly null-excluding. These classes are the ones for which the author has explicitly decided that the default zero value is not a valid member of the domain, so we must ensure that in no case does this invalid value ever escape. This effectively means that we must similarly erase these types to a nullable representation to ensure that the zero value stays contained.? (There are limited heroics the VM can do with alternate representations for null when these classes are small and have readily identifiable slack bits, but this is merely a potential optimization for the future.) ### Atomicity Primitives additionally have the property that larger-than-32-bit primitives (`long` and `double`) may tear under race.? The allowance for tearing was an accomodation to the fact that numeric code is often performance-critical, and so a tradeoff was made to allow for more performance at the cost of less safety for incorrect programs.? The corresponding box types, as well as primitive variables declared `volatile`, are guaranteed not to tear, even under race.? (See the document entitled "Understanding non-atomicity and tearing" for more detail.) Implicitly constructible value classes can be declared as "non-atomic" to indicate that its null-exclusion type may tear under race (if not declared `volatile`), just as with `long` and `double`.? The classes `Long` and `Double` would be declared non-atomic (though most implementations still offer atomic access for 64-bit primitives.) ### Flattening Flattening in the heap is an emergent property, which is achieved when we give up the degrees of freedom that would prevent flattening: ?- Identity prevents flattening entirely; ?- Nullability prevents flattening in the absence of heroics involving exotic ?? representations for null; ?- The inability to use a class without initialization requires nullability at ?? the VM representation level, undermining flattening; ?- Atomicity prevents flattening for larger value objects. Putting this together, the null-exclusion type of implicitly constructible value classes is flattenable in the heap when the class is non-atomic or the layout is suitably small.? For ordinary value classes, we can still get flattening in the calling convention: all identity-free types can be flattened on the stack, regardless of layout size or nullability. ### Summarizing null-exclusion The feature described so far is at the weak end of the spectrum of features described by "non-nullable types".? We make tradeoffs to enable gradual migration compatibility, moving checks to the boundary -- where in some cases they might not happen due to erasure, separate compilation, or just dishonest clients. Users may choose to look at this as "glass X% full" or "glass (100-X)% empty". We can now more clearly say what we mean, migrate incrementally towards more explicit and safe code without forking the ecosystem, and catch many errors earlier in time.? On the other hand, it is less explicit where we might experience runtime failures, because autoboxing makes unboxing implicit.? And some users will surely complain merely because this is not what their favorite language does.? But it is the null-exclusion we can actually have, rather than the one we wish we might have in an alternate universe. This approach yields a significant payoff for the Valhalla story.? Valhalla already had to deal with considerable new complexity to handle the relationship between reference and value types -- but this new complexity applied only to primitive classes.? For less incremental complexity, we can have a more uniform treatment of null-exclusion across all class types.? The story is significantly simpler and more unified than we had previously: ?- Everything, including the legacy primitives, is an object (an instance of ?? some class); ?- Every type, including the legacy primitives, is derived from a class; ?- All types are reference types (they refer to objects), but some reference ?? types (non-nullable references to implicitly constructible objects) exhibit ?? the runtime behavior of primitives; ?- Some reference types exclude null, and some null-excluding reference types ?? are reifiable with a known-good non-null default; ?- Every type can have a corresponding null-exclusion type. ## Planning for a null-free future (?) Users prefer working with unnanotated types (e.g., `Foo`) rather than explicitly annotated types (`Foo!`, `Foo?`), where possible.? The unannotated type `Foo` could mean one of three things: an alias for `Foo!`, an alias for `Foo?`, or a type of "raw" (unknown) nullity.?? Investigations into null-excluding type systems have shown that the better default would be to treat an unannotated name as indicating non-nullability, and use explicitly nullable types (`T?`) to indicate the presence of null, because returning or accepting null is generally a less common case.? Of course, today `String` means "possibly nullable String" in Java, meaning that, yet again, we seem to have chosen the wrong default. Our friends in the `C#` community have explored the possibility of a "flippening".? `C#` started with the Java defaults, and later provided a compiler mode to flip the default on a per-module basis, with checking (or pollution risk) at the boundary between modules with opposite defaults.? This is an interesting experiment and we look forward to seeing how this plays out in the `C#` ecosystem. Alternately, another possible approach for Java is to continue to treat the unadorned name as having "raw" or "unknown" nullity, encouraging users to annotate types with either `!` or `?`.? This approach has been partially explored in the `JSpecify` project.? Within this approach is a range of options for what the language will do with such types; there is a risk of flooding users with warnings.? We may want to leave such analysis to extralinguistic type checkers, at least initially -- but we would like to not foreclose on the possibility of an eventual flippening. -------------- next part -------------- An HTML attachment was scrubbed... URL: