From brian.goetz at oracle.com Fri Jul 1 11:01:58 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 1 Jul 2022 07:01:58 -0400 Subject: Fwd: nullable-inlined-values on the heap In-Reply-To: <799cc618-a3c6-2b8d-c03f-d154e5bf6107@oracle.com> References: <799cc618-a3c6-2b8d-c03f-d154e5bf6107@oracle.com> Message-ID: >> If this is possible, maybe Valhalla's Java could have a user-model >> like this: > > You should probably start with what problem you are trying to solve. The poster offered the following clarifications: > The problem it's trying to solve is to remove the .val and .ref > operators/knobs/concepts from the user-model without any loss of > performance or loss of control over nullability, zeroness or > atomicity. In other words, the objective is to take Kevin's > ref-by-default idea one step further. > > > In theory, we could construct the union type int|Null, but this > type doesn't have a practical representation in memory, ... > > > > Would it be possible to have a value-class give rise to these 3 > hidden/runtime-only companion-types on the heap: > > RefType? - reference to a value-instance or no-reference (null) > ValType? - inlined [value-instance-fields] > ValType? - inlined [nullability-boolean + value-instance-fields] > > Then, the runtime could transparently choose between RefType|ValType > for non-nullable variables or between RefType|ValType? for nullable > variables, depending on hardware, bitSize, zeroness and atomicity > constraints, as explained by the ternary expression in my previous > email. Of course, since ValType? has a higher bitSize than ValType, > nullable values will be less likely to be inlined. But still, the > point is: could nullable values sometimes be inlined on the heap as > opposed to never being inlined. > > > In theory, we could construct the union type int|Null, but this > (...) drags in all sorts of mismatches because union types would > then flow throughout the system. > > > > Is my 3-companion-types solution a real union type? Sure, I am > suggesting two sort-of-unions: > > RefType|ValType? - for non-nullable value-class variables > RefType|ValType? - for nullable value-class variables > > However, to the user, both types in each union represent the same > exact value-set. My comments: Except the model proposed actually had _more_ knobs than where we are now: just as many declaration-site knobs (no identity, tearable, zeroable), and more use-site knobs (atomic, nullable.) Reading between the lines, what I think is going on here is: ".val is ugly, can we find a way to spell it !, so we can pretend there aren't really two things." And I totally get the desire to do this!? But I don't think these are the droids you are looking for.? The overwhelming lesson of Valhalla has been: every time we try to associate something with nullity (identity, reference-ness, flattening, atomicity, etc), it turns out to be a mistake. > > Would it be possible to have a value-class give rise to these 3 > hidden/runtime-only companion-types on the heap: > > RefType? - reference to a value-instance or no-reference (null) > ValType? - inlined [value-instance-fields] > ValType? - inlined [nullability-boolean + value-instance-fields] I think there are at least two tails wagging this dog here -- the syntax tail (I want to say ?/!, not .ref/.val) and the performance tail.? Note that the RefType and ValType? in this breakdown are semantically identical!? The only difference is the assumed performance model -- "reference vs inlined."? But we *already* can do significant flattening on the .ref types (calling convention and locals.) If we could achieve the kind of inlining you want for ValType? in the heap, we would just do it for RefType too, and we wouldn't need a third thing.? So this breakdown is just making it needlessly more complicated for no benefit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 1 12:39:03 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 1 Jul 2022 08:39:03 -0400 Subject: Fwd: Explicit name for value companion type In-Reply-To: References: Message-ID: <77365ac0-b4d7-c8b1-1aa2-b1f0fa443e2e@oracle.com> Now that the model is settling, the messages in the suggestion box -- many of them syntax-driven -- are coming fast and furious now.? Here's another. Summary: "Having to type .val will make users angry; can we please find a way to make it the default at least sometimes." The author goes on to suggest a syntax for naming the value companion explicitly, observing that if the user can pick two type names instead of deriving the type names from the class name, then they can give a better name than X.val (perhaps even "complex") to the value companion (and perhaps a worse name to the ref, like ComplexRefDontUseMe, further discouraging it.) Comment: Yes, we know.? We have been trying very, very hard to not let the syntax tail wag the model dog.? Getting to the right model is the much important thing at this point.? (But, I'll take the flood of syntax-driven comments we've gotten lately as an indication that the model is mostly there, so that's good.) Whether or not to allow the user to choose two names (as we already have for int/Integer) has been hanging in the air for a long time, but there are more factors there than are immediately obvious.? Let's continue to focus on the model for the time being; we'll return to syntax in due time. -------- Forwarded Message -------- Subject: Explicit name for value companion type Date: Fri, 1 Jul 2022 14:01:21 +0200 From: Victor Nazarov To: valhalla-spec-comments at openjdk.org Hello dear experts, I've been following the valhalla list for a number of years already and I'm quite happy with the currently proposed model, but as others I'd like to raise a concern about making .val a use-site knob instead of a declaration site. I don't think there are a lot of problems, but still I think the gist of why the use-site usage of .val introduces distress for people is the following. It's easy to miss .val were it should and there is no immediate indication that something is wrong: When performance is not good enough, you need to profile and look for a place to insert .val. Profiling is costly, many people have software that can only be realistically profiled in production. So yes it's easy to fix in a sense, but it's hard to find where the problem is. The alternative that people see is that for some types you should aggressively go through the code and append .val wherever this type is available, just to never ever have a need to profile. I envision that we will get annotations, static-analysis and even best practices to encourage this. The worst example that I can invent is something like a naming convention, so that value classes should always be named with Ref suffix: value class CompexRef { } so that all actual usages will become CompexRef.val v = CompexRef.of(1, 5); or annotation for static-analysis tool @AlwaysUseVal non-atomic value class CompexRef { } The introduction of conventions and additional tooling is understandable: it's much cheaper to solve this problem like this than to profile software on a case by case basis. I think this is the main issue, even though the performance model is adequate and the use-site knob is good to tweak performance, there are still cases that can be fixed "once and for all", and people will reach to this solution no matter what. And I think language has to accommodate for this and not create a new industry of static-analysis tools. Another similar case is missing initialization. This is not so convincing, because we have already become good at fixing NPE and we already have good static analysis tools for nullability. But still fixing missing initialization for some Complex variable based on catched NPE is much, much more costly then having some kind of static-analysis that says that Complex should always be initialized to zero. So here again there is a solution that can "fix NPE" for some areas without going case by case. And some types, like Complex, should probably be "just fixed once and for all". So I think this is it, there are no more overlooked problems, but I think the problem stated above is serious enough to make people concerned. If I go to a solution, then I don't think I have a perfect answer, but leaning on "code like a class works like an int mantra" maybe we can reuse the concept of "permit" from sealed-interfaces and make value companion to be explicitly named, like: non-atomic value record ComplexRef(int re, int im) __permits_value_companion Complex { } __value_companion Complex __unboxes ComplexRef { ??? static Complex of(int re, int im) { ??????? return new ComplexRef(re, im); ??? } } there should be no explicit state and no constructor in the Complex value companion, all the state management is performed in normal class ComplexRef. Maybe "__value_companion" should be called "primitive", because it is not a class, but something that has a "wrapper" class. We can imagine that Integer and int are defined like this: value class Integer permits-primitive int { } primitive int { } -- Victor Nazarov -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Fri Jul 1 21:50:21 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 1 Jul 2022 14:50:21 -0700 Subject: User model stacking: current status In-Reply-To: <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> References: <80ca8334-ebd7-1157-4081-188de9cb240e@oracle.com> <7ca63dd2-401d-4885-dd67-041fc0c17fae@oracle.com> <6fe472f7-0ba6-8038-6352-8006c56098f4@oracle.com> <1750a350-4e83-cd0e-8a52-44d193b766af@oracle.com> Message-ID: Apologies that my comments might inevitably be a bit to the side of what you're really hoping to talk about. If one doesn't go in a helpful direction feel free to just not lean into it. On Thu, Jun 23, 2022 at 12:01 PM Brian Goetz wrote: - Value classes have ref and val companion types with the obvious > properties. (Notably, refs are always atomic.) > - For `value class C`, C as a type is an alias for `C.ref`. > I'm happy about all this -- this comment is purely about mental model / framing. I really think we want to flip this around. For *any* class `One`, you always get a reference type and it is always called `One`. Then you might *also* get a value type which is called `One.val` or whatever we come up with. If we want the type `One.ref` to exist for some reason (I have not understood why yet, only why we need `T.ref` for *type variables*), then *that's* the alias. If I print out the names of One.val.class, One.ref.class, and One.class, I should get "One.val, One, One", not "One.val, One.ref, One.ref". I harp on this because this is how Java classes get to continue to feel unified. > _This is the second of three documents describing the current State of > Valhalla. The first is [The Road to Valhalla](01-background); the > third is [The JVM Model](03-vm-model)._ > > This document describes the directions for the Java _language_ charted by > Project Valhalla. (In this document, we use "currently" to describe the > language as it stands today, without value classes.) > > Valhalla started with the goal of providing user-programmable classes > which can > be flat and dense in memory. Numerics are one of the motivating use cases; > adding new primitive types directly to the language has a very high > barrier. As > we learned from [Growing a Language][growing] there are infinitely many > numeric > types we might want to add to Java, but the proper way to do that is via > libraries, not as a language feature. > > ## Primitive and reference types in Java today > > Java currently has eight built-in primitive types. Primitives represent > pure > _values_; any `int` value of "3" is equivalent to, and indistinguishable > from, > any other `int` value of "3". Primitives are monolithic (their bits > cannot be > addressed individually) and have no canonical location, and so are _freely > copyable_. With the exception of the unusual treatment of exotic floating > point > values such as `NaN`, the `==` operator performs a _substitutibility test_ > -- it > asks "are these two values the same value". > The last part needs to be much clearer imho; appealing to "sameness" raises as many questions as it answers. btw, Why do we say "substitutability" and not "(in)distinguishability"? It seems more readily obvious what the second means. Substitutability brings LSP to mind, which is a different and asymmetric kind. I feel like distinguishability is the concept you want here in place of "sameness". Rather, "strict distinguishability". What `==` does is more specifically a *strict* or absolute distinguishability test. I think this should be called out, because the kind of distinguishability that *matters more to users more often* is the other kind: *logical* distinguishability, the kind that `Object.equals` empowers them to control for themselves. The exact reason they override that method is to govern what they want to be considered distinguishable. But ofc. `==` ignores all that. I hope to lean on the twin concepts of "strict distinguishability" and "logical distinguishability" for these two kinds. Java also has _objects_, and each object has a unique _object identity_. > ("currently") > Because > of identity, objects are not freely copyable; each object lives in exactly > one > place at any given time, and to access its state we have to go to that > place. > But we mostly don't notice this because objects are not manipulated or > accessed > directly, but instead through _object references_. Object references are > also a > kind of value -- they encode the identity of the object to which they > refer, and > the `==` operator on object references asks "do these two references refer > to > the same object." Accordingly, object _references_ (like other values) > can be > freely copied, but the objects they refer to cannot. > > Primitives and objects differ in almost every conceivable way: > > | Primitives | > Objects | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values) | > Identity | > | `==` compares values | `==` compares object > identity | > | Built-in | Declared in > classes | > | No members (fields, methods, constructors) | Members (including mutable > fields) | > | No supertypes or subtypes | Class and interface > inheritance | > | Accessed directly | Accessed via object > references | > | Not nullable | > Nullable | > | Default value is zero | Default value is > null | > | Arrays are monomorphic | Arrays are > covariant | > | May tear under race | Initialization safety > guarantees | > | Have reference companions (boxes) | Don't need reference > companions | > > The design of primitives represents various tradeoffs aimed at maximizing > performance and usability of the primtive types. Reference types default > to > `null`, meaning "referring to no object"; primitives default to a usable > zero > value (which for most primitives is the additive identity). Reference > types > provide initialization safety guarantees against a certain category of data > races; primitives allow tearing under race for larger-than-32-bit values. > We could characterize the design principles behind these tradeoffs are > "make > objects safer, make primitives faster." > > The following figure illustrates the current universe of Java's types. The > upper left quadrant is the built-in primitives; the rest of the space is > reference types. In the upper-right, we have the abstract reference types > -- > abstract classes, interfaces, and `Object` (which, though concrete, acts > more > like an interface than a concrete class). The built-in primitives have > wrappers > or boxes, which are reference types. > >
> > Current universe of Java field
> types > >
> > Valhalla aims to unify primitives and objects in that they can both be > declared with classes, but maintains the special runtime characteristics > primitives have. But while everyone likes the flatness and density that > user-definable value types promise, in some cases we want them to be more > like > classical objects (nullable, non-tearable), and in other cases we want > them to > be more like classical primitives (trading some safety for performance). > > ## Value classes: separating references from identity > > Many of the impediments to optimization that Valhalla seeks to remove > center > around _unwanted object identity_. The primitive wrapper classes have > identity, > but it is a purely accidental one. Not only is it not directly useful, it > can > be a source of bugs. For example, due to caching, `Integer` can be > accidentally > compared correctly with `==` just often enough that people keep doing it. > Similarly, [value-based classes][valuebased] such as `Optional` have no > need for > identity, but pay the costs of having identity anyway. > > Our first step is allowing class declarations to explicitly disavow > identity, by > declaring themselves as _value classes_. The instances of a value class > are > called _value objects_. > Obviously the fuller explanation will come below, but even just as a stepping stone, this statement already seems problematic to me. I feel like we want to end up here, do you disagree?: * Instances of the type `Foo.val` are values, not objects * Instances of the type `Foo` are objects, not values, which we call "value objects" as short for "value-like objects" or somesuch * When we say instances of the *class* `Foo` we might mean instances of either type the class declares (and really, maybe we should just downplay "instances of the class" and emphasize "instances of the type") We need students to be able to confidently articulate things like "value objects are objects in every way, but they aren't values in any way; the name really just means objects without identity, which makes them sort of 'adjacent to' being values...." It's going to be a bit of an awkward sell. > ``` > value class ArrayCursor { > } > ``` > > This says that an `ArrayCursor` is a class whose instances have no > identity -- > that instead they have _value semantics_. > Oof. I'm glad you used the ArrayCursor example, as it shines a spotlight on the precise meaning of the phrase "value semantics". I have been saying that the well-developed concept of "value semantics" out there in the world is generally assumed *recursive* -- we expect it to have value semantics all the way down. If I tell someone a thing has value semantics but (any # of levels deep) in it is some string that's getting compared by identity, their expectations will be violated. In our context, being a "value class" (thus birthing 2 types whose instances are "values" and "value objects", as above) is inherently about what's going on one-level deep and nothing else. How might we keep this terminology straight? Suggestion: leave "semantics" out of it? As a consequence, it must give up the > things that depend on identity; the class and its fields are implicitly > final. > > But, value classes are still classes, and can have most of the things > classes > can have -- fields, methods, constructors, type parameters, superclasses > (with > some restrictions), nested classes, class literals, interfaces, etc. The > classes they can extend are restricted: `Object` or abstract classes with > no > instance fields, empty no-arg constructor bodies, no other constructors, > no instance > initializers, no synchronized methods, and whose superclasses all meet > this same > set of conditions. (`Number` meets these conditions.) > Classes in Java give rise to types; the class `ArrayCursor` gives rise to a > type > `ArrayCursor` (actually a parametric family of instantiations > `ArrayCursor`.) > `ArrayCursor` is still a reference type, just one whose references refer to > value objects rather than identity objects. For the types in the > upper-right > quadrant of the diagram (interfaces, abstract classes, and `Object`), > references > to these types might refer to either an identity object or a value object. > (Historically, JVMs were effectively forced to represent object references > with > pointers; for references to value objects, JVMs now have more flexibility.) > > Because `ArrayCursor` is a reference type, it is nullable (because > references > are nullable), its default value is null, and loads and stores of > references are > atomic with respect to each other even in the presence of data races, > providing > the initialization safety we are used to with classical objects. > > Because instances of `ArrayCursor` have value semantics, `==` compares by > state > rather than identity. > Would be good to always include the word "shallow" in that. Comparing "deeply" by state only emerges from being "deeply" value classes. > This means that value objects, like primitives, are > _freely copyable_; we can explode them into their fields and re-aggregate > them > into another value object, and we cannot tell the difference. (Because > they > have no identity, some identity-sensitive operations, such as > synchronization, > are disallowed.) > (Would always mention mutation together with synchronization, as it's the "biggie".) So far we've addressed the first two lines of the table of differences > above; > rather than identity being a property of all object instances, classes can > decide whether their instances have identity or not. By allowing classes > that > don't need identity to exclude it, we free the runtime to make better > layout and > compilation decisions -- and avoid a whole category of bugs. > > In looking at the code for `ArrayCursor`, we might mistakenly assume it > will be > inefficient, as each loop iteration appears to allocate a new cursor: > > ``` > for (ArrayCursor c = Arrays.cursor(array); > c.hasNext(); > c = c.advance()) { > // use c.next(); > } > ``` > > One should generally expect here that _no_ cursors are actually allocated. > Because an `ArrayCursor` is just its two fields, these fields will > routinely get > scalarized and hoisted into registers, and the constructor call in > `advance` > will typically compile down to incrementing one of these registers. > > ### Migration > > The JDK (as well as other libraries) has many [value-based > classes][valuebased] > such as `Optional` and `LocalDateTime`. Value-based classes adhere to the > semantic restrictions of value classes, but are still identity classes -- > even > though they don't want to be. Value-based classes can be migrated to true > value > classes simply by redeclaring them as value classes, which is both source- > and > binary-compatible. > I think this is confusing unless you point out that these classes are only "labeled" as value-based, and voluntarily police their own restrictions; there's no actual feature behind it. We plan to migrate many value-based classes in the JDK to value classes. > Additionally, the primitive wrappers can be migrated to value classes as > well, > making the conversion between `int` and `Integer` cheaper; see the section > "Legacy Primitives" below. (In some cases, this may be _behaviorally_ > incompatible for code that synchronizes on the primitive wrappers. [JEP > 390][jep390] has supported both compile-time and runtime warnings for > synchronizing on primitive wrappers since Java 16.) > The text here just sounds a little "problem all solved!"-ish, while only mentioning synchronization and not the rest of the list you go through below. >
> > Java field types adding value
> classes > >
> > ### Equality > > Earlier we said that `==` compares value objects by state rather than by > identity. More precisely, two value objects are `==` if they are of the > same > type, and each of their fields are pairwise equal, where equality is given > by > `==` for primitives (except `float` and `double`, which are compared with > `Float::equals` and `Double::equals` to avoid anomalies), `==` for > references to > identity objects, and recursively with `==` for references to value > objects. In > no case is a value object ever `==` to a reference to an identity object. > This seems like a good place to explain (a) *why* this is the best thing we can do for `==` if we must keep it working, and (b) why it is problematic as heck. > > ### Value records > > While records have a lot in common with value classes -- they are final and > their fields are final -- they are still identity classes. > (This sentence seems off, because as you go on to say, value records *aren't* identity classes) Records embody a > tradeoff: give up on decoupling the API from the representation, and in > return > get various syntactic and semantic benefits. Value classes embody another > tradeoff: give up identity, and get various semantic and performance > benefits. > If we are willing to give up both, we can get both sets of benefits. > > ``` > value record NameAndScore(String name, int score) { } > ``` > > Value records combine the data-carrier idiom of records with the improved > scalarization and flattening benefits of value classes. > Do we have a good use case for an identity record? You might want to mention that we'd expect most records to become value records (don't we?). > In theory, it would be possible to apply `value` to certain enums as well, > but > this is not currently possible because the `java.lang.Enum` base class that > enums extend do not meet the requirements for superclasses of value > classes (it > has fields and non-empty constructors). > I would think that whether an enum is a value or not doesn't tend to matter much? ## Unboxing values for flatness and density > > Value classes shed object identity, gaining a host of performance and > predictability benefits in the process. They are an ideal replacement for > many > of today's value-based classes, fully preserving their semantics (except > for the > accidental identity these classes never wanted). But identity-free > reference > types are only one point a spectrum of tradeoffs between abstraction and > performance, and other desired use cases -- such as numerics -- may want a > different set of tradeoffs. > > Reference types are nullable, and therefore must account for null somehow > in > their representation, which may involve additional footprint. Similarly, > they > offer the initialization safety guarantees for final fields that we come to > expect from identity objects, which may entail limits on flatness. For > certain > use cases, it may be desire to additionally give up something else to make > further flatness and footprint gains -- and that something else is > reference-ness. > > The built-in primitives are best understood as _pairs_ of types: a > primitive > type (e.g., `int`) and its reference companion or box (`Integer`), with > conversions between the two (boxing and unboxing.) We have both types > because > the two have different characteristics. Primitives are optimized for > efficient > storage and access: they are not nullable, they tolerate uninitialized > (zero) > values, and larger primitive types (`long`, `double`) may tear under racy > access. References err on the side of safety and flexibility; they support > nullity, polymorphism, and offer initialization safety (freedom from > tearing), > but by comparison to primitives, they pay a footprint and indirection > cost. > > For these reasons, value classes give rise to pairs of types as well: a > reference type and a _value companion type_. We've seen the reference > type so > far; for a value class `Point`, the reference type is called `Point`. > (The full > name for the reference type is `Point.ref`; `Point` is an alias for > that.) The > value companion type is called `Point.val`, and the two types have the same > conversions between them as primitives do today with their boxes. (If we > are > talking explicitly about the value companion type of a value class, we may > sometimes describe the corresponding reference type as its _reference > companion_.) > > ``` > value class Point implements Serializable { > int x; > int y; > > Point(int x, int y) { > this.x = x; > this.y = y; > } > > Point scale(int s) { > return new Point(s*x, s*y); > } > } > ``` > > The default value of the value companion type is the one for which all > fields > take on their default value; the default value of the reference type is, > like > all reference types, null. > > In our diagram, these new types show up as another entity that straddles > the > line between primitives and identity-free references, alongside the legacy > primitives: > > ** UPDATE DIAGRAM ** > >
> > Java field types with extended
> primitives > >
> > ### Member access > > Both the reference and value companion types are seen to have the same > instance > members. Unlike today's primitives, value companion types can be used as > receivers to access fields and invoke methods, subject to accessibility > constraints: > > ``` > Point.val p = new Point(1, 2); > assert p.x == 1; > > p = p.scale(2); > assert p.x == 2; > ``` > Maybe clarify that this isn't because p is getting boxed. I like to point out that "we might be used to thinking of `.` as a 'dereference operator', but it's always been just a member access expression; the runtime will dereference IF necessary to carry that out." ### Polymorphism > > When we declare a class today, we set up a subtyping (is-a) relationship > between > the declared class and its supertypes. When we declare a value class, we > set up > a subtyping relationship between the _reference type_ and the declared > supertypes. > Beating dead horse, just, again it makes it sound like two different things are happening when it could emphasize that the same thing is happening in both cases. > This means that if we declare: > > ``` > value class UnsignedShort extends Number > implements Comparable { > ... > } > ``` > > then `UnsignedShort` is a subtype of `Number` and > `Comparable`, > and we can ask questions about subtyping using `instanceof` or pattern > matching. > What happens if we ask such a question of the value companion type? > > ``` > UnsignedShort.val us = ... > if (us instanceof Number) { ... } > ``` > > Since subtyping is defined only on reference types, the `instanceof` > operator > (and corresponding type patterns) will behave as if both sides were lifted > to > the approrpriate reference type, and we can answer the question that way. > So ... this will yield `true`? I hope that is useful enough to pay for the deeper confusions it might sow. Who knows, maybe I'll need to loosen up on this, but I have assumed that we do want/need users to understand that `UnsignedShort.val` and `short` are monomorphic, having no supertypes or subtypes. > (This > may trigger fears of expensive boxing conversions, but in reality no actual > allocation will happen.) > > We introduce a new relationship based on `extends` / `implements` clauses, > which > we'll call "extends"; we define `A extends B` as meaning `A <: B` when A > is a > reference type, and `A.ref <: B` when A is a value companion type. The > `instanceof` relation, reflection, and pattern matching are updated to use > "extends". > (This will make some readers want to hear your explanation of why it isn't easier to just say that `A <: A.ref` and be done with it) ### Arrays > > Arrays of reference types are _covariant_; this means that if `A <: B`, > then > `A[] <: B[]`. This allows `Object[]` to be the "top array type", at least > for > arrays of references. But arrays of primitives are currently left out of > this > story. We can unify the treatment of arrays by defining array covariance > over > the new "extends" relationship; if A extends B, then `A[] <: B[]`. For a > value > class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the > top > type for all arrays. > Isn't it really "value companion types" you want to talk about here -- then primitives get it for free when we cover that they are becoming just VCTs? ### Equality > > Just as with `instanceof`, we define `==` on values by appealing to the > reference companion (though no actual boxing need occur). Evaluating `a > == b`, > where one or both operands are of a value companion type, can be defined > as if > the operands are first converted to their corresponding reference type, > and then > comparing the results. This means that the following will succeed: > > ``` > Point.val p = new Point(3, 4); > Point pr = p; > assert p == pr; > ``` > > The base implementation of `Object::equals` delegates to `==`, which is a > suitable default for both reference and value classes. > > ### Serialization > > If a value class implements `Serializable`, this is also really a statement > about the reference type. Just as with other aspects described here, > serialization of value companions can be defined by converting to the > corresponding reference type and serializing that, and reversing the > process at > deserialization time. > > Serialization currently uses object identity to preserve the topology of an > object graph. This generalizes cleanly to objects without identity, > because > `==` on value objects treats two identical copies of a value object as > equal. > So any observations we make about graph topology prior to serialization > with > `==` are consistent with those after deserialization. > > ### Identity-sensitive operations > > Certain operations are currently defined in terms of object identity. As > we've > already seen, some of these, like equality, can be sensibly extended to > cover > all instances. > As you know, I object to calling this `==` behavior "sensible". It is forced by compatibility and isn't what users really want, but will be close enough to what they want often enough to get them into trouble. > Others, like synchronization, will become partial. > Identity-sensitive operations include: > > - **Equality.** We extend `==` on references to include references to > value > objects. Where it currently has a meaning, the new definition > coincides > with that meaning. > > - **System::identityHashCode.** The main use of `identityHashCode` is > in the > implementation of data structures such as `IdentityHashMap`. We can > extend > `identityHashCode` in the same way we extend equality -- deriving a > hash on > primitive objects from the hash of all the fields. > s/primitive/value/? > > - **Synchronization.** This becomes a partial operation. If we can > statically detect that a synchronization will fail at runtime > (including > declaring a `synchronized` method in a value class), we can issue a > compilation error; if not, attempts to lock on a value object results > in > `IllegalMonitorStateException`. This is justifiable because it is > intrinsically imprudent to lock on an object for which you do not have > a > clear understanding of its locking protocol; locking on an arbitrary > `Object` or interface instance is doing exactly that. > > - **Weak, soft, and phantom references.** Capturing an exotic reference > to a > value object becomes a partial operation, as these are intrinsically > tied to > reachability (and hence to identity). However, we will likely make > enhancements to `WeakHashMap` to support mixed identity and value > keys. > > ### What about Object? > > The root class `Object` poses an unusual problem, in that every class must > extend it directly or indirectly, but it is also instantiable > (non-abstract), > and its instances have identity -- it is common to use `new Object()` as a > way > to obtain a new object identity for purposes of locking. > ... left me hanging! > ## Why two types? > > It is sensible to ask: why do we need companion types at all? This is > analogous > to the need for boxes in 1995: we'd made one set of tradeoffs for > primitives, > favoring performance (non-nullable, zero-default, tolerant of > non-initialization, tolerant of tearing under race, unrelated to > `Object`), and > another for references, favoring flexibility and safety. Most of the > time, we > ignored the primitive wrapper classes, but sometimes we needed to > temporarily > suppress one of these properties, such as when interoperating with code > that > expects an `Object` or the ability to express "no value". The reasons we > needed > boxes in 1995 still apply today: sometimes we need the affordances of > references, and in those cases, we appeal to the reference companion. > > Reasons we might want to use the reference companion include: > > - **Interoperation with reference types.** Value classes can implement > interfaces and extend classes (including `Object` and some abstract > classes), > which means some class and interface types are going to be polymorphic > over > both identity and primitive objects. This polymorphism is achieved > through > object references; a reference to `Object` may be a reference to an > identity > object, or a reference to a value object. > > - **Nullability.** Nullability is an affordance of object _references_, > not > objects themselves. Most of the time, it makes sense that primitive > types > are non-nullable (as the primitives are today), but there may be > situations > where null is a semantically important value. Using the reference > companion > when nullability is required is semantically clear, and avoids the need > to > invent new sentinel values for "no value." > > This need comes up when migrating existing classes; the method > `Map::get` > uses `null` to signal that the requested key was not present in the > map. But, > if the `V` parameter to `Map` is a primitive class, `null` is not a > valid > value. We can capture the "`V` or null" requirement by changing the > descriptor of `Map::get` to: > Three more stale references to "primitive class" here? > > ``` > public V.ref get(K key); > ``` > > where, whatever type `V` is instantiated as, `Map::get` returns the > reference > companion. (For a type `V` that already is a reference type, this is > just `V` > itself.) This captures the notion that the return type of `Map::get` > will > either be a reference to a `V`, or the `null` reference. (This is a > compatible change, since both erase to the same thing.) > > > - **Self-referential types.** Some types may want to directly or > indirectly > refer to themselves, such as the "next" field in the node type of a > linked > list: > > ``` > class Node { > T theValue; > Node nextNode; > } > ``` > > We might want to represent this as a value class, but if the type of > `nextNode` were `Node.val`, the layout of `Node` would be > self-referential, since we would be trying to flatten a `Node` into its > own > layout. > > - **Protection from tearing.** For a value class with a non-atomic value > companion type, we may want to use the reference companion in cases > where we > are concerned about tearing; because loads and stores of references are > atomic, `P.ref` is immune to the tearing under race that `P.val` might > be > subject to. > > - **Compatibility with existing boxing.** Autoboxing is convenient, in > that it > lets us pass a primitive where a reference is required. But boxing > affects > far more than assignment conversion; it also affects method overload > selection. The rules are designed to prefer overloads that require no > conversions to those requiring boxing (or varargs) conversions. Having > both > a value and reference type for every value class means that these rules > can > be cleanly and intuitively extended to cover value classes. > > ## Refining the value companion > > Value classes have several options for refining the behavior of the value > companion type and how they are exposed to clients. > > ### Classes with no good default value > > For a value class `C`, the default value of `C.ref` is the same as any > other > reference type: `null`. For the value companion type `C.val`, the default > value > is the one where all of its fields are initialized to their default value. > > > The built-in primitives reflect the design assumption that zero is a > reasonable > default. The choice to use a zero default for uninitialized variables was > one > of the central tradeoffs in the design of the built-in primitives. It > gives us > a usable initial value (most of the time), and requires less storage > footprint > than a representation that supports null (`int` uses all 2^32 of its bit > patterns, so a nullable `int` would have to either make some 32 bit signed > integers unrepresentable, or use a 33rd bit). This was a reasonable > tradeoff > for the built-in primitives, and is also a reasonable tradeoff for many > (but not > all) other potential value classes (such as complex numbers, 2D points, > half-floats, etc). > > But for others potential value classes, such as `LocalDate`, there _is_ no > reasonable default. If we choose to represent a date as the number of days > since some some epoch, there will invariably be bugs that stem from > uninitialized dates; we've all been mistakenly told by computers that > something > will happen on or near 1 January 1970. Even if we could choose a default > other > than the zero representation, an uninitialized date is still likely to be > an > error -- there simply is no good default date value. > > For this reason, value classes have the choice of encapsulating or exposing > their value companion type. If the class is willing to tolerate an > uninitialized (zero) value, it can freely share its `.val` companion with > the > world; if uninitialized values are dangerous (such as for `LocalDate`), it > can > be encapsulated to the class or package. > > Encapsulation is accomplished using ordinary access control. By default, > the > value companion is `private`, and need not be declared explicitly; a class > that > wishes to share its value companion can make it public: > > ``` > public value record Complex(double real, double imag) { > public value companion Complex.val; > } > ``` > Elephant in the room: so I can name it something else? > ### Atomicity and tearing > > For the primitive types longer than 32 bits (long and double), it is not > guaranteed that reads and writes from different threads (without suitable > coordination) are atomic with respect to each other. The result is that, > if > accessed under data race, a long or double field or array element can be > seen to > "tear", and a read might see the low 32 bits of one write and the high 32 > bits > of another. (Declaring the containing field `volatile` is sufficient to > restore > atomicity, as is properly coordinating with locks or other concurrency > control, > or not sharing across threads in the first place.) > > This was a pragmatic tradeoff given the hardware of the time; the cost of > 64-bit > atomicity on 1995 hardware would have been prohibitive, and problems only > arise > when the program already has data races -- and most numeric code deals with > thread-local data. Just like with the tradeoff of nulls vs zeros, the > design of > the built-in primitives permits tearing as part of a tradeoff between > performance and correctness, where primitives chose "as fast as possible" > and > reference types chose more safety. > > Today, most JVMs give us atomic loads and stores of 64-bit primitives, > because > the hardware makes them cheap enough. But value classes bring us back to > 1995; atomic loads and stores of larger-than-64-bit values are still > expensive > on many CPUs, leaving us with a choice of "make operations on primitives > slower" > or permitting tearing when accessed under race. > > It would not be wise for the language to select a one-size-fits-all policy > about > tearing; choosing "no tearing" means that types like `Complex` are slower > than > they need to be, even in a single-threaded program; choosing "tearing" > means > that classes like `Range` can be seen to not exhibit invariants asserted by > their constructor. Class authors have to choose, with full knowledge of > their > domain, whether their types can tolerate tearing. The default is no > tearing > (safe by default); a class can opt for greater flattening at the cost of > potential tearing by declaring the value companion as `non-atomic`: > > ``` > public value record Complex(double real, double imag) { > public non-atomic value companion Complex.val; > } > ``` > > For classes like `Complex`, all of whose bit patterns are valid, this is > very > much like the choice around `long` in 1995. For other classes that might > have > nontrivial representational invariants, they likely want to stick to the > default > of atomicity. > I just think many readers are going to think "well of course I need to be safe from this terrible tearable thing" without realizing that this only even comes up when someone uses it in a wrong or risky way. An extra reminder of this might be helpful. ## Migrating legacy primitives > > As part of generalizing primitives, we want to adjust the built-in > primitives to > behave as consistently with value classes as possible. While we can't > change > the fact that `int`'s reference companion is the oddly-named `Integer`, we > can give them > more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an > alias for > `Integer.val`) -- so that we can use a consistent rule for naming > companions. > Similarly, we can extend member access to the legacy primitives, and allow > `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) > > We will redeclare `Integer` as a value class with a public value companion: > > ``` > value class Integer { > public value companion Integer.val; > > // existing methods > } > ``` > > where the type name `int` is an alias for `Integer.val`. > Many people will wonder how that is going to work since the class currently contains `int value` which becomes circular. Do you want to mention the 2 candidate solutions for that we've discussed? The primitive array > types will be retrofitted such that arrays of primitives are subtypes of > arrays > of their boxes (`int[] <: Integer[]`). > > ## Unifying primitives with classes > > Earlier, we had a chart of the differences between primitive and reference > types: > > | Primitives | > Objects | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values) | > Identity | > | `==` compares values | `==` compares object > identity | > | Built-in | Declared in > classes | > | No members (fields, methods, constructors) | Members (including mutable > fields) | > | No supertypes or subtypes | Class and interface > inheritance | > | Accessed directly | Accessed via object > references | > | Not nullable | > Nullable | > | Default value is zero | Default value is > null | > | Arrays are monomorphic | Arrays are > covariant | > | May tear under race | Initialization safety > guarantees | > | Have reference companions (boxes) | Don't need reference > companions | > > The addition of value classes addresses many of these directly. Rather > than > saying "classes have identity, primitives do not", we make identity an > optional > characteristic of classes (and derive equality semantics from that.) > Rather > than primitives being built in, we derive all types, including primitives, > from > classes, and endow value companion types with the members and supertypes > declared with the value class. Rather than having primitive arrays be > monomorphic, we make all arrays covariant under the `extends` relation. > > The remaining differences now become differences between reference types > and > value types: > > | Value types | Reference > types | > | --------------------------------------------- | > -------------------------------- | > | Accessed directly | Accessed via object > references | > | Not nullable | > Nullable | > | Default value is zero | Default value is > null | > | May tear under race, if declared `non-atomic` | Initialization safety > guarantees | > > > ### Choosing which to use > > How would we choose between declaring an identity class or a value class, > and > the various options on value companiones? Here are some quick rules of > thumb: > > - If you need mutability, subclassing, or aliasing, choose an identity > class. > - If uninitialized (zero) values are unacceptable, choose a value class > with > the value companion encapsulated. > - If you have no cross-field invariants and are willing to tolerate > tearing to > enable more flattening, choose a value class with a non-atomic value > companion. > > ## Summary > > Valhalla unifies, to the extent possible, primitives and objects. The > following table summarizes the transition from the current world to > Valhalla. > > | Current World | > Valhalla | > | ------------------------------------------- | > --------------------------------------------------------- | > | All objects have identity | Some objects have > identity | > | Fixed, built-in set of primitives | Open-ended set of > primitives, declared via classes | > Intentional use of "primitives" still? I would think it should say "value types no longer limited to just the built-in primitives". > | Primitives don't have methods or supertypes | Primitives are classes, > with methods and supertypes | > | Primitives have ad-hoc boxes | Primitives have > regularized reference companions | > | Boxes have accidental identity | Reference companions have > no identity | > | Boxing and unboxing conversions | Primitive reference and > value conversions, but same rules | > | Primitive arrays are monomorphic | All arrays are > covariant | > > > [valuebased]: > https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html > [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 > [jep390]: https://openjdk.java.net/jeps/390 > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sun Jul 3 01:40:54 2022 From: john.r.rose at oracle.com (John Rose) Date: Sat, 02 Jul 2022 18:40:54 -0700 Subject: User model stacking: current status In-Reply-To: References: Message-ID: <7941CDDB-38CC-4813-871D-8C65D58FD186@oracle.com> On 5 May 2022, at 12:21, Brian Goetz wrote: >> There are lots of other things to discuss here, including a >> discussion of what does non-atomic B2 really mean, and whether there >> are additional risks that come from tearing _between the null and the >> fields_. > > So, let's discuss non-atomic B2s.? (First, note that atomicity is > only relevant in the heap; on the stack, everything is > thread-confined, so there will be no tearing.) > > If we have: > > ??? non-atomic __b2 class DateTime { > ??????? long date; > ??????? long time; > ??? } > > then the layout of a B2 (or a B3.ref) is really (long, long, boolean), > not just (long, long), because of the null channel.? (We may be able > to hide the null channel elsewhere, but that's an optimization.) That goes straight to a desired optimization, but it leaves something valuable in the dust. The valuable thing is one of the ?affordances of references?, which is that a reference to an immutable value can be safely published. This is a core feature of the JMM that applies to all value-based classes. The behavior you are citing is inconsistent with a reference to an object containing an immutable field (of type `DateTime.val`). It is consistent with a reference to a mutable field or to an array of type `DateTime.val[]`, but none of our current wrapper types work like that. (Arrays do, which is a problem with arrays.) I see how you got there: You want to apply full flattening to `DateTime.ref`, simply adding a boolean. That?s a nice data structure but it departs from how we expect boxing of values to work. There are extra races between the components of `DateTime`, as well as a race between null and non-null states. With today?s value-based classes, a mutable `DateTime` reference will only show races between null and non-null, and between earlier and later pairs of field values. With this proposed feature, a mutable reference will act as if the wrapper object being referenced were no longer immutable, and not safely published. I think this is too much of a sharp edge, even for an opt-in feature. What I would prefer here is a principle that boxes (including `DateTime.ref`) are always safely publishable. The possibility to race on individual object states should be confined to the value companion type. That somewhat reduces the optimization for heap variables of reference type of non-atomics. I think that?s a fine price to pay, in order to avoid putting new exceptions into the JMM?s current assurances about safe publication. The optimizations on the val-companion are unaffected. This is good: Reasoning about strange race conditions can concentrate around uses of the val-companion, and all uses of the ref-companion would be race-safe. Part of my discomfort here is that when we say that the fields of a value-based class are final is that we are telling users their instances can be safely published. I don?t want to claw that back, even for a corner case like explicitly non-atomic value classes. I do see that there could be a workaround, if a class `Foo` allowed field-races even on its reference companion `Foo.ref`: Manually make a value-based wrapper class `AtomicFoo` which is (implicitly declared as) atomic and has a final `Foo` field as its sole payload. In that case I think the JMM will assure me (am I right?) that a variable of type `AtomicFoo` accesses a stable set of `Foo` fields, even if that `AtomicFoo` variable is updated by data races, because its nested field is not raced. And that should be true even if the JVM aggressively flattens `AtomicFoo` into the `Foo` fields plus two null channels. That?s all consistent, but I think it will cause bugs as people fumble around with a mix of `Foo` and `AtomicFoo` values in containers like `ArrayList` or `Object[]`. > > If two threads racily write (d1, t1) and (d2, t2) to a shared mutable > DateTime, it is possible for an observer to observe (d1, t2) or (d2, > t1).? Saying non-atomic says "this is the cost of data races". (So of course that?s OK for mutable copies of `DateTime.val`, but that?s not how references behave now or should behave in the future.) > But additionally, if we have a race between writing null and (d, t), > there is another possible form of tearing. > > Let's write this out more explicitly.? Suppose that T1 writes a > non-null value (d, t, true), and T2 writes null as (0, 0, false). Then > it would be possible to observe (0, 0, true), which means that we > would be conceivably exposing the zero value to the user, even though > a B2 class might want to hide its zero. This is another reason to confine races to the value companion, because we are making a plan to protect value companions specially, for cases like this. > So, suppose instead that we implemented writing a null as simply > storing false to the synthetic boolean field.? Then, in the event of > a race between reader and writer, we could only see values for date > and time that were previously put there by some thread.? This > satisfies the OOTA (out of thin air) safety requirements of the JMM. I think the right approach here is starting with the semantics of value-based classes (which include safe publication) and working out the allowed implementation techniques. The semantics of a flattened ref are (or should be) that it must behave *as if* it were a non-flattened ref. (?Should be?: We are talking optimization here, not a changeable variation in the user model adopted randomly as the JIT comes and goes.) A ref, in fact, to a VBC. A non-flattened ref is a thing which you first query as to null-ness, and then if non-null you can load the VBC?s field or fields. (Without races.) So if there is a null channel sitting inside or next to some data fields, the read-access code has to first check for null, and if not null then to load a consistent view of the fields, in such a way that racing writes of null or other values do not impair the consistency. The write-access code can write null by asserting the null flag and (as others have observed) it is an implementation puzzle whether to ?clear out? the other storage. (My take is that the JVM could do this during GC at a safepoint, but it is hard to do so at other times.) The write-access code can write non-null by (atomically) setting the field values and then (if that did not already de-assert the null channel) de-asserting the null channel. Again, the fields should be written as a group consistently, so as not to interfere with racing reads or writes. The null channel need not be written consistently. All this would imply that the size of a flattened ref, perhaps including its null channel, should be no larger than a naturally atomic unit of memory, which is 64 or maybe 128 bits today. Your argument above, which I think I buy, is that is also probably possible to place the null channel outside of the naturally atomic unit that contains the other fields; this would allow 9-byte and 17-byte refs. Such a racing null channel, with non-racing payload fields, can be modeled in classic Java in the JMM like this: ``` class RacyNullable { private non-final boolean isNull = true; private static final Object GARB = new Object(); //any value OK, even null private non-final Object v = GARB; //null and GARB never observed public V get() { return isNull ? null : (V) v; } public void set(V v) { if (v == null) { isNull = true; if (EAGER_CLEANUP) cleanup(); } else { this.v = v; /*race here!*/ isNull = false; } } private final boolean EAGER_CLEANUP = false; private void cleanup() { if (isNull) /*race here?*/ v = GARB; } } } ``` I think really nice flattened refs can be built with ?as if? semantics the follow that pattern. They won?t flatten quite as well as some of the ?no holds barred? cases discussed by the EG, but they would behave? ?as if? ?they follow the JMM without surprises. The one race (outside of the cleanup method) is innocuous if the cleanup method is used with restraint. How to do that is a puzzle. > ? > So we have a choice for how we implement writing nulls, with a > pick-your-poison consequence: > > ?- If we do a wide write, and write all the fields to zero, we risk > exposing a zero value even when the zero is a bad value; Yes, that?s like flipping `EAGER_CLEANUP` above. (After if we go to the trouble of making `C.val` access-controlled, let?s not make racy refs let the cat back out of the bag!) > ?- If we do a narrow write, and only write the null field, we risk > pinning other OOPs in memory That?s the one I prefer. I think it?s actually a reasonable thing to try for. Basically, the GC would have to special-case those fields in a similar way that it special-cases weak-reference fields. For WR?s the GC clears them under certain non-local conditions. In this case the GC would clear them under a very local condition, the setting of the null channel. GC folks growl about requests like this, but I think this one is reasonable. ? John P.S. Next up, a long-ish study on how to put access control on `C.val`! -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sun Jul 3 01:53:53 2022 From: john.r.rose at oracle.com (John Rose) Date: Sat, 02 Jul 2022 18:53:53 -0700 Subject: Explicit name for value companion type In-Reply-To: <77365ac0-b4d7-c8b1-1aa2-b1f0fa443e2e@oracle.com> References: <77365ac0-b4d7-c8b1-1aa2-b1f0fa443e2e@oracle.com> Message-ID: <51B166F3-E866-4BCF-B3CE-7316817084D0@oracle.com> On 1 Jul 2022, at 5:39, Brian Goetz wrote: > Now that the model is settling, the messages in the suggestion box -- > many of them syntax-driven -- > are coming fast and furious now. Here's another. > > Summary: "Having to type .val will make users angry; can we please > find a way to make it the default at least sometimes." Having to guess the name of the value companion type will make a different set of users angry (including me). Also the spec-writers and teachers will lose a useful tool. My take: Start with a standard (dumb but effective) name like `C.val` for ?the value companion for class `C`?. Live with its boring but useful predictability for a while. Later, add some sort of separable ?type alias? mechanism that polishes not just this one flyspeck but cleans up a whole raft of naming issues. There are no shortages of examples Java might follow in this area. Another take: There is a basic difference between use-site and def-site declarations, and type aliases can (and should?) go on both sides. But it is natural and easy to put them on the use-site first, perhaps an ?import ? as ?? thing. Then maybe something that feels like a type-member, but is also an alias. Remember that type-members are easy to import, if you want to use their unqualified names. Yet one more take: Compiling alternative type names into class file linkage is a losing proposition. Let?s not. This is another reason to embrace `C.val` and the JVM?s Q-descriptors, instead of some complicated landscape of mapping tables that will be endlessly out of date. Type aliases need to be canonicalized at javac-time. Final thought: Notice that we don?t really need to do anything about this immediately; the type alias idea will keep. Let?s table it; we have enough to worry about. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sun Jul 3 01:55:14 2022 From: john.r.rose at oracle.com (John Rose) Date: Sat, 02 Jul 2022 18:55:14 -0700 Subject: User model stacking: current status In-Reply-To: <7941CDDB-38CC-4813-871D-8C65D58FD186@oracle.com> References: <7941CDDB-38CC-4813-871D-8C65D58FD186@oracle.com> Message-ID: P.S. Apologies for starting at the wrong end of the chain. Hope it was useful anyway! On 2 Jul 2022, at 18:40, John Rose wrote: > On 5 May 2022, at 12:21, Brian Goetz wrote: > >>> There are lots of other things to discuss here, including a discussion of what does non-atomic B2 really mean, and whether there are additional risks that come from tearing _between the null and the fields_. >> From forax at univ-mlv.fr Mon Jul 4 13:27:48 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 4 Jul 2022 15:27:48 +0200 (CEST) Subject: I still do not understand how things are supposed to work :( Message-ID: <1279597253.3703369.1656941268108.JavaMail.zimbra@u-pem.fr> Hi all, i still can not wrap my head on how the model is supposed to work from a user POV. The guideline seems to be that the value class should be used in API and .val companion should be used in implementation. I'm confused for mainly two reasons 1. this guideline is very similar to the idea of the container model, you only need .val for fields and arrays so it inherits of the shortcoming of the container model. 2. a .val is a kind of non nullable value class but an instance of a value class non-null by construction is not a .val type. 1. Parameterized type are flattened depending on the type arguments which is is part of the public API To be flattened, a List or a Stream should have a .val as type parameter but that means that the val companion is clearly part of the public API. This can not simply be fixed by casting a List to a List after creation because this is not a sound conversion, it introduces spurious NPEs. And because of inference, the type argument can be propagated from the type of a local variable/parameters, so the idea to confine .val to only the implementation does not seem to work. 2. There are a lot of syntax in Java, where we statically know that the result of an expression is not null but those expression are not typed .val - "this" should be type as a .val - new Point() should be typed as a .val (as Brian said) - Point point ...; point.getClass() should be typed Class. Does it mean that Point.class should be a val, otherwise point.getClass() == Point.class will not work ? - the binding of an instanceof with a type pattern (e.g. instanceof Point point) should be a .val - after seeing a requireNonNull(), an if (value != null), assert value != null, etc the type should be a val This leads to javac having a full null analysis to determine if a value class can be null or not, something we say we do not want. Here we are in a hard place, because from the user POV a .val is just a non-nullable value class but for the compiler .val is a full (synthetic) class. To summarize, i'm sure i'm missing something but from a user POV, i do not see the appeal of this model, because of 1, the rules where to put .val or not, are messy and because of 2 either we will have either a lot of peculiar cases to memorize (cf the list above) or a compiler that will be a hindrance because it will not understand what we will be saying. R?mi From heidinga at redhat.com Mon Jul 4 13:45:53 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Mon, 4 Jul 2022 09:45:53 -0400 Subject: Value type companions, encapsulated In-Reply-To: References: Message-ID: Sorry for top-posting but it was easier to track a list of issues as I read through: * Miscellaneous privatization checks --> MethodHandle.asType(MT) and MethodHandle.invoke() will also need to protect against the zero being introduced. For example: jshell> public class T { public static void z() {}} | created class T jshell> MethodHandles.lookup().findStatic(T.class, "z", MethodType.methodType(void.class)) $28 ==> MethodHandle()void jshell> Object o = $28.invoke() o ==> null Here we see invoke() converting a void to a reference (null) and similarly for a primitive, to zero. Both these apis will need similar treatment as ::explictCastArguments. Serialization --> There's a mention of serialization but if Lambda taught us anything, it's that serialization requires more thought than we expected, even if we take that into account =) We should spend some time on what serialization of a C.val actually means, any format concerns, and how it interacts with default reconstitution behaviours. Otherwise, we'll leave a hole here where unconstructed values can be deserialized. C.default & Reflection --> Is "default" a reflectively accessible field or compiler sugar? If a user does C.val.class.getDeclaredFields will it find "default"? Or maybe C.class.getDeclaredFields? I'm fine with it being a fiction but I wasn't clear how far we were pushing that into the reflective model as well. I think the intent is to expose this with Class::defaultValue / Lookup::defaultValue APIs but clarification would be good. Accessing C.val.class --> Do we need restrictions here beyond those of accessing C.class? The mirror may be required to create MethodTypes for use in MethodHandle lookup().find* apis even by code that can't create a C.val. Given that it will leak already as shown in the doc, do we need the extra restrictions? More thoughts and comments to follow after another read. --Dan On Sun, Jul 3, 2022 at 12:56 AM John Rose wrote: > > In this message Brian wrote out the major features > of an emerging design for value classes: > > From: Brian Goetz brian.goetz at oracle.com > To: ? valhalla-spec-experts at openjdk.java.net > Subject: Re: User model stacking: current status > Date: Thu, 23 Jun 2022 15:01:24 -0400 > > I think controlling the complexity by having a separate > nested declaration of the value companion type will > work very well. > > So what exactly does a private value companion do? > What is it you can and cannot do with this type? > What problems are prevented by privatizing it? > How and when is privatization enforced? > What other problems are created by those new rules? > > I have been pulling on this thread for a few days > now, and I think I have some answers. > > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html > > (The Hitchhiker?s Guide suddenly comes to mind. Don?t panic!) > > I expect I will be editing these files as we go. > For reference here is a verbatim copy of the MD file > as it stands right now (minus the header): > > Background > > (We will start with background information. The new stuff comes > afterward. Impatient readers can find a very quick summary of > restrictions at the end.) > > Affordances of C.ref > > Every class or interface C comes with a companion type, the > reference type C.ref derived from C which describes any variable > (argument, return value, array element, etc.) whose values are either > null or of a concrete class derived from C. We are not in the habit > of distinguishing C.ref from C, but the distinction is there. For > example, if we call Object::getClass on a variable of type C.ref > we might not get C.class; we might even get a null pointer > exception! > > We are so very used to working with reference types (for short, > ref-types) that we sometimes forget all that they do for us > in addition to their linkage to specific classes: > > C.ref gives a starting point for accessing C's members. > C.ref provides abstraction: C or a subtype might not be loaded yet. > C.ref provides the standard uninitialized value null. > C.ref can link C objects into graphs, even circular ones. > C.ref has a known size, one "machine word", carefully tuned by the JVM. > C.ref allows a single large object to be shared from many locations. > C.ref with an identity class can centralize access to mutable state. > C.ref values uniformly convert to and from general types like Object. > C.ref variable types can be reflected using Class mirror objects. > C.ref is safe for publication if the fields of C are final. > > When I store a bunch of C objects into an object array or list, sort > it, and then share it with another thread, I am using several of the > above properties; if the other thread down-casts the items to C.ref > and works on them it relies on those properties. > > If I implement C as a doubly-linked list data structure or a > (alternatively) a value-based class with tree structure, I am using > yet more of the above properties of references. > > If my C object has a lot of state and I pass out many pointers to > it, and perhaps compute and cache interesting values in its mutable > fields, I am again relying on the special properties of references, > as well as of identity classes (if fields are mutable). > > By the way, in the JVM, variables of type C.ref (some of them at > least) are associated not with C simple, but with the so-called > L-descriptor spelled LC;. When we talk about C.ref we are > usually talking about those L-descriptors in the JVM, as well. > > I don't need to think much about this portfolio of properties as I go > about my work. But if they were to somehow fail, I would notice bugs > in my code sooner or later. > > One of the big consequences of this overall design is that I can write > a class C which has full control over its instance states. If it is > mutable, I can make its fields private and ensure that mutations occur > only under appropriate locking conditions. Or if I declare it as a > value-based class, I can ensure that its constructor only allows > legitimate instances to be constructed. Under those conditions, I > know that every single instance of my class will have been examined > and accepted by the class constructor, and/or whatever factory and > mutator methods I have created for it. If I did my job right, not > even a race condition can create an invalid state in one of my > objects. > > Any instance state of C which has been reached without being > produced from a constructor, factory, mutator, or constant of C can > be called non-constructed. Of course, inside a class any state > whatever can be constructed, subject to the types of fields and so on. > But the author of the class gets to decide which states are > legitimate, and the decisions are enforced by access control at the > boundaries of the encapsulation. > > So if I code my class right, using access control to keep bad states > away from my clients, my class's external API will have no > non-constructed states. > > Costs of C.ref > > In that case why have value types at all, if references are so > powerful? The answer is that reference-based abstraction pays for its > benefits with particular costs, costs that Java programmers do not > always wish to pay: > > A reference (usually) requires storage for a pointer to the object. > A reference (usually) requires storage for a header embedded inside the object. > Access to an object's fields (usually) requires extra cycles to chase the pointer. > The GC expends effort administering a singular "home location" for every object. > Cache line invalidation near that home location can cause useless memory traffic. > A reference must be able to represent null; tightly-packed types like int and long would need to add an extra bit somewhere to cover this. > > The major alternative to references, as provided by Valhalla, is flat > objects, where object fields are laid out immediately in their > containers, in place of a pointer which points to them stored > elsewhere. Neither alternative is always better than the other, which > is why Java has both int and Integer types and their arrays, and > why Valhalla will offer a corresponding choice for value classes. > > Alternative affordances of C.val > > Now, instances of a value class can be laid out flat in their > containing variables. But they can also be "boxed" in the heap, for > classic reference-based access. Therefore, a value class C has not > one but two companion types associated it, not only the reference > companion C.ref but also the value companion C.val. Only value > classes have value companions, naturally. The companion C.val is > called a value type (or val-type for short), by contrast with any > reference type, whether Object.ref or C.ref. > > The two companion types are closely related and perform some of the > same jobs: > > C.ref and C.val both give a starting point for accessing C's members. > C.ref and C.val can link C objects into acyclic graphs. > C.ref and C.val values uniformly convert to and from general types like Object. > C.ref and C.val variable types can be reflected using Class mirror objects. > > For these jobs, it usually doesn't matter which type companion does > the work. > > Despite the similarities, many properties of a value companion type > are subtly different from any reference type: > > C.val is non-abstract: You must load its class file before making a variable. > C.val cannot nest except by reference; C cannot declare a C.val field. > C.val does not represent the value null. > C.val is routinely flattenable, avoiding headers and indirection pointers > C.val has configurable size, depending on C's non-static fields. > C.val heap variables (fields, array elements) are initialized to all-zeroes. > C.val might not be safe for publication (even though its fields are final). > > The JVM distinguishes C.val by giving it a different descriptor, a > so-called Q-descriptor of the form QC;, and it also provides a > so-called secondary mirror C.val.class which is similar to the > built-in primitive mirrors like int.class. > > As the Valhalla performance model notes, flattening may be expected > but is not fully guaranteed. A C.val stored in an Object > container is likely to be boxed on the heap, for example. But C.val > objects created as bytecode temporaries, arguments, and return values > are likely to be flattened into machine registers, and C.val fields > and array elements (at least below certain size thresholds) are also > likely to be flattened into heap words. > > As a special feature, C.ref is potentially flattenable if C is a > value class. There are additional terms and conditions for flattening > C.ref, however. If C is not yet loaded, nothing can be done: > Remember that reference types have full abstraction as one of their > powers, and this means building data structures that can refer to them > even before they are loaded. But a class file can request that the JVM > "peek" at a class to see if it is a value class, and if this request > is acted on early enough (at the JVM's discretion), then the JVM can > choose to lay out some or all C.ref values as flattened C.val > values plus a boolean or other sentinel value which indicates the > null state. > > Pitfalls of C.val > > The advantages of value companion types imply some complementary > disadvantages. Hopefully they are rarely significant, but they > must sometimes be confronted. > > C.val might need to load a class file which is somehow unloadable > C.val will fail to load if its instance layout directly or indirectly includes a C.val field or subfield > C.val will throw an exception if you try to assign a null to it. > C.val may have surprising costs for multi-word footprint and assignment (and so might C.ref if that is flattened) > C.val is initialized to its all-zero value, which might be non-constructed > C.val might allow data races on its components, creating values which are non-constructed > > The footprint issue shows up most strongly if you have many copies of > the same C.val value; each copy will duplicate all the fields, as > opposed many copies of the same C.ref reference, which are likely to > all point to a single heap location with one copie of all the fields. > > Flat value size can also affect methods like Arrays.sort, which > perform many assignments of the base type, and must move all fields on > each assignment. If a C.val array has many words per element, then > the costs of moving those words around may dominate a sort request. > For array sorting there are ways to reduce such costs transparently, > but it is still a "law of physics" that editing a whole data structure > will have costs proportional to the size of the edited portions of the > data structure, and C.ref arrays will often be somewhat more compact > than C.val arrays. Programmers and library authors will have to use > their heads when deciding between the new alternatives given by value > classes. > > But the last two pitfalls are hardest to deal with, because they both > have to do with non-constructed states. These states are the all-zero > state with the second-to-last pitfall, and (with the last pitfall) the > state obtained by mixing two previous states by means of a pair of > racing writes to the same mutable C.val variable in the heap. > Unlike reference types, value types can be manipulated to create these > non-constructed states even in well-designed classes. > > Now, it may be that a constructor (or factory) might be perfectly able > to create one of the above non-constructed states as well, no strings > attached. In that case, the class author is enforcing few or no > invariants on the states of the value class. Many numeric classes, > like complex numbers, are like this: Initialization to all-zeroes is > no problem, and races between components are acceptable, compared to > the costs of excluding races. > > (The reader may recall that early JVMs accepted races on the high > > and low halves of 64-bit integers as well; this is no longer a > widespread issue, but bigger value types like complex raise the same > issue again, and we need to provide class authors the same solution, > if it fits their class.) > > There are also some classes for which there are no good defaults, or > for which a good default is definitely not the all-zero bit pattern. > Authors of such types will often wish to make that bit pattern > inaccessible to their clients and provide some factory or constant > that gives the real default. We expect that such types will choose > the C.ref companion, and rely on the extra null checks to ensure > correct initialization. > > Other classes may need to avoid other non-constructed values that may > arise from data races, perhaps for reasons of reliability or security. > This is a subtle trade-off; very few class authors begin by asking > themselves about the consequences of data races on mutable members, > and even fewer will ask about races on whole instances of value > types, especially given that fields in value types are always > immutable. For this reason, we will set safety as the default, so > that a class (like complex numbers) which is willing to tolerate data > races must declare its tolerance explicitly. Only then will the JVM > drop the internal costs of race exclusion. > > Whether to tolerate the all-zero bit pattern is a simpler decision. > Still, it turns out to be useful to give a common single point of > declarative control to handle all non-constructed states, both > the default value of C.val and its mysterious data races. > > Privatization to the rescue > > (Here are the important details about the encapsulation of value > types. The impatient reader may enjoy the very quick summary of > restrictions at the end of this document.) > > In order to hide non-constructed states, the value companion C.val > may be privatized by the author of the class C. A privatized > value companion is effectively withdrawn from clients and kept private > to its own class (and to nestmates). Inside the class, the value > companion can be used freely, fully under control of the class author. > > But untrusted clients are prevented from building uninitialized fields > or arrays of type C.val. This prevents such clients from creating > (either accidentally or purposefully) non-constructed values of type > C.val. How privatization is declared and enforced is discussed in > the rest of this document. > > (To review, for those who skipped ahead, non-constructed values are > > those not created under control of the class C by constructors or > other accessible API points. A non-constructed value may be either an > uninitialized variable of C.val, or the result of a data race on a > shared mutable variable of type C.val. The class itself can work > internally with such values all day long, but we exclude external > access to them by default.) > > Atomicity as well > > As a second tactic, a value class C may select whether or not the > JVM enforces atomicity of all occurrences of its value companion > C.val. A non-atomic value companion is subject to data races, and > if it is not privatized, external code may misuse C.val variables > (in arrays or mutable fields) to create non-constructed values via > data races. > > A value companion which is atomic is not subject to data races. This > will be the default if the the class C does not explicitly request > non-atomicity. This gives safety by default and limits > non-constructed states to only the all-zero initial value. The > techniques to support this are similar to the techniques for > implementing non-tearing of variables which are declared volatile; > it is as if every variable of an atomic value variable has some (not > all) of the costs of volatility. > > The JVM is likely to flatten such an atomic value only up to the > largest available atomically settable memory unit, usually 128 bits. > Values larger than that are likely to be boxed, or perhaps treated > with some other expensive transactional technique. Containers that > are immutable can still be fully flattened, since they are not subject > to data races. > > The behavior of an atomic C.val is aligned with that of C.ref. A > reference to a value class C never admits data races on C's > fields. The reason for this is simple: A C.ref value is a C.val > instance boxed on the heap in a single immutable box-class field of > type C.val. (Actually, the JVM may partially or wholly flatten the > representation of C.ref if it can get away with it; full flattening > is likely for JVM locals and stack values, but any such secret > flattening is undetectable by the user.) Since it is final all the > way down (to C's fields) any C.ref value is safely published > without any possibility of data races. Therefore, an extra > declaration of non-atomicity in C affects only the value companion > C.val. > > It seems that there are use cases which justify all four combinations > of both choices (privatization and declared non-atomicity), although > it is natural to try to boil down the size of the matrix. > > C.val private & atomic is the default, and safest configuration > > hiding all non-constructed values outside of C and all data races > even inside of C. There are some runtime costs. > > C.val public & non-atomic is the opposite, with fewer runtime > > costs. It must be explicitly declared. It is desirable for > numerics like complex numbers, where all possible bitwise states are > meaningful. It is analogous to the situation of a naturally > non-atomic primitive like long. > > C.val public & atomic allows everybody to see the all-zero > > initial value but no other non-constructed states. This is > analogous to the situation of a naturally atomic primitive like > int. > > C.val private & non-atomic allows C complete control over the > > visibility of non-constructed states, but C also has the ability > to work internally on arrays of non-atomic elements. C should > take care not to leak internally-created flat arrays to untrusted > clients, lest they use data races to hammer non-constructed values > into those arrays. > > It is logically possible, but there does not seem to be a need, for > allowing a single class C to work with both kinds of arrays, atomic > and non-atomic. (In principle, the dynamic typing of Java arrays > would support this, as long as each array was configured at its > creation.) The effect of this can be simulated by wrapping a > non-atomic class C in another wrapper class WC which is atomic. > Then C.val[] arrays are non-atomic and WC.val[] arrays are atomic, > yet each kind of array can have the same "payload", a repeated > sequence of the fields of C. > > Privatization in code > > For source code and bytecode, privatization is enforced by performing > access checks on names. > > Privatization rules in the language > > We will stipulate that a value class C always has a value > companion type C.val, even if it is never declared or used. And we > give the author of C some control over how clients may use the type > C.val, in a manner roughly similar to nested member classes like > C.M. > > Specifically, the declaration of C always selects an access mode for > its value companion C.val from one of the following three choices: > > C.val is declared private > C.val is declared public > C.val is declared, but neither public nor private > > If C.val is declared private, then only nestmates of C may access > C.val. If it is neither public nor private, only classes in the > same runtime package as C may access it. If it is declared public, > then any class that can access C may also access C.val. > > As an independent choice, the declaration of C may select an atomicity for its value companion C.val` from one of the following two choices: > > C.val is explicitly declared non-atomic > C.val is not explicitly declared non-atomic, and is thus atomic > > If there is no explicit access declaration for C.val in the code of > C, then C.val is declared private and atomic. That is, we set the > default to the safest and most restrictive choice. > > In source code, these declarations are applied to explicit occurrences > of the type name C.val. The access modification of C.val is also > transferred to the implicitly declared name C.default > > The syntax looks like this: > > class C { > //only one of the following lines may be specified > //the first line is the default > private value companion C.val; //nestmates only > value companion C.val; //package-mates only > public value companion C.val; //all may access > // the non-atomic modifier may be present: > private non-atomic value companion C.val; > public non-atomic value companion C.val; > non-atomic value companion C.val; > } > > When a type name C.val or an expression C.default is > used by a class X, there are two access checks that occur. First, > access from X to the class C is checked according to the usual > rules of Java. If access to C is permitted, a second check is done > if the companion is not declared public. If the companion is > declared private, then X and C must be nestmates, or else access > will fail. If the companion is neither public nor private, then > X and C must be in the same package, or else access will fail. > > Example privatized value companion > > Here is an example of a class which refuses to construct its default > value, and which prevents clients from seeing that state: > > class C { > int neverzero; > public C(int x) { > if (x == 0) throw new IllegalArgumentException(); > neverzero = x; > } > public void print() { System.out.println(this); } > > private value companion C.val; //privatized (also the default) > > // some valid uses of C.val follow: > public C.val[] flatArray() { return new C.val[]{ this }; } > private static C.ref nonConstructedZero() { > return (new C.val[1])[0]; //OK: C.val private but available > } > public static C.ref box(C.val val) { return val; } //OK param type > public C.val unbox() { return this; } //OK return type > > // valid use of private C.default, with Lookup negotiation > public static > C.ref defaultValue(java.lang.reflect.MethodHandles.Lookup lookup) { > if (!lookup.in(C.class).hasFullPrivilegeAccess()) > return null; //?or throw > return C.default; //OK: default for me and maybe also for thee > } > } > > // non-nestmate client: > class D { > static void passByValue(C x) { > C.ref ref = box(x); //OK, although x is null-checked > if (false) box((C.ref) null); //would throw NPE > assert ref == x; > } > > static Object useValue(C x) { > x.unbox().print(); //OK, invoke method on C.val expression > var xv = x.unbox(); //OK, although C.val is non-denotable > xv.print(); //OK > //> C.val xv = x.unbox(); //ERROR: C.val is private > return xv; //OK, originally from legitimate method of C > } > > static Object arrays(C x) { > var a = x.flatArray(); > //> C.val[] va = a; //ERROR: C.val is private > Arrays.toString(a); //OK > C.ref[] a2 = a; //covariant array assignment > C.ref[] na = new C.ref[1]; > //> na = new C.val[1]; //ERROR: C.val is private > return a[0]; //constructed values only > } > } > > The above code shows how a privatized value companion can and cannot > be used. The type name may never be mentioned. Apart from that > restriction, client code can work with the value companion type as it > appears in parameters, return values, local variables, and array > elements. In this, a privatized companion behaves like other > non-denotable types in Java. > > Rationale: Note that a companion type is not a real class. > > Therefore it cannot appeal, precisely, to the existing provisions (in > JLS or JVMS) for enforcing class accessibility. But because it is a > type, and today nearly all types are classes (and interfaces), users > have a right to expect that encapsulation of companion types will > "feel like" encapsulation of type names. More precisely, users will > hope to re-use their knowledge about how type name access works when > reasoning about companion types. We aim to accommodate that hope. If > it works, users won't have to think very often about the class-vs-type > distinction. That is why the above design emulates pre-existing > usage patterns for non-denotable types. > > Privatization in translation > > When a value class is compiled to a class file, some metadata is > included to record the explicit declaration or implicit status of the > value companion. > > The access selection of C's value companion (public, package, > private) is encoded in the value_flags field of the ValueClass > attribute of the class information in the class file of C. > > The value_flags field (16 bits) has the following legitimate values: > > zero: C.val default access, non-atomic > ACC_PUBLIC: C.val public access, non-atomic > ACC_PRIVATE: C.val private access, non-atomic > ACC_VOLATILE: C.val default access, atomic > ACC_VOLATILE|ACC_PUBLIC: C.val public access, atomic > ACC_VOLATILE|ACC_PRIVATE: C.val private access, atomic > > Other values are rejected when the class file is loaded. > > (JVM ISSUE #0: Can we kill the ACC_VALUE modifier bit? Do we > really care that jlr.Modifiers kind-of wants to own the reflection > of the contextual modifier value? Who are the customers of this > modifier bit, as a bit? John doesn't care about it personally, and > thinks that if we are going to have an attribute we can get rid of the > flag bit. One implementation issue with killing ACC_VALUE is that > class modifiers are processed very late during class loading, while > class modifiers are processed very early. It may be easier to do some > kinds of structural checks on the fly during class loading even before > class attributes are processed. Yet this also seems like a poor > reason to use a modifier bit.) > > (JVM ISSUE #1: What if the attribute is missing; do we reject the > class file or do we infer value_flags=ACC_PRIVATE|ACC_VOLATILE? > Let's just reject the file.) > > (JVM ISSUE #2: Is this ValueClass attribute really a good place > to store the "atomic" bit as well? This attribute is a green-field > for VM design, as opposed to the brown-field of modifier bits. The > above language assumes the atomic bit belongs in there as well.) > > A use of a value companion C.val, in any source file, is generally > translated to a use of a Q-descriptor QC;: > > a field declaration of C.val translates to a field-info with a Q-descriptor > a method or constructor declaration that mentions C.val mentions a corresponding Q-descriptor in its method descriptor > a use of a field resolves a CONSTANT_Fieldref with a Q-descriptor component > a use of a method or constructor uses a CONSTANT_Methodref (or CONSTANT_InterfaceMethodref) with a Q-descriptor component > a CONSTANT_Class entry main contain a Q-descriptor or an array type whose element type is a Q-descriptor > a verifier type record may refer to CONSTANT_Class which contains a Q-descriptor > > Privatization is enforced for these uses only as much as is needed to > ensure that classes cannot create unintiialized values, fields, and > arrays. > > If an access from bytecode to a privatized Q-descriptor fails, an > exception is thrown; its type is IllegalAccessError, a subtype of > IncompatibleClassChangeError. Generally speaking such an exception > diagnoses an attempt by bytecode to make an access that would have > been prevented by the static compiler, if the Java source program had > been compiled together as a whole. > > When a field of Q-descriptor type is declared in a class file, the > descriptor is resolved early, before the class is linked, and that > resolution includes an access check which will fail unless the class > being loaded has access to C.val, as determined by loading C and > inspecting its ValueClass attribute. These checks prevent untrusted > clients of C from created non-constructed zero values, in any of > their fields. > > The timing of these checks, on fields, is aligned with the internal > logic of the JVM which consults the class file of C to answer other > related questions about field types: (a) whether C is in fact a > value class, and (b) what is the layout of C.val, in case the JVM > wishes to flatten the value in a containing field. The third check > (c) is C.val companion accessible happens at the same time. This is > early during class loading for non-static fields, and during > class preparation for static fields. > > Privatization is not enforced for non-field Q-descriptors, that > occur in method and constructor signatures, and in state descriptions > for the verifier. This is because mere use of Q-descriptors to > describe pre-existing values cannot (by itself) expose non-constructed > values, when those values are on stack or in locals. > > This can happen invisible at the source-code level as well. An API > > might be designed to return values of a privatized type from its > methods or fields, and/or accept values of a privatized type into its > methods, constructors, or fields. In general, the bytecode for a > client of such an API will work with a mix of Q-descriptor and > L-descriptor values. > > The verifier's type system uses field descriptor types, and thus can > "see" both Q-descriptors and L-descriptors. Clients of a class with a > privatized companion are likely to work mostly with L-descriptor > values but may also have Q-descriptor values in locals and on stack. > > When feeding an L-descriptor value to an API point that accepts a > Q-descriptor, the verifier needs help to keep the types straight. In > such cases, the bytecode compiler issues checkcast instructions to > adjust types to keep the verifier happy, and in this case the operand > of the checkcast would be of the form CONSTANT_Class["QC;"]. > > (JVM ISSUE #3: The Q/L distinction in the verifier helps the > interpreter avoid extra dynamic null checks around putfield, > putstatic, and the invoke instructions. This distinction requires > an explicit bytecode to fix up Q/L mismatches; the checkcast > bytecode serves this purpose. That means checkcast requires the > ability to work with privatized types. It requires us to make the > dynamic permission check when other bytecodes try to use the > privatized type. All this seems acceptable, but we could try to make > a different design which CONSTANT_Class resolution fails immediately > if it contains an inaccessible Q-descriptor. That design might > require a new bytecode which does what checkcast does today on a > Q-descriptor.) > > Meanwhile, arrays are rich sources of non-constructed zero values. > They appear in bytecode as follows: > > A C.val[] array construction uses anewarray with a CONSTANT_Class type for the Q-descriptor; this is new to Valhalla. > Such an array construction may also use multianewarray with an appropriate array type. > An array element is read from heap to stack by aaload; the verifier type of the stacked value is copied from the verifier type of the array itself. > An array element is written from stack to heap by aastore; the verifier type of the stored value is merely constrained to the type Object. > > Note that there are no static type annotations on array access > instruction. The practical impact of this is that, if an array of a > privatized type C.val is passed outside of C, then any values in > that array become accessible outside of C. Moreover, if C.val is > non-atomic, clients may be able to inflict data races on the array. > > Thus, the best point of control over misuse of arrays is their > creation, not their access. Array creation is controlled by > CONSTANT_Class constant pool entries and their access checking. > When an anewarray or multianewarray tries to create an array, > the CONSTANT_Class constant pool entry it uses must be consulted > to see if the element type is privatized and inaccessible to the > current class, and IllegalAccessError thrown if that is the case. > > All this leads to special rules for resolving an entry of the form > CONSTANT_Class["QC;"]. When resolving such a constant, the class > file for C is loaded, and C is access checked against the current > class. (This is just what happens when CONSTANT_Class["C"] gets > resolved.) Next, the ValueClass attribute for C is examined; it > must exist, and if it indicates privatization of C.val, then access > is checked for C.val against the current class. > > If that access to a privatized companion would fail, no exception is > thrown, but the constant pool entry is resolved into a special > restricted state. Thus, a resolved constant pool entry of the form > CONSTANT_Class["QC;"] can have the following states: > > Error, because C is inaccessible or doesn't exist or is not a value class. > Full resolution, so C.val is ready for general use in the current class. > Restricted resolution, so C.val is ready for restricted use in the current class. > > That last state happens when C is accessible but C.val is not. > > Likewise, a constant pool entry of the form CONSTANT_Class["[QC;"] > (or a similar form with more leading array brackets) can have three > states, error, full resolution, and restricted resolution. > > Pre-Valhalla CONSTANT_Class entries which do not mention > Q-descriptors have only two resolved states, error and full > resolution. > > As required above, the checkcast bytecode treats full resolution and > restricted resolution states the same. > > But when the anewarray or multianewarray instruction is executed, > it consults throws an access error if its CONSTANT_Class is not > fully resolved (either it is an error or is restricted). This is how > the JVM prevents creation of arrays whose component type is an > inaccessible value companion type, even if the class file does > not correspond to correct Java source code. > > Here are all the classfile constructs that could refer to a > CONSTANT_Class constant in the restricted state, and whether they > respect it (throwing IllegalAccessError): > > checkcast ignores the restriction and proceeds > instanceof ignores the restriction (consistent with checkcast) > anewarray and multianewarray respect the restriction and throw > ldc throws (consistent with C.val.class in source code) > bootstrap arguments throw (consistent with ldc) > verifier types ignore the restriction and continue checking > (FIXME: There must be more than this.) > > Q-descriptors not in CONSTANT_Class constants are naturally immune > to privatization restrictions. In particular, CONSTANT_Methodtype > constants can successfully refer to mirrors to privatized companions. > > Uses of CONSTANT_Class constants which forbid Q-descriptors and > their arrays are also naturally immune, since they will never > encounter a constant resolved in the restricted state. These include > new, aconst_init, the class sub-operands of CONSTANT_Methodref > and its friends, exception catch-types, and various attributes like > NestHost and InnerClasses: All of the above are allowed to refer > only to proper classes, and not to their value companions or arrays. > > Nevertheless, a aconst_init bytecode must throw an access error when > applied to a class with an inaccessible privatized value companion. > This is worth noting because the constant pool entry for aconst_init > does not mention a Q-descriptor, unlike the array construction > bytecodes. > > Perhaps regular class constants of the form CONSTANT["C"] would > > also benefit slightly from a restricted state, which would be > significant only to the aconst_init bytecode, and ignored by all > the above "naturally immune" usages. If a JVM implementation takes > this option, the same access check would be performed and recorded for > both CONSTANT["C"] and CONSTANT["QC;"], but would be respected > only by withvalue (for the former) and anewarray and the other > cases noted above (for the latter but not the former). On the other > hand, the particular issue would become moot if aconst_init, like > withfield, were restricted to the nest of its class, because then > privatization would not matter. > > The net effect of these rules, so far, is that neither source code nor > class files can directly make uninitialized variables of type C.val, > if the code or class file was not granted access to C.val via C. > Specifically, fields of type C.val cannot be declared nor can arrays > of type C.val[] be constructed. > > This includes class files as correctly derived from valid source code > or as "spun" by dodgy compilers or even as derived validly from old > source code that has changed (and revoked some access). > > Remember that new nestmates can be injected at runtime via the > > Lookup API, which checks access and then loads new code that enjoys > the same access. The level of access depends in detail on the > selection of ClassOption.NESTMATE (for nestmate injection) or not > (for package-mate injection). The JVM uses common rules for these > injected nestmates or package-mates and for normally compiled ones. > > There are no restrictions on the use of C.ref, beyond the basic > access restrictions imposed by the language and JVM on the name C. > Access checks for regular references to classes and interfaces are > unchanged throughout all of the above. > > There are more holes to be plugged, however. It will turn out that > arrays are once again a problem. But first let's examine how > reflection interacts with companion types and access control. > > Privatization and APIs > > Beyond the language there are libraries that must take account of the > privatization of value companions. We start on the shared boundary > between language and libraries, with reflection. > > Reflecting privatization > > Every companion type is reflected by a Java class mirror of type > java.lang.Class. A Java class mirror also represents the class > underlying the type. The distinction between the concept of class and > companion type is relatively uninteresting, except for a value class > C, which has two companion types and thus two mirrors. > > In Java source code the expression C.class obtains the mirror for > both C and its companion C.ref. The expression C.val.class > obtains the mirror for the value companion, if C is a value class. > Both expressions check access to C as a whole, and C.val.class > also checks access to the value companion (if it was privatized). > > But it is a generally recognized fact that Java class mirrors are less > secure than the Java class types that the mirrors represent. It is > easy to write code that obtains a mirror on a class C without > directly mentioning the name C in source code. One can use > reflective lookup to get such mirrors, and without even trying one may > also "stumble upon" mirrors to inaccessible classes and companion > types. Here are some simple examples: > > Class lookup() { > var name = "java.util.Arrays$ArrayList"; > //or name = "java.lang.AbstractStringBuilder"; > //> java.lang.invoke.MethodHandles.lookup().findClass(name); //ERROR > return Class.forName(name); //OK! > } > Class stumble1() { > //> return java.util.Arrays.ArrayList.class; //ERROR > return java.util.Arrays.asList().getClass(); //OK! > } > Class stumble2() { > //> return java.lang.AbstractStringBuilder.class; //ERROR > return StringBuilder.class.getSuperclass(); //OK! > } > Class stumble3() { > //> return C.val.class; //ERROR if C.val is privatized > return C.ref.class.asValueType(); //OK! > } > > Therefore, access checking class names is not and cannot be the whole > story for protecting classes and their companion types from reflective > misuse. If a mirror is obtained that refers to an inaccessible > non-public class or privatized companion, the mirror will "defend > itself" against illegal access by checking whether the caller has > appropriate permissions. The same goes for method, constructor, and > field mirrors derived from the class mirror: You can reflect a method > but when you try to call it all of the access checks (including the > check against the class) are enforced against you, the caller of the > reflective API. > > The checking of the caller has two possible shapes. Either a caller > > sensitive method looks directly at its caller, or the call is > delegated through an API that requires negotiation with a > MethodHandles.Lookup object that was previously checked against a > caller. > > Now, if a class C is accessible but its value companion C.val is > privatized, all of C's public methods and other API points are > accessible (via both companion types), but access is limited to those > very specific operations that could create non-constructed instances > (via a variable of companion type C.val). And this boils down > to a limitation on array creation. If you cannot use either source > code or reflection to create an array of type C.val[], then you > cannot create the conditions necessary to build non-constructed > instances. > > Reflective APIs should be available to report the declared properties > of reference companions. It is enough to add the following two methods: > > Class::isNonAtomic is true only of mirrors of value companions > > which have been declared non-atomic. On some JVM implementations it > may additionally be true of long.class and/or double.class. > > Class::getModifiers, when applied to a mirror of a value > > companion, will return a modifier bit-mask that reflects the > declared access. (This is compatible with the current behavior of > HotSpot for primitive mirrors, which appear as if they were somehow > declared public, with abstract and final thrown in to boot.) > > (Note that most reflective access checking should take care to work > with the reference mirror, not the value mirror, as the modifier bits > of the two mirrors might differ.) > > Privatization and arrays > > There are a number of standard API points for creating Java array > objects. When they create arrays containing uninitialized elements, > then a non-constructed default value can appear. Even when they > create properly initialized arrays, if the type is declared > non-atomic, then non-constructed values can be created by races. > > java.lang.reflect.Array::newInstance takes an element mirror and length and builds an array. The elements of the returned array are initialized to the default value of the selected element type. > java.util.Arrays::copyOf and copyOfRange can extend the length of an existing array to include new uninitialized elements. > A special overloading of java.util.Arrays::copyOf can request a different type of the new array copy. > java.util.Collection::toArray (an interface method) may extend the length of an existing array, but does not add uninitialized elements. > java.lang.invoke.MethodHandles.arrayConstructor creates a method handle that creates uninitialized arrays of a given type, as if by the anewarray bytecode. > The serialization API contains an operator for materializing arrays of arbitrary type from the wire format. > > The basic policy for all these API points is to conservatively limit > the creation of arrays of type C.val[] if C.val is not public. > > java.lang.reflect.Array::newInstance will throw > IllegalArgumentException if the element type is privatized. > (See below for a possible caller-sensitive enhancement.) > > java.util.Arrays::copyOf and copyOfRange will throw instead of > creating uninitialized elements, if the element type is > privatized. If only previously existing array elements are > copied, there is no check, and this is a use common case (e.g., in > ArrayList::toArray). > > The special overloading of java.util.Arrays::copyOf will refuse > to create an array of any non-atomic privatized type. (This > refusal protects against non-constructed values arising from data > races.) It also incorporates the restrictions of its sibling > methods, against creating uninitialized elements (even of an > atomic type). > > java.lang.invoke.MethodHandles.arrayConstructor will refuse to > create a factory method handle if the element type is privatized. > > java.util.Collection::toArray needs implementation review; as it > is built on top of the previous API points, it may possibly fail > if asked to lengthen an array of privatized type. Note that many > methods of toArray use Arrays.copyOf in a safe manner, which > does not create uninitialized elements. > > java.util.stream.Stream::toArray, the various List::toArray, > and other clients of Arrays::copyOf or Array::newInstance need > implementation review. Where a generic API is involved, the > assumption is often that non-flat reference arrays are being > created, and in that case no outage is possible, since reference > companion arrays can always be freely created. For specialized > generics with flat types, additional implementation work is > required, in general, to ensure that flat arrays can be created by > parties with the right to do so. > > The serialization API should restrict its array creation operator. > Serialization methods should not attempt to serialize flat arrays > either. It is enough to serialize arrays of the reference type. > > API ISSUE #1: Should we relax construction rules for zero-length > arrays? This would add complexity but might be a friendly move for > some use cases. A zero-length array cannot expose non-constructed > values. It may, however, serve as a misleading "witness" that some > code has gained permission to work with flat arrays. It's safer to > disallow even zero-length arrays. > > API ISSUE #2: What about public value companions of non-public > inaccessible classes? In source code, we do not allow arrays of > private classes to be made, or of their their public value companions. > Should we be more permissive in this case? We could specify that > where a value companion has to be checked against a client, its > original class gets checked as well; this would exclude some use cases > allowed by the above language, which only takes effect if the > companion is privatized. An extra check for a public companion seems > like busy-work and a source of unnecessary surprises, though. Let's > not. > > There are probably legitimate use cases for arrays of privatized > types, with which the new restrictions on the above API points would > interfere. So as a backup, we will make API adjustments to work with > privatized array types, with an extra handshake to perform the access > check (via either caller sensitivity or negotiation with an instance > of MethodHandles.Lookup). > > java.lang.reflect.Array::newInstance should probably be made > caller sensitive, so it can refrain from throwing if a privatized > element type is accessible to the caller. (Alternatively, a new > caller-sensitive API point could made, such as > Array::newFlatInstance. But a new API point seems unnecessary > in this case, and caller-sensitivity is common practice in this > method's package.) Note that, as is typical of core reflection > API points, many uses of newInstance will not benefit from > the caller sensitivity. > > java.util.Arrays::copyOf and copyOfRange may be joined by > additional "companion friendly" methods of a similar character > which fill new array elements with some other specified fill > value, and/or which cyclically replicate the contents of the > original array, and/or which call a functional interface to > provide missing elements. The details of this are a matter for > library designers to decide. Adding caller sensitivity to > these API points is probably the wrong move. > > java.lang.invoke.MethodHandles::arrayConstructor will be joined > by a method of the same name on MethodHandles.Lookup which > performs a companion check before allowing the array constructor > method handle to be returned. It will not check the class, just > the companion. Note that the use of caller sensitivity in the > Lookup API is concentrated on the factory method Lookup::lookup, > which is the starting point for Lookup-based negotiation. > > Miscellaneous privatization checks > > Besides newly-created or extended arrays, there are a few API points > in java.lang.invoke which expose default values of reflectively > determined types. Like the array creation methods, they must simply > refuse to expose default values of privatized value companions. > > MethodHandles::zero and MethodHandles::empty will simply > > refuse to produce a result of a privatized C.val type. Clients > with a legitimate need to produce such default values can use > MethodHandles::filterReturnValue and/or MethodHandles::constant > to create equivalent handles, assuming they already possess the > default value. > > MethodHandles::explicitCastArguments will refuse to convert from > > a nullable reference to a privatized C.val type. Clients with a > legitimate need to convert nulls to privatized values can use > conditional combinators to do this "the hard way". > > The method Lookup::accessCompanion will be defined analogously > > to Lookup::accessClass. If Lookup::accessClass is applied to a > companion, it will check both the class and the companion, whereas > Lookup::accessCompanion will look only at the possible > privatization of the companion. (Thus it can simply refer to > Reflection::verifyCompanionType.) > > To support reflective checks against array elements which may be > privatized companion types, an internal method of the form > jdk.internal.reflect.Reflection::verifyCompanionType may be defined. > It will pass any reference type (regardless of class accessibility) > and for a value companion it will check access of the companion (but > not the class itself). > > Building companion-safe APIs > > The method Lookup::arrayConstructor gives enough of a "hook" to > create all kinds of safe but friendly APIs in privileged JDK code. > The methods in java.util could make use of this privileged API to > quickly adapt their internal code to create arrays in cases they are > refused by the existing methods Array.newInstance and > Arrays.copyOf. > > For example, a checked method MethodHandles.Lookup::defaultValue(C) > may be added to provide the default value C.default if its companion > C.val is accessible. It will operate as if it first creates a > one-element array of the desired type, and then loads the element. > > Or, a caller-sensitive method Class::defaultValue or Class::newArray > could be added which check the caller and return the requested result. > All such methods can be built on top of MethodHandles.Lookup. > > In general, a library API may be designed to preserve some aspect of > companion safety, as it allows untrusted code to work with arrays of > privatized value type, while preventing non-constructed values of that > type from being materialized. Each such safe and friendly API has to > make a choice about how to prevent clients from creating > non-constructed states, or perhaps how to allow clients to gain > privilege to do so. Some points are worth remembering: > > An unprivileged client must not obtain C.default if C.val is privatized. > An unprivileged client must not obtain a non-empty C.val[] array if C.val is privatized and non-atomic. > It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) old arrays, if the default is not injected. > If a new array is somehow frozen or wrapped so as be effectively immutable, it is safe as long as it does not expose C.default values. > If a value companion is public, there is no need for any restriction. > Also, unrestricted use can be gated by a Lookup object or caller sensitivity. > > In the presence of a reconstruction capability, either in the > > language or in a library API or as provided by a single class, > avoiding non-constructable objects includes allowing legitimate > reconstruction requests; each legitimate reconstruction request must > somehow preserve the intentions of the class's designer. > Reconstruction should act as if field values had been legitimately > (from C's API) extracted, transformed, and then again legitimately > (to C's API) rebuilt into an instance of C. Serialization is an > example of reconstruction, since field values can be edited in the > wire format. Proposed with expressions for records are another > example of reconstruction. The withfield bytecode is the primitive > reconstruction operator, and must be restricted to nestmates of C > since it can perform all physically possible field updates. > Reconstruction operations defined outside of C must be designed with > great care if they use elevated privileges beyond what C provides > directly. > > Summary of user model > > A value class C has a value companion C.val which denotes the > null-hostile (zero-initialized) fully flattenable value type for C. > > Like other type members of C, C.val can be declared with an access > modifier (public or private or neither). It is therefore quite > possible that clients of C might be prevented from using the > companion type. > > The operations on C.val are almost the same as the operations on > plain C (C.ref), so a private C.val is usually not a burden. > > Operations which are unique to C.val, and which therefore may > be restricted to you, are: > > declaring a field of type C.val > making an array with element type C.val > getting the default flat value C.default > asking for the mirror C.val.class > > Library routines which create empty flattenable arrays of C.val > might not work as expected, when C.val is not public. You'll have > to find a workaround, such as: > > use a plain C reference array to hold your data > use a different API point which is friendly to privatie C.val types > ask C politely to build such an array for you > crack into C with a reflective API and build your own > > If you look closely at the code for C, you might noticed that it > uses its private type C.val in its public API. This is allowed. > Just be aware that null values will not flow through such API points. > When you get a C.val value into your own code, you can work on it > perfectly freely with the type C (which is C.ref). > > If a value companion C.val is declared public, the class has > declared that it is willing to encounter its own default value > C.default coming from untrusted code. If it is declared private, > only the class's own nest can work with C.default. If the value > companion is neither public nor private, the class has declared that > it is willing to encounter its own default within its own package. > > If a class has declared its companion non-atomic, it is willing to > encounter states arising from data races (across multiple fields) in > the same places it is willing to encounter its default value. > > Summary of restrictions > > From the implementation point of view, the salient task is restricting > clients from illegitimately obtaining non-constructed values of C, > if the author of C has asked for such restrictions. (Recall that a > non-constructed value of C is one obtained without using C's > constructor or other public API.) Here are the generally enforced > restrictions regarding a privatized type C.val: > > You cannot mention the name C.val or C.default in code. > You cannot create and load bytecodes which would implement such a mention. > You cannot obtain C.default from a mirror of C or C.val. > You cannot create a new C.val[] array from a mirror of C or C.val. > You cannot lengthen an existing C.val[] array to contain uninitialized elements. > You cannot copy an existing array as a new C.val[] array, if C.val is declared non-atomic. > > Even so, let us suppose you are an accident-prone client of C. > Ignoring the above restrictions, you might go about obtaining a > non-constructed value of C in several ways, and there is an > answer from the system in each case that stops you: > > You can mention the C.val or C.default directly in code, in various ways. > After obtaining the mirror C.val.class (by one of several means), you can call Class::defaultValue, MethodHandles::zero, or a similar API point. > If you can declare a field of type C.val directly you can extract an initial value (or a data-race result, if C.val is non-atomic). > If you can indirectly create an array of type C.val, you can extract an initial value (or a data-race result, if C.val is non-atomic). > > And there are a number of ways you might attempt to indirectly create > an array of type C.val[]: > > Indirectly create it from a mirror using Array::newInstance or Arrays::copyOf or MethodHandles::arrayConstructor or another similar API point. > Create it from a pre-existing array of the same type using Object::clone or Arrays::copyOf or another similar API point. > Specify such an array on a serialization wire format and deserialize it. > > Using C.val or C.default directly is blocked if C privatizes its > value companion, unless you are coding a nestmate or package-mate of > C. These checks are applied both at compile time and when the JVM > resolves names, so they apply equally to source code and bytecodes > created by any means whatsoever. > > There are no realistic restrictions on obtaining a mirror to a > companion type C.val. (Accidental and casual direct use of > C.val.class is prevented by access restrictions on the type name > C.val. But there are many ways to get around this limitation.) > Therefore any method or API which could violate the above generally > enforced restrictions must perform an appropriate dynamic access check > on behalf of its mirror argument. > > Such a dynamic access check can be made negotiable by an appeal to > caller sensitivity or a Lookup check, so a correctly configured call > can avoid the restriction. For some simple methods (perhaps > Arrays::copyOf or MethodHandles::zero) there is no negotiation. > Depending on the use case, access failure can be worked around via a > "negotiable" API point like Lookup::arrayConstructor. From kevinb at google.com Fri Jul 8 20:59:37 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Jul 2022 13:59:37 -0700 Subject: Please don't restrict access to the companion-type! In-Reply-To: <88162d4d-2584-c4ee-af0a-2b89fce66ba5@oracle.com> References: <88162d4d-2584-c4ee-af0a-2b89fce66ba5@oracle.com> Message-ID: I'm sympathetic to some of this. But I think I can accept the stake Brian is putting in the ground. He says: a concrete class should be empowered to provide for its own integrity, not merely count on its users to not hold it wrong. If we must preserve that, then okay, we must. I worry about decision fatigue for common cases, but I'm optimistic we have a really good story here now: 1. First you will decide whether your class is a value class. I think this should be as simple as "do you need any of the features identity would provide?" 2. Then you will decide how much you want to expose its value companion type. Is this as simple as "does it feel worth the (nonzero) risk of bogus uninitialized instances floating around?" 3. And, if you know what you're doing, and you really need to optimize heavily, you can make it non-atomic. So I disagree with the main request (always-public companion types), but I have a few comments/questions on the details. On Thu, Jun 30, 2022 at 7:15 AM Brian Goetz wrote: Your Accumulator example is correct, but I think you are overestimating the > novelty of the problem. Arrays have always had a dynamic check; I can cast > String[] to Object[] and hand that to you, if you try to put an Integer in > it, you'll get an ASE. Handing out arrays for someone else to write to > should always specify what the bounds on those writes are; "don't write > nulls" is novel in degree but not in concept. > It is quite rare for public APIs to exchange arrays for purposes of writing to them. It's a code smell. The safe and conscientious practice since the beginning has been to only write to arrays you created yourself. I agree with Brian that there's not much new here. 3. Let the compiler treat fields of companion-types like final fields > today, i.e. enforce initialization. > > If this were possible to do reliably, we would have gone this way. But > initializing final fields today has holes where the null is visible to the > user, such as class initialization circularity and receiver escape. (And a > reliable protocol is even harder for arrays.) Exposing a null in this way > is bad, but exposing the zero in this way would be worse, because now every > user has a way to get the zero and inject it into unsuspecting (and > unguarded) implementation code. > I'd still like to understand what steps we can take to reduce the damage here, even if they're not 100% solutions. > There is simply no way we can reasonably expect everyone to write > perfectly defensive code against a threat they don't fully understand and > believe to be vanishingly rare -- and this is a perfect recipe for > tomorrow's security exploits. > > 4. Provide the users with a convenient API for creating arrays with all > elements initialized to a specific value. > > > We explored this as well; it is a good start but ultimately not flexible > enough to be "the solution". If a class has no good default, what value > should it initialize array elements to? There's still no good default. > And the model of "here's a lambda to initialize each element" is similarly > an 80% solution. > fwiw, I'll still keep pushing on offering these, if I can somehow. I think it's strange that they (create-and-fill, create-and-setAll) never got added to Arrays by now. It is very nice when the user can isolate themselves from the initialization gap. The goal here is to let people write classes that can be used safely. If > non-initialization is an mistake then we can make that mistake impossible. > That's much better than trying to detect and recover from that mistake. > > > > > -------- Forwarded Message -------- > Subject: Please don't restrict access to the companion-type! > Date: Thu, 30 Jun 2022 09:33:42 +0200 > From: Gernot Neppert > To: valhalla-spec-comments at openjdk.java.net > > I've been following the valhalla-development for a very long time, and > have also posted quite a few comments, some of them raising valid concerns, > some of them proving my ignorance. > > This comment hopefully falls into the first category: > > My concern is that allowing access-restriction for a value-type's > "companion-type" is a severe case of "throwing the baby out with the > bathwater". > > Yes, I know what it is supposed to achieve: prevent users from > accidentally creating zero-initialized values for value-types "with no > resonable default". > > However, the proposed solution of hiding the companion-type will force > programmers to use the reference-type even if they do not want to. > Please have a look at the following class "Accumulator". It assumes that > "Sample" is a value-class in the same package with a non-public > companion-type. > The Javadoc must now explicitly mention some pitfalls that would not be > there if "Sample.val" were accessible. > Especially the necessary precaution about the returned array-type is > rather ugly, right?! > > public class Accumulator { > private Sample.val samples; > > /** > Yields the samples that were taken. > Note: the returned array is actually a "flat" array! No element can be > null. While processing this array, do not try to set any of its elements to > null, as that may trigger an ArrayStoreException! > */ > public Sample[] samples() { > return samples.clone(); > } > } > > To sum it up, my proposal is: > > 1. Make the companion-type public always. > 2. When introducing value-classes, document the risks of having > "uninitialized" values under very specific circumstances (uninitialized > fields, flat arrays). > 3. Let the compiler treat fields of companion-types like final fields > today, i.e. enforce initialization. > 4. The risk of still encountering uninitialized fields is really really > low, and is, btw, absolutely not new. > 4. Provide the users with a convenient API for creating arrays with all > elements initialized to a specific value. > 5. In Java, one could possibly also use this currently disallowed syntax > for creating initialized arrays: new Sample.val[20] { Sample.of("Hello") }; > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Sun Jul 10 17:23:58 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 10 Jul 2022 17:23:58 +0000 Subject: Please don't restrict access to the companion-type! In-Reply-To: References: <88162d4d-2584-c4ee-af0a-2b89fce66ba5@oracle.com> Message-ID: On Jul 8, 2022, at 4:59 PM, Kevin Bourrillion > wrote: I worry about decision fatigue for common cases, but I'm optimistic we have a really good story here now: 1. First you will decide whether your class is a value class. I think this should be as simple as "do you need any of the features identity would provide?" 2. Then you will decide how much you want to expose its value companion type. Is this as simple as "does it feel worth the (nonzero) risk of bogus uninitialized instances floating around?" 3. And, if you know what you're doing, and you really need to optimize heavily, you can make it non-atomic. So I disagree with the main request (always-public companion types), but I have a few comments/questions on the details. Yes, and I think #2 is even simpler: is the zero value a valid value, and a useful default. If so, make it as accessible as the class. Yes for Complex; no for LocalDate. You could get more picky on things like Rational (its not a valid value, but the DBZE might provide enough negative feedback, so you might be willing to skate on thinner ice), but that?s for more advanced library developers to consider. 3. Let the compiler treat fields of companion-types like final fields today, i.e. enforce initialization. If this were possible to do reliably, we would have gone this way. But initializing final fields today has holes where the null is visible to the user, such as class initialization circularity and receiver escape. (And a reliable protocol is even harder for arrays.) Exposing a null in this way is bad, but exposing the zero in this way would be worse, because now every user has a way to get the zero and inject it into unsuspecting (and unguarded) implementation code. I'd still like to understand what steps we can take to reduce the damage here, even if they're not 100% solutions. Essentially, this seems to me to be asking: what can the language do to further reduce the risk that an uninitialized value will be used as input into a computation. Which is a useful (and orthogonal) question. 4. Provide the users with a convenient API for creating arrays with all elements initialized to a specific value. We explored this as well; it is a good start but ultimately not flexible enough to be "the solution". If a class has no good default, what value should it initialize array elements to? There's still no good default. And the model of "here's a lambda to initialize each element" is similarly an 80% solution. fwiw, I'll still keep pushing on offering these, if I can somehow. I think it's strange that they (create-and-fill, create-and-setAll) never got added to Arrays by now. It is very nice when the user can isolate themselves from the initialization gap. Yes, much like the above, this is ?what more can we do? about initialization. The good news is this is a ?simple matter of library design.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jul 13 15:47:50 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 13 Jul 2022 17:47:50 +0200 (CEST) Subject: Meeting today Message-ID: <1349001234.10135212.1657727270119.JavaMail.zimbra@u-pem.fr> I will not be available for today's meeting :( R?mi From john.r.rose at oracle.com Wed Jul 13 15:57:56 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 08:57:56 -0700 Subject: Value type companions, encapsulated In-Reply-To: References: Message-ID: On 2 Jul 2022, at 20:24, John Rose wrote: > ? > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html > On 3 Jul 2022, at 5:25, Remi Forax wrote: > ? > One kind of sad thing with CONSTANT\_Class QC; is that we need it now > but once we will have the new generics, we will not need it anymore > because it can be express with a CONSTANT\_Specialization\_Linkage + a > constant dynamic. So it's a kind of temporary design. That?s probably true. But (like you?) I think this ?sad thing? is too hard to avoid. One consolation: The existing descriptor syntax for a CONSTANT\_Class of an *array type* is much older sad thing of the same type. The two sad things can keep each other company in a corner of the JVMS. > > I wonder if it's not "better" to separate checkcast from unbox/box > given that mixing them together result in different resolution for all > checkcasts (compare to anewarray). From the language POV, those two > kind of checkcasts are different anyway. But from the JVM POV lumping behaviors into bytecodes is a much better move, than splitting them out into separate new bytecodes, when the lumping makes any sense at all. And in this case it does. On 4 Jul 2022, at 6:45, Dan Heidinga wrote: > Sorry for top-posting but it was easier to track a list of issues as I > read through: I actually prefer the top-posting, and the markdown is kind of a mess, so you don?t need to apologize. :-) > > * Miscellaneous privatization checks > --> MethodHandle.asType(MT) and MethodHandle.invoke() will also need > to protect against the zero being introduced. > ?Here we see invoke() converting a void to a reference (null) and > similarly for a primitive, to zero. Both these apis will need similar > treatment as ::explictCastArguments. Good catch. I see you still remember your JSR 292 details. Fixed. > > Serialization > --> There's a mention of serialization but if Lambda taught us > anything, it's that serialization requires more thought than we > expected, even if we take that into account =) We should spend some > time on what serialization of a C.val actually means, any format > concerns, and how it interacts with default reconstitution behaviours. > Otherwise, we'll leave a hole here where unconstructed values can be > deserialized. Added this: >> Reconstruction operations defined outside of `C` must be designed >> with >> great care if they use elevated privileges beyond what `C` provides >> directly. Given the historically tricky nature of deserialization, >> more work is needed to consider what serialization of a C.val >> actually >> means and how it interacts with default reconstitution behaviours. >> One likely possibility is that wire formats should only work with >> `C.ref` types with proper construction paths (enforced by >> serialization), >> and leave conversion to `C.val` types to deserialization code inside >> the encapsulation of `C`. > > C.default & Reflection > --> Is "default" a reflectively accessible field or compiler sugar? > If a user does C.val.class.getDeclaredFields will it find "default"? > Or maybe C.class.getDeclaredFields? I'm fine with it being a fiction > but I wasn't clear how far we were pushing that into the reflective > model as well. I think the intent is to expose this with > Class::defaultValue / Lookup::defaultValue APIs but clarification > would be good. > > Accessing C.val.class > --> Do we need restrictions here beyond those of accessing C.class? Those restrictions are a paper tiger, as I think I?ve proven. My recommendation is to have a second paper tiger on `C.val.class` as well. This consistency has a specific goal, to help users learn more quickly how access control of `C.val` works, giving them direct experience with it via the `C.val.class` syntax. > The mirror may be required to create MethodTypes for use in > MethodHandle lookup().find* apis even by code that can't create a > C.val. Given that it will leak already as shown in the doc, do we > need the extra restrictions? It?s a paper tiger, but an educational one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jul 13 19:13:33 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 12:13:33 -0700 Subject: Value type companions, encapsulated In-Reply-To: References: Message-ID: <3030E223-F7F8-4268-843D-9A2DB7D32773@oracle.com> I have updated the document online in response to various comments. http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html The Valhalla JVM team is starting to look at these also. I expect they will want to weigh in on the various JVMS details and issues. So, thanks! We should start some separate threads on some of the issues. ? John P.S. For the record here are the diffs to the md file: ``` --- a/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md.~6~ +++ b/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md @@ -1,5 +1,5 @@ % Value type companions, encapsulated -% John Rose for Valhalla EG, June 2022 (ver 0.1) +% John Rose for Valhalla EG, July 2022 (ver 0.2) ## Background @@ -34,13 +34,23 @@ restrictions]** at the end.)_ ### Affordances of `C.ref` Every class or interface `C` comes with a companion type, the -reference type `C.ref` derived from `C` which describes any variable -(argument, return value, array element, etc.) whose values are either -null or of a concrete class derived from `C`. We are not in the habit +reference type `C.ref` derived from `C` which describes any expression +(variable, return value, array element, etc.) whose values are either +null or are instances of a concrete class derived from `C`. + +> We are not in the habit of distinguishing `C.ref` from `C`, but the distinction is there. For example, if we call `Object::getClass` on a variable of type `C.ref` we might not get `C.class`; we might even get a null pointer -exception! +exception! Put another way, `C` as a class means a particular class +declaration, while `C.ref` as a type means a variable which can refer +to instances of class `C` or any subclass. Also `C.ref` can be +`null`, which is of no class at all. One can view the result of +`Object::getClass` as a *type* rather than a mere *class*, since the +API of `Class` includes representation of types like `int` and `C.val` +as well as classes. In any case, the fact that a class can now have +two associated types requires a clearer distinction between classes +and types. We are so very used to working with reference types (for short, _ref-types_) that we sometimes forget all that they do for us @@ -54,7 +64,7 @@ in addition to their linkage to specific classes: - `C.ref` allows a single large object to be shared from many locations. - `C.ref` with an identity class can centralize access to mutable state. - `C.ref` values uniformly convert to and from general types like `Object`. - - `C.ref` variable types can be reflected using `Class` mirror objects. + - `C.ref` values are polymorphic (for non-final `C`), with varying `Object::getClass` values. - `C.ref` is safe for publication if the fields of `C` are `final`. When I store a bunch of `C` objects into an object array or list, sort @@ -100,10 +110,22 @@ But the author of the class gets to decide which states are legitimate, and the decisions are enforced by access control at the boundaries of the encapsulation. +> The author of an encapsulation determines whether the constant +`C.default` is part of the public API or not. Therefore, the value of +`C.default` is non-constructed only if `C.val` is privatized. + So if I code my class right, using access control to keep bad states away from my clients, my class's external API will have no non-constructed states. +> Reflection and serialization provide additional modes of access to a +class's API. The author of an encapsulation must be given control +over these modes of access as well. (This is discussed further +below.) If the author of `C` allows deserialization of `C` values not +otherwise constructible via the public API, those values must be +regarded as constructed, not non-constructed, but the API may also +be regarded as poorly designed. + ### Costs of `C.ref` In that case why have value types at all, if references are so @@ -119,7 +141,7 @@ always wish to pay: - A reference must be able to represent `null`; tightly-packed types like `int` and `long` would need to add an extra bit somewhere to cover this. The major alternative to references, as provided by Valhalla, is flat -objects, where object fields are laid out immediately in their +class instances, where instance fields are laid out immediately in their containers, in place of a pointer which points to them stored elsewhere. Neither alternative is always better than the other, which is why Java has both `int` and `Integer` types and their arrays, and @@ -140,13 +162,30 @@ The two companion types are closely related and perform some of the same jobs: - `C.ref` and `C.val` both give a starting point for accessing `C`'s members. - - `C.ref` and `C.val` can link `C` objects into acyclic graphs. + - `C.ref` and `C.val` can link `C` instances into acyclic graphs. - `C.ref` and `C.val` values uniformly convert to and from general types like `Object`. - - `C.ref` and `C.val` variable types can be reflected using `Class` mirror objects. For these jobs, it usually doesn't matter which type companion does the work. +> Specifically, + +> - An expression of the form `myc.method()` cares about the class of + `myc` but not which companion type it is. The same point is true + (probably) of methods like `Class::getMethods` which ignore the + distinction between the mirrors `C.ref.class` and `C.val.class`. + +> - I can build a tree of `C` nodes using children lists of either + companion type. (If however my `C` node contains direct child + fields they cannot be of the `C.val` type.) + +> - Converting a variable `myc` to `Object` (or, respectively, casting + an `Object` to store in `myc`), does the same kind of thing + regardless of which companion type `myc` has. The only difference + that `null` cannot be a result if `myc` is `C.val` (or, + respectively, that `null` is rejected as a `C.val` value). + + Despite the similarities, many properties of a value companion type are subtly different from any reference type: @@ -158,6 +197,10 @@ are subtly different from any reference type: - `C.val` heap variables (fields, array elements) are initialized to all-zeroes. - `C.val` might not be safe for publication (even though its fields are `final`). +The overall effect is that a `C.val` variable has a very specific +concrete format, a flattened set of application-defined fields, often +without added overhead from object headers and pointer chasing. + The JVM distinguishes `C.val` by giving it a different descriptor, a so-called _Q-descriptor_ of the form `QC;`, and it also provides a so-called _secondary mirror_ `C.val.class` which is similar to the @@ -166,7 +209,7 @@ built-in primitive mirrors like `int.class`. As the Valhalla performance model notes, flattening may be expected but is not fully guaranteed. A `C.val` stored in an `Object` container is likely to be boxed on the heap, for example. But `C.val` -objects created as bytecode temporaries, arguments, and return values +instances created as bytecode temporaries, arguments, and return values are likely to be flattened into machine registers, and `C.val` fields and array elements (at least below certain size thresholds) are also likely to be flattened into heap words. @@ -177,12 +220,35 @@ value class. There are additional terms and conditions for flattening Remember that reference types have full abstraction as one of their powers, and this means building data structures that can refer to them even before they are loaded. But a class file can request that the JVM -"peek" at a class to see if it is a value class, and if this request -is acted on early enough (at the JVM's discretion), then the JVM can +"peek" at a class to see if it is a value class. + +> This request is conveyed via the [`Preload` attribute] defined in +recent drafts of [JEP 8277163 (Value Objects)]. If this request is +acted on early enough (at the JVM's discretion), then the JVM can choose to lay out some or all `C.ref` values as flattened `C.val` values _plus_ a boolean or other sentinel value which indicates the `null` state. +> If the JVM succeeds in flattening a `C.ref` variable, the JMM still +requires that racing reads to such a variable will always return a +consistent, safely published state. The atomicity or non-atomicity of +the `C.val` companion type has no effect on the races possible to a +`C.ref` variable. Thus, flattening a `C.ref` variable with a +non-atomic value type is not simply a matter of adding a `null` +channel field to a struct, if races are possible on that variable. +Most machines today provide hardware atomicity only to 128 bits, so +racing updates must probably be accomplished within the limits of 64- +or 128-bit reads and writes, for a flattened `C.ref`. It seems likely +that the heap buffering enjoyed by today's value-based classes will +also be the technique of choice in the future, at least for larger +value classes, when their containers are in the heap. Since JVM stack +and locals can never race, adjoining a null state for a `C.ref` value +can be a simple matter of allocating another calling sequence register +or stack slot, for an argument or return value. + +[`Preload` attribute]: +[JEP 8277163 (Value Objects)]: + ### Pitfalls of `C.val` The advantages of value companion types imply some complementary @@ -221,13 +287,16 @@ racing writes to the same mutable `C.val` variable in the heap. Unlike reference types, value types can be manipulated to create these non-constructed states even in well-designed classes. -Now, it may be that a constructor (or factory) might be perfectly able -to create one of the above non-constructed states as well, no strings +Now, it may be that a public constructor (or factory) might be perfectly able +to create a zero state or an arbitrary field combination, no strings attached. In that case, the class author is enforcing few or no invariants on the states of the value class. Many numeric classes, like complex numbers, are like this: Initialization to all-zeroes is no problem, and races between components are acceptable, compared to -the costs of excluding races. +the costs of excluding races. The worst a race condition can ever do +is create a state that is legitimately constructed via the class API. +We can say that a class which is this permissive has no +non-constructed states at all. > (The reader may recall that early JVMs accepted races on the high and low halves of 64-bit integers as well; this is no longer a @@ -259,6 +328,10 @@ Still, it turns out to be useful to give a common single point of declarative control to handle _all_ non-constructed states, both the default value of `C.val` and its mysterious data races. +So different encapsulation authors will want to make different +choices. We will give them the means to make these choices. And +(spoiler alert) we will make the safest choice be the default choice. + ## Privatization to the rescue _(Here are the important details about the encapsulation of value @@ -273,13 +346,13 @@ companion can be used freely, fully under control of the class author. But untrusted clients are prevented from building uninitialized fields or arrays of type `C.val`. This prevents such clients from creating -(either accidentally or purposefully) non-constructed values of type +(either accidentally or purposefully) non-constructed states of type `C.val`. How privatization is declared and enforced is discussed in the rest of this document. -> (To review, for those who skipped ahead, non-constructed values are +> (To review, for those who skipped ahead, non-constructed states are those not created under control of the class `C` by constructors or -other accessible API points. A non-constructed value may be either an +other accessible API points. A non-constructed state may be either an uninitialized variable of `C.val`, or the result of a data race on a shared mutable variable of type `C.val`. The class itself can work internally with such values all day long, but we exclude external @@ -291,7 +364,7 @@ As a second tactic, a value class `C` may select whether or not the JVM enforces atomicity of all occurrences of its value companion `C.val`. A non-atomic value companion is subject to data races, and if it is not privatized, external code may misuse `C.val` variables -(in arrays or mutable fields) to create non-constructed values via +(in arrays or mutable fields) to create non-constructed states via data races. A value companion which is atomic is not subject to data races. This @@ -328,7 +401,7 @@ of both choices (privatization and declared non-atomicity), although it is natural to try to boil down the size of the matrix. - `C.val` private & atomic is the default, and safest configuration - hiding all non-constructed values outside of `C` and all data races + hiding the most non-constructed states outside of `C` and all data races even inside of `C`. There are some runtime costs. - `C.val` public & non-atomic is the opposite, with fewer runtime @@ -338,12 +411,12 @@ it is natural to try to boil down the size of the matrix. non-atomic primitive like `long`. - `C.val` public & atomic allows everybody to see the all-zero - initial value but no other non-constructed states. This is + initial value but no racing non-constructed states. This is analogous to the situation of a naturally atomic primitive like `int`. - - `C.val` private & non-atomic allows `C` complete control over the - visibility of non-constructed states, but `C` also has the ability + - `C.val` private & non-atomic allows `C` complete access to and + control over non-constructed states, but `C` also has the ability to work internally on arrays of non-atomic elements. `C` should take care not to leak internally-created flat arrays to untrusted clients, lest they use data races to hammer non-constructed values @@ -428,7 +501,7 @@ will fail. If the companion is neither `public` nor `private`, then Here is an example of a class which refuses to construct its default value, and which prevents clients from seeing that state: -``` +```{#class-C} class C { int neverzero; public C(int x) { @@ -500,7 +573,7 @@ have a right to expect that encapsulation of companion types will hope to re-use their knowledge about how type name access works when reasoning about companion types. We aim to accommodate that hope. If it works, users won't have to think very often about the class-vs-type -distinction. That is why the above design emulates pre-existing +distinction. That is also why the above design emulates pre-existing usage patterns for non-denotable types. ### Privatization in translation @@ -518,13 +591,36 @@ The `value_flags` field (16 bits) has the following legitimate values: - zero: `C.val` default access, non-atomic - `ACC_PUBLIC`: `C.val` public access, non-atomic - `ACC_PRIVATE`: `C.val` private access, non-atomic - - `ACC_VOLATILE`: `C.val` default access, atomic - - `ACC_VOLATILE|ACC_PUBLIC`: `C.val` public access, atomic - - `ACC_VOLATILE|ACC_PRIVATE`: `C.val` private access, atomic + - `ACC_FINAL`: `C.val` default access, atomic + - `ACC_FINAL|ACC_PUBLIC`: `C.val` public access, atomic + - `ACC_FINAL|ACC_PRIVATE`: `C.val` private access, atomic Other values are rejected when the class file is loaded. -(**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we +The choice of `ACC_FINAL` for this job is arbitrary. It basically +means "please ensure safe publication of `final` fields of this class, +even for fields inside flattened instances." The race conditions of a +non-atomic variable of type `C.val` are about the same as (are +isomorphic to) the race conditions for the states reachable from a +non-varying non-null variable of type `MC.ref`, where `MC` is a +hypothetical identity class containing the same instance fields as +`C`, but whose fields are not declared `final`. (Remember that `C`, +being a value class, must have declared its fields `final`.) Omitting +`ACC_FINAL` above means about the same as using the non-final fields +of `MC` to store `C.val` states. Omitting `ACC_FINAL` is less safe +for programmers, but much easier to implement in the JVM, since it can +just peek and poke the fields retail, instead of updating the whole +instance value in a wholesale transaction. + +> That is, if you see what I mean? `ACC_VOLATILE` would be another +clever pun along the same lines, since a `volatile` variable of type +`long` is one which suppresses tearing race conditions. But +`volatile` means additional things as well. Other puns could be +attempted with `ACC_STATIC`, `ACC_STRICT`, `ACC_NATIVE`, and more. +John likes `ACC_FINAL` because of the JMM connection to `final` +fields. + +> (**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we really care that `jlr.Modifiers` kind-of wants to own the reflection of the contextual modifier `value`? Who are the customers of this modifier bit, as a bit? John doesn't care about it personally, and @@ -536,11 +632,11 @@ kinds of structural checks on the fly during class loading even before class attributes are processed. Yet this also seems like a poor reason to use a modifier bit.) -(**JVM ISSUE #1:** What if the attribute is missing; do we reject the -class file or do we infer `value_flags=ACC_PRIVATE|ACC_VOLATILE`? +> (**JVM ISSUE #1:** What if the attribute is missing; do we reject the +class file or do we infer `value_flags=ACC_PRIVATE|ACC_FINAL`? Let's just reject the file.) -(**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place +> (**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place to store the "atomic" bit as well? This attribute is a green-field for VM design, as opposed to the brown-field of modifier bits. The above language assumes the atomic bit belongs in there as well.) @@ -673,7 +769,7 @@ As required above, the `checkcast` bytecode treats full resolution and restricted resolution states the same. But when the `anewarray` or `multianewarray` instruction is executed, -it consults throws an access error if its `CONSTANT_Class` is not +it must throw an access error if its `CONSTANT_Class` is not fully resolved (either it is an error or is restricted). This is how the JVM prevents creation of arrays whose component type is an inaccessible value companion type, even if the class file does @@ -846,7 +942,7 @@ There are a number of standard API points for creating Java array objects. When they create arrays containing uninitialized elements, then a non-constructed default value can appear. Even when they create properly initialized arrays, if the type is declared -non-atomic, then non-constructed values can be created by races. +non-atomic, then non-constructed states can be created by races. - `java.lang.reflect.Array::newInstance` takes an element mirror and length and builds an array. The elements of the returned array are initialized to the default value of the selected element type. - `java.util.Arrays::copyOf` and `copyOfRange` can extend the length of an existing array to include new uninitialized elements. @@ -870,7 +966,7 @@ the creation of arrays of type `C.val[]` if `C.val` is not public. - The special overloading of `java.util.Arrays::copyOf` will refuse to create an array of any non-atomic privatized type. (This - refusal protects against non-constructed values arising from data + refusal protects against non-constructed states arising from data races.) It also incorporates the restrictions of its sibling methods, against creating uninitialized elements (even of an atomic type). @@ -900,8 +996,8 @@ the creation of arrays of type `C.val[]` if `C.val` is not public. **API ISSUE #1:** Should we relax construction rules for zero-length arrays? This would add complexity but might be a friendly move for -some use cases. A zero-length array cannot expose non-constructed -values. It may, however, serve as a misleading "witness" that some +some use cases. A zero-length array can never expose non-constructed +states. It may, however, serve as a misleading "witness" that some code has gained permission to work with flat arrays. It's safer to disallow even zero-length arrays. @@ -969,6 +1065,9 @@ refuse to expose default values of privatized value companions. legitimate need to convert nulls to privatized values can use conditional combinators to do this "the hard way". + - `MethodHandle::asType` will refuse to convert from a `void` return + to a privatized `C.val` type, similarly to `explicitCastArguments`. + - The method `Lookup::accessCompanion` will be defined analogously to `Lookup::accessClass`. If `Lookup::accessClass` is applied to a companion, it will check both the class and the companion, whereas @@ -1003,7 +1102,7 @@ All such methods can be built on top of `MethodHandles.Lookup`. In general, a library API may be designed to preserve some aspect of companion safety, as it allows untrusted code to work with arrays of -privatized value type, while preventing non-constructed values of that +privatized value type, while preventing non-constructed states of that type from being materialized. Each such safe and friendly API has to make a choice about how to prevent clients from creating non-constructed states, or perhaps how to allow clients to gain @@ -1011,19 +1110,21 @@ privilege to do so. Some points are worth remembering: - An unprivileged client must not obtain `C.default` if `C.val` is privatized. - An unprivileged client must not obtain a non-empty `C.val[]` array if `C.val` is privatized and non-atomic. - - It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) old arrays, if the default is not injected. + - It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) old arrays, as long as new elements containing the `C.default` do not appear. - If a new array is somehow frozen or wrapped so as be effectively immutable, it is safe as long as it does not expose `C.default` values. - If a value companion is `public`, there is no need for any restriction. - Also, unrestricted use can be gated by a `Lookup` object or caller sensitivity. > In the presence of a reconstruction capability, either in the language or in a library API or as provided by a single class, -avoiding non-constructable objects includes allowing legitimate +avoiding non-constructed instances includes allowing legitimate reconstruction requests; each legitimate reconstruction request must somehow preserve the intentions of the class's designer. Reconstruction should act as if field values had been legitimately (from `C`'s API) extracted, transformed, and then again legitimately -(to `C`'s API) rebuilt into an instance of `C`. Serialization is an +(to `C`'s API) rebuilt into an instance of `C`. + +> Serialization is an example of reconstruction, since field values can be edited in the wire format. Proposed `with` expressions for records are another example of reconstruction. The `withfield` bytecode is the primitive @@ -1031,7 +1132,25 @@ reconstruction operator, and must be restricted to nestmates of `C` since it can perform all physically possible field updates. Reconstruction operations defined outside of `C` must be designed with great care if they use elevated privileges beyond what `C` provides -directly. +directly. Given the historically tricky nature of deserialization, +more work is needed to consider what serialization of a C.val actually +means and how it interacts with default reconstitution behaviours. +One likely possibility is that wire formats should only work with +`C.ref` types with proper construction paths (enforced by serialization), +and leave conversion to `C.val` types to deserialization code inside +the encapsulation of `C`. + +> JNI, like serialization, allows creation of arrays which is hard to +constrain with access checks. We have a choice of at least two +positions on this. We could allow JNI full permission to create any +kind of arrays, thus effectively allowing it "inside the nest" of any +value class, as far as array construction goes. Or, we could say that +JNI (like `Arrays::copyOf`) is absolutely forbidden to create +uninitialized arrays of privatized value type. The latter is probably +acceptable. As with other API points, programmers with a legitimate +need to create flat privatized arrays can work around the limitations +of the "nice" API points by using more complex ones that incorporate +the necessary access checks. ## Summary of user model @@ -1063,12 +1182,15 @@ to find a workaround, such as: - ask `C` politely to build such an array for you - crack into `C` with a reflective API and build your own -If you look closely at the code for `C`, you might noticed that it +If you looked closely at [the code for `C` above], +you might have noticed that it uses its private type `C.val` in its public API. This is allowed. Just be aware that null values will not flow through such API points. When you get a `C.val` value into your own code, you can work on it perfectly freely with the type `C` (which is `C.ref`). +[the code for `C` above]: <#class-C> + If a value companion `C.val` is declared `public`, the class has declared that it is willing to encounter its own default value `C.default` coming from untrusted code. If it is declared `private`, ``` -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jul 13 19:43:18 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 12:43:18 -0700 Subject: one class, two types, many bikesheds Message-ID: <1ACF189B-D346-4996-AF5B-3977FF02C1F3@oracle.com> The latest iteration of the user model for value classes makes it crystal clear that one value class `C` defines two types. The second type is named `C.val` (at present, until that bikeshed is repainted). This is the ?value companion? to `C`, or maybe its ?companion value type?. The term ?companion? tries to capture the idea that the class doesn?t come alone but travels with some friends, its types. But this pulls in a long and difficult discussion about the exact relation between a class and a type. And then (inevitably) ?what?s a class *really*?? and ?what?s a type *really*?? I think we want to make a distinction between a class and a type. A class is primarily a bunch of source code, later compiled into a classfile. A type is the primary static attribute of a variable in source code, determining its range of values and set of valid operations. As later compiled, the type determines a JVM-level type (usually what we call a field descriptor). The type probably also determines something of the eventual format of the variable in a real machine, although that?s a secret the JVM keeps. In some of our discussions we have called the other companion type of `C`, which is the (nullable) reference type, by the name `C.ref`, as if it were something you could write in source code. Perhaps I should be saying `C.__REF` to avoid giving that impression. But perhaps not. (A generic class can engender many, many types. Is this a whole mob of companion types? Perhaps not, but it does call for a clear term for this other relation of classes to types.) (Note also that the ?raw type? of a generic class is named by just the class name, sans type arguments: Plain `List` instead of `List`. And not `List.raw`, at least not today.) To me it seems useful to treat the two types with a certain amount of symmetry. There is one class, and two companion types, not a class (which is also a type), plus its companion (value) type. If we do this, it makes some further sense to give them symmetrical names, `C.ref` and `C.val`. We then say that the class name `C`, used in a context that requires a type, is ?just sugar? for the more exact `C.ref` (and certainly not `C.val`, or you would have used that name). Are there other uses for `C.ref`? I can think of just two: - For type variables (*which are not classes*) `T.ref` (or some other bikeshed color) means ?recover the reference companion, even if the generic argument was a value type?. - For extreme stylized clarity in source code, where someone wants to emphasize that a variable is nullable. (Could this interact with null-inference schemes? Oh, certainly!) The use `T.ref` lends weight to making the companions symmetric. You can go from `C` meaning `C.ref` to `C.val` in `List` and then inside `List` you can go back to `C.ref`. It?s a two-way street. There is a limit to treating the two companions symmetrically. Do we really want to allow inner declarations of the form `public companion type C.ref;`, on the grounds that we do so for the companion `C.val`? No, because reference types are ?hardwired? by present JVM specifications, and presumably future ones. Maybe we will turn, in the end, to a maximally asymmetric design, with no symmetrical treatment anywhere; no `C.ref` in particular. But the cost of that is never being able to refer unambiguously to `C` as class or as ref-type, except using informal notations or narrative prose. At the moment, though, I like these rules, personally: - For a value class name `C`, `C.val` names a type. - For any class or interface name `C`, `C.ref` names a type, meaning the same thing as `C`. - For any type variable `T` (in new generics), `T.ref` names a type. - Maybe: For any type variable `T` (in specialized generics?), `T.val` also names a type. - The ref and val suffixes cannot be applied elsewhere. (So no `C.ref.val`.) Comments? -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jul 13 20:23:50 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 13:23:50 -0700 Subject: There are five buckets now Message-ID: FTR, I?m really happy with the user model as of this point. The things I like best about it are: - just the right spacing between the two types and their class (no extra classfiles, in particular) - safety by default everywhere - high speed numerics are supported as a non-default less-safe decision - no more discussion of non-atomic refs (that always gave me the willies) - uniform meaning of `C` (a safely published ref-type) - `Object::getClass` returns a predictable ref-mirror (I always wanted that) - encapsulation authors have authority over uses of the new features - the new levers ?feel like? existing encapsulation decisions - no more broad appeals like ?now, everybody, just remember that B3s are *never atomic*? - no need yet for new features like `require x` - good correspondence with our existing JVM prototype (`C.val` ? `QC;`) The surprising outcome of this is we now have (by my count) five buckets. And I don?t think we mind, because they are all about encapsulation choices. 1. identity = B1 2. better VBC (by-default private companion, by-default atomic) = B2 3. full-flat primitive (explicitly public non-atomic companion) = B3n 4. atomic primitive (explicitly public companion, but no tearing) = B3a 5. internally-flat VBC (private companion, tricky full-flat private vals) Yeah, the eagle-eyed designer might try to toss the last guy or two off the island. But to what purpose? The encapsulation paradigm means we trust the author of the class to set things up. Class authors like that sort of trust, as long as we don?t undermine their decisions with use-site overrides. (Which, in the Java world, we call ?security bugs?.) One thing I?m a little sad to give up on is the word ?primitive? for any of these types. It felt nice to try on the idea of user-definable primitives. As you can see from my list above, they do survive in some form. But not as a primary, bright-line distinction in the language, other than the existing legacy distinction. I think that?s fine. We always knew we were doubling down on class-like declarations, so it feels really good to be using them fully, including the encapsulation features. Another result of backing away from ?primitive? is that we have to engage with the question of ?where are the objects?? Obviously every non-null value of a variable whose type is an identity class refers to ?an object?. But beyond that it gets dicey and we will have to adjust our agreements, I think. This is probably worth a separate thread, which I will start, and which I expect Kevin will be very interested in. ? John On 13 Jun 2022, at 16:04, Brian Goetz wrote: > I've done a little more shaking of this tree.? It involves keeping > the notion that the non-identity buckets differ only in the treatment > of their val projection, but makes a further normalization that > enables the buckets to mostly collapse away. > > "value class X" means: > > ?- Instances are identity-free > ?- There are two types, X.ref (reference, nullable) and X.val > (direct, non-nullable) > ?- Reference types are atomic, as always > ?- X is an alias for X.ref > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jul 13 20:54:43 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 13:54:43 -0700 Subject: where are all the objects? Message-ID: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> On 13 Jul 2022, at 13:23, John Rose wrote: > ?Another result of backing away from ?primitive? is that we have > to engage with the question of ?where are the objects?? Obviously > every non-null value of a variable whose type is an identity class > refers to ?an object?. But beyond that it gets dicey and we will > have to adjust our agreements, I think. This is probably worth a > separate thread, which I will start, and which I expect Kevin will be > very interested in. A class defines ways to ?realize? the class in its types, that is, in variables of its companion types. Examples: `C r = new C();` and `C.val v = C.default;` and `new C.val[1]`. (For class declared non-atomic, races can also ?realize? values of the class.) I?m using a more neutral term ?realize? instead of ?instantiate?. You can think ?instantiate? or ?construct? if you like. A long-winded way to say ?realize a class? without tripping over words I want to avoid is ?make a non-`null` runtime configuration of a variable of a type of that class, using, directly or indirectly, an expression allowed by the declaration of that class?. These propositions seem to be all true (?at least? in part): - The result of realizing at least some classes in some types is, in fact, an ?object?. - The result of realizing at least some classes in some types is, in fact, an ?instance? of that class. - The result of realizing at least some classes in a value type is a ?value? of that class. - Every variable ?has a value?. - Every reference, other than `null`, ?refers to an object?. - Every non-reference variable ?contains a value? (as well as having it). So, do we think they are true? And of which classes, and for which of their companion types, are they true? I think some of us would like to reserve the term ?object? for something that has a header and a storage block in the heap. Although talking about memory blocks and headers and machine pointers is probably illegitimate for a JVMS, it is semi-transparent enough to ?see? that stuff like that is going on. (Except when it isn?t: Both Valhalla flattened references and classic escape analysis break those intuitions.) Users will want *something* to visualize about objects, and maybe that just doesn?t jibe with how we want them to think about values. OTOH, it is really freeing to be able to say that ?every class makes objects?, and build from there into a world of ?identity objects? and ?value objects?. (I can also appeal to various C standards, in which everything is either an object or a function. Yes, a C `int` is a C object. Take that, Mr. Objects-Are-Everywhere Java. That might work for us too with minor adjustments.) Even if we give up on making everything an object, I will still request that we cling to *some* word that can uniformly be applied to the result of realizing *any Java class*. If that word is not ?object? I think it is ?instance?. Also I think it is still useful to at least pretend (virtually) that a reference is always to an object. So, something like this: - The result of realizing any class in its reference companion is an object of that class. - The result of realizing any class into any of its companion types is an instance of that class. - The result of realizing any value class into its value companion is a ?value? of that class. - Maybe also: The result of realizing any value class into its reference companion is a ?value? of that class (as well a an ?object? of that class). - Every variable ?has a value?. (Same as above.) - Every reference, other than `null`, ?refers to an object?. (Same as above.) - Every non-reference variable ?contains a primitive? or ?contains an instance?. - Any value is (therefore) either a primitive, an instance, a `null` reference, or a reference to an object. - An object (therefore) is an instance or an array. (Cribbing from Kevin here.) - By an abuse of language, it is common to ignore the reference part of a reference variable, and say that it ?has? or ?contains? an instance if not `null`. (This makes opportunities for consistent reasoning about `C.ref` and `C.val` variables.) Get those spray-paints shaking! Bonus points if we finish before Brian gets back from his vacation, he?ll be so pleased! -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jul 13 21:02:06 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 14:02:06 -0700 Subject: There are five buckets now In-Reply-To: References: Message-ID: On 13 Jul 2022, at 13:23, John Rose wrote: > ?The surprising outcome of this is we now have (by my count) five > buckets. And I don?t think we mind, because they are all about > encapsulation choices. > > 1. identity = B1 > 2. better VBC (by-default private companion, by-default atomic) = B2 > 3. full-flat primitive (explicitly public non-atomic companion) = B3n > 4. atomic primitive (explicitly public companion, but no tearing) = > B3a > 5. internally-flat VBC (private companion, tricky full-flat private > vals) P.S. At a full count there are seven buckets, as I said in the meeting, since a ?privatized? companion type can be either package-private (aka. default access) or fully private (nestmate access only). So we have `B = id + val[atomic={yes,no}, access={public,package-private,private}]`. The list above splits sub-cases for package-private out of case 2 and case 5. At this rate, any bets on how long it takes to get into the double digits? :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Jul 14 03:31:46 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 13 Jul 2022 20:31:46 -0700 Subject: races on flat values Message-ID: <4A145916-740D-4C22-AC54-808280403753@oracle.com> So, let?s talk about the Java Memory Model and consequent JVM support for flattenable types. (Racing on flats is either really cool, as on the Bonneville Salt Flats, or else really dumb, if the flats are your flat tires. Either image will do here. Races are a generally bad idea, but sometimes they are how the cool kids get the best performance, just barely avoiding application-defined failure modes.) When a value is atomic, and/or when a reference type is used (not the `C.val` companion), I think there does not need to be any impact on the JMM. It is always enough to say that a load or store of the value behaves as if (two very important words: ?as if?) the value were separately buffered in the heap, and accessed only via a safely published pointer. Consider a racing read of a composite (multi-field) value from a variable V of type `C.val`, that has also been set from two or more racing writes `V=X`, `V=Y`. The ?as if? rule implies that the read will see either X with its fields `X.*F*` or Y with its fields `Y.*F*`, because X and Y behave ?as if? they are references to separate buffered instances of class C. (Either of the writes could be a buffered snapshot of aboriginal value `C.default`.) This is all straight out of the JMM?s repertoire of scenarios. The JVM can try to pull off various shenanigans under the covers, as long as the user observes behavior ?as if? the values were buffered in the heap. This is a good starting point for implementors, and it takes us quickly into special cases where the structure of the `.*F*` fields is simple enough. If they all pack in an atomically readable/writable memory unit (64 or 128 bits typically), then the JVM implementor can choose to quietly maintain the illusion of heap buffering, while storing a composite like `X.*F*` by packing the `.*F*` in a single memory unit, but omitting any physical reference for X. And then of course it doesn?t matter whether X existed in the first place or not, so don?t allocate it either. There?s more where that came from. An example would be an optimistic technique which stores both X and some of the `.*F*` in 128 bits, plus the rest of the `.*F*` in more bits; some sort of pointer encoding would say whether X must be followed (to a different heap block) or else (as the optimist hopes) all the `.*F*` can be read, perhaps helped by with a fast seq-lock or some other exclusion of races. I?m just getting started here, but it gets very very tricky and performance can stall out quickly. As has been mentioned already, nullable reference types can potentially be flattened as well as regular value types, and this would require encoding an extra ?null channel? in addition to the `.*F*` fields of the value being pointed to. This could be a byte, or a bit, or less than a bit if there is ?slack? in any of the other `.*F*` fields. So a tri-state `OptionalBoolean` could still be just one byte, but an `OptionalLong` is cursed with the need to find that 65th bit. There?s a lot to say about potential implementations under that benign ?as if? rule. With respect to JMM, the implementor has to ensure that if a default value `C.default` is never stored into a `C.ref` flat container, no possible race can observe that value. This is a special hazard, depending on the order in which the null channel is read and written, since an all zero flat `C.ref` might race to a `C.default` value if the null channel were set to the not-null state, before any `.*F*` components were stored. The safest way to stay within bounds is to use what Brian calls ?half-flat? formats that fit in 128 bits (or whatever is the natural unit for a platform) and load and store those formats with atomic instructions. This was already true for the `C.val` flats discussed above, but it becomes that much harder when you have the null channel along as a hitchhiker. Those were the preliminaries; now we come to non-atomic value types of the form `C.val`. When such a variable is flattened fully (this is Brian?s ?full flat? option), new kinds of races can create torn values. A full account of these races requires new rules within the JMM, describing what loads and stores (and other events) look like when they involve the new value types. I think we can say that a variable (field or array element) of a non-atomic value type (but no other type) decomposes into independent *sub-variables*, one for each field. This has to happen recursively, for fields that are themselves Q-types. Each sub-variable is an independent variable of the JMM. Maybe that is enough to build up various useful and interesting events and relations (read, write, happens-before, etc.), or maybe not. This is regardless of whether the JVM actually flattens; again it is all ?as if?, but this time with more races. So, suppose a read (`getfield` or `aaload`) grabs the `.*F*` field values from some variable V, into which was previously stored both X (including by reference all of `X.*F*`) and racing with X also Y (including by reference `Y.*F*`). Let?s call these variables `V.*F*`. I think we should then break the single read and both writes into one read and two writes *per field* (of C). I think the JMM can ignore the references X and Y and just track the individual read and write events. Having null out of the picture helps us forget the pointers as well. From the POV of the write, a thread decided on a whole bundle of field values and stored them, one by one, into the separate `.*F*` variables. The more exotic events of the JMM can, I think, simply be distributed from V to the sub-variables `V.*F*` in a regular way, just as we distributed the plain reads and writes. This seems workable. Let?s test it by considering a hybrid scenario where the class C includes a slowly-varying field and a rapidly-varying one. Maybe an array cursor: ``` value record C(Object[] array; int index) { public non-atomic companion type C.val; public boolean hasNext() { return index < array.length; } public Object get() { return array[index]; } public C next() { check(); return new C(array, index+1); } private void check() { if (!hasNext()) throw new Error(); } } ``` (Such a type is reasonably safe to use even with a public value type: The failure modes are comparable to those you get if you race an iterator for an `ArrayList`. One might even expect its null-capable reference type to flatten nicely in 64 or 128 bits. But that?s beside my point here.) Suppose I have a full-flat container of `C.val` and I have two racing writes; the first set up the variable from scratch, and the second changes the index but not the array (say, by calling `next`). ``` static final Object[] ARR = {22, 33, 55, 77}; static C.val V = new C(ARR, 0); void T1() { V = V.next(); } void T2() { System.out.println(V); } // T1 and T2 execute concurrently ``` The effect I would like is for a racing read to receive either index value, but always the same array value, as long as all racing writes have contributed the same array value. I think this is true for the example code above. What do you think? The reason I want this effect is I want to enable an optimization like this: ``` void T1optimized() { if (false) V = V.next(); //original code version if (false) V = new C(V.array, V.index+1); //inlining if (false) V = V __WithSubVariables { //inlining array = V.array; index = V.index+1; } boolean MAYBE = false; if (MAYBE) { //unbundling sub-variables V.*array = V.*array; //useless store, kill it if you can V.*index = V.*index+1; } else { V.*index += 1; //32-bit memory update } } ``` To get to the pleasing end result, I think the JIT has to work through the intermediate phases, and have permission to stop at any of them at any time. So I want the JIT to have the option (at its own whim) to either make a 32-bit memory update of just the V.*index sub-variable, or else a larger update to *both* sub-variables (the `if (MAYBE)` block in the example). This will have no effect in the simple scenario described above. But in more complex ones, where the `V.*array` sub-variable is changing as well, the JMM will allow arbitrarily strange mismatches between fields, such as a really obsolete array and the latest index (into a different array). This could happen if a racing composite write stalled just before writing `V.*array`, waiting until just about now, wrote a really old value, and then stalled permanently before writing the associated `V.*index` value. Meanwhile more normal threads are writing more or less coherent array/index pairs, but suddenly a racing read can pick up a recent index and the very old array component. I guess this is all true whether or not the JIT makes that final optimization step of nullifying a useless write. So (to finish with this laborious example) I guess the JIT has all it needs to optimize the processing of non-atomic full-flat values, without straying from the JMM (which is very permissive in the presence of races). One thing to observe in passing is that when a method like `C::next` runs, it has a value `this` which is on the stack, not in heap. This means that there can be no races on the fields of `this` during the execution of any method. So as the body of any C method executes, the fields `this.array` and `this.index` cannot change. This is as one would expect from final fields. But it?s true even if the original copy of `this` is being concurrently trashed by racing writes. This means the JIT cannot treat race-prone heap containers as spilled copies of `this`, to be reloaded at leisure. It has to pick up any and all fields that it might need just once per field needed in an inlined method call. It might kill dead stores, of course. If the class of `this` is atomic, it must use an atomic to pick up `this.*F*` (or the parts needed) at one time; otherwise it can pick up the needed parts of `this.*F*` as needed, but at most once per part. Does the partial write technique sketched here work for atomic flat values? I wish it would but I think it doesn?t, in general. Suppose that C above (the cursor class) were atomic, as is surely more typical. If I update the `V.*index` sub-variable by itself, I have to make sure that the 32-bit update is atomic with respect to the neighboring `V.*array` variable. If the hardware allows me to mix 32-bit and 64-bit atomics on the same word, well and good; I can do the narrow update. But it probably won?t work very often, and perhaps the hardware would have trouble sorting out the conflicting update sizes. A special case of this is setting a half-flat `C.ref` variable to null. This should allow a narrow store (say just one byte to the null channel), leaving the other bytes as garbage to be dealt with later. (The GC can come along later and zero them out, kind of like with weak reference processing, but more certain and eager.) Doing this requires care in ordering the null checks. If the JIT sees the null channel set to the ?null=yes? state (probably a zero bit or byte) then the JIT needs to cover its eyes and ignore any other bits it picked up in the same atomic read, because they might be non-zero garbage marooned by a partial write to the null channel. Since writing a zero byte to memory is naturally atomic, the hardware might tolerate null-channel writes mixed in with full 64-bit and 128-bit reads. An optimistic narrow-word null check might work on the read side. If I read the null channel using a single-byte read, and observe the ?null=yes? state, I don?t need to read anything else. But, if I observe ?null=no? using the narrow read, and do the full-width atomic read of 64 or 128 bits, I need to check the null channel again, in the full read, since a null might have come into memory between my two memory operations. This is uncomfortably like the ?test twice? anti-pattern, but I think it actually works. Whether it is profitable is anybody?s guess. I put this in as another example of VM shenanigans behind the ?as if? rules. Partial reads from flat atomic variables are probably a good idea in general. (That is, as long as they don?t interfere with hardware?s graceful execution of the atomic write instructions that populate the variables in the first place.) If the cursor C is atomic, and I write `V.index()` (again V is of type C), the JIT doesn?t need to load the `V.*array` sub-variable, just the `V.*index` sub-variable. No atomicity failures can be observed even if V has racing writes, since you can believe any `V.*array` value you like came with your sample of `V.*index`. But methods which work on two or more fields must pick up all of the sub-variable values (for those fields) in an atomic operation. So `V.hasNext()`, which looks at both `array.length` and `index`, needs to pick up the bundle `V.*{index,array}` in a coherent manner, using an atomic 64-bit or 128-bit load. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jul 14 16:48:00 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 14 Jul 2022 09:48:00 -0700 Subject: where are all the objects? In-Reply-To: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> Message-ID: You knew you could count on me. :-) On Wed, Jul 13, 2022 at 1:54 PM John Rose wrote: > On 13 Jul 2022, at 13:23, John Rose wrote: > > ?Another result of backing away from ?primitive? is that we have to engage > with the question of ?where are the objects?? Obviously every non-null > value of a variable whose type is an identity class refers to ?an object?. > But beyond that it gets dicey and we will have to adjust our agreements, I > think. This is probably worth a separate thread, which I will start, and > which I expect Kevin will be very interested in. > > A class defines ways to ?realize? the class in its types, that is, in > variables of its companion types. > Aside: I'd push back on the term "companion types of a class"; the class *defined* those types so they are just its "defined types". I do like calling the value type a "companion type to the reference type" (not to the class) well enough, though. For a couple reasons -- there is the parallel relationship of equals that I expect "companions" to have, and it's injecting a new term for a new concept, which is easier for folks to swallow than "we need a new term for an old concept while we are simultaneously injecting new concepts too". Examples: C r = new C(); and C.val v = C.default; and new C.val[1]. > > (For class declared non-atomic, races can also ?realize? values of the > class.) > > I?m using a more neutral term ?realize? instead of ?instantiate?. You can > think ?instantiate? or ?construct? if you like. A long-winded way to say > ?realize a class? without tripping over words I want to avoid is ?make a > non-null runtime configuration of a variable of a type of that class, > using, directly or indirectly, an expression allowed by the declaration of > that class?. > Fair enough; I'm viewing this roughly as "originating (summoning, conjuring) an instance of the type", as distinct from "being handed it from somewhere". I think we are more precisely talking about "realizing the *type*", and "realizing the class" is a shorthand for "realizing one of the types defined by the class". It seems too ambiguous to focus on "realizing the class". These propositions seem to be all true (?at least? in part): > > - The result of realizing at least some classes in some types is, in > fact, an ?object?. > - The result of realizing at least some classes in some types is, in > fact, an ?instance? of that class. > - The result of realizing at least some classes in a value type is a > ?value? of that class. > > My model sees this instead as: The instances of a value type are "values". The instances of a reference type are "references to objects". The result of realizing a type will always be an instance of that type by definition. The "instances of a class" is a slightly more problematic phrase I'd rather avoid. (Of course, I say this to suggest that I think the differences are important.) > - Every variable ?has a value?. > - Every reference, other than null, ?refers to an object?. > > (imho we would do well to shift to "null *isn't* a reference, but the lack of a reference where a reference could be". It lacks the most basic and important capability all references have: the ability to be dereferenced. But I understand that may be a big change, and not necessarily supported by theory.) > > - Every non-reference variable ?contains a value? (as well as having > it). > > Fair enough. I won't go into the subtle reasons why I use "remembers a value" over "contains a value". > > > So, do we think they are true? And of which classes, and for which of > their companion types, are they true? > > I think some of us would like to reserve the term ?object? for something > that has a header and a storage block in the heap. Although talking about > memory blocks and headers and machine pointers is probably illegitimate for > a JVMS, it is semi-transparent enough to ?see? that stuff like that is > going on. (Except when it isn?t: Both Valhalla flattened references and > classic escape analysis break those intuitions.) Users will want > *something* to visualize about objects, and maybe that just doesn?t jibe > with how we want them to think about values. > Heh, yes, I've been pretty determined about this. I see a beautifully clean distinction between objects and values, that I would feel grief at losing: An object A value ----------------------------- ----------------------------------------- has independent existence is ephemeral (no existence of its *own*) is self-describing is context-described is accessed via reference is used directly is eligible to have identity can?t have identity is polymorphic is strictly monomorphic has the type `Object` does not have the type `Object` If we let values of value types (i.e., all values except references and null) become "objects" too, then we will need a term for what used to mean "object" above. For me it is hard to see that ending up in a smooth and shiny place. OTOH, it is really freeing to be able to say that ?every class makes > objects?, and build from there into a world of ?identity objects? and > ?value objects?. > But, in this lexicon, "value objects" becomes ambiguous between (what we'd been calling) bucket 2 and bucket 3. In the model I've been advocating we have: "the instances of identity classes are identity objects; the instances of value classes are either value objects or just (composite) values." This brought me rare feelings of terminological victory, the kind of clarity you hope to be able to achieve but almost never can. Granted, it's still got to be carefully taught to everyone, against some amount of resistance, but that's true no matter what lexicon we adopt. Some old associations will *have* to be broken. (I can also appeal to various C standards, in which everything is either an > object or a function. Yes, a C int is a C object. Take that, Mr. > Objects-Are-Everywhere Java. That might work for us too with minor > adjustments.) > > Even if we give up on making everything an object, I will still request > that we cling to *some* word that can uniformly be applied to the result > of realizing *any Java class*. If that word is not ?object? I think it is > ?instance?. > Yes, I think it is absolutely and usefully "instance". And all instances except arrays (ugh, let's dodge whether null is an instance) are more specifically *class instances*, and that's what gives us instance members. I believe *that* is the unification we've really been driving for when we've said words like "we want to make everything an object". > Also I think it is still useful to at least pretend (virtually) that a > reference is always to an object. So, something like this: > > - > > The result of realizing any class in its reference companion is an > object of that class. > - > > The result of realizing any class into any of its companion types is > an instance of that class. > - > > The result of realizing any value class into its value companion is a > ?value? of that class. > - > > Maybe also: The result of realizing any value class into its reference > companion is a ?value? of that class (as well a an ?object? of that class). > > I would say no to this: a "value object" is not a "value". That takes explaining/apologetics, of course, so I don't love the term "value object" for that reason, but have no *other* complaint about it and can live with it. "identityless object" is just too unwieldy. > > - > > Every variable ?has a value?. (Same as above.) > - > > Every reference, other than null, ?refers to an object?. (Same as > above.) > - > > Every non-reference variable ?contains a primitive? or ?contains an > instance?. > > But primitive values are instances too. Of primitive types. I think that has always been true (though most of us aren't in the habit of saying it, because they were never *class instances* which is a very useful kind). Soon it will be "even more true" as they also even become *class* instances. > - > > Any value is (therefore) either a primitive, an instance, a null > reference, or a reference to an object. > > I think "instance" is awkwardly standing in for something more specific, here. I think I'd say "a value is either an instance of a value type (such as a primitive type), a reference to an object, or the null reference". > - > > An object (therefore) is an instance or an array. (Cribbing from Kevin > here.) > > I would say that arrays are also instances -- of array types. What they aren't is *class* instances. (So they don't get to have members; `length` and `clone` are at best half-heartedly-simulated members.) A tough spot about my model (which I think is unavoidable/acceptable) is that I can't get away with saying "An object is any class instance or array" anymore. Because Long is a class, it defines two types, and both those types have instances, and all those instances deserve equally to be considered "instances of the class". So the term "class instance" becomes inclusive of `(int) 42` which is so not an object. > - > > By an abuse of language, it is common to ignore the reference part of > a reference variable, and say that it ?has? or ?contains? an instance if > not null. (This makes opportunities for consistent reasoning about > C.ref and C.val variables.) > > Yeah, since the language and runtime abstract references away from us, traversing them for us when needed, we naturally also abstract them away from our speech and thoughts much of the time. I'm okay with "has an instance" but "contains an instance" is the kind of phrase I'd gently push back on, because it's tantamount to "the instance is contained by". -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Jul 14 19:14:36 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 14 Jul 2022 12:14:36 -0700 Subject: There are five buckets now In-Reply-To: References: Message-ID: On Wed, Jul 13, 2022 at 1:24 PM John Rose wrote: > > - ... > - Object::getClass returns a predictable ref-mirror (I always wanted > that) > > (aside: Yes, although in the case of `Foo.val value = ...; value.getClass()`, I'll still claim it shouldn't compile or should at least throw, so the user will write whichever of `Foo.class` or `Foo.val.class` they actually want. The surprising outcome of this is we now have (by my count) five buckets. > Sounds alarming, but mostly we just have to ensure that the individual levers ("value", "non-atomic", visibility) are fully understandable individually and orthogonally, with their combinations acting as expected from that understanding. So long as we achieve that, the tallied up number (5 or 7 or...) doesn't matter as much. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Fri Jul 15 13:48:10 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 15 Jul 2022 15:48:10 +0200 (CEST) Subject: There are five buckets now In-Reply-To: References: Message-ID: <804956609.10925768.1657892890413.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Brian Goetz" > Cc: "daniel smith" , "valhalla-spec-experts" > > Sent: Wednesday, July 13, 2022 10:23:50 PM > Subject: There are five buckets now > FTR, I?m really happy with the user model as of this point. > The things I like best about it are: [...] > * Object::getClass returns a predictable ref-mirror (I always wanted that) so we have to tweak the return type of getClass() to it returns something like Class and also C.val.class has to be typed Class. > ? John R?mi > On 13 Jun 2022, at 16:04, Brian Goetz wrote: >> I've done a little more shaking of this tree. It involves keeping the notion >> that the non-identity buckets differ only in the treatment of their val >> projection, but makes a further normalization that enables the buckets to >> mostly collapse away. >> "value class X" means: >> - Instances are identity-free >> - There are two types, X.ref (reference, nullable) and X.val (direct, >> non-nullable) >> - Reference types are atomic, as always >> - X is an alias for X.ref >> ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 15 14:49:05 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 15 Jul 2022 14:49:05 +0000 Subject: There are five buckets now In-Reply-To: <804956609.10925768.1657892890413.JavaMail.zimbra@u-pem.fr> References: <804956609.10925768.1657892890413.JavaMail.zimbra@u-pem.fr> Message-ID: <58B85BED-E79B-4CBD-BB33-B0A052A7EA41@oracle.com> You?re getting ahead of the story ;) Dan has worked out much of the generics work, but we are trying to stay focused on the class and object model first. But the issues you raise are handled. Sent from my MacBook Wheel On Jul 15, 2022, at 9:48 AM, Remi Forax wrote: ? ________________________________ From: "John Rose" To: "Brian Goetz" Cc: "daniel smith" , "valhalla-spec-experts" Sent: Wednesday, July 13, 2022 10:23:50 PM Subject: There are five buckets now FTR, I?m really happy with the user model as of this point. The things I like best about it are: [...] * Object::getClass returns a predictable ref-mirror (I always wanted that) so we have to tweak the return type of getClass() to it returns something like Class and also C.val.class has to be typed Class. ? John R?mi On 13 Jun 2022, at 16:04, Brian Goetz wrote: I've done a little more shaking of this tree. It involves keeping the notion that the non-identity buckets differ only in the treatment of their val projection, but makes a further normalization that enables the buckets to mostly collapse away. "value class X" means: - Instances are identity-free - There are two types, X.ref (reference, nullable) and X.val (direct, non-nullable) - Reference types are atomic, as always - X is an alias for X.ref ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cay.horstmann at gmail.com Sat Jul 16 14:09:07 2022 From: cay.horstmann at gmail.com (Cay Horstmann) Date: Sat, 16 Jul 2022 16:09:07 +0200 Subject: Five Buckets/JCrete In-Reply-To: References: Message-ID: As a possibly meaningless datapoint, I just presented the design from https://mail.openjdk.org/pipermail/valhalla-spec-observers/2022-June/002000.html at JCrete. Audience reaction: 1. "References have identity, values don't" is clean and easy to understand 2. T.ref, T.val were non-controversial. 3. Everyone expected/preferred values to be tearable and zero-initialized by default 4. Everyone absolutely hated the concept of non-public .val, and much groaning occurred when I showed public value record Complex(double real, double imag) { public non-atomic value companion Complex.val; } Of course people took potshots at package-private or protected .val types. People wondered why the five buckets couldn't just be identity + value[atomic={yes,no}, zero-default={yes,no}], with some reasonable choice of modifiers. I wasn't sure, and we all missed R?mi. 5. Some audience members forcefully demanded guarantees about flattening. "If I go through the trouble and make this type a non-atomic zero-default value and have an array with a million of them, I want to know for sure there are no object headers." Cheers, Cay Il 14/07/2022 21:14, Kevin Bourrillion ha scritto: > On Wed, Jul 13, 2022 at 1:24 PM John Rose > wrote: > > * ... > * |Object::getClass| returns a predictable ref-mirror (I always > wanted that) > > (aside: Yes, although in the case of `Foo.val value = ...; > value.getClass()`, I'll still claim it shouldn't compile or should?at > least throw, so the user will write whichever of `Foo.class` or > `Foo.val.class` they actually want. > > The surprising outcome of this is we now have (by my count) five > buckets. > > Sounds alarming, but mostly we just have to ensure that the individual > levers ("value", "non-atomic", visibility) are fully understandable > individually and orthogonally, with their combinations acting as > expected from that understanding. So long as we achieve that, the > tallied up number (5 or 7 or...) doesn't matter as much. > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > -- Cay S. Horstmann | http://horstmann.com | mailto:cay at horstmann.com From brian.goetz at oracle.com Sun Jul 17 19:21:46 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 17 Jul 2022 19:21:46 +0000 Subject: where are all the objects? In-Reply-To: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> Message-ID: Abstractly, my conception of ?object? is that it is a bundle of type state, identity (optionally), and zero or more fields. I think this most closely corresponds to Kevin?s notion of ?compound value?, though it might only have zero or one fields ? but since there is type state and potentially identity, it is still a composite. Historically we touched objects only through references, and we can still do so for all objects, but we now have the ability that a variable can store some objects directly. How we choose to see the formerly-primitive types is mostly a matter of choosing which fiction we prefer. At the VM level, we have I/J/F/D carriers, which are definitely some sort of primitive value. At the language level, we can tell ourselves that `int` is an identity-free direct instance of class Integer, if we like (though we still have to cut off the turtle-regress when declaring Integer.java.). I think some of us would like to reserve the term ?object? for something that has a header and a storage block in the heap. I?m not in that camp. The header and storage block is how we reassociate type state with a (non-flattened) object *reference*. But a field of type C.val stores its typestate ?statically?, in the field descriptor. (Theoretically, a field of type C.ref could too, though we don?t currently make that optimization.). But these are all just implementation options a JVM has. In the most idealized version of this world, values are either ?bare objects? (bag of type state and fields) or references to objects. In the slightly less idealized version, we would call out the legacy primitives as distinguished bare values. On Jul 13, 2022, at 4:54 PM, John Rose > wrote: On 13 Jul 2022, at 13:23, John Rose wrote: ?Another result of backing away from ?primitive? is that we have to engage with the question of ?where are the objects?? Obviously every non-null value of a variable whose type is an identity class refers to ?an object?. But beyond that it gets dicey and we will have to adjust our agreements, I think. This is probably worth a separate thread, which I will start, and which I expect Kevin will be very interested in. A class defines ways to ?realize? the class in its types, that is, in variables of its companion types. Examples: C r = new C(); and C.val v = C.default; and new C.val[1]. (For class declared non-atomic, races can also ?realize? values of the class.) I?m using a more neutral term ?realize? instead of ?instantiate?. You can think ?instantiate? or ?construct? if you like. A long-winded way to say ?realize a class? without tripping over words I want to avoid is ?make a non-null runtime configuration of a variable of a type of that class, using, directly or indirectly, an expression allowed by the declaration of that class?. These propositions seem to be all true (?at least? in part): * The result of realizing at least some classes in some types is, in fact, an ?object?. * The result of realizing at least some classes in some types is, in fact, an ?instance? of that class. * The result of realizing at least some classes in a value type is a ?value? of that class. * Every variable ?has a value?. * Every reference, other than null, ?refers to an object?. * Every non-reference variable ?contains a value? (as well as having it). So, do we think they are true? And of which classes, and for which of their companion types, are they true? I think some of us would like to reserve the term ?object? for something that has a header and a storage block in the heap. Although talking about memory blocks and headers and machine pointers is probably illegitimate for a JVMS, it is semi-transparent enough to ?see? that stuff like that is going on. (Except when it isn?t: Both Valhalla flattened references and classic escape analysis break those intuitions.) Users will want something to visualize about objects, and maybe that just doesn?t jibe with how we want them to think about values. OTOH, it is really freeing to be able to say that ?every class makes objects?, and build from there into a world of ?identity objects? and ?value objects?. (I can also appeal to various C standards, in which everything is either an object or a function. Yes, a C int is a C object. Take that, Mr. Objects-Are-Everywhere Java. That might work for us too with minor adjustments.) Even if we give up on making everything an object, I will still request that we cling to some word that can uniformly be applied to the result of realizing any Java class. If that word is not ?object? I think it is ?instance?. Also I think it is still useful to at least pretend (virtually) that a reference is always to an object. So, something like this: * The result of realizing any class in its reference companion is an object of that class. * The result of realizing any class into any of its companion types is an instance of that class. * The result of realizing any value class into its value companion is a ?value? of that class. * Maybe also: The result of realizing any value class into its reference companion is a ?value? of that class (as well a an ?object? of that class). * Every variable ?has a value?. (Same as above.) * Every reference, other than null, ?refers to an object?. (Same as above.) * Every non-reference variable ?contains a primitive? or ?contains an instance?. * Any value is (therefore) either a primitive, an instance, a null reference, or a reference to an object. * An object (therefore) is an instance or an array. (Cribbing from Kevin here.) * By an abuse of language, it is common to ignore the reference part of a reference variable, and say that it ?has? or ?contains? an instance if not null. (This makes opportunities for consistent reasoning about C.ref and C.val variables.) Get those spray-paints shaking! Bonus points if we finish before Brian gets back from his vacation, he?ll be so pleased! -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Jul 19 01:18:19 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 19 Jul 2022 03:18:19 +0200 (CEST) Subject: Value type companions, encapsulated In-Reply-To: <3030E223-F7F8-4268-843D-9A2DB7D32773@oracle.com> References: <3030E223-F7F8-4268-843D-9A2DB7D32773@oracle.com> Message-ID: <1172781059.12493451.1658193499669.JavaMail.zimbra@u-pem.fr> I've just finished to re-read the document. - There is an issue with Collections.toArray(), this is not well known but toArray(array) add a null isf the array is bigger than the collection size, so if array is an array of C.val i suppose that a default value can be inserted. from the javadoc "If this collection fits in the specified array with room to spare (i.e., the array has more elements than this collection), the element in the array immediately following the end of the collection is set to null . (This is useful in determining the length of this collection only if the caller knows that this collection does not contain any null elements.) " - Do we agree that this document prohibits to create an ArrayList of C.val (or any collections of C.val) if C.val is declared private or package private once the generics are updated (anewarray/aconst_init of C.val will fail at runtime) ? This seems too restrictive to me. It should be possible to create an array of T with T a C.val at runtime but it should not be possible to create a C.val out of thin air. R?mi > From: "John Rose" > To: "valhalla-spec-experts" > Sent: Wednesday, July 13, 2022 9:13:33 PM > Subject: Re: Value type companions, encapsulated > I have updated the document online in response to various comments. > [ http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md | > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md ] > [ http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html | > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html ] > The Valhalla JVM team is starting to look at these also. I expect they will want > to weigh in on the various JVMS details and issues. > So, thanks! We should start some separate threads on some of the issues. > ? John > P.S. For the record here are the diffs to the md file: > --- a/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md.~6~ > +++ b/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md > @@ -1,5 +1,5 @@ > % Value type companions, encapsulated > -% John Rose for Valhalla EG, June 2022 (ver 0.1) > +% John Rose for Valhalla EG, July 2022 (ver 0.2) > > > ## Background > @@ -34,13 +34,23 @@ restrictions]** at the end.)_ > ### Affordances of `C.ref` > Every class or interface `C` comes with a companion type, the > -reference type `C.ref` derived from `C` which describes any variable > -(argument, return value, array element, etc.) whose values are either > -null or of a concrete class derived from `C`. We are not in the habit > +reference type `C.ref` derived from `C` which describes any expression > +(variable, return value, array element, etc.) whose values are either > +null or are instances of a concrete class derived from `C`. > + > +> We are not in the habit > of distinguishing `C.ref` from `C`, but the distinction is there. For > example, if we call `Object::getClass` on a variable of type `C.ref` > we might not get `C.class`; we might even get a null pointer > -exception! > +exception! Put another way, `C` as a class means a particular class > +declaration, while `C.ref` as a type means a variable which can refer > +to instances of class `C` or any subclass. Also `C.ref` can be > +`null`, which is of no class at all. One can view the result of > +`Object::getClass` as a *type* rather than a mere *class*, since the > +API of `Class` includes representation of types like `int` and `C.val` > +as well as classes. In any case, the fact that a class can now have > +two associated types requires a clearer distinction between classes > +and types. > We are so very used to working with reference types (for short, > _ref-types_) that we sometimes forget all that they do for us > @@ -54,7 +64,7 @@ in addition to their linkage to specific classes: > - `C.ref` allows a single large object to be shared from many locations. > - `C.ref` with an identity class can centralize access to mutable state. > - `C.ref` values uniformly convert to and from general types like `Object`. > - - `C.ref` variable types can be reflected using `Class` mirror objects. > + - `C.ref` values are polymorphic (for non-final `C`), with varying > `Object::getClass` values. > - `C.ref` is safe for publication if the fields of `C` are `final`. > When I store a bunch of `C` objects into an object array or list, sort > @@ -100,10 +110,22 @@ But the author of the class gets to decide which states > are > legitimate, and the decisions are enforced by access control at the > boundaries of the encapsulation. > +> The author of an encapsulation determines whether the constant > +`C.default` is part of the public API or not. Therefore, the value of > +`C.default` is non-constructed only if `C.val` is privatized. > + > So if I code my class right, using access control to keep bad states > away from my clients, my class's external API will have no > non-constructed states. > +> Reflection and serialization provide additional modes of access to a > +class's API. The author of an encapsulation must be given control > +over these modes of access as well. (This is discussed further > +below.) If the author of `C` allows deserialization of `C` values not > +otherwise constructible via the public API, those values must be > +regarded as constructed, not non-constructed, but the API may also > +be regarded as poorly designed. > + > ### Costs of `C.ref` > In that case why have value types at all, if references are so > @@ -119,7 +141,7 @@ always wish to pay: > - A reference must be able to represent `null`; tightly-packed types like `int` > and `long` would need to add an extra bit somewhere to cover this. > The major alternative to references, as provided by Valhalla, is flat > -objects, where object fields are laid out immediately in their > +class instances, where instance fields are laid out immediately in their > containers, in place of a pointer which points to them stored > elsewhere. Neither alternative is always better than the other, which > is why Java has both `int` and `Integer` types and their arrays, and > @@ -140,13 +162,30 @@ The two companion types are closely related and perform > some of the > same jobs: > - `C.ref` and `C.val` both give a starting point for accessing `C`'s members. > - - `C.ref` and `C.val` can link `C` objects into acyclic graphs. > + - `C.ref` and `C.val` can link `C` instances into acyclic graphs. > - `C.ref` and `C.val` values uniformly convert to and from general types like > `Object`. > - - `C.ref` and `C.val` variable types can be reflected using `Class` mirror > objects. > For these jobs, it usually doesn't matter which type companion does > the work. > +> Specifically, > + > +> - An expression of the form `myc.method()` cares about the class of > + `myc` but not which companion type it is. The same point is true > + (probably) of methods like `Class::getMethods` which ignore the > + distinction between the mirrors `C.ref.class` and `C.val.class`. > + > +> - I can build a tree of `C` nodes using children lists of either > + companion type. (If however my `C` node contains direct child > + fields they cannot be of the `C.val` type.) > + > +> - Converting a variable `myc` to `Object` (or, respectively, casting > + an `Object` to store in `myc`), does the same kind of thing > + regardless of which companion type `myc` has. The only difference > + that `null` cannot be a result if `myc` is `C.val` (or, > + respectively, that `null` is rejected as a `C.val` value). > + > + > Despite the similarities, many properties of a value companion type > are subtly different from any reference type: > @@ -158,6 +197,10 @@ are subtly different from any reference type: > - `C.val` heap variables (fields, array elements) are initialized to all-zeroes. > - `C.val` might not be safe for publication (even though its fields are > `final`). > +The overall effect is that a `C.val` variable has a very specific > +concrete format, a flattened set of application-defined fields, often > +without added overhead from object headers and pointer chasing. > + > The JVM distinguishes `C.val` by giving it a different descriptor, a > so-called _Q-descriptor_ of the form `QC;`, and it also provides a > so-called _secondary mirror_ `C.val.class` which is similar to the > @@ -166,7 +209,7 @@ built-in primitive mirrors like `int.class`. > As the Valhalla performance model notes, flattening may be expected > but is not fully guaranteed. A `C.val` stored in an `Object` > container is likely to be boxed on the heap, for example. But `C.val` > -objects created as bytecode temporaries, arguments, and return values > +instances created as bytecode temporaries, arguments, and return values > are likely to be flattened into machine registers, and `C.val` fields > and array elements (at least below certain size thresholds) are also > likely to be flattened into heap words. > @@ -177,12 +220,35 @@ value class. There are additional terms and conditions > for flattening > Remember that reference types have full abstraction as one of their > powers, and this means building data structures that can refer to them > even before they are loaded. But a class file can request that the JVM > -"peek" at a class to see if it is a value class, and if this request > -is acted on early enough (at the JVM's discretion), then the JVM can > +"peek" at a class to see if it is a value class. > + > +> This request is conveyed via the [`Preload` attribute] defined in > +recent drafts of [JEP 8277163 (Value Objects)]. If this request is > +acted on early enough (at the JVM's discretion), then the JVM can > choose to lay out some or all `C.ref` values as flattened `C.val` > values _plus_ a boolean or other sentinel value which indicates the > `null` state. > +> If the JVM succeeds in flattening a `C.ref` variable, the JMM still > +requires that racing reads to such a variable will always return a > +consistent, safely published state. The atomicity or non-atomicity of > +the `C.val` companion type has no effect on the races possible to a > +`C.ref` variable. Thus, flattening a `C.ref` variable with a > +non-atomic value type is not simply a matter of adding a `null` > +channel field to a struct, if races are possible on that variable. > +Most machines today provide hardware atomicity only to 128 bits, so > +racing updates must probably be accomplished within the limits of 64- > +or 128-bit reads and writes, for a flattened `C.ref`. It seems likely > +that the heap buffering enjoyed by today's value-based classes will > +also be the technique of choice in the future, at least for larger > +value classes, when their containers are in the heap. Since JVM stack > +and locals can never race, adjoining a null state for a `C.ref` value > +can be a simple matter of allocating another calling sequence register > +or stack slot, for an argument or return value. > + > +[`Preload` attribute]: > > +[JEP 8277163 (Value Objects)]: > + > ### Pitfalls of `C.val` > The advantages of value companion types imply some complementary > @@ -221,13 +287,16 @@ racing writes to the same mutable `C.val` variable in the > heap. > Unlike reference types, value types can be manipulated to create these > non-constructed states even in well-designed classes. > -Now, it may be that a constructor (or factory) might be perfectly able > -to create one of the above non-constructed states as well, no strings > +Now, it may be that a public constructor (or factory) might be perfectly able > +to create a zero state or an arbitrary field combination, no strings > attached. In that case, the class author is enforcing few or no > invariants on the states of the value class. Many numeric classes, > like complex numbers, are like this: Initialization to all-zeroes is > no problem, and races between components are acceptable, compared to > -the costs of excluding races. > +the costs of excluding races. The worst a race condition can ever do > +is create a state that is legitimately constructed via the class API. > +We can say that a class which is this permissive has no > +non-constructed states at all. > > (The reader may recall that early JVMs accepted races on the high > and low halves of 64-bit integers as well; this is no longer a > @@ -259,6 +328,10 @@ Still, it turns out to be useful to give a common single > point of > declarative control to handle _all_ non-constructed states, both > the default value of `C.val` and its mysterious data races. > +So different encapsulation authors will want to make different > +choices. We will give them the means to make these choices. And > +(spoiler alert) we will make the safest choice be the default choice. > + > ## Privatization to the rescue > _(Here are the important details about the encapsulation of value > @@ -273,13 +346,13 @@ companion can be used freely, fully under control of the > class author. > But untrusted clients are prevented from building uninitialized fields > or arrays of type `C.val`. This prevents such clients from creating > -(either accidentally or purposefully) non-constructed values of type > +(either accidentally or purposefully) non-constructed states of type > `C.val`. How privatization is declared and enforced is discussed in > the rest of this document. > -> (To review, for those who skipped ahead, non-constructed values are > +> (To review, for those who skipped ahead, non-constructed states are > those not created under control of the class `C` by constructors or > -other accessible API points. A non-constructed value may be either an > +other accessible API points. A non-constructed state may be either an > uninitialized variable of `C.val`, or the result of a data race on a > shared mutable variable of type `C.val`. The class itself can work > internally with such values all day long, but we exclude external > @@ -291,7 +364,7 @@ As a second tactic, a value class `C` may select whether or > not the > JVM enforces atomicity of all occurrences of its value companion > `C.val`. A non-atomic value companion is subject to data races, and > if it is not privatized, external code may misuse `C.val` variables > -(in arrays or mutable fields) to create non-constructed values via > +(in arrays or mutable fields) to create non-constructed states via > data races. > A value companion which is atomic is not subject to data races. This > @@ -328,7 +401,7 @@ of both choices (privatization and declared non-atomicity), > although > it is natural to try to boil down the size of the matrix. > - `C.val` private & atomic is the default, and safest configuration > - hiding all non-constructed values outside of `C` and all data races > + hiding the most non-constructed states outside of `C` and all data races > even inside of `C`. There are some runtime costs. > - `C.val` public & non-atomic is the opposite, with fewer runtime > @@ -338,12 +411,12 @@ it is natural to try to boil down the size of the matrix. > non-atomic primitive like `long`. > - `C.val` public & atomic allows everybody to see the all-zero > - initial value but no other non-constructed states. This is > + initial value but no racing non-constructed states. This is > analogous to the situation of a naturally atomic primitive like > `int`. > - - `C.val` private & non-atomic allows `C` complete control over the > - visibility of non-constructed states, but `C` also has the ability > + - `C.val` private & non-atomic allows `C` complete access to and > + control over non-constructed states, but `C` also has the ability > to work internally on arrays of non-atomic elements. `C` should > take care not to leak internally-created flat arrays to untrusted > clients, lest they use data races to hammer non-constructed values > @@ -428,7 +501,7 @@ will fail. If the companion is neither `public` nor > `private`, then > Here is an example of a class which refuses to construct its default > value, and which prevents clients from seeing that state: > -``` > +```{#class-C} > class C { > int neverzero; > public C(int x) { > @@ -500,7 +573,7 @@ have a right to expect that encapsulation of companion types > will > hope to re-use their knowledge about how type name access works when > reasoning about companion types. We aim to accommodate that hope. If > it works, users won't have to think very often about the class-vs-type > -distinction. That is why the above design emulates pre-existing > +distinction. That is also why the above design emulates pre-existing > usage patterns for non-denotable types. > ### Privatization in translation > @@ -518,13 +591,36 @@ The `value_flags` field (16 bits) has the following > legitimate values: > - zero: `C.val` default access, non-atomic > - `ACC_PUBLIC`: `C.val` public access, non-atomic > - `ACC_PRIVATE`: `C.val` private access, non-atomic > - - `ACC_VOLATILE`: `C.val` default access, atomic > - - `ACC_VOLATILE|ACC_PUBLIC`: `C.val` public access, atomic > - - `ACC_VOLATILE|ACC_PRIVATE`: `C.val` private access, atomic > + - `ACC_FINAL`: `C.val` default access, atomic > + - `ACC_FINAL|ACC_PUBLIC`: `C.val` public access, atomic > + - `ACC_FINAL|ACC_PRIVATE`: `C.val` private access, atomic > Other values are rejected when the class file is loaded. > -(**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we > +The choice of `ACC_FINAL` for this job is arbitrary. It basically > +means "please ensure safe publication of `final` fields of this class, > +even for fields inside flattened instances." The race conditions of a > +non-atomic variable of type `C.val` are about the same as (are > +isomorphic to) the race conditions for the states reachable from a > +non-varying non-null variable of type `MC.ref`, where `MC` is a > +hypothetical identity class containing the same instance fields as > +`C`, but whose fields are not declared `final`. (Remember that `C`, > +being a value class, must have declared its fields `final`.) Omitting > +`ACC_FINAL` above means about the same as using the non-final fields > +of `MC` to store `C.val` states. Omitting `ACC_FINAL` is less safe > +for programmers, but much easier to implement in the JVM, since it can > +just peek and poke the fields retail, instead of updating the whole > +instance value in a wholesale transaction. > + > +> That is, if you see what I mean? `ACC_VOLATILE` would be another > +clever pun along the same lines, since a `volatile` variable of type > +`long` is one which suppresses tearing race conditions. But > +`volatile` means additional things as well. Other puns could be > +attempted with `ACC_STATIC`, `ACC_STRICT`, `ACC_NATIVE`, and more. > +John likes `ACC_FINAL` because of the JMM connection to `final` > +fields. > + > +> (**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we > really care that `jlr.Modifiers` kind-of wants to own the reflection > of the contextual modifier `value`? Who are the customers of this > modifier bit, as a bit? John doesn't care about it personally, and > @@ -536,11 +632,11 @@ kinds of structural checks on the fly during class loading > even before > class attributes are processed. Yet this also seems like a poor > reason to use a modifier bit.) > -(**JVM ISSUE #1:** What if the attribute is missing; do we reject the > -class file or do we infer `value_flags=ACC_PRIVATE|ACC_VOLATILE`? > +> (**JVM ISSUE #1:** What if the attribute is missing; do we reject the > +class file or do we infer `value_flags=ACC_PRIVATE|ACC_FINAL`? > Let's just reject the file.) > -(**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place > +> (**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place > to store the "atomic" bit as well? This attribute is a green-field > for VM design, as opposed to the brown-field of modifier bits. The > above language assumes the atomic bit belongs in there as well.) > @@ -673,7 +769,7 @@ As required above, the `checkcast` bytecode treats full > resolution and > restricted resolution states the same. > But when the `anewarray` or `multianewarray` instruction is executed, > -it consults throws an access error if its `CONSTANT_Class` is not > +it must throw an access error if its `CONSTANT_Class` is not > fully resolved (either it is an error or is restricted). This is how > the JVM prevents creation of arrays whose component type is an > inaccessible value companion type, even if the class file does > @@ -846,7 +942,7 @@ There are a number of standard API points for creating Java > array > objects. When they create arrays containing uninitialized elements, > then a non-constructed default value can appear. Even when they > create properly initialized arrays, if the type is declared > -non-atomic, then non-constructed values can be created by races. > +non-atomic, then non-constructed states can be created by races. > - `java.lang.reflect.Array::newInstance` takes an element mirror and length and > builds an array. The elements of the returned array are initialized to the > default value of the selected element type. > - `java.util.Arrays::copyOf` and `copyOfRange` can extend the length of an > existing array to include new uninitialized elements. > @@ -870,7 +966,7 @@ the creation of arrays of type `C.val[]` if `C.val` is not > public. > - The special overloading of `java.util.Arrays::copyOf` will refuse > to create an array of any non-atomic privatized type. (This > - refusal protects against non-constructed values arising from data > + refusal protects against non-constructed states arising from data > races.) It also incorporates the restrictions of its sibling > methods, against creating uninitialized elements (even of an > atomic type). > @@ -900,8 +996,8 @@ the creation of arrays of type `C.val[]` if `C.val` is not > public. > **API ISSUE #1:** Should we relax construction rules for zero-length > arrays? This would add complexity but might be a friendly move for > -some use cases. A zero-length array cannot expose non-constructed > -values. It may, however, serve as a misleading "witness" that some > +some use cases. A zero-length array can never expose non-constructed > +states. It may, however, serve as a misleading "witness" that some > code has gained permission to work with flat arrays. It's safer to > disallow even zero-length arrays. > @@ -969,6 +1065,9 @@ refuse to expose default values of privatized value > companions. > legitimate need to convert nulls to privatized values can use > conditional combinators to do this "the hard way". > + - `MethodHandle::asType` will refuse to convert from a `void` return > + to a privatized `C.val` type, similarly to `explicitCastArguments`. > + > - The method `Lookup::accessCompanion` will be defined analogously > to `Lookup::accessClass`. If `Lookup::accessClass` is applied to a > companion, it will check both the class and the companion, whereas > @@ -1003,7 +1102,7 @@ All such methods can be built on top of > `MethodHandles.Lookup`. > In general, a library API may be designed to preserve some aspect of > companion safety, as it allows untrusted code to work with arrays of > -privatized value type, while preventing non-constructed values of that > +privatized value type, while preventing non-constructed states of that > type from being materialized. Each such safe and friendly API has to > make a choice about how to prevent clients from creating > non-constructed states, or perhaps how to allow clients to gain > @@ -1011,19 +1110,21 @@ privilege to do so. Some points are worth remembering: > - An unprivileged client must not obtain `C.default` if `C.val` is privatized. > - An unprivileged client must not obtain a non-empty `C.val[]` array if `C.val` > is privatized and non-atomic. > - - It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) > old arrays, if the default is not injected. > + - It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) > old arrays, as long as new elements containing the `C.default` do not appear. > - If a new array is somehow frozen or wrapped so as be effectively immutable, it > is safe as long as it does not expose `C.default` values. > - If a value companion is `public`, there is no need for any restriction. > - Also, unrestricted use can be gated by a `Lookup` object or caller > sensitivity. > > In the presence of a reconstruction capability, either in the > language or in a library API or as provided by a single class, > -avoiding non-constructable objects includes allowing legitimate > +avoiding non-constructed instances includes allowing legitimate > reconstruction requests; each legitimate reconstruction request must > somehow preserve the intentions of the class's designer. > Reconstruction should act as if field values had been legitimately > (from `C`'s API) extracted, transformed, and then again legitimately > -(to `C`'s API) rebuilt into an instance of `C`. Serialization is an > +(to `C`'s API) rebuilt into an instance of `C`. > + > +> Serialization is an > example of reconstruction, since field values can be edited in the > wire format. Proposed `with` expressions for records are another > example of reconstruction. The `withfield` bytecode is the primitive > @@ -1031,7 +1132,25 @@ reconstruction operator, and must be restricted to > nestmates of `C` > since it can perform all physically possible field updates. > Reconstruction operations defined outside of `C` must be designed with > great care if they use elevated privileges beyond what `C` provides > -directly. > +directly. Given the historically tricky nature of deserialization, > +more work is needed to consider what serialization of a C.val actually > +means and how it interacts with default reconstitution behaviours. > +One likely possibility is that wire formats should only work with > +`C.ref` types with proper construction paths (enforced by serialization), > +and leave conversion to `C.val` types to deserialization code inside > +the encapsulation of `C`. > + > +> JNI, like serialization, allows creation of arrays which is hard to > +constrain with access checks. We have a choice of at least two > +positions on this. We could allow JNI full permission to create any > +kind of arrays, thus effectively allowing it "inside the nest" of any > +value class, as far as array construction goes. Or, we could say that > +JNI (like `Arrays::copyOf`) is absolutely forbidden to create > +uninitialized arrays of privatized value type. The latter is probably > +acceptable. As with other API points, programmers with a legitimate > +need to create flat privatized arrays can work around the limitations > +of the "nice" API points by using more complex ones that incorporate > +the necessary access checks. > ## Summary of user model > @@ -1063,12 +1182,15 @@ to find a workaround, such as: > - ask `C` politely to build such an array for you > - crack into `C` with a reflective API and build your own > -If you look closely at the code for `C`, you might noticed that it > +If you looked closely at [the code for `C` above], > +you might have noticed that it > uses its private type `C.val` in its public API. This is allowed. > Just be aware that null values will not flow through such API points. > When you get a `C.val` value into your own code, you can work on it > perfectly freely with the type `C` (which is `C.ref`). > +[the code for `C` above]: <#class-C> > + > If a value companion `C.val` is declared `public`, the class has > declared that it is willing to encounter its own default value > `C.default` coming from untrusted code. If it is declared `private`, -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbepincket at live.be Tue Jul 19 15:28:27 2022 From: robbepincket at live.be (Robbe Pincket) Date: Tue, 19 Jul 2022 15:28:27 +0000 Subject: where are all the objects? Message-ID: On Wed Jul 13 20:54:43 UTC 2022, John Rose wrote: > I?m using a more neutral term "realize" instead of "instantiate". I'm not really a fan of the term, I see why "instantiate" doesn't fit, but I hope stuff will be able to get reworded in such a way that "realize" isn't needed. > These propositions seem to be all true (?at least? in part): > - The result of realizing at least some classes in some types is, in fact, an "object". > - The result of realizing at least some classes in some types is, in fact, an "instance" of that class. > - The result of realizing at least some classes in a value type is a "value" of that class. > - Every variable "has a value". > - Every reference, other than `null`, "refers to an object". > - Every non-reference variable ?contains a value? (as well as having it). I'm very confused what you mean with your last point. "Every non-reference variable ?contains a value? (as well as having it)". A variable is a typed storage location, a variable containing a value or a variable having a value are in my mind synonyms. But if I read your point, it sounds like you disagree and a non-reference (primitive?) variable has and contains 2 (different?) values? --- On Thu Jul 14 16:48:00 UTC 2022, Kevin Bourrillion wrote: > The instances of a value type are "values". > The instances of a reference type are "references to objects". > [...] > But primitive values are instances too. Of primitive types. I think that has always been true (though most of us aren't in the habit of saying it, because they were never *class instances* which is a very useful kind). Oh god, no no no no no nooo. The instances of a reference type are the objects, not the references to those objects. The references are the values that a variable of a compatible type can hold. And no one calls primitive values instances, is cause they aren't. All mentions of "instances" in Java spec refer to "class instances" or "instances of a/the class" (at least as far I can see). > An object | A value > ----------------------------- | ----------------------------------------- > has independent existence | is ephemeral (no existence of its *own*) > is self-describing | is context-described > is accessed via reference | is used directly > is eligible to have identity | can?t have identity > is polymorphic | is strictly monomorphic > has the type `Object` | does not have the type `Object` I agree with some of these but I have a few issues: * I feel like independent existence and identity are the same thing, how would you be able to differentiate 2 equal objects that don't have identity? * I think I understand what you mean with "self-describing", if I have an object, I expect to be able to call `.getClass`. As a result an object holds all the info to fully interpret the object. But I'm missing something with "context-described". It feels like you are saying a value needs a context to "exist", but values are ephemeral. They just exist. In the end, I don't think "object" and "value" are mutually exclusive. Anything that extends Object is an "object", anything that is ephemeral is a "value" and anything that is both is a "value object" (identityless object) > I would say that arrays are also instances -- of array types. What they aren't is *class* instances. (So they don't get to have members; `length` and `clone` are at best half-heartedly-simulated members.) Arrays do have members, all the methods of `Object` are inherited by arrays, `clone` being one of them. --- John: > Even if we give up on making everything an object, I will still request that we cling to *some* word that can uniformly be applied to the result of realizing *any Java class*. If that word is not "object" I think it is "instance". Kevin: > Yes, I think it is absolutely and usefully "instance". > A tough spot about my model (which I think is unavoidable/acceptable) is that I can't get away with saying "An object is any class instance or array" anymore. Yeah this a bit of a nuisance. It would be nice to have a term that covers both "values" and "instances", because in my mind, and instance is something that gets instantiated. Ephemeral values don't get instantiated, cause they exist. Which means with my view objects of value classes aren't instances (?? this surprised me, but I can't convince myself otherwise anymore). Regards Robbe Pincket -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Tue Jul 19 17:54:20 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 19 Jul 2022 10:54:20 -0700 Subject: one class, two types, many bikesheds In-Reply-To: <1ACF189B-D346-4996-AF5B-3977FF02C1F3@oracle.com> References: <1ACF189B-D346-4996-AF5B-3977FF02C1F3@oracle.com> Message-ID: Sorry for delay On Wed, Jul 13, 2022 at 12:43 PM John Rose wrote: > The latest iteration of the user model for value classes makes it crystal > clear that one value class C defines two types. > > The second type is named C.val (at present, until that bikeshed is > repainted). This is the ?value companion? to C, or maybe its ?companion > value type?. > > The term ?companion? tries to capture the idea that the class doesn?t come > alone but travels with some friends, its types. But this pulls in a long > and difficult discussion about the exact relation between a class and a > type. And then (inevitably) ?what?s a class *really*?? and ?what?s a type > *really*?? > > I think we want to make a distinction between a class and a type. A class > is primarily a bunch of source code, later compiled into a classfile. A > type is the primary static attribute of a variable in source code, > determining its range of values and set of valid operations. As later > compiled, the type determines a JVM-level type (usually what we call a > field descriptor). The type probably also determines something of the > eventual format of the variable in a real machine, although that?s a secret > the JVM keeps. > (I know the 3 people regularly participating in these threads are already well-aware, sorry...) My own answer to the question has been Understanding classes and types in Java (comments welcome). I see a class as a way to "configure" and feed behavior into a type (apart from classes *also* serving as bundles of static members). It's a fairly subservient relationship, which feels right to me. In some of our discussions we have called the other companion type of C, > which is the (nullable) reference type, by the name C.ref, as if it were > something you could write in source code. Perhaps I should be saying > C.__REF to avoid giving that impression. But perhaps not. > > (A generic class can engender many, many types. Is this a whole mob of > companion types? Perhaps not, but it does call for a clear term for this > other relation of classes to types.) > (Sorry for digression: you could also say one class engenders many array types, though. I think it helps to fully distinguish predefined, user-defined, and composed types. Setting aside value classes temporarily: each class directly defines just one type, which is the type of `this` inside the class itself (the "implicit type", or the "this-type"). That's the all-important type whose member signatures are seen in the class and whose supertypes are seen in the class signature. Other types can be composed out of the defined types: array types, type variables, intersection types I guess, and relevant to us here, all *other* parameterized types beyond the implicit type. That is, imho it's most fruitful to understand those parameterized types as deriving from the implicit type/"this-type", with member signatures and supertypes being calculated from that implicit type via substitution, rather than to see them all as popping directly off of the generic class.) > (Note also that the ?raw type? of a generic class is named by just the > class name, sans type arguments: Plain List instead of List. And > not List.raw, at least not today.) > > To me it seems useful to treat the two types with a certain amount of > symmetry. There is one class, and two companion types, not a class (which > is also a type), plus its companion (value) type. > > If we do this, it makes some further sense to give them symmetrical names, > C.ref and C.val. We then say that the class name C, used in a context > that requires a type, is ?just sugar? for the more exact C.ref (and > certainly not C.val, or you would have used that name). > Are there other uses for C.ref? I can think of just two: > > - > > For type variables (*which are not classes*) T.ref (or some other > bikeshed color) means ?recover the reference companion, even if the generic > argument was a value type?. > - > > For extreme stylized clarity in source code, where someone wants to > emphasize that a variable is nullable. (Could this interact with > null-inference schemes? Oh, certainly!) > > I see some sense in your argument, but I still can't think of a reason I'd want to see `ClassName.ref` in source code. It seems like that can't add any information. > - The use T.ref lends weight to making the companions symmetric. You > can go from C meaning C.ref to C.val in List and then inside > List you can go back to C.ref. It?s a two-way street. > > There is a limit to treating the two companions symmetrically. Do we > really want to allow inner declarations of the form public companion type > C.ref;, on the grounds that we do so for the companion C.val? No, because > reference types are ?hardwired? by present JVM specifications, and > presumably future ones. > > Maybe we will turn, in the end, to a maximally asymmetric design, with no > symmetrical treatment anywhere; no C.ref in particular. But the cost of > that is never being able to refer unambiguously to C as class or as > ref-type, except using informal notations or narrative prose. > > At the moment, though, I like these rules, personally: > > - For a value class name C, C.val names a type. > - For any class or interface name C, C.ref names a type, meaning the > same thing as C. > - For any type variable T (in new generics), T.ref names a type. > - Maybe: For any type variable T (in specialized generics?), T.val > also names a type. > > If I ever see `T.val` (except maybe the case of `T.val[]`??) I will assume some kind of templating must be going on, since we'll all have learned early on that there is no polymorphic interaction with values. Is that your expectation too? > - The ref and val suffixes cannot be applied elsewhere. (So no > C.ref.val.) > > Comments? > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jul 20 16:44:00 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 20 Jul 2022 18:44:00 +0200 (CEST) Subject: The storage hint model Message-ID: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> Yes, i know, we have already discuss several models like that. But i think, it's a good idea to re-examine those because i believe they are more attractive today. My aim here is to try to simplify the .ref/.val model, not to fundamentally change it, .ref is still the default, there is no way to ask for a .val at definition site, etc so most of the design should be very familiar. The main issue with the .val model is that it presents two *types* to the user while we really want is mostly to flatten the storage and have a precise the method calling convention. Those two goals are not equals, the first is far more important than the second, to the point where the coding guideline proposed by Brian is to use .ref for the parameters and .val for the fields and arrays. We still need .val and .ref to be able to specialize generics, right ? No, i don't think so, we technically do not have to pass a .val as type argument to be able to specialize a generic class, we just need to pass a type argument that can be flatten if it's possible. Let's run this idea, let say that if we have a value class C, we do not need to pass C.val as argument to specialize a List, List is enough. Then for field/array and parameter, we need to introduce a storage hint saying that the value class must be a Q-type, in the rest of the document, i will use .flat for that. so instead of writing value class C { // ... } class Container { private T value; public Container(T value) { this.value = value; } } ... Container container = new Container(new C()); we can write instead value class C { // ... } class Container { private T.flat value; public Container(T value) { this.value = value; // may NPE } } ... Container container = new Container(new C()); // there is no C.val anymore ! The idea is to align the way, we declare a generics class with the way we declare a classical class, i.e. instead of specializing the T using a C.val as type argument, we can directly use C as type argument and ask for the flattened version of T using T.flat. This is very similar to the way the equivalent non-generics class is currently declared in the .ref/.val model if you replace replace .flat by .val. class Container { private C.flat value; public Container(C value) { this.value = value; // may NPE } } So it appears that if we do not allow users to specify if a local variable is a .ref or a .val but decide that it's always a .ref and if we are using container hint inside generics classes the same way we are using it on non-generics classes, then we do not need .ref and .val to be types, but only to be storage hints. And not having .ref and .val to be types greatly simplify the model, because they is no interaction between the type checking and the storage hints, those are two separated concerns. So following the actual design, they are 3 diff?rents kind of value class, using encapsulation to declare if a default is available or not value class C { private default {} } value class C { /* package-private */ default {} } value class C { public default {} } I've used "default" instead of "companion type" here because those are not type anymore. If flatten, the assignment can be non-atomic or atomic, with atomic being the default: non-atomic value class NC { [modifier] default {} } Because there is only one type C, all the syntax C.class, c.getClass(), instanceof C, etc works as usual. All Codes that does not use C.flat works as usual, apart from ==, hashCode, synchronized and weak reference. The storage hint C.flat can be used only in few selected places: - as a hint on a field type: private C.flat c; - as a hint on a field array type: private C.flat[] c; - when creating an array new C.flat[16] - as a hint on a parameter type of a method void foo(C.flat c) { ... } In terms of code generation, .flat is equivalent to asking a Q-type and adding a checkcast (or equivalent) from the Q-type to the L-type (or vice-versa) for each read/write of the field, the array cell or the parameter (for the parameter it can be done once). I really think that the .flat model is far easier to use than the .ref/.val model because it untangle the notion of value type with the notion of container hints and it seems to me that a JIT able to propagate/merge the Q-types should generate an assembly code as efficient as with the .ref/.val model. And obviously, i may have forgotten something invalidating the whole design, please shoot. regards, R?mi From brian.goetz at oracle.com Wed Jul 20 17:34:04 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 20 Jul 2022 17:34:04 +0000 Subject: The storage hint model In-Reply-To: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> Message-ID: <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> > Yes, i know, we have already discuss several models like that. But i think, it's a good idea to re-examine those because i believe they are more attractive today. Indeed, this has come up several times. It is attractive to think of flattening entirely as a ?storage class?, and fair to reexamine it (this also came up in an internal discussion recently) but I think in the end this still will be a choice that we regret. > The main issue with the .val model is that it presents two *types* to the user while we really want is mostly to flatten the storage and have a precise the method calling convention. > Those two goals are not equals, the first is far more important than the second, to the point where the coding guideline proposed by Brian is to use .ref for the parameters and .val for the fields and arrays. FTR, the motivation for the the guideline here is ?use .val where it makes the most difference.? There?s nothing *wrong* with using val types on the stack, you just don?t get the enormous payback you do with heap variables. But I can imagine ? especially in a specialized-generics world ? that there is value to using .val in APIs as well, because it carries the semantic ?not null? information as well as the flattening hint. > We still need .val and .ref to be able to specialize generics, right ? No, i don't think so, we technically do not have to pass a .val as type argument to be able to specialize a generic class, we just need to pass a type argument that can be flatten if it's possible. Here?s where I disagree. If field declaration and array creation expressions were the only places you needed to say .val, I?d be much more sympathetic to the container-properties model. But in a world with specialized generics, we want to flow the types throughout, not only to field layout, but flowing the non-null constraint to the JIT, etc. The `T.flat` approach will feel like a hack, because it is, and as an unbonus, people will forget almost all the time because having to select a storage class for an abstractly typed variable will feel unnatural. When I say ArrayList, I want the properties of Foo.val to flow to *all* the places where a T is being moved around. (This scheme rests on a clever but implicit assumption: that `T.flat` really means ?as flat as T can be?, which for a ref, is ?not at all.? Its clever, but for this reason `T.flat` is kind of a misnomer.). > we can write instead > value class C { > // ... > } > > class Container { > private T.flat value; Yeah, this is where you lose me. When you?re writing a generic class like ArrayList, you?re abstracted from the details of heap layout, and it seems overwhelmingly likely you?d forget to say T.flat somewhere. It also feels very ?nonparametric?, because we?ve created a second, ad-hoc channel through which information flows, and that channel is ?bumpier". But its worse than that, because there?s less type information in the program, and therefore the VM has to make more conservative assumptions about nullity. I get what you are trying to accomplish; the ref/val distinction feels like it is almost something we can get rid of. But I think swapping it for a storage class model is worse, because it is asking users to think about low-level details in more places, rather than using types and having the information flow with the types. And as you point out, it means there are more possible ways nulls can get deeper into the system before NPEing. From forax at univ-mlv.fr Wed Jul 20 20:05:34 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 20 Jul 2022 22:05:34 +0200 (CEST) Subject: The storage hint model In-Reply-To: <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> Message-ID: <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" > Cc: "valhalla-spec-experts" > Sent: Wednesday, July 20, 2022 7:34:04 PM > Subject: Re: The storage hint model >> Yes, i know, we have already discuss several models like that. But i think, it's >> a good idea to re-examine those because i believe they are more attractive >> today. > > Indeed, this has come up several times. It is attractive to think of flattening > entirely as a ?storage class?, and fair to reexamine it (this also came up in > an internal discussion recently) but I think in the end this still will be a > choice that we regret. > >> The main issue with the .val model is that it presents two *types* to the user >> while we really want is mostly to flatten the storage and have a precise the >> method calling convention. >> Those two goals are not equals, the first is far more important than the second, >> to the point where the coding guideline proposed by Brian is to use .ref for >> the parameters and .val for the fields and arrays. > > FTR, the motivation for the the guideline here is ?use .val where it makes the > most difference.? There?s nothing *wrong* with using val types on the stack, > you just don?t get the enormous payback you do with heap variables. But I can > imagine ? especially in a specialized-generics world ? that there is value to > using .val in APIs as well, because it carries the semantic ?not null? > information as well as the flattening hint. T.flat carries the same semantics, the difference is that you have to explicitly use T.flat where you want the flattening in the generic code. class Container { private T.flat value; // here public void set(T.flat value) { // but also here this.value = value; } public T.flat get() { // and here too return value; } } so yes it makes the generic code more cumbersome to write but it also makes generic classes easier to use because the writer of the generics decide what can be flattened (or not) and not the user of the generics. > >> We still need .val and .ref to be able to specialize generics, right ? No, i >> don't think so, we technically do not have to pass a .val as type argument to >> be able to specialize a generic class, we just need to pass a type argument >> that can be flatten if it's possible. > > Here?s where I disagree. If field declaration and array creation expressions > were the only places you needed to say .val, I?d be much more sympathetic to > the container-properties model. But in a world with specialized generics, we > want to flow the types throughout, not only to field layout, but flowing the > non-null constraint to the JIT, etc. The `T.flat` approach will feel like a > hack, because it is, and as an unbonus, people will forget almost all the time > because having to select a storage class for an abstractly typed variable will > feel unnatural. People will forget T.flat as much as they will forget C.flat (C.val if you prefer), that's true, but that the price to pay to be safe by default, in both cases. If you want to "fix" the potential missing T.flat, it's the same fix as with a potential missing C.flat, have a way to declare a value class flat by default at declaration site. But that's a separate discussion. > When I say ArrayList, I want the properties of > Foo.val to flow to *all* the places where a T is being moved around. Maybe you want or maybe you don't, here is an interesting implementation of ArrayList public classs ArrayList { private E[] array; private int size; public ArrayList() { array = new E.flat[16]; // ahah, flat by default ! } public boolean add(E element) { // E is not flat if (element == null && !array.getClass().isNullable()) { var newArray = new E[array.length]; // need to store null, use a nullable array System.arraycopy(array, 0, newArray, 0, array.length); array = newArray; } if (array.length == size) { array = Arrays.copyOf(array, size * 2); } array[size++] = element: return true; } } It starts with a flat array and if an element null is added, it "unflat" itself. This implementation is interesting because once recompiled with the new generics, a new ArrayList() will use a flatten array by default. I've no idea about the performance of such kind of implementations, but using T.flat give better control on what is flattenable or not in the implementation. > > (This scheme rests on a clever but implicit assumption: that `T.flat` really > means ?as flat as T can be?, which for a ref, is ?not at all.? Its clever, but > for this reason `T.flat` is kind of a misnomer.). If it's a value class, T.flat can still flatten the value if the size is <= 128 bits but yes, T.flat means as flat as T can be. > >> we can write instead >> value class C { >> // ... >> } >> >> class Container { >> private T.flat value; > > Yeah, this is where you lose me. When you?re writing a generic class like > ArrayList, you?re abstracted from the details of heap layout, and it seems > overwhelmingly likely you?d forget to say T.flat somewhere. It also feels very > ?nonparametric?, because we?ve created a second, ad-hoc channel through which > information flows, and that channel is ?bumpier". But its worse than that, > because there?s less type information in the program, and therefore the VM has > to make more conservative assumptions about nullity. This have been true with the previous proposed storage hint models, but unlike those, this model allows parameters to be declared as T.flat. I think it is the missing piece so the VM as enough information by propagating the T.flat so it does not need to make conservative assumptions. > > I get what you are trying to accomplish; the ref/val distinction feels like it > is almost something we can get rid of. But I think swapping it for a storage > class model is worse, because it is asking users to think about low-level > details in more places, rather than using types and having the information flow > with the types. In more places inside the generic code, in less places inside the user code. It's a trade i'm happy to make. > And as you point out, it means there are more possible ways > nulls can get deeper into the system before NPEing. yes, it can be as late as reaching a putField but it's because as a class writer you have more control. For example with List.of() which never allows null, delaying the NPE may provide better error messages, a requireNonNull may be better than having a NPE at the callsite like List.of(null) will do. R?mi From john.r.rose at oracle.com Wed Jul 20 23:36:05 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 20 Jul 2022 16:36:05 -0700 Subject: Value type companions, encapsulated In-Reply-To: <1172781059.12493451.1658193499669.JavaMail.zimbra@u-pem.fr> References: <3030E223-F7F8-4268-843D-9A2DB7D32773@oracle.com> <1172781059.12493451.1658193499669.JavaMail.zimbra@u-pem.fr> Message-ID: <284B9E3D-EC23-45B0-ACA6-2BE7F51BE2D4@oracle.com> On 18 Jul 2022, at 18:18, Remi Forax wrote: > I've just finished to re-read the document. Thank you! > > - There is an issue with Collections.toArray(), this is not well known > but toArray(array) add a null isf the array is bigger than the > collection size, so if array is an array of C.val i suppose that a > default value can be inserted. > from the javadoc "If this collection fits in the specified array with > room to spare (i.e., the array has more elements than this > collection), the element in the array immediately following the end of > the collection is set to null . (This is useful in determining the > length of this collection only if the caller knows that this > collection does not contain any null elements.) " By my reading of that spec, adding `null` cannot materialize `C.default`, but rather must cause a NPE, which is the normal result of storing a null into a val-array. (Non-ref arrays never contain `null`!) This will be true regardless of privatization of `C.val`. > - Do we agree that this document prohibits to create an ArrayList of > C.val (or any collections of C.val) if C.val is declared private or > package private once the generics are updated (anewarray/aconst_init > of C.val will fail at runtime) ? That depends on translation strategy for updated generics. If updated generics are still erased, then yes. If we go to the PVM and specialized generics, then the TS will surely provide the specialization engine with a `Lookup` for the client asking for `ArrayList`, and can check if that client can really access `C.val`. Next, if `ArrayList.java` has been recoded appropriately (whatever that means, in the end), surely whatever stands for `new T[n]` in the code will be able to access the capability of building the array by delegating through the specialization anchor (which can hold a `Lookup` if the TS sets that up). Maybe the language will allow a real literal `new T[n]`, or maybe `ArrayList.java` has to use some hocus-pocus with method handles. I can?t predict details but I am confident that these issues can be handled above the level of the PVM in the bootstrap methods and translation strategy. > This seems too restrictive to me. It should be possible to create an > array of T with T a C.val at runtime but it should not be possible to > create a C.val out of thin air. I agree. One of the challenges of the JLS support for specialized generics will be deciding how to permit delegation of the permission to do `new C.val[n]` in generic code under the guise `new T[n]` (or some equivalent expression), when `T` is `C.val` and `C.val` is privatized. It seems clear that some provision should be made. When I make a specialized generic `Foo` or `Foo::m` and `C.val` is privatized, (a) `C.val` is access-checked to me, but (b) I am entrusting its use to the generic as well. If (b) is not true I should be saying `Foo` instead. ?That raises the question, for specialized generics, whether the related but distinct access checks on the `C` name should be done (as well as for privatized `C.val`). I guess they should, but this is somewhat incompatible with erased generics. For erased generics, I can say `List` and it all works, because the erased generic code can never see `MyPrivateClass`. If we try that with specialized generics, then there are a number of choices: - Delegate access to `MyPrivateClass` to the specialized generic (via the specialization anchor and a `Lookup` if necessary). - Refuse to build the specialization at link-time unless everybody has access to `MyPrivateClass`. (The BSM for the specialization performs an access check to `MyPrivateClass` from both the specialized generic code and from the `Lookup` of the client.) - Build the specialization but throw an error when the specialized code tries to do something that would be an access failure in written-out code. (Yuck, late failures!) - In the BSM check access to `MyPrivateClass` to the caller, and give the specialized code carte blanche to do whatever with it. (This means I can write generics that can ?grab? secure capabilities from type args.) - In the BSM check access to `MyPrivateClass` to both caller and generic code (as before) and if either fails, fallback to use erasure. This is the sort of thing the PVM won?t care about? but the JLS and TS (with its BSM support) will have to figure it all out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Jul 21 02:34:24 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 20 Jul 2022 19:34:24 -0700 Subject: The storage hint model In-Reply-To: <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> Message-ID: I?m glad to see this idea worked out in detail, and for the record. Given we want to give users control at several points over potential flatness, we must choose either a separate storage classes channel or a variation in variable types themselves. And my overall sense is that it?s a ?pick your poison? situation. But I think complexifying the types goes down a little smoother than adding a new channel to most (but not all) variables. Generic variables can benefit from a type-based solution, but not from a SC-based solution, as this analysis shows. The SC-based solution shrinks the surface of the flat-vs-not choice, but makes it correspondingly more irregular. One irregularity you don?t mention is signature alignment. In order to link to a method or field (or override a method) you need the signatures aligned. (We don?t want bridges today please.) This means that the `.flat` annotations are part of the signature (since they compile to Q-types). This means signatures depend on something besides types, surely an add to the net complexity. Also, type inference can carry flatness in the design of record, but cannot do so in a SC-based design, at least not without more irregular hacks. (Maybe that doesn?t matter?) On 20 Jul 2022, at 13:05, forax at univ-mlv.fr wrote: > Maybe you want or maybe you don't, here is an interesting > implementation of ArrayList > > public classs ArrayList { > private E[] array; > private int size; > > public ArrayList() { > array = new E.flat[16]; // ahah, flat by default ! > } > > public boolean add(E element) { // E is not flat > if (element == null && !array.getClass().isNullable()) { > var newArray = new E[array.length]; // need to store null, use > a nullable array > System.arraycopy(array, 0, newArray, 0, array.length); > array = newArray; > } > if (array.length == size) { > array = Arrays.copyOf(array, size * 2); > } > array[size++] = element: > return true; > } > } > > It starts with a flat array and if an element null is added, it > "unflat" itself. > This implementation is interesting because once recompiled with the > new generics, a new ArrayList() will use a flatten array by > default. > > I've no idea about the performance of such kind of implementations, > but using T.flat give better control on what is flattenable or not in > the implementation. If we choose, we can code this trick in the design of record as well. So it?s not an advantage of the `.flat` proposal; it?s just a trick that is forced on the programmer willy nilly. Today a variable is a type, and an optional name. If it is a local or a field it has a name. If it is inside an array or a (possibly generic) container, there is no name. I can refactor my variables all day long through named and unnamed temps, fields, arrays, and other containers. If some but not all of those locations accept my chosen SC, my refactorings have greater friction. It seems like adding new types is less disruptive than adding a whole new classification for variables. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Thu Jul 21 11:29:47 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 21 Jul 2022 12:29:47 +0100 Subject: The storage hint model In-Reply-To: <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> Message-ID: <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> Hi Remi, I've been thinking along similar lines in the past few weeks. I think that, as with every approach, there are pros and cons to what you propose. In a way the difference between type-based and the storage-based approaches remind me of the distinction between homogeneous and heterogeneous generic reification translation strategies (for more details, please refer to the good read in [1]). In the strage-based model, the user of a generic class doesn't know if a type-variable will be used 20 levels down the stack; this calls, I think, for some sort of type-passing approach, where the generic type information is made available when the class is created, but not necessarily acted upon by the JVM. That is, an object whose static type is `Foo` is just an object whose type is `Foo` which has its "type-token" saved somewhere (in my thesis [2] I did that with an indirection in the oop - an approach that is sometimes referred to as near/far classes). You need to pass this info around everywhere because you don't know who's gonna use this information (e.g. in your strategy, which class is going to use some T.flat). Granted, in the model you propose it would be possible to see if a generic class uses T.flat at all, and, if it doesn't, maybe no type token is required - but that's an orthogonal optimization. As there's only one Foo (albeit used w/ or w/o side type information), it is a bit easier to deal with pathologically polymorphic cases such as wildcards, or to deal with use cases where type information is either missing, or not fit for purpose (think javac inferring a grotesque non-denotable type in a generic method call). One last point: in the storage-based model, clients do not have to opt in to get a version of `Foo` that exhibits some flatness features. It's up to the owner of `Foo` to decide whether to use `.flat` inside it or not. This can be seen as a pro, or a cons: on the one hands, there's no need to rewrite client code to take advantage of specialization (good!) - on the other hand, it is impossible for a client to make sure that existing code keeps behaving like it did in the past (bad!). Conversely, in the type-driven approach, simply "uttering" a specialized type like `Foo` brings a new runtime type into existence, possibly with a different layout. In this world it's easier to see where the type information is flowing into (as Brian pointed out), as that's part of the type signature. Also, since `Foo` is its own little class (or species), you get a place where to store type-static metadata for free. For instance, the type parameter `Point.val` might be represented as a static field of type `Class` inside the `Foo` species. Overall, a type-driven approach seems to fit better with the physics of the VMs we have, given that different parameterization can be given different runtime types, thus avoiding some of the profile pollutions that are otherwise hard to address when using a storage-based approach (something similar has been discussed for Scala miniboxing, see [3]). That said, in this model, dealing with absence of type information can be tricky, as shown in [4]. As noted above, clients here need an explicit opt-in into specialization to take advantage of it. Creating `Foo` is one thing, creating `Foo` is another, and clients can decide if they are ok with the costs associated with specialization. Finally, as Brian pointed out, under the storage-based translation, in order for things to work when type information is missing, you have to assume that T.flat doesn't really mean "flat all the time", but only "flat if you can". That is, if there's some side-channel available, then read T's true form from there, otherwise just take T's erasure and use that. That said, this problem is not entirely new in this approach. Consider: ``` class Foo { ?? X x; } class Sub extends Foo { ... } ``` Under the type-driven approach, if I create `Sub`, I'd expect that species to have a super-species `Foo` (which means `x` will have sharp type `Point.val`). But if I create `Sub`, then the super-species is just erased Foo, and the type of `x` is simply Object. So, the "flat if you can behavior" is there even in the type-driven approach (e.g. `extends Foo` doesn't mean the same thing in all cases), perhaps more in disguise. Overall, I don' think either model is "clearly" better than the other - they have different trade-offs which might work better in some contexts and worse in others. What we pick depends primarily, I think, on whether we see specialization as a conscious, opt-in decision performed by the user, or if we see specialization more as something happening "under the hood" (or, put in better terms, under control of library developers). While the latter sounds attractive, some figments of the specialized generic type system unfortunately will result in seams (e.g. new NullPointerExceptions) which are _visible_ to clients. So encapsulating specialization choices is not something that can be achieved 100%, and I think that is where some of us might feel uncomfortable about. Maurizio [1] - https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8658&rep=rep1&type=pdf [2] - http://amsdottorato.unibo.it/2476/ [3] - https://www.semanticscholar.org/paper/Compile-Time-Type-Driven-Data-Representation-in-Ureche/df5831814318ff11d189c4de0485745603fb7afe [4] - http://cr.openjdk.java.net/~jrose/values/parametric-vm.html On 20/07/2022 21:05, forax at univ-mlv.fr wrote: > ----- Original Message ----- >> From: "Brian Goetz" >> To: "Remi Forax" >> Cc: "valhalla-spec-experts" >> Sent: Wednesday, July 20, 2022 7:34:04 PM >> Subject: Re: The storage hint model >>> Yes, i know, we have already discuss several models like that. But i think, it's >>> a good idea to re-examine those because i believe they are more attractive >>> today. >> Indeed, this has come up several times. It is attractive to think of flattening >> entirely as a ?storage class?, and fair to reexamine it (this also came up in >> an internal discussion recently) but I think in the end this still will be a >> choice that we regret. >> >>> The main issue with the .val model is that it presents two *types* to the user >>> while we really want is mostly to flatten the storage and have a precise the >>> method calling convention. >>> Those two goals are not equals, the first is far more important than the second, >>> to the point where the coding guideline proposed by Brian is to use .ref for >>> the parameters and .val for the fields and arrays. >> FTR, the motivation for the the guideline here is ?use .val where it makes the >> most difference.? There?s nothing *wrong* with using val types on the stack, >> you just don?t get the enormous payback you do with heap variables. But I can >> imagine ? especially in a specialized-generics world ? that there is value to >> using .val in APIs as well, because it carries the semantic ?not null? >> information as well as the flattening hint. > T.flat carries the same semantics, the difference is that you have to explicitly use T.flat where you want the flattening in the generic code. > > class Container { > private T.flat value; // here > > public void set(T.flat value) { // but also here > this.value = value; > } > > public T.flat get() { // and here too > return value; > } > } > > so yes it makes the generic code more cumbersome to write but it also makes generic classes easier to use because the writer of the generics decide what can be flattened (or not) and not the user of the generics. > >>> We still need .val and .ref to be able to specialize generics, right ? No, i >>> don't think so, we technically do not have to pass a .val as type argument to >>> be able to specialize a generic class, we just need to pass a type argument >>> that can be flatten if it's possible. >> Here?s where I disagree. If field declaration and array creation expressions >> were the only places you needed to say .val, I?d be much more sympathetic to >> the container-properties model. But in a world with specialized generics, we >> want to flow the types throughout, not only to field layout, but flowing the >> non-null constraint to the JIT, etc. The `T.flat` approach will feel like a >> hack, because it is, and as an unbonus, people will forget almost all the time >> because having to select a storage class for an abstractly typed variable will >> feel unnatural. > People will forget T.flat as much as they will forget C.flat (C.val if you prefer), that's true, but that the price to pay to be safe by default, in both cases. > If you want to "fix" the potential missing T.flat, it's the same fix as with a potential missing C.flat, have a way to declare a value class flat by default at declaration site. But that's a separate discussion. > >> When I say ArrayList, I want the properties of >> Foo.val to flow to *all* the places where a T is being moved around. > Maybe you want or maybe you don't, here is an interesting implementation of ArrayList > > public classs ArrayList { > private E[] array; > private int size; > > public ArrayList() { > array = new E.flat[16]; // ahah, flat by default ! > } > > public boolean add(E element) { // E is not flat > if (element == null && !array.getClass().isNullable()) { > var newArray = new E[array.length]; // need to store null, use a nullable array > System.arraycopy(array, 0, newArray, 0, array.length); > array = newArray; > } > if (array.length == size) { > array = Arrays.copyOf(array, size * 2); > } > array[size++] = element: > return true; > } > } > > It starts with a flat array and if an element null is added, it "unflat" itself. > This implementation is interesting because once recompiled with the new generics, a new ArrayList() will use a flatten array by default. > > I've no idea about the performance of such kind of implementations, but using T.flat give better control on what is flattenable or not in the implementation. > >> (This scheme rests on a clever but implicit assumption: that `T.flat` really >> means ?as flat as T can be?, which for a ref, is ?not at all.? Its clever, but >> for this reason `T.flat` is kind of a misnomer.). > If it's a value class, T.flat can still flatten the value if the size is <= 128 bits but yes, T.flat means as flat as T can be. > >>> we can write instead >>> value class C { >>> // ... >>> } >>> >>> class Container { >>> private T.flat value; >> Yeah, this is where you lose me. When you?re writing a generic class like >> ArrayList, you?re abstracted from the details of heap layout, and it seems >> overwhelmingly likely you?d forget to say T.flat somewhere. It also feels very >> ?nonparametric?, because we?ve created a second, ad-hoc channel through which >> information flows, and that channel is ?bumpier". But its worse than that, >> because there?s less type information in the program, and therefore the VM has >> to make more conservative assumptions about nullity. > This have been true with the previous proposed storage hint models, but unlike those, this model allows parameters to be declared as T.flat. > I think it is the missing piece so the VM as enough information by propagating the T.flat so it does not need to make conservative assumptions. > >> I get what you are trying to accomplish; the ref/val distinction feels like it >> is almost something we can get rid of. But I think swapping it for a storage >> class model is worse, because it is asking users to think about low-level >> details in more places, rather than using types and having the information flow >> with the types. > In more places inside the generic code, in less places inside the user code. It's a trade i'm happy to make. > >> And as you point out, it means there are more possible ways >> nulls can get deeper into the system before NPEing. > yes, it can be as late as reaching a putField but it's because as a class writer you have more control. > For example with List.of() which never allows null, delaying the NPE may provide better error messages, a requireNonNull may be better than having a NPE at the callsite like List.of(null) will do. > > R?mi From forax at univ-mlv.fr Thu Jul 21 13:29:03 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 21 Jul 2022 15:29:03 +0200 (CEST) Subject: The storage hint model In-Reply-To: <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> Message-ID: <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Maurizio Cimadamore" > To: "Remi Forax" , "Brian Goetz" > Cc: "valhalla-spec-experts" > Sent: Thursday, July 21, 2022 1:29:47 PM > Subject: Re: The storage hint model > Hi Remi, > I've been thinking along similar lines in the past few weeks. I think we are all hunting on similar grounds :) I think you are mixing two different ideas one is .val propagation and the other is the flat vs box model for generics. It's my fault because in my previous mail i mix both things too. The .val propagation. My aha! moment was discovering that we do not need .val propagation. In the parametric VM, the instantiation of specialized generics is done at runtime inside bootstrap methods (triggered by opcodes linkage resolution or method descriptor type restriction). Combined with the idea that at runtime there is only one class corresponding to a value class, then the runtime can check if a type argument is a value class or not. So with C a value class, an ArrayList is enough to trigger the specialization. The type propagation we all want is done by the parametric VM. With your example, class Sub extends Foo { ... } If X is C at runtime, then the X of Foo is also a C because the VM propagates the type arguments. The .flat model for value class We know since quite some time that we can describe the behavior of value classes using a storage hints model. The appeal of such models is that it is independent on the type system, such hints are not propagated with the types. Unlike the previous models, i propose to add such hints not only on field type and array type but also on parameter type so the VM has enough information at class preparation time and at JIT time so the VM has enough information to generate specialized assembly codes. The flat vs box model for generics My mistake is to not have discussed whenever we want a .flat or a .box model for generics. Like with value classes, for a type parameter doing specialization can be opt-in (using .flat) or opt-out (using .box). Brian, John and you seems to prefer the box model, where a storage hint has to be used only when a L-type is required. As an example, here is the ArrayList example, using the .box model. public classs ArrayList { private E.box[] array; // can be flat or box, so declared as box private int size; public ArrayList() { array = new E[16]; // flat by default ! } public boolean add(E.box element) { // E is not flat if (element == null && !array.getClass().isNullable()) { var newArray = new E.box[array.length]; // need to store null, use a nullable array System.arraycopy(array, 0, newArray, 0, array.length); array = newArray; } if (array.length == size) { array = Arrays.copyOf(array, size * 2); } array[size++] = element: return true; } } for me, using .flat or .box is a separate decision than using a storage hint model vs a type based model. > > I think that, as with every approach, there are pros and cons to what > you propose. In a way the difference between type-based and the > storage-based approaches remind me of the distinction between > homogeneous and heterogeneous generic reification translation strategies > (for more details, please refer to the good read in [1]). Not sure to follow you on this one, homogenous vs heterogeneous is more a parametric VM discussion. The current plan is homogenous bytecode and heterogeneous when specialized if asked by bootstrap methods whatever the model we choose. > > In the strage-based model, the user of a generic class doesn't know if a > type-variable will be used 20 levels down the stack; this calls, I > think, for some sort of type-passing approach, where the generic type > information is made available when the class is created, but not > necessarily acted upon by the JVM. That is, an object whose static type > is `Foo` is just an object whose type is `Foo` which has its > "type-token" saved somewhere (in my thesis [2] I did that with an > indirection in the oop - an approach that is sometimes referred to as > near/far classes). You need to pass this info around everywhere because > you don't know who's gonna use this information (e.g. in your strategy, > which class is going to use some T.flat). Granted, in the model you > propose it would be possible to see if a generic class uses T.flat at > all, and, if it doesn't, maybe no type token is required - but that's an > orthogonal optimization. As there's only one Foo (albeit used w/ or w/o > side type information), it is a bit easier to deal with pathologically > polymorphic cases such as wildcards, or to deal with use cases where > type information is either missing, or not fit for purpose (think javac > inferring a grotesque non-denotable type in a generic method call). One > last point: in the storage-based model, clients do not have to opt in to > get a version of `Foo` that exhibits some flatness features. It's > up to the owner of `Foo` to decide whether to use `.flat` inside it or > not. This can be seen as a pro, or a cons: on the one hands, there's no > need to rewrite client code to take advantage of specialization (good!) > - on the other hand, it is impossible for a client to make sure that > existing code keeps behaving like it did in the past (bad!). yes, that's the main difference, as a client to have less control, you are only free to not upgrade when a generics class is recompiled to use specialized generics (apart using raw type which is a special kind of ugly). But at the same time, if you are using a library, you are trusting the maintainers that they will do a proper job when upgrading. > > Conversely, in the type-driven approach, simply "uttering" a specialized > type like `Foo` brings a new runtime type into existence, > possibly with a different layout. In this world it's easier to see where > the type information is flowing into (as Brian pointed out), as that's > part of the type signature. Also, since `Foo` is its own > little class (or species), you get a place where to store type-static > metadata for free. For instance, the type parameter `Point.val` might be > represented as a static field of type `Class` inside the > `Foo` species. Overall, a type-driven approach seems to fit > better with the physics of the VMs we have, given that different > parameterization can be given different runtime types, thus avoiding > some of the profile pollutions that are otherwise hard to address when > using a storage-based approach (something similar has been discussed for > Scala miniboxing, see [3]). That said, in this model, dealing with > absence of type information can be tricky, as shown in [4]. As noted > above, clients here need an explicit opt-in into specialization to take > advantage of it. Creating `Foo` is one thing, creating > `Foo` is another, and clients can decide if they are ok with > the costs associated with specialization. I believe both models propagate enough information because the specialization occurs at runtime. > > Finally, as Brian pointed out, under the storage-based translation, in > order for things to work when type information is missing, you have to > assume that T.flat doesn't really mean "flat all the time", but only > "flat if you can". That is, if there's some side-channel available, then > read T's true form from there, otherwise just take T's erasure and use > that. That said, this problem is not entirely new in this approach. > Consider: > > ``` > class Foo { > ?? X x; > } > > class Sub extends Foo { ... } > ``` > > Under the type-driven approach, if I create `Sub`, I'd expect > that species to have a super-species `Foo` (which means `x` > will have sharp type `Point.val`). But if I create `Sub`, then > the super-species is just erased Foo, and the type of `x` is simply > Object. So, the "flat if you can behavior" is there even in the > type-driven approach (e.g. `extends Foo` doesn't mean the same thing > in all cases), perhaps more in disguise. yes, the propagation is the same but the client has less control of the specialization. > > Overall, I don' think either model is "clearly" better than the other - > they have different trade-offs which might work better in some contexts > and worse in others. What we pick depends primarily, I think, on whether > we see specialization as a conscious, opt-in decision performed by the > user, or if we see specialization more as something happening "under the > hood" (or, put in better terms, under control of library developers). yes ! > While the latter sounds attractive, some figments of the specialized > generic type system unfortunately will result in seams (e.g. new > NullPointerExceptions) which are _visible_ to clients. So encapsulating > specialization choices is not something that can be achieved 100%, and I > think that is where some of us might feel uncomfortable about. I think this one is more an artifact of .flat vs .box. > > Maurizio R?mi > > > [1] - > https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8658&rep=rep1&type=pdf > [2] - http://amsdottorato.unibo.it/2476/ > [3] - > https://www.semanticscholar.org/paper/Compile-Time-Type-Driven-Data-Representation-in-Ureche/df5831814318ff11d189c4de0485745603fb7afe > [4] - http://cr.openjdk.java.net/~jrose/values/parametric-vm.html > > > > > > > > On 20/07/2022 21:05, forax at univ-mlv.fr wrote: >> ----- Original Message ----- >>> From: "Brian Goetz" >>> To: "Remi Forax" >>> Cc: "valhalla-spec-experts" >>> Sent: Wednesday, July 20, 2022 7:34:04 PM >>> Subject: Re: The storage hint model >>>> Yes, i know, we have already discuss several models like that. But i think, it's >>>> a good idea to re-examine those because i believe they are more attractive >>>> today. >>> Indeed, this has come up several times. It is attractive to think of flattening >>> entirely as a ?storage class?, and fair to reexamine it (this also came up in >>> an internal discussion recently) but I think in the end this still will be a >>> choice that we regret. >>> >>>> The main issue with the .val model is that it presents two *types* to the user >>>> while we really want is mostly to flatten the storage and have a precise the >>>> method calling convention. >>>> Those two goals are not equals, the first is far more important than the second, >>>> to the point where the coding guideline proposed by Brian is to use .ref for >>>> the parameters and .val for the fields and arrays. >>> FTR, the motivation for the the guideline here is ?use .val where it makes the >>> most difference.? There?s nothing *wrong* with using val types on the stack, >>> you just don?t get the enormous payback you do with heap variables. But I can >>> imagine ? especially in a specialized-generics world ? that there is value to >>> using .val in APIs as well, because it carries the semantic ?not null? >>> information as well as the flattening hint. >> T.flat carries the same semantics, the difference is that you have to explicitly >> use T.flat where you want the flattening in the generic code. >> >> class Container { >> private T.flat value; // here >> >> public void set(T.flat value) { // but also here >> this.value = value; >> } >> >> public T.flat get() { // and here too >> return value; >> } >> } >> >> so yes it makes the generic code more cumbersome to write but it also makes >> generic classes easier to use because the writer of the generics decide what >> can be flattened (or not) and not the user of the generics. >> >>>> We still need .val and .ref to be able to specialize generics, right ? No, i >>>> don't think so, we technically do not have to pass a .val as type argument to >>>> be able to specialize a generic class, we just need to pass a type argument >>>> that can be flatten if it's possible. >>> Here?s where I disagree. If field declaration and array creation expressions >>> were the only places you needed to say .val, I?d be much more sympathetic to >>> the container-properties model. But in a world with specialized generics, we >>> want to flow the types throughout, not only to field layout, but flowing the >>> non-null constraint to the JIT, etc. The `T.flat` approach will feel like a >>> hack, because it is, and as an unbonus, people will forget almost all the time >>> because having to select a storage class for an abstractly typed variable will >>> feel unnatural. >> People will forget T.flat as much as they will forget C.flat (C.val if you >> prefer), that's true, but that the price to pay to be safe by default, in both >> cases. >> If you want to "fix" the potential missing T.flat, it's the same fix as with a >> potential missing C.flat, have a way to declare a value class flat by default >> at declaration site. But that's a separate discussion. >> >>> When I say ArrayList, I want the properties of >>> Foo.val to flow to *all* the places where a T is being moved around. >> Maybe you want or maybe you don't, here is an interesting implementation of >> ArrayList >> >> public classs ArrayList { >> private E[] array; >> private int size; >> >> public ArrayList() { >> array = new E.flat[16]; // ahah, flat by default ! >> } >> >> public boolean add(E element) { // E is not flat >> if (element == null && !array.getClass().isNullable()) { >> var newArray = new E[array.length]; // need to store null, use a nullable array >> System.arraycopy(array, 0, newArray, 0, array.length); >> array = newArray; >> } >> if (array.length == size) { >> array = Arrays.copyOf(array, size * 2); >> } >> array[size++] = element: >> return true; >> } >> } >> >> It starts with a flat array and if an element null is added, it "unflat" itself. >> This implementation is interesting because once recompiled with the new >> generics, a new ArrayList() will use a flatten array by default. >> >> I've no idea about the performance of such kind of implementations, but using >> T.flat give better control on what is flattenable or not in the implementation. >> >>> (This scheme rests on a clever but implicit assumption: that `T.flat` really >>> means ?as flat as T can be?, which for a ref, is ?not at all.? Its clever, but >>> for this reason `T.flat` is kind of a misnomer.). >> If it's a value class, T.flat can still flatten the value if the size is <= 128 >> bits but yes, T.flat means as flat as T can be. >> >>>> we can write instead >>>> value class C { >>>> // ... >>>> } >>>> >>>> class Container { >>>> private T.flat value; >>> Yeah, this is where you lose me. When you?re writing a generic class like >>> ArrayList, you?re abstracted from the details of heap layout, and it seems >>> overwhelmingly likely you?d forget to say T.flat somewhere. It also feels very >>> ?nonparametric?, because we?ve created a second, ad-hoc channel through which >>> information flows, and that channel is ?bumpier". But its worse than that, >>> because there?s less type information in the program, and therefore the VM has >>> to make more conservative assumptions about nullity. >> This have been true with the previous proposed storage hint models, but unlike >> those, this model allows parameters to be declared as T.flat. >> I think it is the missing piece so the VM as enough information by propagating >> the T.flat so it does not need to make conservative assumptions. >> >>> I get what you are trying to accomplish; the ref/val distinction feels like it >>> is almost something we can get rid of. But I think swapping it for a storage >>> class model is worse, because it is asking users to think about low-level >>> details in more places, rather than using types and having the information flow >>> with the types. >> In more places inside the generic code, in less places inside the user code. >> It's a trade i'm happy to make. >> >>> And as you point out, it means there are more possible ways >>> nulls can get deeper into the system before NPEing. >> yes, it can be as late as reaching a putField but it's because as a class writer >> you have more control. >> For example with List.of() which never allows null, delaying the NPE may provide >> better error messages, a requireNonNull may be better than having a NPE at the >> callsite like List.of(null) will do. >> > > R?mi From forax at univ-mlv.fr Thu Jul 21 13:54:36 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 21 Jul 2022 15:54:36 +0200 (CEST) Subject: The storage hint model In-Reply-To: References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> Message-ID: <720984774.13872281.1658411676531.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Sent: Thursday, July 21, 2022 4:34:24 AM > Subject: Re: The storage hint model > I?m glad to see this idea worked out in detail, and for the record. > Given we want to give users control at several points over potential flatness, > we must choose either a separate storage classes channel or a variation in > variable types themselves. > And my overall sense is that it?s a ?pick your poison? situation. But I think > complexifying the types goes down a little smoother than adding a new channel > to most (but not all) variables. Generic variables can benefit from a > type-based solution, but not from a SC-based solution, as this analysis shows. > The SC-based solution shrinks the surface of the flat-vs-not choice, but makes > it correspondingly more irregular. > One irregularity you don?t mention is signature alignment. In order to link to a > method or field (or override a method) you need the signatures aligned. (We > don?t want bridges today please.) This means that the .flat annotations are > part of the signature (since they compile to Q-types). This means signatures > depend on something besides types, surely an add to the net complexity. yes, it's not only storage hints, you also need parameter hints and those have an impact on the binary compatibility apart if we go with the TypeRestriction option (not quite a bridge if you squint). > Also, type inference can carry flatness in the design of record, but cannot do > so in a SC-based design, at least not without more irregular hacks. (Maybe that > doesn?t matter?) for me it's a big plus, it means less fight with the type inference for my students. > On 20 Jul 2022, at 13:05, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >> Maybe you want or maybe you don't, here is an interesting implementation of >> ArrayList >> public classs ArrayList { >> private E[] array; >> private int size; >> public ArrayList() { >> array = new E.flat[16]; // ahah, flat by default ! >> } >> public boolean add(E element) { // E is not flat >> if (element == null && !array.getClass().isNullable()) { >> var newArray = new E[array.length]; // need to store null, use a nullable array >> System.arraycopy(array, 0, newArray, 0, array.length); >> array = newArray; >> } >> if (array.length == size) { >> array = Arrays.copyOf(array, size * 2); >> } >> array[size++] = element: >> return true; >> } >> } >> It starts with a flat array and if an element null is added, it "unflat" itself. >> This implementation is interesting because once recompiled with the new >> generics, a new ArrayList() will use a flatten array by default. >> I've no idea about the performance of such kind of implementations, but using >> T.flat give better control on what is flattenable or not in the implementation. > If we choose, we can code this trick in the design of record as well. You mean in the design of array ? I don't think it's wise given that the shape of the array change, but perhaps GCs can be modified to add a relocate and unflat operation to array. > So it?s not an advantage of the .flat proposal; it?s just a trick that is forced > on the programmer willy nilly. > Today a variable is a type, and an optional name. If it is a local or a field it > has a name. If it is inside an array or a (possibly generic) container, there > is no name. I can refactor my variables all day long through named and unnamed > temps, fields, arrays, and other containers. If some but not all of those > locations accept my chosen SC, my refactorings have greater friction. If the variable is typed by a type variable, yes. But we have decided to have the exact same kind of friction - with value class in practice because the VM is able to unmask the value class on stack but not on heap. - with wildcards (you have to add those pesky ? extends/super). > It seems like adding new types is less disruptive than adding a whole new > classification for variables. R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Thu Jul 21 16:06:36 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 21 Jul 2022 17:06:36 +0100 Subject: The storage hint model In-Reply-To: <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> Message-ID: <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> On 21/07/2022 14:29, forax at univ-mlv.fr wrote: > for me, using .flat or .box is a separate decision than using a storage hint model vs a type based model. I'm not sure I'm sold. Consider this: ``` class Box { ? X x; } ``` A question users will ask is: under which condition can I expect `Box` to use flat representation for its `x` field ? Now, in both cases the answer is - it depends, on both the type parameter T and what the declaration of `Box::x` looks like. But if we adopt a .box approach, I think it will be hard for users not to read this as "T flowing into the Box implementation", because effectively that's what happens 99% of the times, except if you use the .box (or .ref) escape hatch. I other words, in a world where .flat is the default, and .box is the opt in, how is that world different from what you describe ".val propagation" ? I think I know intuitively what you are reaching for - one thing is to treat .ref/.box as a type modifier (similar to a wildcard), another thing is to apply only to fields, and maybe array creation. But your example on ArrayList already veers into method parameters as well. At what point does it stop becoming a property of the container and starts being a property of the type? Not saying I have a bullet proof answer, but this all seems rather fluid to me. Maurizio From john.r.rose at oracle.com Thu Jul 21 19:31:00 2022 From: john.r.rose at oracle.com (John Rose) Date: Thu, 21 Jul 2022 12:31:00 -0700 Subject: The storage hint model In-Reply-To: <720984774.13872281.1658411676531.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <720984774.13872281.1658411676531.JavaMail.zimbra@u-pem.fr> Message-ID: <033F52C6-CC69-41E7-89EA-DF2FCA15E73A@oracle.com> On 21 Jul 2022, at 6:54, forax at univ-mlv.fr wrote: > ? >>>> I've no idea about the performance of such kind of implementations, >>>> but >>> using >>> T.flat give better control on what is flattenable or not in the >>> implementation. >> If we choose, we can code this trick in the design of record as well. > > You mean in the design of array ? I don't think it's wise given that > the shape of the array change, but perhaps GCs can be modified to add > a relocate and unflat operation to array. No, I mean in the design of ArrayList (or similar generics) in the way you sketch. If we choose we can create shape-shifting List/Map/etc. containers which ignore the `.val` channel in their type args (or there?s none at all, as you are proposing here), accept nulls, but have a better internal organization if no nulls are encountered dynamically. I say ?if we choose? because there are reasons not to choose something that tricky and dynamic. Library designers have more options if `.val` is in the type channel. > >> ?If some but not all of those >> locations accept my chosen SC, my refactorings have greater friction. > > If the variable is typed by a type variable, yes. But we have decided > to have the exact same kind of friction > - with value class in practice because the VM is able to unmask the > value class on stack but not on heap. Here by ?friction? I think you mean hidden costs when heap placement of a value requires separate buffering of the payload also on heap. Alert users sometimes care about such subtleties. But (here?s the key point) users who just want the types to connect things up properly can ignore hidden costs and have a frictionless experience refactoring between heap and stack. That?s a win you don?t have when every user (alert or otherwise) is confronted with SC choices to align with type choices, during refactoring. > - with wildcards (you have to add those pesky ? extends/super). I have a separate proposal to patch that. It?s irregular but (again) visible only to the very alert. A plain value class name `C`, when it appears as a type argument, should be treated as sugar for `? extends C.ref`. This is consistent, I claim, with the treatment of `C` as sugar for `C.ref` in other contexts. It?s a little piece of the magic from Dan Smith?s thesis, applied specifically to the needs of type companions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Jul 21 21:15:23 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 21 Jul 2022 23:15:23 +0200 (CEST) Subject: The storage hint model In-Reply-To: <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> Message-ID: <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Maurizio Cimadamore" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > Sent: Thursday, July 21, 2022 6:06:36 PM > Subject: Re: The storage hint model > On 21/07/2022 14:29, forax at univ-mlv.fr wrote: >> for me, using .flat or .box is a separate decision than using a storage hint >> model vs a type based model. > > I'm not sure I'm sold. > > Consider this: > > ``` > class Box { > ? X x; > } > ``` > > A question users will ask is: under which condition can I expect > `Box` to use flat representation for its `x` field ? > > Now, in both cases the answer is - it depends, on both the type > parameter T and what the declaration of `Box::x` looks like. > > But if we adopt a .box approach, I think it will be hard for users not > to read this as "T flowing into the Box implementation", because > effectively that's what happens 99% of the times, except if you use the > .box (or .ref) escape hatch. I other words, in a world where .flat is > the default, and .box is the opt in, how is that world different from > what you describe ".val propagation" ? The main difference is that ".val" requires two types for one value class that are similar but not quite the same. Differences between C and C.val is really hard to understand because those types are really similar but obey to different rules - C and C.val have no subtyping relationship but if you see a type as a set, C accept null while C.val don't. - you have auto-boxing between a C and a C.val - overloading rules should prefer C.val to C (but maybe not) - the inference with C.val doe not work exactly like the inference with C - an array of C.val is a subtype of an array of C - List and List are incompatible types - instanceof C.val is not valid - c.getClass() with c a C.val is typed Class (not C.val). and this is not an exhaustive list and i'm sure some of them are wrong because even us, experts, have trouble to define the correct set of rules. Basically, it's a mess because we are creating a new kind of types, C.val, that sometimes works like a primitive (hence boxing, overloading, type inference) but sometimes works like an objet (has Object as super class, have an adhoc way to work with wildcards, etc). And then as a user, there is there is the looming question about where to use C vs C.val, which is will be like a long words essay full of particular cases (like the Angelika Langer FAQ for generics). Now, the .box world is simpler because .box is a storage hint not a type, but it has one nasty property. T means different things depending on the context. As type of a parameter or as type of a field, it means may not accept null but as local variable, it always accepts null, so it's quite easy to have a weird NPE. Here is an example class Foo { T t; // can be flatten void foo(T t) { // can be flatten T other = whatever()? t: null; // here T allows null this.t = other; // oops, potential NPE ! } } This can be mitigated in a language like Kotlin that does null analysis or by IDEs, both Eclipse and IntelliJ have null analysis, but it's still ugly. > > I think I know intuitively what you are reaching for - one thing is to > treat .ref/.box as a type modifier (similar to a wildcard), another > thing is to apply only to fields, and maybe array creation. But your > example on ArrayList already veers into method parameters as well. At > what point does it stop becoming a property of the container and starts > being a property of the type? Not saying I have a bullet proof answer, > but this all seems rather fluid to me. You can define storage hints at only 4 different locations. - on a field type - on an array type at creation - on a field array type - on parameter type The first twos are really for storage, the next one is nice because it avoids the VM to ask at runtime if an array allows null or not so it's an optimization and the last one is needed to avoid boxing when crossing inlining "domain", again, it can be seen as an optimization. > > Maurizio R?mi From robbepincket at live.be Thu Jul 21 21:35:13 2022 From: robbepincket at live.be (Robbe Pincket) Date: Thu, 21 Jul 2022 21:35:13 +0000 Subject: The storage hint model In-Reply-To: <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> Message-ID: On Thu Jul 21 21:15:23 UTC 2022, R?mi wrote: > Differences between C and C.val is really hard to understand because those types are really similar but obey to different rules > - C and C.val have no subtyping relationship but if you see a type as a set, C accept null while C.val don't. > - you have auto-boxing between a C and a C.val > - overloading rules should prefer C.val to C (but maybe not) > - the inference with C.val doe not work exactly like the inference with C > - an array of C.val is a subtype of an array of C > - List and List are incompatible types > - instanceof C.val is not valid > - c.getClass() with c a C.val is typed Class (not C.val). I thought `c.getClass()` wasn't gonna be valid, just like you can't do `i.getClass()` where `i` is an int. > and this is not an exhaustive list and I'm sure some of them are wrong because even us, experts, have trouble to define the correct set of rules. > Basically, it's a mess because we are creating a new kind of types, C.val, that sometimes works like a primitive (hence boxing, overloading, type inference) but sometimes works like an objet (has Object as super class, have an adhoc way to work with wildcards, etc). And then as a user, there is there is the looming question about where to use C vs C.val, which is will be like a long words essay full of particular cases (like the Angelika Langer FAQ for generics). I'm a bit confused here. `C.val` doesn't extend `Object` afaik. Just like primitives `C.val` has a boxed version `C` that does extend `Object`, much like `int` to `Integer` now. I'm gonna be honest, I'm very confused with the `.box`/`.flat` discussions. Greetings Robbe -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Thu Jul 21 21:37:20 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 21 Jul 2022 21:37:20 +0000 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> Message-ID: <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> On Jul 17, 2022, at 12:21 PM, Brian Goetz > wrote: Abstractly, my conception of ?object? is that it is a bundle of type state, identity (optionally), and zero or more fields. I think this most closely corresponds to Kevin?s notion of ?compound value?, though it might only have zero or one fields ? but since there is type state and potentially identity, it is still a composite. Historically we touched objects only through references, and we can still do so for all objects, but we now have the ability that a variable can store some objects directly. How we choose to see the formerly-primitive types is mostly a matter of choosing which fiction we prefer. At the VM level, we have I/J/F/D carriers, which are definitely some sort of primitive value. At the language level, we can tell ourselves that `int` is an identity-free direct instance of class Integer, if we like (though we still have to cut off the turtle-regress when declaring Integer.java.). I think some of us would like to reserve the term ?object? for something that has a header and a storage block in the heap. I?m not in that camp. The header and storage block is how we reassociate type state with a (non-flattened) object *reference*. But a field of type C.val stores its typestate ?statically?, in the field descriptor. (Theoretically, a field of type C.ref could too, though we don?t currently make that optimization.). But these are all just implementation options a JVM has. In the most idealized version of this world, values are either ?bare objects? (bag of type state and fields) or references to objects. In the slightly less idealized version, we would call out the legacy primitives as distinguished bare values. Relevant to this: we talked about the "object" term in a January EG meeting; my summary at the time (subject "Terminology bikeshedding summary"): "primitive value vs. object We're trying to make a distinction between primitive values being "class instances" and calling them "objects", but for many developers, especially beginners, that sounds like meaningless pedantry. We might be over-rotating on the subtle differences that make these entities distinct, rather than acknowledging that, with their fields and methods, they will be commonly understood to be a kind of object." -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Jul 21 21:45:39 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 21 Jul 2022 23:45:39 +0200 (CEST) Subject: The storage hint model In-Reply-To: <033F52C6-CC69-41E7-89EA-DF2FCA15E73A@oracle.com> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <720984774.13872281.1658411676531.JavaMail.zimbra@u-pem.fr> <033F52C6-CC69-41E7-89EA-DF2FCA15E73A@oracle.com> Message-ID: <461138106.14102476.1658439939141.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Sent: Thursday, July 21, 2022 9:31:00 PM > Subject: Re: The storage hint model > On 21 Jul 2022, at 6:54, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] wrote: >> ? >>>>> I've no idea about the performance of such kind of implementations, but >>> >>>>> using >>>> T.flat give better control on what is flattenable or not in the implementation. >>> If we choose, we can code this trick in the design of record as well. >> You mean in the design of array ? I don't think it's wise given that the shape >> of the array change, but perhaps GCs can be modified to add a relocate and >> unflat operation to array. > No, I mean in the design of ArrayList (or similar generics) in the way you > sketch. If we choose we can create shape-shifting List/Map/etc. containers > which ignore the .val channel in their type args (or there?s none at all, as > you are proposing here), accept nulls, but have a better internal organization > if no nulls are encountered dynamically. I believe you need a T.val which forces to use the flat representation when the type argument of T is a value class but not a .val. > I say ?if we choose? because there are reasons not to choose something that > tricky and dynamic. Library designers have more options if .val is in the type > channel. I believe that V8 has this kind of arrays for JavaScript. >>> ?If some but not all of those >>> locations accept my chosen SC, my refactorings have greater friction. >> If the variable is typed by a type variable, yes. But we have decided to have >> the exact same kind of friction >> - with value class in practice because the VM is able to unmask the value class >> on stack but not on heap. > Here by ?friction? I think you mean hidden costs when heap placement of a value > requires separate buffering of the payload also on heap. Alert users sometimes > care about such subtleties. But (here?s the key point) users who just want the > types to connect things up properly can ignore hidden costs and have a > frictionless experience refactoring between heap and stack. That?s a win you > don?t have when every user (alert or otherwise) is confronted with SC choices > to align with type choices, during refactoring. >> - with wildcards (you have to add those pesky ? extends/super). > I have a separate proposal to patch that. It?s irregular but (again) visible > only to the very alert. A plain value class name C , when it appears as a type > argument, should be treated as sugar for ? extends C.ref . This is consistent, > I claim, with the treatment of C as sugar for C.ref in other contexts. It?s a > little piece of the magic from Dan Smith?s thesis, applied specifically to the > needs of type companions. Only when testing for subtyping relationship. Otherwise it means you can not add a C into a List of C. For me, this kind of trick are a symptom of the problem of using .val as a type channel, having two types for one value class means that most of the features that are using types in Java needs to be massaged (inference, overloading, subtyping, etc). Having a feature that requires re-interpretation of the spec everywhere should be a red flag. I'm not sure that my proposal of using storage hints is better but the cost of using .val as a type channel is scary. R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Jul 21 21:55:42 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 21 Jul 2022 23:55:42 +0200 (CEST) Subject: The storage hint model In-Reply-To: References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> Message-ID: <293912736.14103063.1658440542485.JavaMail.zimbra@u-pem.fr> > From: "Robbe Pincket" > To: "Valhalla Expert Group Observers" , > "Remi Forax" > Sent: Thursday, July 21, 2022 11:35:13 PM > Subject: RE: The storage hint model > On Thu Jul 21 21:15:23 UTC 2022, R?mi wrote: >> Differences between C and C.val is really hard to understand because those types > > are really similar but obey to different rules >> - C and C.val have no subtyping relationship but if you see a type as a set, C > > accept null while C.val don't. > > - you have auto-boxing between a C and a C.val > > - overloading rules should prefer C.val to C (but maybe not) > > - the inference with C.val doe not work exactly like the inference with C > > - an array of C.val is a subtype of an array of C > > - List and List are incompatible types > > - instanceof C.val is not valid > > - c.getClass() with c a C.val is typed Class (not C.val). > I thought `c.getClass()` wasn't gonna be valid, just like you can't do > `i.getClass()` where `i` is an int. You may be right, i think Kevin is a proponent of that idea. >> and this is not an exhaustive list and I'm sure some of them are wrong because > > even us, experts, have trouble to define the correct set of rules. >> Basically, it's a mess because we are creating a new kind of types, C.val, that >> sometimes works like a primitive (hence boxing, overloading, type inference) >> but sometimes works like an objet (has Object as super class, have an adhoc way >> to work with wildcards, etc). And then as a user, there is there is the looming >> question about where to use C vs C.val, which is will be like a long words > > essay full of particular cases (like the Angelika Langer FAQ for generics). > I'm a bit confused here. `C.val` doesn't extend `Object` afaik. Just like > primitives `C.val` has a boxed version `C` that does extend `Object`, much like > `int` to `Integer` now. You can call methods on a C.val, so it's an object, it may not directly extends of java.lang.Object like an interface does not extend java.lang.Object but you can call toString() on it. > I'm gonna be honest, I'm very confused with the `.box`/`.flat` discussions. given that we are in a phase of throwing things and see what's stick, don't worry :) > Greetings > Robbe R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbepincket at live.be Thu Jul 21 22:00:18 2022 From: robbepincket at live.be (Robbe Pincket) Date: Thu, 21 Jul 2022 22:00:18 +0000 Subject: Iterable and valhalla Message-ID: Hello all I was recently thinking about cases where the new "value classes"/"primitive classes" (or whatever they'll be called) can be used. One of the common places where I learned C# devs use their structs, back when I used to code in C#, is in iterators. However, this is not a case we can mirror in Java, as our version of these "structs" are immutable. However there are variants that could still allow these, and I was wondering whether any thought has been given to those (or other variants) yet, and/or whether they are deemed not useful. One such variation: ```java class ArrayList implements Iterable2 { @Override public Iterator2 iterator2() { return new ArrayListCursor<>(this, 0); } value record ArrayListCursor(ArrayList list, int index) implements Iterator2 { @Override public boolean hasNext() { return index < list.size(); } @Override public Tupple2> moveNext() { if (!this.hasNext()) { throw new NoSuchElementException(); } return new Tupple2(list.get(index), new ArrayListCursor<>(list, index + 1)); } // or @Override public T next() { if (!this.hasNext()) { throw new NoSuchElementException(); } return list.get(index); } @Override Iterator2 moveNext() { if (!this.hasNext()) { throw new NoSuchElementException(); } return new ArrayListCursor<>(list, index + 1); } } } ``` Greetings Robbe Pincket -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Jul 21 23:00:49 2022 From: john.r.rose at oracle.com (John Rose) Date: Thu, 21 Jul 2022 16:00:49 -0700 Subject: one class, two types, many bikesheds In-Reply-To: References: <1ACF189B-D346-4996-AF5B-3977FF02C1F3@oracle.com> Message-ID: <1A6131AC-A038-4F4A-B47D-BCBC9C607A12@oracle.com> On 19 Jul 2022, at 10:54, Kevin Bourrillion wrote: > ?My own answer to the question has been > Understanding > classes and types in Java > > (comments > welcome). > I see a class as a way to "configure" and feed behavior into a type > (apart > from classes *also* serving as bundles of static members). It's a > fairly > subservient relationship, which feels right to me. Yes, I like your document (at about a 90% level). It?s helpful to mention it here. I think it?s a good observation that static members are properties only of classes while non-static members are primarily properties of types. (All members are in classes to start with but in your document non-statics quickly lift to types in practical usage. There are surely times when members stay in their class. Reflection comes to mind.) I also like your observation that the principal type of a class can be obtained by asking for the type of `this`. It?s clean. It might be too clean for us, in fact, because the observation is likely to clash with some very different but also reasonable expectations, which would ask that the principal type of a class be the meaning of `C` (unadorned by `.ref` or `.val`), or that the principal type of a class be represented by the mirror which you get when you call `Object::getClass` on an instance. I think I would like all of the above questions to be answered by `C.ref`, even though there are plenty of times when someone will propose or expect that `C.val`, or something else, is the answer. (Oh, and please please do not have `Object::getClass` return different values for variables of type `C.ref` and `C.val`; I think I see that suggested from time to time!) Your appeal to `this` has a special benefit when `C` is generic: It captures the most general type possible, of the form `C`. (Except for conditional methods if we ever do those; then `this` might have a conditional bound on a parameter. The root problem is that `T` will probably mean something subtly different in a conditional method, so `C` doesn?t mean just one type everywhere in `C`.) > (Sorry for digression: you could also say one class engenders many > array > types, though. I think it helps to fully distinguish predefined, > user-defined, and composed types. Setting aside value classes > temporarily: > each class directly defines just one type, which is the type of `this` > inside the class itself (the "implicit type", or the "this-type"). > That's > the all-important type whose member signatures are seen in the class > and > whose supertypes are seen in the class signature. Other types can be > composed out of the defined types: array types, type variables, > intersection types I guess, and relevant to us here, all *other* > parameterized types beyond the implicit type. That is, imho it's most > fruitful to understand those parameterized types as deriving from the > implicit type/"this-type", with member signatures and supertypes being > calculated from that implicit type via substitution, rather than to > see > them all as popping directly off of the generic class.) Yup, when is a related type a true companion, and when is it just a projection? We get to define this, and then we have to live with it. It?s an interesting outcome (of your `this` position) that `C`, out of all the generic instances of `C`, is elevated to principal position, and all other `C` are mere projections of `C`. (Surely you already considered and rejected the following alternative choice of narrative in your document, which I will state here FTR: The principal type of `C`, when `C` is generic, is its *raw type*. That is much less useful for speaking about the type of expressions derived from `this`, but it aligns much more closely with the other ?questions? I alluded to above: ?What is the type denoted by merely the class name `C`?? And ?What is the mirror returned from `Object::getClass` when invoked on an instance of class `C`??) > I see some sense in your argument, but I still can't think of a reason > I'd > want to see `ClassName.ref` in source code. It seems like that can't > add > any information. I mean it can adds a certain connotation (?stylized clarity? as I said) to the code. Have you ever written a fully-qualified name where it wasn?t necessary? (I have, when I wanted to emphasize where the symbol came from: Such emphasis is connotation not denotation.) Have you ever written `public` on an interface member where you didn?t need to? Again, I?d call that choice a matter of stylized clarity. Depending on how type inference works, `ClassName.ref` vs `ClassName` might affect TI, as `List` vs. `List`. This is, I think, the case with certain drafts of Valhalla-related generics. I made a sly reference to null-inference. As with type inference, I could imagine designs of NI where `ClassName.ref` vs `ClassName` produces a different inference about null. Suppose there?s some way of saying, for `ju.Optional`, that only a dope would make null values of that reference type. Then `Optional.ref` could possibly be a way of saying, ?I?m that dope, bear with me.? >> ? >> - Maybe: For any type variable T (in specialized generics?), T.val >> also names a type. >> >> > If I ever see `T.val` (except maybe the case of `T.val[]`??) I will > assume > some kind of templating must be going on, since we'll all have learned > early on that there is no polymorphic interaction with values. Is that > your > expectation too? I?m imagining, at least, some sort of additional ?leakage? of ref/val distinctions into the scope of `T`. We have such leakage already otherwise `T.ref` wouldn?t be useful; it happens when a generic API is bound to type arguments and `T` looks like `C.val`. I think the consensus is that the use cases don?t support doing the reverse, of allowing `T.val` to mean `C.val` when `T` is `C.ref`, but it?s logically possible isn?t it? And if so a use case may show up. Independently, something like what Remi discusses, of flattening to val-type inside a generic bound to a ref-type, could be a use of `T.val`. I think you surmised that: `new T.val[n]` could be an optimistic dynamic buffer, if the actual type `C.val[]` were somehow available at that point. That would require even more ?leakage? of information about type arguments beyond the API of a generic and into its method bodies and maybe even field types. Eventually you would use such information to ?fill in templates?, including flattening fields to `C.val`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Fri Jul 22 10:30:45 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 22 Jul 2022 11:30:45 +0100 Subject: The storage hint model In-Reply-To: <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> Message-ID: > The main difference is that ".val" requires two types for one value class that are similar but not quite the same. > > Differences between C and C.val is really hard to understand because those types are really similar but obey to different rules > - C and C.val have no subtyping relationship but if you see a type as a set, C accept null while C.val don't. > - you have auto-boxing between a C and a C.val > - overloading rules should prefer C.val to C (but maybe not) > - the inference with C.val doe not work exactly like the inference with C > - an array of C.val is a subtype of an array of C > - List and List are incompatible types > - instanceof C.val is not valid > - c.getClass() with c a C.val is typed Class (not C.val). > and this is not an exhaustive list and i'm sure some of them are wrong because even us, experts, have trouble to define the correct set of rules. Some of these differences are covered in this JEP draft: https://openjdk.org/jeps/8261529 In particular, search for "parameterized type conversions". Of course the document talks about conversions between C and C.ref, but the same would apply, I think, to C and C.val. I get what you are saying: adding a new type (or a new pair of types) is a big thing in terms of user model, especially if they come with complicated rules as to how they should be used, and how they propagate through overload resolution and inference rules. But, the idea of having pairs of types with an autobox relationship between them is not exactly new (int -> Integer), so we'd also be playing on a tune that developers are familiar with. In any case, I don't think we can rule out this approach w/o first pulling the string in full and see where it leads. It might be (as you say) that pairs of related types are hard to reason about in user code. Or it might be that we come up with the right mix of type system enhancements that is enough for people not to care 99% of the time. In other words, it seems to me that we're jumping ahead a little here: yes, we should make sure that whatever we come up with makes sense with universal generics, but it doesn't seem to me that either (a) the universal generic draft dismisses any of your concerns or that (b) we are far enough down that route to draw a conclusion one way or another. Maurizio From forax at univ-mlv.fr Fri Jul 22 11:53:12 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 22 Jul 2022 13:53:12 +0200 (CEST) Subject: The storage hint model In-Reply-To: References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <48EC4583-0E8B-4181-8B68-D0479B76E024@oracle.com> <1563054819.13596421.1658347534011.JavaMail.zimbra@u-pem.fr> <709caa37-49f7-e8c3-73e4-9a84a81c53ae@oracle.com> <1285174980.13858432.1658410143102.JavaMail.zimbra@u-pem.fr> <2eb7a404-4438-4968-845a-4d4d10688ab3@oracle.com> <1214741175.14087746.1658438123287.JavaMail.zimbra@u-pem.fr> Message-ID: <344549169.14316654.1658490792729.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Maurizio Cimadamore" > To: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > Sent: Friday, July 22, 2022 12:30:45 PM > Subject: Re: The storage hint model >> The main difference is that ".val" requires two types for one value class that >> are similar but not quite the same. >> >> Differences between C and C.val is really hard to understand because those types >> are really similar but obey to different rules >> - C and C.val have no subtyping relationship but if you see a type as a set, C >> accept null while C.val don't. >> - you have auto-boxing between a C and a C.val >> - overloading rules should prefer C.val to C (but maybe not) >> - the inference with C.val doe not work exactly like the inference with C >> - an array of C.val is a subtype of an array of C >> - List and List are incompatible types >> - instanceof C.val is not valid >> - c.getClass() with c a C.val is typed Class (not C.val). >> and this is not an exhaustive list and i'm sure some of them are wrong because >> even us, experts, have trouble to define the correct set of rules. > > Some of these differences are covered in this JEP draft: > > https://openjdk.org/jeps/8261529 > > In particular, search for "parameterized type conversions". > > Of course the document talks about conversions between C and C.ref, but > the same would apply, I think, to C and C.val. > > I get what you are saying: adding a new type (or a new pair of types) is > a big thing in terms of user model, especially if they come with > complicated rules as to how they should be used, and how they propagate > through overload resolution and inference rules. > > But, the idea of having pairs of types with an autobox relationship > between them is not exactly new (int -> Integer), so we'd also be > playing on a tune that developers are familiar with. > > In any case, I don't think we can rule out this approach w/o first > pulling the string in full and see where it leads. It might be (as you > say) that pairs of related types are hard to reason about in user code. > Or it might be that we come up with the right mix of type system > enhancements that is enough for people not to care 99% of the time. > > In other words, it seems to me that we're jumping ahead a little here: > yes, we should make sure that whatever we come up with makes sense with > universal generics, but it doesn't seem to me that either (a) the > universal generic draft dismisses any of your concerns or that (b) we > are far enough down that route to draw a conclusion one way or another. Yes, i don't disagree with your last paragraph but adding autoboxing between C and C.val has a strong starwars (prequels) vibe to me, Anakin (value class) you were the one supposed to rebalance the force, healing the rift between objects and primitive types, not adding more auto-boxing. I believe we are adding complexity to the spec/compiler in order to manage the relationship between C and C.val where users will be totally fine declaring both of them as separate types and doing the conversions by hand, and in a safer way, not bypassing the constructor when doing a conversion from C.val to C. > > Maurizio R?mi From kevinb at google.com Fri Jul 22 15:07:39 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Jul 2022 08:07:39 -0700 Subject: one class, two types, many bikesheds In-Reply-To: <1A6131AC-A038-4F4A-B47D-BCBC9C607A12@oracle.com> References: <1ACF189B-D346-4996-AF5B-3977FF02C1F3@oracle.com> <1A6131AC-A038-4F4A-B47D-BCBC9C607A12@oracle.com> Message-ID: On Thu, Jul 21, 2022 at 4:01 PM John Rose wrote: > (Oh, and please please do not have Object::getClass return different > values for variables of type C.ref and C.val; I think I see that > suggested from time to time!) > (plug: imo there are only 3 Object methods that have any business even *compiling* when called on val receivers.) Your appeal to this has a special benefit when C is generic: It captures > the most general type possible, of the form C. > > (Except for conditional methods if we ever do those; then this might have > a conditional bound on a parameter. The root problem is that T will > probably mean something subtly different in a conditional method, so C > doesn?t mean just one type everywhere in C.) > Agreed. I'd guess it would resemble that method introducing its own type parameters, bounded by the class's, and hiding them. (Sorry for digression: you could also say one class engenders many array > types, though. I think it helps to fully distinguish predefined, > user-defined, and composed types. Setting aside value classes temporarily: > each class directly defines just one type, which is the type of `this` > inside the class itself (the "implicit type", or the "this-type"). That's > the all-important type whose member signatures are seen in the class and > whose supertypes are seen in the class signature. Other types can be > composed out of the defined types: array types, type variables, > intersection types I guess, and relevant to us here, all *other* > parameterized types beyond the implicit type. That is, imho it's most > fruitful to understand those parameterized types as deriving from the > implicit type/"this-type", with member signatures and supertypes being > calculated from that implicit type via substitution, rather than to see > them all as popping directly off of the generic class.) > > Yup, when is a related type a true companion, and when is it just a > projection? We get to define this, and then we have to live with it. > > It?s an interesting outcome (of your this position) that C, out of all > the generic instances of C, is elevated to principal position, and all > other C are mere projections of C. > Yes, in my mind, it is already elevated: it's the type whose supertypes and method signatures are actually being specified directly by the class. That's such an essential role it is playing. > (Surely you already considered and rejected the following alternative > choice of narrative in your document, which I will state here FTR: The > principal type of C, when C is generic, is its *raw type*. That is much > less useful for speaking about the type of expressions derived from this, > but it aligns much more closely with the other ?questions? I alluded to > above: ?What is the type denoted by merely the class name C?? And ?What > is the mirror returned from Object::getClass when invoked on an instance > of class C??) > Gross! :-) I think pretending raw types don't exist (as much as possible) leads to more virtue than vice. > I see some sense in your argument, but I still can't think of a reason I'd > want to see `ClassName.ref` in source code. It seems like that can't add > any information. > > I mean it can adds a certain connotation (?stylized clarity? as I said) to > the code. Have you ever written a fully-qualified name where it wasn?t > necessary? (I have, when I wanted to emphasize where the symbol came from: > Such emphasis is connotation not denotation.) Have you ever written public > on an interface member where you didn?t need to? Again, I?d call that > choice a matter of stylized clarity. > If I do those things, someone else comes along and cleans them up. :-) > Depending on how type inference works, ClassName.ref vs ClassName might > affect TI, as List vs. List. This is, I think, > the case with certain drafts of Valhalla-related generics. > > I made a sly reference to null-inference. As with type inference, I could > imagine designs of NI where ClassName.ref vs ClassName produces a > different inference about null. Suppose there?s some way of saying, for > ju.Optional, that only a dope would make null values of that reference > type. Then Optional.ref could possibly be a way of saying, ?I?m that > dope, bear with me.? > fwiw (which isn't much), this all makes me feel uneasy. > - Maybe: For any type variable T (in specialized generics?), T.val > also names a type. > > If I ever see `T.val` (except maybe the case of `T.val[]`??) I will assume > some kind of templating must be going on, since we'll all have learned > early on that there is no polymorphic interaction with values. Is that > your > expectation too? > > I?m imagining, at least, some sort of additional ?leakage? of ref/val > distinctions into the scope of T. We have such leakage already otherwise > T.ref wouldn?t be useful; it happens when a generic API is bound to type > arguments and T looks like C.val. I think the consensus is that the use > cases don?t support doing the reverse, of allowing T.val to mean C.val > when T is C.ref, but it?s logically possible isn?t it? And if so a use > case may show up. > Again it just puzzles me, since I expect that part of the whole deal with value types vs. reference types is that you always need to know exactly what type you're working with. So how could I interact generically with `T.val`? Unless templating. The array case seems sensible to me though, because I can figure that the array's header must be keeping track of the precise value type. Independently, something like what Remi discusses, of flattening to > val-type inside a generic bound to a ref-type, could be a use of T.val. I > think you surmised that: new T.val[n] could be an optimistic dynamic > buffer, if the actual type C.val[] were somehow available at that point. > That would require even more ?leakage? of information about type arguments > beyond the API of a generic and into its method bodies and maybe even field > types. Eventually you would use such information to ?fill in templates?, > including flattening fields to C.val. > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 22 15:18:23 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Jul 2022 15:18:23 +0000 Subject: where are all the objects? In-Reply-To: <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> Message-ID: <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> FWIW, this was the intuition I was leaning into: > We're trying to make a distinction between primitive values being "class instances" and calling them "objects", but for many developers, especially beginners, that sounds like meaningless pedantry. We might be over-rotating on the subtle differences that make these entities distinct, rather than acknowledging that, with their fields and methods, they will be commonly understood to be a kind of object.? Making a distinction between object and instance is likely to be confusing and irritating to users. So I was leaning into preserving these common intuitions: - users declare classes - classes have instances - instances are objects - objects/instances know what class they are And extending them to ?and now, the primitives are objects/instances too? (which I think is an evolution people will like, because it moves us towards ?everything is an object.?). The new bit, which is the cost of all this, is that for at least id-free objects, there are TWO ways to put them in a variable: put the object itself in a variable, or put a reference to the object in a variable. This is a tricky new concept, but it is consistent with the companion-types model: the value set of C.val are the instances of C; the value set of C.ref are *references to* the instances of C. From kevinb at google.com Fri Jul 22 16:04:52 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Jul 2022 09:04:52 -0700 Subject: where are all the objects? In-Reply-To: <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: I do understand where you're coming from, but I feel strongly that we cannot make a judgment about which way is more "meaningless pedantry" than the other without a *complete* picture of all the terms we will use for necessary concepts, and how our explanations come out using those terms. I also feel strongly that we need to accept that *no matter which* way we go, there will be some terminology clarifications we need to make that will rub users the wrong way, where they used to think "X always means Y", and we start telling them "not anymore". All my work has been toward trying to understand *which* of these breaks are more or less damaging than others and why. It doesn't work to just cite the disadvantages of one particular shift as self-sufficient argument that we shouldn't do it. It's about pain A vs. pains B/C. Note that *some* decisions which produce strong initial antipathy in the minds of users will actually become good teachable moments! "Here's why the reaction you had was tied to old assumptions that we are intentionally leaving behind for these good reasons." Even a user who doesn't *agree* with the decision can still hang their *learning* onto this. In fact I think some of the *best* teachable moments will be like that. But not all such breaks we could make will be so teachable; some will yield persistent confusion. (This is part of what I meant by "more/less damaging" just above.) "My way" (i.e. that I've been advocating thus for) might not be the right way in the end, but I've not gotten as clear of an understanding of the alternative. Brian, I think you declined *twice* on this thread to answer my question of what you would call the thing that I've precisely called "object" (once being off-list (my bad)). John finally volunteered (also off-list (my bad)) "heap object"; okay then, that at least lets us put together a more complete picture that we can evaluate the pros and cons of. So then, would we call an instance of `Complex.val` a "non-heap object" or an "inlined object" or what? We need to flesh out a whole lexicon. The phrase "value object" becomes useless for this particular distinction as it will apply to both. Users don't always have a stable and precisely developed vocabulary (I sure didn't until I had to work all this stuff through for this group), but what we do have, deeply, is our years of experience understanding *the difference between int and Integer* and how to make that choice in our code. (There's a principle in here somewhere that oughtta be stated in some better way than this: users will know the distinctions between concepts if those are the distinctions that have actively driven the *decisions* those users have had to make, and not as much otherwise.) I think the best thing we can do for users is make it clear and natural for them *what that carries over to*, exactly, in the new world. I've been advocating for saying: That `int/Integer` decision you've been making has always been between (1) value and (2) (reference-to) object, and that decision is *still* exactly between (1) value and (2) (reference-to) object now, and btw the definitions of 'reference' and 'object' remain precisely wedded to each other as always. The "heap object" alternative strikes me (and I am *trying* to be fair, here) as: Now, that's an object either way, and you're going to apply that old thought process toward which *kind* of object you mean, either a (1) "inline object" or a (2) "(reference-to) heap object". It's now just heap objects and references that are paired together. Starting to prefer the first way (as I did) did not feel like going rogue: after all, did we not gravitate toward ".ref" and ".val" as our placeholder syntaxes, not ".inline" and ".heap" or anything else? On Fri, Jul 22, 2022 at 8:18 AM Brian Goetz wrote: > FWIW, this was the intuition I was leaning into: > > > We're trying to make a distinction between primitive values being "class > instances" and calling them "objects", but for many developers, especially > beginners, that sounds like meaningless pedantry. We might be over-rotating > on the subtle differences that make these entities distinct, rather than > acknowledging that, with their fields and methods, they will be commonly > understood to be a kind of object.? > > > Making a distinction between object and instance is likely to be confusing > and irritating to users. So I was leaning into preserving these common > intuitions: > > - users declare classes > - classes have instances > - instances are objects > - objects/instances know what class they are > > And extending them to ?and now, the primitives are objects/instances too? > (which I think is an evolution people will like, because it moves us > towards ?everything is an object.?). > > The new bit, which is the cost of all this, is that for at least id-free > objects, there are TWO ways to put them in a variable: put the object > itself in a variable, or put a reference to the object in a variable. This > is a tricky new concept, but it is consistent with the companion-types > model: the value set of C.val are the instances of C; the value set of > C.ref are *references to* the instances of C. > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Fri Jul 22 16:16:51 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 22 Jul 2022 18:16:51 +0200 (CEST) Subject: End of the storage hints model ? Message-ID: <1405918535.14447689.1658506611832.JavaMail.zimbra@u-pem.fr> I've found where the storage hints model does not work well. The storage hints model as its own name says works on implementations, but it does not work on functional interface types. By example, we have functional interfaces in java.util.function where the same interface say j.u.f.Function is used sometime with the return/parameter values being nullable sometimes it is used with the return/parameter values that we hope to be non-nullable. For example, the function in stream.map(function) wants to be non-nullable but the function in map.compute(function) allows nullable values. Thus when declaring j.u.f.Function we can not decide if the parameter type / return type should be annotated by the .flat storage hint or not. Which means that the storage hints model will not provide enough information for the VM to compute the precise calling convention. So the storage hints model is not dead dead but a model based on the type propagation is more efficient. R?mi From kevinb at google.com Fri Jul 22 17:17:44 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Jul 2022 10:17:44 -0700 Subject: where are all the objects? In-Reply-To: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> Message-ID: Here's a worthy reply that appeared on -observers -- I'll quote it in full and intersperse my own responses (hopefully that's an okay protocol). On Tue Jul 19 15:28:27 UTC 2022 Robbe Pincket wrote: > On Wed Jul 13 20:54:43 UTC 2022, John Rose wrote: > > > I?m using a more neutral term "realize" instead of "instantiate". > > I'm not really a fan of the term, I see why "instantiate" doesn't fit, > but I hope stuff will be able to get reworded in such a way that > "realize" isn't needed. > > > > These propositions seem to be all true (?at least? in part): > > - The result of realizing at least some classes in some types is, in fact, > > an "object". > > - The result of realizing at least some classes in some types is, in fact, > > an "instance" of that class. > > - The result of realizing at least some classes in a value type is a "value" > > of that class. > > - Every variable "has a value". > > - Every reference, other than `null`, "refers to an object". > > - Every non-reference variable ?contains a value? (as well as having it). > > I'm very confused what you mean with your last point. "Every > non-reference variable ?contains a value? (as well as having it)". A > variable is a typed storage location, a variable containing a value or > a variable having a value are in my mind synonyms. But if I read your > point, it sounds like you disagree and a non-reference (primitive?) > variable has and contains 2 (different?) values? > > > --- > > On Thu Jul 14 16:48:00 UTC 2022, Kevin Bourrillion wrote: > > > The instances of a value type are "values". The instances of a > > reference type are "references to objects". [...] But primitive > > values are instances too. Of primitive types. I think that has > > always been true (though most of us aren't in the habit of saying > > it, because they were never *class instances* which is a very useful > > kind). > > Oh god, no no no no no nooo. The instances of a reference type are the > objects, not the references to those objects. The references are the > values that a variable of a compatible type can hold. I love your reaction. :-) You're right, I fat-fingered this. What I *actually* think is written out in https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/edit and I believe it agrees with you completely. (I do worry I'm being insufferable by plugging these documents I wrote over and over on this list, but this experience here provides a good example of why I do it: they are more likely to represent what I actually think that whatever email I clack out one day to the next. Still, the docs have bugs too, which I'd love to hear about as well.) > And no one calls > primitive values instances, is cause they aren't. All mentions of > "instances" in Java spec refer to "class instances" or "instances of > a/the class" (at least as far I can see). Because "instance" is usually assumed to mean "instance of a class", then yes, primitive values never have been instances, but interestingly we are going to make them instances. As "instances of a type", I need to think more about what I am trying to accomplish with that term, as I have myself just confused it with "values of the type". > > An object | A value > > ----------------------------- | ----------------------------------------- > > has independent existence | is ephemeral (no existence of its *own*) > > is self-describing | is context-described > > is accessed via reference | is used directly > > is eligible to have identity | can?t have identity > > is polymorphic | is strictly monomorphic > > has the type `Object` | does not have the type `Object` > > I agree with some of these but I have a few issues: > * I feel like independent existence and identity are the same thing, > how would you be able to differentiate 2 equal objects that don't > have identity? Perfect question. I think both schools of thought are viable: * Independent existence requires addressability and addressiblity IS identity. So, all objects (or "heap objects") have identity, but some *hide* it from the user * It is just cleaner to treat "hiding identity" as not having it (but still having addressability) I can go into the reasons I like the second at length if necessary. Weird analogy alert, but I can grab and use a "tablespoon of milk" when I need to, even though it clearly has no identity. > * I think I understand what you mean with "self-describing", if I > have an object, I expect to be able to call `.getClass`. As a > result an object holds all the info to fully interpret the object. > But I'm missing something with "context-described". It feels like > you are saying a value needs a context to "exist", but values are > ephemeral. They just exist. Yes, objects are self-describing because they have a dynamic type and, in the array case, a length. The thing itself contains sufficient information to find out how it is laid out in memory. "Value" does have a Platonic meaning, like "the value 42" which transcends time and space. I'm not invoking that particular meaning. I'm talking about one "occurrence" of that value inside a running program (but still at a conceptual-model level, above things like registers). In carrying out "a = b + 5", something is "told" to the + expression by (a) a variable read and (b) a literal expression, and then it "tells" something else to the assignment expression. Those somethings are values. I would use the word "message" if it wasn't so overloaded in computing already with other meanings. (Some of my words may make values sound like countable "things" but it's unproductive to think that way.) For every occurrence of a value in a Java program there is always exogenous awareness of its static type. > In the end, I don't think "object" and "value" are mutually exclusive. > Anything that extends Object is an "object", anything that is > ephemeral is a "value" and anything that is both is a "value object" > (identityless object) I have also accepted the term "value object" for things like `Complex.ref`, grudgingly, but I still insist that *a value object is not a value*. It's an object that is value-like or value-based. The viewpoint in the linked document does maintain that "value" and "object" are disjoint, and I think that is a splendid thing. > > I would say that arrays are also instances -- of array types. What > > they aren't is *class* instances. (So they don't get to have > > members; `length` and `clone` are at best half-heartedly-simulated > > members.) > > Arrays do have members, all the methods of `Object` are inherited by > arrays, `clone` being one of them. Yes, "arrays don't get to have members" is a simplified description of a weird reality. An array type gets the Object type as its supertype, and inherits the type members thereof. That doesn't explain the *public* `clone()` method or `length` pseudo-field, which both arise out of some invisible magic that reflection won't even cop to. > --- > > John: > > Even if we give up on making everything an object, I will still > > request that we cling to *some* word that can uniformly be applied > > to the result of realizing *any Java class*. If that word is not > > "object" I think it is "instance". > > Kevin: > > Yes, I think it is absolutely and usefully "instance". > > > A tough spot about my model (which I think is > > unavoidable/acceptable) is that I can't get away with saying "An > > object is any class instance or array" anymore. > > Yeah this a bit of a nuisance. It would be nice to have a term that > covers both "values" and "instances", because in my mind, and instance > is something that gets instantiated. Ephemeral values don't get > instantiated, cause they exist. Which means with my view objects of > value classes aren't instances (?? this surprised me, but I can't > convince myself otherwise anymore). I do think "instantiate" often carries a connotation of bringing something into existence that never existed otherwise, but I'm not sure it *should* have that connotation. For example `List` is often called a "type instantiation" of `List`" but nothing was birthed into the world by writing that; it was just sort of obtained from the ether. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 22 17:35:02 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Jul 2022 17:35:02 +0000 Subject: End of the storage hints model ? In-Reply-To: <1405918535.14447689.1658506611832.JavaMail.zimbra@u-pem.fr> References: <1405918535.14447689.1658506611832.JavaMail.zimbra@u-pem.fr> Message-ID: <656C6779-2001-4319-B57C-94967CFEC3A3@oracle.com> Yes, and this is true not only for functional interfaces, but for ArrayList as well; we don?t know if the user really meant List or List. So the implementation will guess wrong some of the time. > On Jul 22, 2022, at 12:16 PM, Remi Forax wrote: > > I've found where the storage hints model does not work well. > > The storage hints model as its own name says works on implementations, but it does not work on functional interface types. > > By example, we have functional interfaces in java.util.function where the same interface say j.u.f.Function is used sometime with the return/parameter values being nullable sometimes it is used with the return/parameter values that we hope to be non-nullable. For example, the function in stream.map(function) wants to be non-nullable but the function in map.compute(function) allows nullable values. > > Thus when declaring j.u.f.Function we can not decide if the parameter type / return type should be annotated by the .flat storage hint or not. > Which means that the storage hints model will not provide enough information for the VM to compute the precise calling convention. > > So the storage hints model is not dead dead but a model based on the type propagation is more efficient. > > R?mi From brian.goetz at oracle.com Fri Jul 22 17:55:55 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Jul 2022 17:55:55 +0000 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: <73702885-441E-4552-9100-2806257405CD@oracle.com> I also feel strongly that we need to accept that *no matter which* way we go, there will be some terminology clarifications we need to make that will rub users the wrong way, where they used to think "X always means Y", and we start telling them "not anymore". All my work has been toward trying to understand *which* of these breaks are more or less damaging than others and why. It doesn't work to just cite the disadvantages of one particular shift as self-sufficient argument that we shouldn't do it. It's about pain A vs. pains B/C. Agreed. Note that *some* decisions which produce strong initial antipathy in the minds of users will actually become good teachable moments! "Here's why the reaction you had was tied to old assumptions that we are intentionally leaving behind for these good reasons." Even a user who doesn't agree with the decision can still hang their learning onto this. In fact I think some of the *best* teachable moments will be like that. But not all such breaks we could make will be so teachable; some will yield persistent confusion. (This is part of what I meant by "more/less damaging" just above.) Yes, sometimes! And sometimes you are just swimming against the tide. And sometimes its hard to know which is which :( "My way" (i.e. that I've been advocating thus for) might not be the right way in the end, but I've not gotten as clear of an understanding of the alternative. Brian, I think you declined *twice* on this thread to answer my question of what you would call the thing that I've precisely called "object" (once being off-list (my bad)). John finally volunteered (also off-list (my bad)) "heap object"; okay then, that at least lets us put together a more complete picture that we can evaluate the pros and cons of. Sorry, I think I misunderstood, and answered ?what do you call an object?. So let?s circle back: give me your object definition again so I don?t mis-answer again, and I?ll try to come up with a name for it? So then, would we call an instance of `Complex.val` a "non-heap object" or an "inlined object" or what? We need to flesh out a whole lexicon. The phrase "value object" becomes useless for this particular distinction as it will apply to both. Yes, in the taxonomy I?m pushing, a ?value object? is one without identity, and is the kind of object you can store directly in variables without going through a reference. But I don?t think that there are instances of Complex.val and instances of Complex.ref; I think there are instances of *Complex*, and multiple ways to describe/store/access them. Users don't always have a stable and precisely developed vocabulary (I sure didn't until I had to work all this stuff through for this group), but what we do have, deeply, is our years of experience understanding the difference between int and Integer and how to make that choice in our code. Right. Though, the difference we historically have (Integer is an identity class that happens to contain a sole int field) is full of accidental detail (accidental identity, ad-hoc boxing and unboxing conversions with magic methods like valueOf, etc) and what we might want to describe this difference as. The mental model I?ve had for some time, and which is where .val and .ref come from, is that there are abstract things called Complex, and you can pass/store a Complex by value, or by reference. The value/reference distinction has meaning from before the days of Java. We papered over the difference historically by not letting you touch the objects, but always requiring an intermediary (a reference), and *that* is what is changing now. (There's a principle in here somewhere that oughtta be stated in some better way than this: users will know the distinctions between concepts if those are the distinctions that have actively driven the decisions those users have had to make, and not as much otherwise.) See previous ! That `int/Integer` decision you've been making has always been between (1) value and (2) (reference-to) object, and that decision is still exactly between (1) value and (2) (reference-to) object now, and btw the definitions of 'reference' and 'object' remain precisely wedded to each other as always. The "heap object" alternative strikes me (and I am trying to be fair, here) as: Now, that's an object either way, and you're going to apply that old thought process toward which *kind* of object you mean, either a (1) "inline object" or a (2) "(reference-to) heap object". It's now just heap objects and references that are paired together. Starting to prefer the first way (as I did) did not feel like going rogue: after all, did we not gravitate toward ".ref" and ".val" as our placeholder syntaxes, not ".inline" and ".heap" or anything else? With you on this. I think asking users to reason about ?heap objects? vs ?inline objects? is pushing them towards the implementation, not the concepts. They may have to reason about this to understand the performance model, but that?s already advanced material. On Fri, Jul 22, 2022 at 8:18 AM Brian Goetz > wrote: FWIW, this was the intuition I was leaning into: > We're trying to make a distinction between primitive values being "class instances" and calling them "objects", but for many developers, especially beginners, that sounds like meaningless pedantry. We might be over-rotating on the subtle differences that make these entities distinct, rather than acknowledging that, with their fields and methods, they will be commonly understood to be a kind of object.? Making a distinction between object and instance is likely to be confusing and irritating to users. So I was leaning into preserving these common intuitions: - users declare classes - classes have instances - instances are objects - objects/instances know what class they are And extending them to ?and now, the primitives are objects/instances too? (which I think is an evolution people will like, because it moves us towards ?everything is an object.?). The new bit, which is the cost of all this, is that for at least id-free objects, there are TWO ways to put them in a variable: put the object itself in a variable, or put a reference to the object in a variable. This is a tricky new concept, but it is consistent with the companion-types model: the value set of C.val are the instances of C; the value set of C.ref are *references to* the instances of C. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Fri Jul 22 19:02:23 2022 From: john.r.rose at oracle.com (John Rose) Date: Fri, 22 Jul 2022 12:02:23 -0700 Subject: where are all the objects? In-Reply-To: <73702885-441E-4552-9100-2806257405CD@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> Message-ID: <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> On 22 Jul 2022, at 10:55, Brian Goetz wrote: > ? >> So then, would we call an instance of `Complex.val` a "non-heap >> object" or an "inlined object" or what? We need to flesh out a whole >> lexicon. The phrase "value object" becomes useless for this >> particular distinction as it will apply to both. > > Yes, in the taxonomy I?m pushing, a ?value object? is one > without identity, and is the kind of object you can store directly in > variables without going through a reference. But I don?t think that > there are instances of Complex.val and instances of Complex.ref; I > think there are instances of *Complex*, and multiple ways to > describe/store/access them. FTR, I enthusiastically agree with this viewpoint, even though I am also probing for weaknesses and alternatives. (FTR I feel the same about Brian?s summary in his previous short message.) And under this viewpoint, the terms ?instance? and ?object? have the same denotation, though difference connotations. (When I say ?instance? you may well think, ?instance of what?? But you don?t ask that question so much if I say ?object?.) >>> That `int/Integer` decision you've been making has always been >>> between (1) value and (2) (reference-to) object, and that decision >>> is still exactly between (1) value and (2) (reference-to) object >>> now, and btw the definitions of 'reference' and 'object' remain >>> precisely wedded to each other as always. >> >> The "heap object" alternative strikes me (and I am trying to be fair, >> here) as: >> >>> Now, that's an object either way, and you're going to apply that old >>> thought process toward which *kind* of object you mean, either a (1) >>> "inline object" or a (2) "(reference-to) heap object". It's now just >>> heap objects and references that are paired together. I think, Kevin, you are going wrong at this point: It?s not a *kind* of object, it is a *placement* of an object. What ?kind? of person am I when I am diving to the office? Surely the same ?kind? as when I am at home. But when I am driving, I am equipped with a car and a road, much like a heap-placed object is equipped with a header and references. Likewise, an int/Integer is (in Valhalla) the same ?kind? of object (if we go all the way to making primitives be honorary objects) whether it is placed in heap or on stack or inside another object. The distinction that comes from the choice of equipping an int with a header in heap storage is a distinction of placement (and corresponding representation). So an int/Integer does not intrinsically have a header because it is an object (because of its ?kind?). It *may* have a header if the JVM needs to give it one, because it is stuck in the heap. (My points about int/Integer could partly fail if we fail to align int and Integer in the end. So transfer the argument to C.val/C.ref if you prefer. It is the same argument.) And I would say the *placement* of an object is in three broad cases which are worth teaching even to beginners: - ?in the heap?: therefore referred to by a machine word address, and presumably equipped with a header and maybe surrounded by some alignment waste; a JVM might have multiple heaps but at this level of discourse we say ?the heap? - ?on the stack?: therefore manipulated directly by its components, which are effectively separated into scalars (it is ?scalarized?, we sometimes say); we might sometimes wish to say ?JVM stack or locals? instead of ?stack?, or, with increasing detail, ?on stack, in locals, and/or in registers, and/or as immediates in the machine code? - ?contained in another object?: in a field or array element, therefore piggy-backing on the other object?s placement; and note that even arrays are scalarized sometimes, lifting their elements into registers etc. To summarize: `Placement = Heap | Stack | Contained[Placement]`. One might use the term ?inline? somewhere in there, either to mean `Contained` or `Stack|Contained[*]`. Static field values are a special case, but they can be classified in one of the above ways. HotSpot places static fields inside a special per-class object (the mirror, in fact), so their values are either contained or separate in the heap (JVM?s choice again). One might be pedantic and say that an instance can be contained ?in static memory? (neither heap nor stack) if the JVM implements storage for static fields outside of the heap. But in that case I?d rather say that they are in a funny corner of the heap, where perhaps headers are not needed, because some static metadata somewhere dictates what is stored. (Hence I like to be cagey about whether a heap-object actually has a physical header. It might not in some JVM implementations.) >> >> Starting to prefer the first way (as I did) did not feel like going >> rogue: after all, did we not gravitate toward ".ref" and ".val" as >> our placeholder syntaxes, not ".inline" and ".heap" or anything else? > > With you on this. I think asking users to reason about ?heap > objects? vs ?inline objects? is pushing them towards the > implementation, not the concepts. They may have to reason about this > to understand the performance model, but that?s already advanced > material. Yes. And even more specifically in the implementation, users who think about ?heap objects? are really (IMO) trying to predict the *placement* of the objects, *where* the JVM will choose to place their bits in physical memory. This question of placement is very interesting to the ?alert? performance-minded programmer. Not every programmer is in that state; for me I try to practice ?first make it work then make it fast?. I get ?alert? to performance only in the ?make it fast phase?, a phase which many of my codes never reach. As a sort of ?siren song? the question of placement is *also* interesting to the beginning student who is struggling to build a mental image of Java data, and is reaching for visualizations in terms of memory and addresses, or (what is about the same) boxes and arrows. But the JVM will make a hash of all that, if it is doing a good job. So the student must be told to hold those mental models lightly. Kevin is insisting (for his own good reasons) on his answer to ?where are the objects?: They are always ?in the heap? and thus ?with headers, accessed by pointers?. I suspect (but haven?t seen from Kevin himself yet) that this is in part due to a desire to work with, rather than work against, the student?s desire to make simple visual models of Java data. Crucially, in a literal ?boxes and arrows? model, an arrow (perhaps a `C.ref` reference to an instance) looks very different from a nested box (perhaps a `C.val` instance), and the naive user might insist that such differences are part of the contract between the user and the JVM. But they are not. The JVM might introduce invisible ?arrows? (because of heap buffering) and it might remove arrows (because of scalarization for a number of possible reasons). So if the student is told that the arrows and boxes are ?what?s really going on? the student using that assurance to predict performance and footprint will feel cheated in the end. To summarize: Any given instance/object has logically independent properties of class and placement. And thus: The choice of companion type does not affect class but may (may!) affect placement. Circling back to the language design, it might seem odd that there are three ways to place an object but just two companion types. But this oddness goes away if you realize that `C.val` and `C.ref` are not placement directives. The choice between the two is a net-binary selection from a sizeable menu of ?affordances? that the user might be expecting or disavowing at any given point in the code. (See my lists of ?affordances? and ?alternative affordances? in [encapsulating-val].) The user is given this simplified switch to influence the JVM?s decisions about placement (and therefore representation). It is useful because the JVM can employ different implementation tactics depending on the differences between the user-visible contracts of `C.ref` and of `C.val`. In the choice of implementation tactics, the JVM has the final say. [encapsulating-val]: -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Fri Jul 22 21:22:44 2022 From: john.r.rose at oracle.com (John Rose) Date: Fri, 22 Jul 2022 14:22:44 -0700 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> Message-ID: <1F569918-897D-4383-B623-73BC45120165@oracle.com> FWIW I too am glad you are backing away from that use of ?instance?! > As "instances of a type", I need to think more about what I am trying to > accomplish with that term, as I have myself just confused it with "values > of the type". We don?t want to use already-overloaded terms like ?value of a type?, ?instance of a type?, ?member of a type?, ?object of a type? when we are just trying to generically discuss what happens when a ?variable of a type? gets filled. In the interest of painting this teeny bikeshed a bit more, may I suggest, for informal but precise language, more neutral terms like ?point within a type? or ?individual of a type? or ?realization of a type? or something (point, element, member, item) in the ?domain of a type?. Also, FTR, I do see some danger in saying a variable ?contains? a value, if we are also talking about other kinds of containment relations, such as in my earlier note about placement. Language is hard. From john.r.rose at oracle.com Fri Jul 22 21:41:26 2022 From: john.r.rose at oracle.com (John Rose) Date: Fri, 22 Jul 2022 14:41:26 -0700 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> Message-ID: <94D2AC6F-C102-4783-B058-D2A741AFC184@oracle.com> On 22 Jul 2022, at 10:17, Kevin Bourrillion wrote: > >> Yeah this a bit of a nuisance. It would be nice to have a term that >> covers both "values" and "instances", because in my mind, and >> instance >> is something that gets instantiated. Ephemeral values don't get >> instantiated, cause they exist. Which means with my view objects of >> value classes aren't instances (?? this surprised me, but I can't >> convince myself otherwise anymore). Instantiation is overrated IMO. One instance of the concept ?natural number? is 42. Either it was never instantiated, or it was instantiated in some platonic pre-temporal epoch from all eternity (which some optimizers use to emulate some kinds of immutable data!), or it was instantiated multiple times, every time someone reached for the number 42. In the end it doesn?t matter when, just what, 42 is. Now, take a somewhat larger natural number which is the godelization of a very large file that you just created for the first time today. Is that number any older or newer than the *contents* of that file? I would say they are logically equivalent (one being derivable from the other) and therefore whatever age-story we make up for one of them applies quite well to the other (the file content, or the big number that encodes its information content). Thus we get the oddity that any kind of file contents (such as that nice JPG selfie you just took of yourself and your cat) can be interchangeably viewed as being platonically existent from eternity (but you just discovered it) or else a new thing created by you (and theoretically recreated independently by others with similar faces and cats). If you hate that idea of discovering rather than making a JPG file, but accept the idea that 42 is obviously a platonic value, then tell me: How big does a number have to be before it loses it eternal character and becomes just another bit of temporary bit-flux in the infosphere? Because then with your help I can understand why 42 is eternal and your selfie is temporal. :-) This is not idle speculation. In fact, when I talk about an identity-free object in Valhalla, I am talking about something very much like the contents of your JPG file. Yes, there might be 5 copies of it in my JVM, but is it a real event when the JIT makes a 6th copy of it for some reason? Or when the GC collapses them into a single copy (that clever GC)? It is a real event for the hardware, but should we teach Java programmers to care about it, or is it just useless distraction? The godelization argument shows that this puzzle extends smoothly from tiny things like 8-bit bytes up to large value objects that join together many smaller values. > > I do think "instantiate" often carries a connotation of bringing > something > into existence that never existed otherwise, but I'm not sure it > *should* > have that connotation. For example `List` is often called a > "type > instantiation" of `List`" but nothing was birthed into the world by > writing that; it was just sort of obtained from the ether. That ?ether? being a pre-temporal epoch such as I mentioned. And the godelization argument means that, wherever you get (or rediscover) things like `List` from is very much related to the source of natural numbers, whatever *that* is. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Fri Jul 22 21:45:14 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Jul 2022 14:45:14 -0700 Subject: where are all the objects? In-Reply-To: <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> Message-ID: The contours of our discussions maybe 9 months ago are finally coming back to me, and I'm suddenly realizing they haven't even changed much. I hadn't properly swapped that context back in and so I guess I've been wasting y'all's time just repeating my "camp's" position louder and more slowly. Sorry! Now I wonder if these points, at least, might be uncontroversial: 1. There exist useful well-defined concepts of "value" and "object" that are disjoint and that *have been* valid up to now. (I'll hazard a claim that my paper still defends at least *this* much well enough.) 2. Also, you've had to treat the two quite differently from each other in your programs. 3. We *are* changing (improving) #2 through this project. 4. But users may still need #1's disjoint concepts when they are trying to reason about the *performance* model (tho they'll also need to understand that the VM is empowered to "fake" one as the other when the spirit so moves it). 5. The questions at hand in this thread are not foremost about the performance model but about the basic "start-here" user model. 6. These miiight be fair descriptions of the 2 camps? A. Because you'll get to program mostly the same way in both cases, we can and should de-emphasize the distinction. There might be a reference sitting in between you and the data/"object" or there might not. It's mostly in the VM's hands. If you ever think you care about the distinction, you probably are dipping down into the performance model. There is a "just don't worry about it!" flavor to this option. B. It's still helpful to have a solid sense of the distinction, even as we benefit from getting to code the same way to each. Even though the VM might really fake one as the other; again, that's performance model. Anything controversial about the above? (If I had to explain why I've been so dogged about B, maybe it's the sense that we simply won't "get away with" A. It feels hard (to me) to tell users simultaneously that they should stop caring about a distinction AND that we're changing up how all kinds of stuff works across that distinction. It feels more solid to firm up the distinction so that we can talk about how things are changing, and then let that distinction just slowly matter less and less over time.) On Fri, Jul 22, 2022 at 12:02 PM John Rose wrote: > On 22 Jul 2022, at 10:55, Brian Goetz wrote: > > ? > > So then, would we call an instance of `Complex.val` a "non-heap object" or > an "inlined object" or what? We need to flesh out a whole lexicon. The > phrase "value object" becomes useless for this particular distinction as it > will apply to both. > > Yes, in the taxonomy I?m pushing, a ?value object? is one without > identity, and is the kind of object you can store directly in variables > without going through a reference. But I don?t think that there are > instances of Complex.val and instances of Complex.ref; I think there are > instances of *Complex*, and multiple ways to describe/store/access them. > > FTR, I enthusiastically agree with this viewpoint, even though I am also > probing for weaknesses and alternatives. (FTR I feel the same about Brian?s > summary in his previous short message.) > > And under this viewpoint, the terms ?instance? and ?object? have the same > denotation, though difference connotations. (When I say ?instance? you may > well think, ?instance of what?? But you don?t ask that question so much if > I say ?object?.) > > That `int/Integer` decision you've been making has always been between (1) > value and (2) (reference-to) object, and that decision is still exactly > between (1) value and (2) (reference-to) object now, and btw the > definitions of 'reference' and 'object' remain precisely wedded to each > other as always. > > The "heap object" alternative strikes me (and I am trying to be fair, > here) as: > > Now, that's an object either way, and you're going to apply that old > thought process toward which *kind* of object you mean, either a (1) > "inline object" or a (2) "(reference-to) heap object". It's now just heap > objects and references that are paired together. > > I think, Kevin, you are going wrong at this point: It?s not a *kind* of > object, it is a *placement* of an object. What ?kind? of person am I when > I am diving to the office? Surely the same ?kind? as when I am at home. But > when I am driving, I am equipped with a car and a road, much like a > heap-placed object is equipped with a header and references. > > Likewise, an int/Integer is (in Valhalla) the same ?kind? of object (if we > go all the way to making primitives be honorary objects) whether it is > placed in heap or on stack or inside another object. > > The distinction that comes from the choice of equipping an int with a > header in heap storage is a distinction of placement (and corresponding > representation). So an int/Integer does not intrinsically have a header > because it is an object (because of its ?kind?). It *may* have a header > if the JVM needs to give it one, because it is stuck in the heap. > > (My points about int/Integer could partly fail if we fail to align int and > Integer in the end. So transfer the argument to C.val/C.ref if you prefer. > It is the same argument.) > > And I would say the *placement* of an object is in three broad cases > which are worth teaching even to beginners: > > - > > ?in the heap?: therefore referred to by a machine word address, and > presumably equipped with a header and maybe surrounded by some alignment > waste; a JVM might have multiple heaps but at this level of discourse we > say ?the heap? > - > > ?on the stack?: therefore manipulated directly by its components, > which are effectively separated into scalars (it is ?scalarized?, we > sometimes say); we might sometimes wish to say ?JVM stack or locals? > instead of ?stack?, or, with increasing detail, ?on stack, in locals, > and/or in registers, and/or as immediates in the machine code? > - > > ?contained in another object?: in a field or array element, therefore > piggy-backing on the other object?s placement; and note that even arrays > are scalarized sometimes, lifting their elements into registers etc. > > To summarize: Placement = Heap | Stack | Contained[Placement]. > > One might use the term ?inline? somewhere in there, either to mean > Contained or Stack|Contained[*]. > > Static field values are a special case, but they can be classified in one > of the above ways. HotSpot places static fields inside a special per-class > object (the mirror, in fact), so their values are either contained or > separate in the heap (JVM?s choice again). > > One might be pedantic and say that an instance can be contained ?in static > memory? (neither heap nor stack) if the JVM implements storage for static > fields outside of the heap. But in that case I?d rather say that they are > in a funny corner of the heap, where perhaps headers are not needed, > because some static metadata somewhere dictates what is stored. > > (Hence I like to be cagey about whether a heap-object actually has a > physical header. It might not in some JVM implementations.) > > Starting to prefer the first way (as I did) did not feel like going rogue: > after all, did we not gravitate toward ".ref" and ".val" as our placeholder > syntaxes, not ".inline" and ".heap" or anything else? > > With you on this. I think asking users to reason about ?heap objects? vs > ?inline objects? is pushing them towards the implementation, not the > concepts. They may have to reason about this to understand the performance > model, but that?s already advanced material. > > Yes. And even more specifically in the implementation, users who think > about ?heap objects? are really (IMO) trying to predict the *placement* > of the objects, *where* the JVM will choose to place their bits in > physical memory. > > This question of placement is very interesting to the ?alert? > performance-minded programmer. Not every programmer is in that state; for > me I try to practice ?first make it work then make it fast?. I get ?alert? > to performance only in the ?make it fast phase?, a phase which many of my > codes never reach. > > As a sort of ?siren song? the question of placement is *also* interesting > to the beginning student who is struggling to build a mental image of Java > data, and is reaching for visualizations in terms of memory and addresses, > or (what is about the same) boxes and arrows. But the JVM will make a hash > of all that, if it is doing a good job. So the student must be told to hold > those mental models lightly. > > Kevin is insisting (for his own good reasons) on his answer to ?where are > the objects?: They are always ?in the heap? and thus ?with headers, > accessed by pointers?. I suspect (but haven?t seen from Kevin himself yet) > that this is in part due to a desire to work with, rather than work > against, the student?s desire to make simple visual models of Java data. > > Crucially, in a literal ?boxes and arrows? model, an arrow (perhaps a > C.ref reference to an instance) looks very different from a nested box > (perhaps a C.val instance), and the naive user might insist that such > differences are part of the contract between the user and the JVM. But they > are not. The JVM might introduce invisible ?arrows? (because of heap > buffering) and it might remove arrows (because of scalarization for a > number of possible reasons). > > So if the student is told that the arrows and boxes are ?what?s really > going on? the student using that assurance to predict performance and > footprint will feel cheated in the end. > > To summarize: Any given instance/object has logically independent > properties of class and placement. > > And thus: The choice of companion type does not affect class but may > (may!) affect placement. > > Circling back to the language design, it might seem odd that there are > three ways to place an object but just two companion types. But this > oddness goes away if you realize that C.val and C.ref are not placement > directives. The choice between the two is a net-binary selection from a > sizeable menu of ?affordances? that the user might be expecting or > disavowing at any given point in the code. (See my lists of ?affordances? > and ?alternative affordances? in encapsulating-val > > .) > > The user is given this simplified switch to influence the JVM?s decisions > about placement (and therefore representation). It is useful because the > JVM can employ different implementation tactics depending on the > differences between the user-visible contracts of C.ref and of C.val. In > the choice of implementation tactics, the JVM has the final say. > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 22 23:11:50 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Jul 2022 23:11:50 +0000 Subject: where are all the objects? In-Reply-To: <94D2AC6F-C102-4783-B058-D2A741AFC184@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <94D2AC6F-C102-4783-B058-D2A741AFC184@oracle.com> Message-ID: Instantiation is overrated IMO. I like the notion of ?summoning?. You can summon Foo.default or 42 or ?foo? without appeal to instantiation; for an indentity class, new Foo() summons a guaranteed-unique entity; for a value class, new Foo() summons a value that you may or may not have seen before. I do think "instantiate" often carries a connotation of bringing something into existence that never existed otherwise, but I'm not sure it *should* have that connotation. Because classes have been identity classes forever, we easily conflate ?summon? with ?summon new and forever unique instance?, which is largely equivalent to ?allocate?. So I think we have to adjust our notion of ?instantiation? or whatever we call it to e more like ?get me a value; depending on the classes involve, it might be new, it might be old.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 22 23:16:28 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Jul 2022 23:16:28 +0000 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> Message-ID: Now I wonder if these points, at least, might be uncontroversial: 1. There exist useful well-defined concepts of "value" and "object" that are disjoint and that *have been* valid up to now. (I'll hazard a claim that my paper still defends at least this much well enough.) 2. Also, you've had to treat the two quite differently from each other in your programs. 3. We *are* changing (improving) #2 through this project. I claim we are changing #1 as well, though to a lesser degree. #2 should ?mostly go away?; #1 should transform into other terms, such as e.g. ?object stored directly? vs ?reference to object?. It is those other terms that I think we are searching for consensus on, but #1 is moving. 4. But users may still need #1's disjoint concepts when they are trying to reason about the *performance* model (tho they'll also need to understand that the VM is empowered to "fake" one as the other when the spirit so moves it). Yes, though I think these are concepts that are more _derived from_ the distinction in #1. John?s notion of ?placement? is good here; the choice of ref vs val constrains the placement, and placement informs the performance model. I think part of what has been missing until today is a good attempt to name the intermediate actors, like placement. I hope that if we refine those terms a bit, things will get clearer. 5. The questions at hand in this thread are not foremost about the performance model but about the basic "start-here" user model. 6. These miiight be fair descriptions of the 2 camps? A. Because you'll get to program mostly the same way in both cases, we can and should de-emphasize the distinction. There might be a reference sitting in between you and the data/"object" or there might not. It's mostly in the VM's hands. If you ever think you care about the distinction, you probably are dipping down into the performance model. There is a "just don't worry about it!" flavor to this option. B. It's still helpful to have a solid sense of the distinction, even as we benefit from getting to code the same way to each. Even though the VM might really fake one as the other; again, that's performance model. Anything controversial about the above? No, and I want to choose both A and B! I don?t think they are opposed, I think they are different angles on the elephant. (If I had to explain why I've been so dogged about B, maybe it's the sense that we simply won't "get away with" A. It feels hard (to me) to tell users simultaneously that they should stop caring about a distinction AND that we're changing up how all kinds of stuff works across that distinction. It feels more solid to firm up the distinction so that we can talk about how things are changing, and then let that distinction just slowly matter less and less over time.) Agree that we need a good "start here? story, but I think a good one will have aspects of A and B. I think we?re making progress? On Fri, Jul 22, 2022 at 12:02 PM John Rose > wrote: On 22 Jul 2022, at 10:55, Brian Goetz wrote: ? So then, would we call an instance of `Complex.val` a "non-heap object" or an "inlined object" or what? We need to flesh out a whole lexicon. The phrase "value object" becomes useless for this particular distinction as it will apply to both. Yes, in the taxonomy I?m pushing, a ?value object? is one without identity, and is the kind of object you can store directly in variables without going through a reference. But I don?t think that there are instances of Complex.val and instances of Complex.ref; I think there are instances of *Complex*, and multiple ways to describe/store/access them. FTR, I enthusiastically agree with this viewpoint, even though I am also probing for weaknesses and alternatives. (FTR I feel the same about Brian?s summary in his previous short message.) And under this viewpoint, the terms ?instance? and ?object? have the same denotation, though difference connotations. (When I say ?instance? you may well think, ?instance of what?? But you don?t ask that question so much if I say ?object?.) That `int/Integer` decision you've been making has always been between (1) value and (2) (reference-to) object, and that decision is still exactly between (1) value and (2) (reference-to) object now, and btw the definitions of 'reference' and 'object' remain precisely wedded to each other as always. The "heap object" alternative strikes me (and I am trying to be fair, here) as: Now, that's an object either way, and you're going to apply that old thought process toward which *kind* of object you mean, either a (1) "inline object" or a (2) "(reference-to) heap object". It's now just heap objects and references that are paired together. I think, Kevin, you are going wrong at this point: It?s not a kind of object, it is a placement of an object. What ?kind? of person am I when I am diving to the office? Surely the same ?kind? as when I am at home. But when I am driving, I am equipped with a car and a road, much like a heap-placed object is equipped with a header and references. Likewise, an int/Integer is (in Valhalla) the same ?kind? of object (if we go all the way to making primitives be honorary objects) whether it is placed in heap or on stack or inside another object. The distinction that comes from the choice of equipping an int with a header in heap storage is a distinction of placement (and corresponding representation). So an int/Integer does not intrinsically have a header because it is an object (because of its ?kind?). It may have a header if the JVM needs to give it one, because it is stuck in the heap. (My points about int/Integer could partly fail if we fail to align int and Integer in the end. So transfer the argument to C.val/C.ref if you prefer. It is the same argument.) And I would say the placement of an object is in three broad cases which are worth teaching even to beginners: * ?in the heap?: therefore referred to by a machine word address, and presumably equipped with a header and maybe surrounded by some alignment waste; a JVM might have multiple heaps but at this level of discourse we say ?the heap? * ?on the stack?: therefore manipulated directly by its components, which are effectively separated into scalars (it is ?scalarized?, we sometimes say); we might sometimes wish to say ?JVM stack or locals? instead of ?stack?, or, with increasing detail, ?on stack, in locals, and/or in registers, and/or as immediates in the machine code? * ?contained in another object?: in a field or array element, therefore piggy-backing on the other object?s placement; and note that even arrays are scalarized sometimes, lifting their elements into registers etc. To summarize: Placement = Heap | Stack | Contained[Placement]. One might use the term ?inline? somewhere in there, either to mean Contained or Stack|Contained[*]. Static field values are a special case, but they can be classified in one of the above ways. HotSpot places static fields inside a special per-class object (the mirror, in fact), so their values are either contained or separate in the heap (JVM?s choice again). One might be pedantic and say that an instance can be contained ?in static memory? (neither heap nor stack) if the JVM implements storage for static fields outside of the heap. But in that case I?d rather say that they are in a funny corner of the heap, where perhaps headers are not needed, because some static metadata somewhere dictates what is stored. (Hence I like to be cagey about whether a heap-object actually has a physical header. It might not in some JVM implementations.) Starting to prefer the first way (as I did) did not feel like going rogue: after all, did we not gravitate toward ".ref" and ".val" as our placeholder syntaxes, not ".inline" and ".heap" or anything else? With you on this. I think asking users to reason about ?heap objects? vs ?inline objects? is pushing them towards the implementation, not the concepts. They may have to reason about this to understand the performance model, but that?s already advanced material. Yes. And even more specifically in the implementation, users who think about ?heap objects? are really (IMO) trying to predict the placement of the objects, where the JVM will choose to place their bits in physical memory. This question of placement is very interesting to the ?alert? performance-minded programmer. Not every programmer is in that state; for me I try to practice ?first make it work then make it fast?. I get ?alert? to performance only in the ?make it fast phase?, a phase which many of my codes never reach. As a sort of ?siren song? the question of placement is also interesting to the beginning student who is struggling to build a mental image of Java data, and is reaching for visualizations in terms of memory and addresses, or (what is about the same) boxes and arrows. But the JVM will make a hash of all that, if it is doing a good job. So the student must be told to hold those mental models lightly. Kevin is insisting (for his own good reasons) on his answer to ?where are the objects?: They are always ?in the heap? and thus ?with headers, accessed by pointers?. I suspect (but haven?t seen from Kevin himself yet) that this is in part due to a desire to work with, rather than work against, the student?s desire to make simple visual models of Java data. Crucially, in a literal ?boxes and arrows? model, an arrow (perhaps a C.ref reference to an instance) looks very different from a nested box (perhaps a C.val instance), and the naive user might insist that such differences are part of the contract between the user and the JVM. But they are not. The JVM might introduce invisible ?arrows? (because of heap buffering) and it might remove arrows (because of scalarization for a number of possible reasons). So if the student is told that the arrows and boxes are ?what?s really going on? the student using that assurance to predict performance and footprint will feel cheated in the end. To summarize: Any given instance/object has logically independent properties of class and placement. And thus: The choice of companion type does not affect class but may (may!) affect placement. Circling back to the language design, it might seem odd that there are three ways to place an object but just two companion types. But this oddness goes away if you realize that C.val and C.ref are not placement directives. The choice between the two is a net-binary selection from a sizeable menu of ?affordances? that the user might be expecting or disavowing at any given point in the code. (See my lists of ?affordances? and ?alternative affordances? in encapsulating-val.) The user is given this simplified switch to influence the JVM?s decisions about placement (and therefore representation). It is useful because the JVM can employ different implementation tactics depending on the differences between the user-visible contracts of C.ref and of C.val. In the choice of implementation tactics, the JVM has the final say. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbepincket at live.be Sat Jul 23 11:17:46 2022 From: robbepincket at live.be (Robbe Pincket) Date: Sat, 23 Jul 2022 11:17:46 +0000 Subject: where are all the objects? Message-ID: Hi everyone I've been watching (with a bit of confusion) what all the discussions were trying to accomplish. Now that I start to grasp some of the issues, I want to bring my thoughts. My mind was still very much thinking with respect to the 3 buckets system, where stuff like "atomicity" was just a cool flag for the 3'rd bucket, but I've come to realize that I might need to leave that behind. My confusion started when people seemed to combine C.ref and C.val often, where in my mind they were quite different, one is "primitive like" while the other is a wrapper. On 22 Jul 2022, at 17:55 UTC, Brian Goetz wrote (I think this isn't on the list): > [...] But I don?t think that there are instances of Complex.val and instances of Complex.ref; I think there are instances of *Complex*, and multiple ways to describe/store/access them. On Fri Jul 22 19:02:23 UTC 2022, John Rose wrote: > Circling back to the language design, it might seem odd that there are three ways to place an object but just two companion types. But this oddness goes away if you realize that C.val and C.ref are not placement directives. The choice between the two is a net-binary selection from a sizeable menu of ?affordances? that the user might be expecting or disavowing at any given point in the code. (See my lists of ?affordances? and ?alternative affordances? in encapsulating-val.) > The user is given this simplified switch to influence the JVM?s decisions about placement (and therefore representation). It is useful because the JVM can employ different implementation tactics depending on the differences between the user-visible contracts of C.ref and of C.val. In the choice of implementation tactics, the JVM has the final say. These comments and a few others, made me realize I shouldn't keep these 2 variants so separate (at least with universal generics in mind). Things like "by value" and "by reference" come from languages like C, and its use in current Java to explain the difference between "objects" and "primitives" always rubbed me the wrong way. Mostly because of the fact that (after compiler magic) there isn't really a difference between "by ref" and "by val" for immutable data after the compiler did its magic. Let's go back and think about `long`/`Long` and what their differences are now, and what they will be after universal generics are here: Now: 1. `Long` implements stuff like `Number`, aka `Long` can extend classes and implement interfaces, and (references to) instances of `Long` can be assigned to variables of those supertypes. Values of `long` don't extend/implement anything. 2. Variables of the type `Long` are nullable, variables of `long` aren't. 3. Assigning stuff to a variable of `Long` is "atomic", meaning there are no unexpected values that appear in the variable. Variables of type `long` (currently) don't make this guarantee. 4. If you want to use generics, you need to use `Long`. You can't have an `ArrayList` Let's see how these change: 1. Once I can create an `ArrayList` and I have a method of `> void inPlaceQuickSort(List list)` I expect to be able to pass my list into this method, meaning `long` must therefore implement `Comparable` 2. This doesn't change, note however that now, `Optional` is a thing too, and could easily replace most if not all remaining usages of `Long` 3. Idk what the plans are about making `long` atomic? 4. `ArrayList` is now legal and has a neat property that it can no longer contain `null`. We can even replace usages of `ArrayList` with `ArrayList>` to indicate we want the nullability/ability to be absent Given that atomicity is the default for `.val`, the main difference between the `C.val` and `C.ref` is nullability, and people will use it as such. If you give them `T.flat`, people are gonna use it to mark fields that shouldn't be null, maybe even forgetting that they might now create new values when races occur. While the fact whether a variable of a value type can be null is quite important for the JVM to improve performance, it feels weird that it is indicated by `.val` and `.ref` (although the later is implicit) or by `.flat` and `.box`. Greetings Robbe Pincket From john.r.rose at oracle.com Sun Jul 3 04:55:58 2022 From: john.r.rose at oracle.com (John Rose) Date: Sun, 03 Jul 2022 04:55:58 -0000 Subject: Value type companions, encapsulated Message-ID: In this message Brian wrote out the major features of an emerging design for value classes: > From: Brian Goetz > To: ? > Subject: Re: User model stacking: current status > Date: Thu, 23 Jun 2022 15:01:24 -0400 I think controlling the complexity by having a separate nested declaration of the value companion type will work very well. So what exactly does a private value companion do? What is it you can and cannot do with this type? What problems are prevented by privatizing it? How and when is privatization enforced? What other problems are created by those new rules? I have been pulling on this thread for a few days now, and I think I have some answers. http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html (The Hitchhiker?s Guide suddenly comes to mind. Don?t panic!) I expect I will be editing these files as we go. For reference here is a verbatim copy of the MD file as it stands right now (minus the header): ## Background _(We will start with background information. The **[new stuff comes afterward]**. Impatient readers can find a very quick **[summary of restrictions]** at the end.)_ [new stuff comes afterward]: <#privatization-to-the-rescue> [summary of restrictions]: <#summary-of-restrictions> ### Affordances of `C.ref` Every class or interface `C` comes with a companion type, the reference type `C.ref` derived from `C` which describes any variable (argument, return value, array element, etc.) whose values are either null or of a concrete class derived from `C`. We are not in the habit of distinguishing `C.ref` from `C`, but the distinction is there. For example, if we call `Object::getClass` on a variable of type `C.ref` we might not get `C.class`; we might even get a null pointer exception! We are so very used to working with reference types (for short, _ref-types_) that we sometimes forget all that they do for us in addition to their linkage to specific classes: - `C.ref` gives a starting point for accessing `C`'s members. - `C.ref` provides abstraction: `C` or a subtype might not be loaded yet. - `C.ref` provides the standard uninitialized value `null`. - `C.ref` can link `C` objects into graphs, even circular ones. - `C.ref` has a known size, one "machine word", carefully tuned by the JVM. - `C.ref` allows a single large object to be shared from many locations. - `C.ref` with an identity class can centralize access to mutable state. - `C.ref` values uniformly convert to and from general types like `Object`. - `C.ref` variable types can be reflected using `Class` mirror objects. - `C.ref` is safe for publication if the fields of `C` are `final`. When I store a bunch of `C` objects into an object array or list, sort it, and then share it with another thread, I am using several of the above properties; if the other thread down-casts the items to `C.ref` and works on them it relies on those properties. If I implement `C` as a doubly-linked list data structure or a (alternatively) a value-based class with tree structure, I am using yet more of the above properties of references. If my `C` object has a lot of state and I pass out many pointers to it, and perhaps compute and cache interesting values in its mutable fields, I am again relying on the special properties of references, as well as of identity classes (if fields are mutable). By the way, in the JVM, variables of type `C.ref` (some of them at least) are associated not with `C` simple, but with the so-called _L-descriptor_ spelled `LC;`. When we talk about `C.ref` we are usually talking about those L-descriptors in the JVM, as well. I don't need to think much about this portfolio of properties as I go about my work. But if they were to somehow fail, I would notice bugs in my code sooner or later. One of the big consequences of this overall design is that I can write a class `C` which has full control over its instance states. If it is mutable, I can make its fields private and ensure that mutations occur only under appropriate locking conditions. Or if I declare it as a value-based class, I can ensure that its constructor only allows legitimate instances to be constructed. Under those conditions, I know that every single instance of my class will have been examined and accepted by the class constructor, and/or whatever factory and mutator methods I have created for it. If I did my job right, not even a race condition can create an invalid state in one of my objects. Any instance state of `C` which has been reached without being produced from a constructor, factory, mutator, or constant of `C` can be called _non-constructed_. Of course, inside a class any state whatever can be constructed, subject to the types of fields and so on. But the author of the class gets to decide which states are legitimate, and the decisions are enforced by access control at the boundaries of the encapsulation. So if I code my class right, using access control to keep bad states away from my clients, my class's external API will have no non-constructed states. ### Costs of `C.ref` In that case why have value types at all, if references are so powerful? The answer is that reference-based abstraction pays for its benefits with particular costs, costs that Java programmers do not always wish to pay: - A reference (usually) requires storage for a pointer to the object. - A reference (usually) requires storage for a header embedded inside the object. - Access to an object's fields (usually) requires extra cycles to chase the pointer. - The GC expends effort administering a singular "home location" for every object. - Cache line invalidation near that home location can cause useless memory traffic. - A reference must be able to represent `null`; tightly-packed types like `int` and `long` would need to add an extra bit somewhere to cover this. The major alternative to references, as provided by Valhalla, is flat objects, where object fields are laid out immediately in their containers, in place of a pointer which points to them stored elsewhere. Neither alternative is always better than the other, which is why Java has both `int` and `Integer` types and their arrays, and why Valhalla will offer a corresponding choice for value classes. ### Alternative affordances of `C.val` Now, instances of a value class can be laid out flat in their containing variables. But they can also be "boxed" in the heap, for classic reference-based access. Therefore, a value class `C` has not one but _two_ companion types associated it, not only the reference companion `C.ref` but also the value companion `C.val`. Only value classes have value companions, naturally. The companion `C.val` is called a value type (or _val-type_ for short), by contrast with any reference type, whether `Object.ref` or `C.ref`. The two companion types are closely related and perform some of the same jobs: - `C.ref` and `C.val` both give a starting point for accessing `C`'s members. - `C.ref` and `C.val` can link `C` objects into acyclic graphs. - `C.ref` and `C.val` values uniformly convert to and from general types like `Object`. - `C.ref` and `C.val` variable types can be reflected using `Class` mirror objects. For these jobs, it usually doesn't matter which type companion does the work. Despite the similarities, many properties of a value companion type are subtly different from any reference type: - `C.val` is non-abstract: You must load its class file before making a variable. - `C.val` cannot nest except by reference; `C` cannot declare a `C.val` field. - `C.val` does not represent the value `null`. - `C.val` is routinely flattenable, avoiding headers and indirection pointers - `C.val` has configurable size, depending on `C`'s non-static fields. - `C.val` heap variables (fields, array elements) are initialized to all-zeroes. - `C.val` might not be safe for publication (even though its fields are `final`). The JVM distinguishes `C.val` by giving it a different descriptor, a so-called _Q-descriptor_ of the form `QC;`, and it also provides a so-called _secondary mirror_ `C.val.class` which is similar to the built-in primitive mirrors like `int.class`. As the Valhalla performance model notes, flattening may be expected but is not fully guaranteed. A `C.val` stored in an `Object` container is likely to be boxed on the heap, for example. But `C.val` objects created as bytecode temporaries, arguments, and return values are likely to be flattened into machine registers, and `C.val` fields and array elements (at least below certain size thresholds) are also likely to be flattened into heap words. As a special feature, `C.ref` is potentially flattenable if `C` is a value class. There are additional terms and conditions for flattening `C.ref`, however. If `C` is not yet loaded, nothing can be done: Remember that reference types have full abstraction as one of their powers, and this means building data structures that can refer to them even before they are loaded. But a class file can request that the JVM "peek" at a class to see if it is a value class, and if this request is acted on early enough (at the JVM's discretion), then the JVM can choose to lay out some or all `C.ref` values as flattened `C.val` values _plus_ a boolean or other sentinel value which indicates the `null` state. ### Pitfalls of `C.val` The advantages of value companion types imply some complementary disadvantages. Hopefully they are rarely significant, but they must sometimes be confronted. - `C.val` might need to load a class file which is somehow unloadable - `C.val` will fail to load if its instance layout directly _or indirectly_ includes a `C.val` field _or subfield_ - `C.val` will throw an exception if you try to assign a `null` to it. - `C.val` may have surprising costs for multi-word footprint and assignment (and so might `C.ref` if that is flattened) - `C.val` is initialized to its all-zero value, which might be non-constructed - `C.val` might allow data races on its components, creating values which are non-constructed The footprint issue shows up most strongly if you have many copies of the same `C.val` value; each copy will duplicate all the fields, as opposed many copies of the same `C.ref` reference, which are likely to all point to a single heap location with one copie of all the fields. Flat value size can also affect methods like `Arrays.sort`, which perform many assignments of the base type, and must move all fields on each assignment. If a `C.val` array has many words per element, then the costs of moving those words around may dominate a sort request. For array sorting there are ways to reduce such costs transparently, but it is still a "law of physics" that editing a whole data structure will have costs proportional to the size of the edited portions of the data structure, and `C.ref` arrays will often be somewhat more compact than `C.val` arrays. Programmers and library authors will have to use their heads when deciding between the new alternatives given by value classes. But the last two pitfalls are hardest to deal with, because they both have to do with non-constructed states. These states are the all-zero state with the second-to-last pitfall, and (with the last pitfall) the state obtained by mixing two previous states by means of a pair of racing writes to the same mutable `C.val` variable in the heap. Unlike reference types, value types can be manipulated to create these non-constructed states even in well-designed classes. Now, it may be that a constructor (or factory) might be perfectly able to create one of the above non-constructed states as well, no strings attached. In that case, the class author is enforcing few or no invariants on the states of the value class. Many numeric classes, like complex numbers, are like this: Initialization to all-zeroes is no problem, and races between components are acceptable, compared to the costs of excluding races. > (The reader may recall that early JVMs accepted races on the high and low halves of 64-bit integers as well; this is no longer a widespread issue, but bigger value types like complex raise the same issue again, and we need to provide class authors the same solution, if it fits their class.) There are also some classes for which there are no good defaults, or for which a good default is definitely not the all-zero bit pattern. Authors of such types will often wish to make that bit pattern inaccessible to their clients and provide some factory or constant that gives the real default. We expect that such types will choose the `C.ref` companion, and rely on the extra null checks to ensure correct initialization. Other classes may need to avoid other non-constructed values that may arise from data races, perhaps for reasons of reliability or security. This is a subtle trade-off; very few class authors begin by asking themselves about the consequences of data races on mutable members, and even fewer will ask about _races on whole instances_ of value types, especially given that fields in value types are always immutable. For this reason, we will set safety as the default, so that a class (like complex numbers) which is willing to tolerate data races must declare its tolerance explicitly. Only then will the JVM drop the internal costs of race exclusion. Whether to tolerate the all-zero bit pattern is a simpler decision. Still, it turns out to be useful to give a common single point of declarative control to handle _all_ non-constructed states, both the default value of `C.val` and its mysterious data races. ## Privatization to the rescue _(Here are the important details about the encapsulation of value types. The impatient reader may enjoy the very quick **[summary of restrictions]** at the end of this document.)_ In order to hide non-constructed states, the value companion `C.val` may be _privatized_ by the author of the class `C`. A privatized value companion is effectively withdrawn from clients and kept private to its own class (and to nestmates). Inside the class, the value companion can be used freely, fully under control of the class author. But untrusted clients are prevented from building uninitialized fields or arrays of type `C.val`. This prevents such clients from creating (either accidentally or purposefully) non-constructed values of type `C.val`. How privatization is declared and enforced is discussed in the rest of this document. > (To review, for those who skipped ahead, non-constructed values are those not created under control of the class `C` by constructors or other accessible API points. A non-constructed value may be either an uninitialized variable of `C.val`, or the result of a data race on a shared mutable variable of type `C.val`. The class itself can work internally with such values all day long, but we exclude external access to them by default.) ### Atomicity as well As a second tactic, a value class `C` may select whether or not the JVM enforces atomicity of all occurrences of its value companion `C.val`. A non-atomic value companion is subject to data races, and if it is not privatized, external code may misuse `C.val` variables (in arrays or mutable fields) to create non-constructed values via data races. A value companion which is atomic is not subject to data races. This will be the default if the the class `C` does not explicitly request non-atomicity. This gives safety by default and limits non-constructed states to only the all-zero initial value. The techniques to support this are similar to the techniques for implementing non-tearing of variables which are declared `volatile`; it is as if every variable of an atomic value variable has some (not all) of the costs of volatility. The JVM is likely to flatten such an atomic value only up to the largest available atomically settable memory unit, usually 128 bits. Values larger than that are likely to be boxed, or perhaps treated with some other expensive transactional technique. Containers that are immutable can still be fully flattened, since they are not subject to data races. The behavior of an atomic `C.val` is aligned with that of `C.ref`. A reference to a value class `C` _never_ admits data races on `C`'s fields. The reason for this is simple: A `C.ref` value is a `C.val` instance boxed on the heap in a single immutable box-class field of type `C.val`. (Actually, the JVM may partially or wholly flatten the representation of `C.ref` if it can get away with it; full flattening is likely for JVM locals and stack values, but any such secret flattening is undetectable by the user.) Since it is `final` all the way down (to `C`'s fields) any `C.ref` value is safely published without any possibility of data races. Therefore, an extra declaration of non-atomicity in `C` affects only the value companion `C.val`. It seems that there are use cases which justify all four combinations of both choices (privatization and declared non-atomicity), although it is natural to try to boil down the size of the matrix. - `C.val` private & atomic is the default, and safest configuration hiding all non-constructed values outside of `C` and all data races even inside of `C`. There are some runtime costs. - `C.val` public & non-atomic is the opposite, with fewer runtime costs. It must be explicitly declared. It is desirable for numerics like complex numbers, where all possible bitwise states are meaningful. It is analogous to the situation of a naturally non-atomic primitive like `long`. - `C.val` public & atomic allows everybody to see the all-zero initial value but no other non-constructed states. This is analogous to the situation of a naturally atomic primitive like `int`. - `C.val` private & non-atomic allows `C` complete control over the visibility of non-constructed states, but `C` also has the ability to work internally on arrays of non-atomic elements. `C` should take care not to leak internally-created flat arrays to untrusted clients, lest they use data races to hammer non-constructed values into those arrays. It is logically possible, but there does not seem to be a need, for allowing a single class `C` to work with both kinds of arrays, atomic and non-atomic. (In principle, the dynamic typing of Java arrays would support this, as long as each array was configured at its creation.) The effect of this can be simulated by wrapping a non-atomic class `C` in another wrapper class `WC` which is atomic. Then `C.val[]` arrays are non-atomic and `WC.val[]` arrays are atomic, yet each kind of array can have the same "payload", a repeated sequence of the fields of `C`. ## Privatization in code For source code and bytecode, privatization is enforced by performing access checks on names. ### Privatization rules in the language We will stipulate that a value class `C` _always_ has a value companion type `C.val`, even if it is never declared or used. And we give the author of `C` some control over how clients may use the type `C.val`, in a manner roughly similar to nested member classes like `C.M`. Specifically, the declaration of `C` always selects an access mode for its value companion `C.val` from one of the following three choices: - `C.val` is declared private - `C.val` is declared public - `C.val` is declared, but neither public nor private If `C.val` is declared private, then only nestmates of `C` may access `C.val`. If it is neither public nor private, only classes in the same runtime package as `C` may access it. If it is declared public, then any class that can access `C` may also access `C.val`. As an independent choice, the declaration of `C may select an atomicity for its value companion `C.val` from one of the following two choices: - `C.val` is explicitly declared non-atomic - `C.val` is not explicitly declared non-atomic, and is thus atomic If there is no explicit access declaration for `C.val` in the code of `C`, then `C.val` is declared private and atomic. That is, we set the default to the safest and most restrictive choice. In source code, these declarations are applied to explicit occurrences of the type name `C.val`. The access modification of `C.val` is also transferred to the implicitly declared name `C.default` The syntax looks like this: ``` class C { //only one of the following lines may be specified //the first line is the default private value companion C.val; //nestmates only value companion C.val; //package-mates only public value companion C.val; //all may access // the non-atomic modifier may be present: private non-atomic value companion C.val; public non-atomic value companion C.val; non-atomic value companion C.val; } ``` When a type name `C.val` or an expression `C.default` is used by a class `X`, there are two access checks that occur. First, access from `X` to the class `C` is checked according to the usual rules of Java. If access to `C` is permitted, a second check is done if the companion is not declared `public`. If the companion is declared `private`, then `X` and `C` must be nestmates, or else access will fail. If the companion is neither `public` nor `private`, then `X` and `C` must be in the same package, or else access will fail. ### Example privatized value companion Here is an example of a class which refuses to construct its default value, and which prevents clients from seeing that state: ``` class C { int neverzero; public C(int x) { if (x == 0) throw new IllegalArgumentException(); neverzero = x; } public void print() { System.out.println(this); } private value companion C.val; //privatized (also the default) // some valid uses of C.val follow: public C.val[] flatArray() { return new C.val[]{ this }; } private static C.ref nonConstructedZero() { return (new C.val[1])[0]; //OK: C.val private but available } public static C.ref box(C.val val) { return val; } //OK param type public C.val unbox() { return this; } //OK return type // valid use of private C.default, with Lookup negotiation public static C.ref defaultValue(java.lang.reflect.MethodHandles.Lookup lookup) { if (!lookup.in(C.class).hasFullPrivilegeAccess()) return null; //?or throw return C.default; //OK: default for me and maybe also for thee } } // non-nestmate client: class D { static void passByValue(C x) { C.ref ref = box(x); //OK, although x is null-checked if (false) box((C.ref) null); //would throw NPE assert ref == x; } static Object useValue(C x) { x.unbox().print(); //OK, invoke method on C.val expression var xv = x.unbox(); //OK, although C.val is non-denotable xv.print(); //OK //> C.val xv = x.unbox(); //ERROR: C.val is private return xv; //OK, originally from legitimate method of C } static Object arrays(C x) { var a = x.flatArray(); //> C.val[] va = a; //ERROR: C.val is private Arrays.toString(a); //OK C.ref[] a2 = a; //covariant array assignment C.ref[] na = new C.ref[1]; //> na = new C.val[1]; //ERROR: C.val is private return a[0]; //constructed values only } } ``` The above code shows how a privatized value companion can and cannot be used. The type name may never be mentioned. Apart from that restriction, client code can work with the value companion type as it appears in parameters, return values, local variables, and array elements. In this, a privatized companion behaves like other non-denotable types in Java. > **Rationale:** Note that a companion type is _not_ a real class. Therefore it cannot appeal, precisely, to the existing provisions (in JLS or JVMS) for enforcing class accessibility. But because it is a type, and today _nearly all types are classes_ (and interfaces), users have a right to expect that encapsulation of companion types will "feel like" encapsulation of type names. More precisely, users will hope to re-use their knowledge about how type name access works when reasoning about companion types. We aim to accommodate that hope. If it works, users won't have to think very often about the class-vs-type distinction. That is why the above design emulates pre-existing usage patterns for non-denotable types. ### Privatization in translation When a value class is compiled to a class file, some metadata is included to record the explicit declaration or implicit status of the value companion. The access selection of `C`'s value companion (public, package, private) is encoded in the `value_flags` field of the `ValueClass` attribute of the class information in the class file of `C`. The `value_flags` field (16 bits) has the following legitimate values: - zero: `C.val` default access, non-atomic - `ACC_PUBLIC`: `C.val` public access, non-atomic - `ACC_PRIVATE`: `C.val` private access, non-atomic - `ACC_VOLATILE`: `C.val` default access, atomic - `ACC_VOLATILE|ACC_PUBLIC`: `C.val` public access, atomic - `ACC_VOLATILE|ACC_PRIVATE`: `C.val` private access, atomic Other values are rejected when the class file is loaded. (**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we really care that `jlr.Modifiers` kind-of wants to own the reflection of the contextual modifier `value`? Who are the customers of this modifier bit, as a bit? John doesn't care about it personally, and thinks that if we are going to have an attribute we can get rid of the flag bit. One implementation issue with killing `ACC_VALUE` is that class modifiers are processed very late during class loading, while class modifiers are processed very early. It may be easier to do some kinds of structural checks on the fly during class loading even before class attributes are processed. Yet this also seems like a poor reason to use a modifier bit.) (**JVM ISSUE #1:** What if the attribute is missing; do we reject the class file or do we infer `value_flags=ACC_PRIVATE|ACC_VOLATILE`? Let's just reject the file.) (**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place to store the "atomic" bit as well? This attribute is a green-field for VM design, as opposed to the brown-field of modifier bits. The above language assumes the atomic bit belongs in there as well.) A use of a value companion `C.val`, in any source file, is generally translated to a use of a Q-descriptor `QC;`: - a field declaration of `C.val` translates to a field-info with a Q-descriptor - a method or constructor declaration that mentions `C.val` mentions a corresponding Q-descriptor in its method descriptor - a use of a field resolves a `CONSTANT_Fieldref` with a Q-descriptor component - a use of a method or constructor uses a `CONSTANT_Methodref` (or `CONSTANT_InterfaceMethodref`) with a Q-descriptor component - a `CONSTANT_Class` entry main contain a Q-descriptor or an array type whose element type is a Q-descriptor - a verifier type record may refer to `CONSTANT_Class` which contains a Q-descriptor Privatization is enforced for these uses only as much as is needed to ensure that classes cannot create unintiialized values, fields, and arrays. If an access from bytecode to a privatized Q-descriptor fails, an exception is thrown; its type is `IllegalAccessError`, a subtype of `IncompatibleClassChangeError`. Generally speaking such an exception diagnoses an attempt by bytecode to make an access that would have been prevented by the static compiler, if the Java source program had been compiled together as a whole. When a field of Q-descriptor type is declared in a class file, the descriptor is resolved early, before the class is linked, and that resolution includes an access check which will fail unless the class being loaded has access to `C.val`, as determined by loading `C` and inspecting its `ValueClass` attribute. These checks prevent untrusted clients of `C` from created non-constructed zero values, in any of their fields. The timing of these checks, on fields, is aligned with the internal logic of the JVM which consults the class file of `C` to answer other related questions about field types: (a) whether `C` is in fact a value class, and (b) what is the layout of `C.val`, in case the JVM wishes to flatten the value in a containing field. The third check (c) is `C.val` companion accessible happens at the same time. This is early during class loading for non-static fields, and during class preparation for static fields. Privatization is _not enforced_ for non-field Q-descriptors, that occur in method and constructor signatures, and in state descriptions for the verifier. This is because mere use of Q-descriptors to describe pre-existing values cannot (by itself) expose non-constructed values, when those values are on stack or in locals. > This can happen invisible at the source-code level as well. An API might be designed to return values of a privatized type from its methods or fields, and/or accept values of a privatized type into its methods, constructors, or fields. In general, the bytecode for a client of such an API will work with a mix of Q-descriptor and L-descriptor values. The verifier's type system uses field descriptor types, and thus can "see" both Q-descriptors and L-descriptors. Clients of a class with a privatized companion are likely to work mostly with L-descriptor values but may also have Q-descriptor values in locals and on stack. When feeding an L-descriptor value to an API point that accepts a Q-descriptor, the verifier needs help to keep the types straight. In such cases, the bytecode compiler issues `checkcast` instructions to adjust types to keep the verifier happy, and in this case the operand of the checkcast would be of the form `CONSTANT_Class["QC;"]`. (**JVM ISSUE #3:** The Q/L distinction in the verifier helps the interpreter avoid extra dynamic null checks around `putfield`, `putstatic`, and the `invoke` instructions. This distinction requires an explicit bytecode to fix up Q/L mismatches; the `checkcast` bytecode serves this purpose. That means checkcast requires the ability to work with privatized types. It requires us to make the dynamic permission check when other bytecodes try to use the privatized type. All this seems acceptable, but we could try to make a different design which `CONSTANT_Class` resolution fails immediately if it contains an inaccessible Q-descriptor. That design might require a new bytecode which does what `checkcast` does today on a Q-descriptor.) Meanwhile, arrays are rich sources of non-constructed zero values. They appear in bytecode as follows: - A `C.val[]` array construction uses `anewarray` with a `CONSTANT_Class` type for the Q-descriptor; this is new to Valhalla. - Such an array construction may also use `multianewarray` with an appropriate array type. - An array element is read from heap to stack by `aaload`; the verifier type of the stacked value is copied from the verifier type of the array itself. - An array element is written from stack to heap by `aastore`; the verifier type of the stored value is merely constrained to the type `Object`. Note that there are no static type annotations on array access instruction. The practical impact of this is that, if an array of a privatized type `C.val` is passed outside of `C`, then any values in that array become accessible outside of `C`. Moreover, if `C.val` is non-atomic, clients may be able to inflict data races on the array. Thus, the best point of control over misuse of arrays is their _creation_, not their _access_. Array creation is controlled by `CONSTANT_Class` constant pool entries and their access checking. When an `anewarray` or `multianewarray` tries to create an array, the `CONSTANT_Class` constant pool entry it uses must be consulted to see if the element type is privatized and inaccessible to the current class, and `IllegalAccessError` thrown if that is the case. All this leads to special rules for resolving an entry of the form `CONSTANT_Class["QC;"]`. When resolving such a constant, the class file for `C` is loaded, and `C` is access checked against the current class. (This is just what happens when `CONSTANT_Class["C"]` gets resolved.) Next, the `ValueClass` attribute for `C` is examined; it must exist, and if it indicates privatization of `C.val`, then access is checked for `C.val` against the current class. If that access to a privatized companion would fail, no exception is thrown, but the constant pool entry is resolved into a special restricted state. Thus, a resolved constant pool entry of the form `CONSTANT_Class["QC;"]` can have the following states: - Error, because `C` is inaccessible or doesn't exist or is not a value class. - Full resolution, so `C.val` is ready for general use in the current class. - Restricted resolution, so `C.val` is ready for restricted use in the current class. That last state happens when `C` is accessible but `C.val` is not. Likewise, a constant pool entry of the form `CONSTANT_Class["[QC;"]` (or a similar form with more leading array brackets) can have three states, error, full resolution, and restricted resolution. Pre-Valhalla `CONSTANT_Class` entries which do not mention Q-descriptors have only two resolved states, error and full resolution. As required above, the `checkcast` bytecode treats full resolution and restricted resolution states the same. But when the `anewarray` or `multianewarray` instruction is executed, it consults throws an access error if its `CONSTANT_Class` is not fully resolved (either it is an error or is restricted). This is how the JVM prevents creation of arrays whose component type is an inaccessible value companion type, even if the class file does not correspond to correct Java source code. Here are all the classfile constructs that could refer to a `CONSTANT_Class` constant in the restricted state, and whether they respect it (throwing `IllegalAccessError`): - `checkcast` ignores the restriction and proceeds - `instanceof` ignores the restriction (consistent with `checkcast`) - `anewarray` and `multianewarray` respect the restriction and throw - `ldc` throws (consistent with `C.val.class` in source code) - bootstrap arguments throw (consistent with `ldc`) - verifier types ignore the restriction and continue checking - **(FIXME: There must be more than this.)** Q-descriptors not in `CONSTANT_Class` constants are naturally immune to privatization restrictions. In particular, `CONSTANT_Methodtype` constants can successfully refer to mirrors to privatized companions. Uses of `CONSTANT_Class` constants which forbid Q-descriptors and their arrays are also naturally immune, since they will never encounter a constant resolved in the restricted state. These include `new`, `aconst_init`, the class sub-operands of `CONSTANT_Methodref` and its friends, exception catch-types, and various attributes like `NestHost` and `InnerClasses`: All of the above are allowed to refer only to proper classes, and not to their value companions or arrays. Nevertheless, a `aconst_init` bytecode must throw an access error when applied to a class with an inaccessible privatized value companion. This is worth noting because the constant pool entry for `aconst_init` does _not_ mention a Q-descriptor, unlike the array construction bytecodes. > Perhaps regular class constants of the form `CONSTANT["C"]` would also benefit slightly from a restricted state, which would be significant _only_ to the `aconst_init` bytecode, and ignored by all the above "naturally immune" usages. If a JVM implementation takes this option, the same access check would be performed and recorded for both `CONSTANT["C"]` and `CONSTANT["QC;"]`, but would be respected only by `withvalue` (for the former) and `anewarray` and the other cases noted above (for the latter but _not_ the former). On the other hand, the particular issue would become moot if `aconst_init`, like `withfield`, were restricted to the nest of its class, because then privatization would not matter. The net effect of these rules, so far, is that neither source code nor class files can directly make uninitialized variables of type `C.val`, if the code or class file was not granted access to `C.val` via `C`. Specifically, fields of type `C.val` cannot be declared nor can arrays of type `C.val[]` be constructed. This includes class files as correctly derived from valid source code or as "spun" by dodgy compilers or even as derived validly from old source code that has changed (and revoked some access). > Remember that new nestmates can be injected at runtime via the `Lookup` API, which checks access and then loads new code that enjoys the same access. The level of access depends in detail on the selection of `ClassOption.NESTMATE` (for nestmate injection) or not (for package-mate injection). The JVM uses common rules for these injected nestmates or package-mates and for normally compiled ones. There are no restrictions on the use of `C.ref`, beyond the basic access restrictions imposed by the language and JVM on the name `C`. Access checks for regular references to classes and interfaces are unchanged throughout all of the above. There are more holes to be plugged, however. It will turn out that arrays are once again a problem. But first let's examine how reflection interacts with companion types and access control. ## Privatization and APIs Beyond the language there are libraries that must take account of the privatization of value companions. We start on the shared boundary between language and libraries, with reflection. ### Reflecting privatization Every companion type is reflected by a Java class mirror of type `java.lang.Class`. A Java class mirror _also_ represents the class underlying the type. The distinction between the concept of class and companion type is relatively uninteresting, except for a value class `C`, which has two companion types and thus two mirrors. In Java source code the expression `C.class` obtains the mirror for both `C` and its companion `C.ref`. The expression `C.val.class` obtains the mirror for the value companion, if `C` is a value class. Both expressions check access to `C` as a whole, and `C.val.class` _also_ checks access to the value companion (if it was privatized). But it is a generally recognized fact that Java class mirrors are less secure than the Java class types that the mirrors represent. It is easy to write code that obtains a mirror on a class `C` without directly mentioning the name `C` in source code. One can use reflective lookup to get such mirrors, and without even trying one may also "stumble upon" mirrors to inaccessible classes and companion types. Here are some simple examples: ``` Class lookup() { var name = "java.util.Arrays$ArrayList"; //or name = "java.lang.AbstractStringBuilder"; //> java.lang.invoke.MethodHandles.lookup().findClass(name); //ERROR return Class.forName(name); //OK! } Class stumble1() { //> return java.util.Arrays.ArrayList.class; //ERROR return java.util.Arrays.asList().getClass(); //OK! } Class stumble2() { //> return java.lang.AbstractStringBuilder.class; //ERROR return StringBuilder.class.getSuperclass(); //OK! } Class stumble3() { //> return C.val.class; //ERROR if C.val is privatized return C.ref.class.asValueType(); //OK! } ``` Therefore, access checking class names is not and cannot be the whole story for protecting classes and their companion types from reflective misuse. If a mirror is obtained that refers to an inaccessible non-public class or privatized companion, the mirror will "defend itself" against illegal access by checking whether the caller has appropriate permissions. The same goes for method, constructor, and field mirrors derived from the class mirror: You can reflect a method but when you try to call it all of the access checks (including the check against the class) are enforced against you, the caller of the reflective API. > The checking of the caller has two possible shapes. Either a caller sensitive method looks directly at its caller, or the call is delegated through an API that requires negotiation with a `MethodHandles.Lookup` object that was previously checked against a caller. Now, if a class `C` is accessible but its value companion `C.val` is privatized, all of `C`'s public methods and other API points are accessible (via both companion types), but access is limited to those very specific operations that could create non-constructed instances (via a variable of companion type `C.val`). And this boils down to a limitation on array creation. If you cannot use either source code or reflection to create an array of type `C.val[]`, then you cannot create the conditions necessary to build non-constructed instances. Reflective APIs should be available to report the declared properties of reference companions. It is enough to add the following two methods: - `Class::isNonAtomic` is true only of mirrors of value companions which have been declared non-atomic. On some JVM implementations it *may* additionally be true of `long.class` and/or `double.class`. - `Class::getModifiers`, when applied to a mirror of a value companion, will return a modifier bit-mask that reflects the declared access. (This is compatible with the current behavior of HotSpot for primitive mirrors, which appear as if they were somehow declared `public`, with `abstract` and `final` thrown in to boot.) (Note that most reflective access checking should take care to work with the reference mirror, not the value mirror, as the modifier bits of the two mirrors might differ.) ### Privatization and arrays There are a number of standard API points for creating Java array objects. When they create arrays containing uninitialized elements, then a non-constructed default value can appear. Even when they create properly initialized arrays, if the type is declared non-atomic, then non-constructed values can be created by races. - `java.lang.reflect.Array::newInstance` takes an element mirror and length and builds an array. The elements of the returned array are initialized to the default value of the selected element type. - `java.util.Arrays::copyOf` and `copyOfRange` can extend the length of an existing array to include new uninitialized elements. - A special overloading of `java.util.Arrays::copyOf` can request a different type of the new array copy. - `java.util.Collection::toArray` (an interface method) may extend the length of an existing array, but does not add uninitialized elements. - `java.lang.invoke.MethodHandles.arrayConstructor` creates a method handle that creates uninitialized arrays of a given type, as if by the `anewarray` bytecode. - The serialization API contains an operator for materializing arrays of arbitrary type from the wire format. The basic policy for all these API points is to conservatively limit the creation of arrays of type `C.val[]` if `C.val` is not public. - `java.lang.reflect.Array::newInstance` will throw `IllegalArgumentException` if the element type is privatized. (See below for a possible caller-sensitive enhancement.) - `java.util.Arrays::copyOf` and `copyOfRange` will throw instead of creating uninitialized elements, if the element type is privatized. If only previously existing array elements are copied, there is no check, and this is a use common case (e.g., in `ArrayList::toArray`). - The special overloading of `java.util.Arrays::copyOf` will refuse to create an array of any non-atomic privatized type. (This refusal protects against non-constructed values arising from data races.) It also incorporates the restrictions of its sibling methods, against creating uninitialized elements (even of an atomic type). - `java.lang.invoke.MethodHandles.arrayConstructor` will refuse to create a factory method handle if the element type is privatized. - `java.util.Collection::toArray` needs implementation review; as it is built on top of the previous API points, it may possibly fail if asked to lengthen an array of privatized type. Note that many methods of `toArray` use `Arrays.copyOf` in a safe manner, which does _not_ create uninitialized elements. - `java.util.stream.Stream::toArray`, the various `List::toArray`, and other clients of `Arrays::copyOf` or `Array::newInstance` need implementation review. Where a generic API is involved, the assumption is often that non-flat reference arrays are being created, and in that case no outage is possible, since reference companion arrays can always be freely created. For specialized generics with flat types, additional implementation work is required, in general, to ensure that flat arrays can be created by parties with the right to do so. - The serialization API should restrict its array creation operator. Serialization methods should not attempt to serialize flat arrays either. It is enough to serialize arrays of the reference type. **API ISSUE #1:** Should we relax construction rules for zero-length arrays? This would add complexity but might be a friendly move for some use cases. A zero-length array cannot expose non-constructed values. It may, however, serve as a misleading "witness" that some code has gained permission to work with flat arrays. It's safer to disallow even zero-length arrays. **API ISSUE #2:** What about public value companions of non-public inaccessible classes? In source code, we do not allow arrays of private classes to be made, or of their their public value companions. Should we be more permissive in this case? We could specify that where a value companion has to be checked against a client, its original class gets checked as well; this would exclude some use cases allowed by the above language, which only takes effect if the companion is privatized. An extra check for a public companion seems like busy-work and a source of unnecessary surprises, though. Let's not. There are probably legitimate use cases for arrays of privatized types, with which the new restrictions on the above API points would interfere. So as a backup, we will make API adjustments to work with privatized array types, with an extra handshake to perform the access check (via either caller sensitivity or negotiation with an instance of `MethodHandles.Lookup`). - `java.lang.reflect.Array::newInstance` should probably be made caller sensitive, so it can refrain from throwing if a privatized element type is accessible to the caller. (Alternatively, a new caller-sensitive API point could made, such as `Array::newFlatInstance`. But a new API point seems unnecessary in this case, and caller-sensitivity is common practice in this method's package.) Note that, as is typical of core reflection API points, _many uses_ of `newInstance` will not benefit from the caller sensitivity. - `java.util.Arrays::copyOf` and `copyOfRange` may be joined by additional "companion friendly" methods of a similar character which fill new array elements with some other specified fill value, and/or which cyclically replicate the contents of the original array, and/or which call a functional interface to provide missing elements. The details of this are a matter for library designers to decide. Adding caller sensitivity to these API points is probably the wrong move. - `java.lang.invoke.MethodHandles::arrayConstructor` will be joined by a method of the same name on `MethodHandles.Lookup` which performs a companion check before allowing the array constructor method handle to be returned. It will _not check the class_, just the companion. Note that the use of caller sensitivity in the `Lookup` API is concentrated on the factory method `Lookup::lookup`, which is the starting point for `Lookup`-based negotiation. ### Miscellaneous privatization checks Besides newly-created or extended arrays, there are a few API points in `java.lang.invoke` which expose default values of reflectively determined types. Like the array creation methods, they must simply refuse to expose default values of privatized value companions. - `MethodHandles::zero` and `MethodHandles::empty` will simply refuse to produce a result of a privatized `C.val` type. Clients with a legitimate need to produce such default values can use `MethodHandles::filterReturnValue` and/or `MethodHandles::constant` to create equivalent handles, assuming they already possess the default value. - `MethodHandles::explicitCastArguments` will refuse to convert from a nullable reference to a privatized `C.val` type. Clients with a legitimate need to convert nulls to privatized values can use conditional combinators to do this "the hard way". - The method `Lookup::accessCompanion` will be defined analogously to `Lookup::accessClass`. If `Lookup::accessClass` is applied to a companion, it will check both the class and the companion, whereas `Lookup::accessCompanion` will look only at the possible privatization of the companion. (Thus it can simply refer to `Reflection::verifyCompanionType`.) To support reflective checks against array elements which may be privatized companion types, an internal method of the form `jdk.internal.reflect.Reflection::verifyCompanionType` may be defined. It will pass any reference type (regardless of class accessibility) and for a value companion it will check access of the companion (but not the class itself). ### Building companion-safe APIs The method `Lookup::arrayConstructor` gives enough of a "hook" to create all kinds of safe but friendly APIs in privileged JDK code. The methods in `java.util` could make use of this privileged API to quickly adapt their internal code to create arrays in cases they are refused by the existing methods `Array.newInstance` and `Arrays.copyOf`. For example, a checked method `MethodHandles.Lookup::defaultValue(C)` may be added to provide the default value `C.default` if its companion `C.val` is accessible. It will operate as if it first creates a one-element array of the desired type, and then loads the element. Or, a caller-sensitive method `Class::defaultValue` or `Class::newArray` could be added which check the caller and return the requested result. All such methods can be built on top of `MethodHandles.Lookup`. In general, a library API may be designed to preserve some aspect of companion safety, as it allows untrusted code to work with arrays of privatized value type, while preventing non-constructed values of that type from being materialized. Each such safe and friendly API has to make a choice about how to prevent clients from creating non-constructed states, or perhaps how to allow clients to gain privilege to do so. Some points are worth remembering: - An unprivileged client must not obtain `C.default` if `C.val` is privatized. - An unprivileged client must not obtain a non-empty `C.val[]` array if `C.val` is privatized and non-atomic. - It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) old arrays, if the default is not injected. - If a new array is somehow frozen or wrapped so as be effectively immutable, it is safe as long as it does not expose `C.default` values. - If a value companion is `public`, there is no need for any restriction. - Also, unrestricted use can be gated by a `Lookup` object or caller sensitivity. > In the presence of a reconstruction capability, either in the language or in a library API or as provided by a single class, avoiding non-constructable objects includes allowing legitimate reconstruction requests; each legitimate reconstruction request must somehow preserve the intentions of the class's designer. Reconstruction should act as if field values had been legitimately (from `C`'s API) extracted, transformed, and then again legitimately (to `C`'s API) rebuilt into an instance of `C`. Serialization is an example of reconstruction, since field values can be edited in the wire format. Proposed `with` expressions for records are another example of reconstruction. The `withfield` bytecode is the primitive reconstruction operator, and must be restricted to nestmates of `C` since it can perform all physically possible field updates. Reconstruction operations defined outside of `C` must be designed with great care if they use elevated privileges beyond what `C` provides directly. ## Summary of user model A value class `C` has a value companion `C.val` which denotes the null-hostile (zero-initialized) fully flattenable value type for `C`. Like other type members of `C`, `C.val` can be declared with an access modifier (`public` or `private` or neither). It is therefore quite possible that clients of `C` might be prevented from using the companion type. The operations on `C.val` are almost the same as the operations on plain `C` (`C.ref`), so a private `C.val` is usually not a burden. Operations which are unique to `C.val`, and which therefore may be restricted to you, are: - declaring a field of type `C.val` - making an array with element type `C.val` - getting the default flat value `C.default` - asking for the mirror `C.val.class` Library routines which create empty flattenable arrays of `C.val` might not work as expected, when `C.val` is not public. You'll have to find a workaround, such as: - use a plain `C` reference array to hold your data - use a different API point which is friendly to privatie `C.val` types - ask `C` politely to build such an array for you - crack into `C` with a reflective API and build your own If you look closely at the code for `C`, you might noticed that it uses its private type `C.val` in its public API. This is allowed. Just be aware that null values will not flow through such API points. When you get a `C.val` value into your own code, you can work on it perfectly freely with the type `C` (which is `C.ref`). If a value companion `C.val` is declared `public`, the class has declared that it is willing to encounter its own default value `C.default` coming from untrusted code. If it is declared `private`, only the class's own nest can work with `C.default`. If the value companion is neither public nor private, the class has declared that it is willing to encounter its own default within its own package. If a class has declared its companion non-atomic, it is willing to encounter states arising from data races (across multiple fields) in the same places it is willing to encounter its default value. ### Summary of restrictions From the implementation point of view, the salient task is restricting clients from illegitimately obtaining non-constructed values of `C`, if the author of `C` has asked for such restrictions. (Recall that a _non-constructed value_ of `C` is one obtained without using `C`'s constructor or other public API.) Here are the generally enforced restrictions regarding a privatized type `C.val`: - You cannot mention the name `C.val` or `C.default` in code. - You cannot create and load bytecodes which would implement such a mention. - You cannot obtain `C.default` from a mirror of `C` or `C.val`. - You cannot create a new `C.val[]` array from a mirror of `C` or `C.val`. - You cannot lengthen an existing `C.val[]` array to contain uninitialized elements. - You cannot copy an existing array as a new `C.val[]` array, if `C.val` is declared non-atomic. Even so, let us suppose you are an accident-prone client of `C`. Ignoring the above restrictions, you might go about obtaining a non-constructed value of `C` in several ways, and there is an answer from the system in each case that stops you: - You can mention the `C.val` or `C.default` directly in code, in various ways. - After obtaining the mirror `C.val.class` (by one of several means), you can call `Class::defaultValue`, `MethodHandles::zero`, or a similar API point. - If you can declare a field of type `C.val` directly you can extract an initial value (or a data-race result, if `C.val` is non-atomic). - If you can indirectly create an array of type `C.val`, you can extract an initial value (or a data-race result, if `C.val` is non-atomic). And there are a number of ways you might attempt to indirectly create an array of type `C.val[]`: - Indirectly create it from a mirror using `Array::newInstance` or `Arrays::copyOf` or `MethodHandles::arrayConstructor` or another similar API point. - Create it from a pre-existing array of the same type using `Object::clone` or `Arrays::copyOf` or another similar API point. - Specify such an array on a serialization wire format and deserialize it. Using `C.val` or `C.default` directly is blocked if `C` privatizes its value companion, unless you are coding a nestmate or package-mate of `C`. These checks are applied both at compile time and when the JVM resolves names, so they apply equally to source code and bytecodes created by any means whatsoever. There are no realistic restrictions on obtaining a mirror to a companion type `C.val`. (Accidental and casual direct use of `C.val.class` is prevented by access restrictions on the type name `C.val`. But there are many ways to get around this limitation.) Therefore any method or API which could violate the above generally enforced restrictions must perform an appropriate dynamic access check on behalf of its mirror argument. Such a dynamic access check can be made negotiable by an appeal to caller sensitivity or a `Lookup` check, so a correctly configured call can avoid the restriction. For some simple methods (perhaps `Arrays::copyOf` or `MethodHandles::zero`) there is no negotiation. Depending on the use case, access failure can be worked around via a "negotiable" API point like `Lookup::arrayConstructor`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Sun Jul 3 12:25:45 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 03 Jul 2022 12:25:45 -0000 Subject: Value type companions, encapsulated In-Reply-To: References: Message-ID: <437188389.2994108.1656851122550.JavaMail.zimbra@u-pem.fr> I fully agree on - having the restriction on array creation, not array access, - providing access to companion class/default value through Lookup and - build the reflaction API on top of the Lookup API. One kind of sad thing with CONSTANT_Class QC; is that we need it now but once we will have the new generics, we will not need it anymore because it can be express with a CONSTANT_Specialization_Linkage + a constant dynamic. So it's a kind of temporary design. I wonder if it's not "better" to separate checkcast from unbox/box given that mixing them together result in different resolution for all checkcasts (compare to anewarray). From the language POV, those two kind of checkcasts are different anyway. R?mi > From: "John Rose" > To: "valhalla-spec-experts" > Sent: Sunday, July 3, 2022 5:24:19 AM > Subject: Value type companions, encapsulated > In this message Brian wrote out the major features > of an emerging design for value classes: >> From: Brian Goetz [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] >> To: ? [ mailto:valhalla-spec-experts at openjdk.java.net | >> valhalla-spec-experts at openjdk.java.net ] >> Subject: Re: User model stacking: current status >> Date: Thu, 23 Jun 2022 15:01:24 -0400 > I think controlling the complexity by having a separate > nested declaration of the value companion type will > work very well. > So what exactly does a private value companion do? > What is it you can and cannot do with this type? > What problems are prevented by privatizing it? > How and when is privatization enforced? > What other problems are created by those new rules? > I have been pulling on this thread for a few days > now, and I think I have some answers. > [ http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md | > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md ] > [ http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html | > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html ] > (The Hitchhiker?s Guide suddenly comes to mind. Don?t panic!) > I expect I will be editing these files as we go. > For reference here is a verbatim copy of the MD file > as it stands right now (minus the header): Background > (We will start with background information. The [ > https://partage.u-pem.fr/mail#privatization-to-the-rescue | new stuff comes > afterward ] . Impatient readers can find a very quick [ > https://partage.u-pem.fr/mail#summary-of-restrictions | summary of > restrictions ] at the end.) Affordances of C.ref > Every class or interface C comes with a companion type, the > reference type C.ref derived from C which describes any variable > (argument, return value, array element, etc.) whose values are either > null or of a concrete class derived from C . We are not in the habit > of distinguishing C.ref from C , but the distinction is there. For > example, if we call Object::getClass on a variable of type C.ref > we might not get C.class ; we might even get a null pointer > exception! > We are so very used to working with reference types (for short, > ref-types ) that we sometimes forget all that they do for us > in addition to their linkage to specific classes: > * C.ref gives a starting point for accessing C 's members. > * C.ref provides abstraction: C or a subtype might not be loaded yet. > * C.ref provides the standard uninitialized value null . > * C.ref can link C objects into graphs, even circular ones. > * C.ref has a known size, one "machine word", carefully tuned by the JVM. > * C.ref allows a single large object to be shared from many locations. > * C.ref with an identity class can centralize access to mutable state. > * C.ref values uniformly convert to and from general types like Object . > * C.ref variable types can be reflected using Class mirror objects. > * C.ref is safe for publication if the fields of C are final . > When I store a bunch of C objects into an object array or list, sort > it, and then share it with another thread, I am using several of the > above properties; if the other thread down-casts the items to C.ref > and works on them it relies on those properties. > If I implement C as a doubly-linked list data structure or a > (alternatively) a value-based class with tree structure, I am using > yet more of the above properties of references. > If my C object has a lot of state and I pass out many pointers to > it, and perhaps compute and cache interesting values in its mutable > fields, I am again relying on the special properties of references, > as well as of identity classes (if fields are mutable). > By the way, in the JVM, variables of type C.ref (some of them at > least) are associated not with C simple, but with the so-called > L-descriptor spelled LC; . When we talk about C.ref we are > usually talking about those L-descriptors in the JVM, as well. > I don't need to think much about this portfolio of properties as I go > about my work. But if they were to somehow fail, I would notice bugs > in my code sooner or later. > One of the big consequences of this overall design is that I can write > a class C which has full control over its instance states. If it is > mutable, I can make its fields private and ensure that mutations occur > only under appropriate locking conditions. Or if I declare it as a > value-based class, I can ensure that its constructor only allows > legitimate instances to be constructed. Under those conditions, I > know that every single instance of my class will have been examined > and accepted by the class constructor, and/or whatever factory and > mutator methods I have created for it. If I did my job right, not > even a race condition can create an invalid state in one of my > objects. > Any instance state of C which has been reached without being > produced from a constructor, factory, mutator, or constant of C can > be called non-constructed . Of course, inside a class any state > whatever can be constructed, subject to the types of fields and so on. > But the author of the class gets to decide which states are > legitimate, and the decisions are enforced by access control at the > boundaries of the encapsulation. > So if I code my class right, using access control to keep bad states > away from my clients, my class's external API will have no > non-constructed states. Costs of C.ref > In that case why have value types at all, if references are so > powerful? The answer is that reference-based abstraction pays for its > benefits with particular costs, costs that Java programmers do not > always wish to pay: > * A reference (usually) requires storage for a pointer to the object. > * A reference (usually) requires storage for a header embedded inside the > object. > * Access to an object's fields (usually) requires extra cycles to chase the > pointer. > * The GC expends effort administering a singular "home location" for every > object. > * Cache line invalidation near that home location can cause useless memory > traffic. > * A reference must be able to represent null ; tightly-packed types like int and > long would need to add an extra bit somewhere to cover this. > The major alternative to references, as provided by Valhalla, is flat > objects, where object fields are laid out immediately in their > containers, in place of a pointer which points to them stored > elsewhere. Neither alternative is always better than the other, which > is why Java has both int and Integer types and their arrays, and > why Valhalla will offer a corresponding choice for value classes. Alternative > affordances of C.val > Now, instances of a value class can be laid out flat in their > containing variables. But they can also be "boxed" in the heap, for > classic reference-based access. Therefore, a value class C has not > one but two companion types associated it, not only the reference > companion C.ref but also the value companion C.val . Only value > classes have value companions, naturally. The companion C.val is > called a value type (or val-type for short), by contrast with any > reference type, whether Object.ref or C.ref . > The two companion types are closely related and perform some of the > same jobs: > * C.ref and C.val both give a starting point for accessing C 's members. > * C.ref and C.val can link C objects into acyclic graphs. > * C.ref and C.val values uniformly convert to and from general types like Object > . > * C.ref and C.val variable types can be reflected using Class mirror objects. > For these jobs, it usually doesn't matter which type companion does > the work. > Despite the similarities, many properties of a value companion type > are subtly different from any reference type: > * C.val is non-abstract: You must load its class file before making a variable. > * C.val cannot nest except by reference; C cannot declare a C.val field. > * C.val does not represent the value null . > * C.val is routinely flattenable, avoiding headers and indirection pointers > * C.val has configurable size, depending on C 's non-static fields. > * C.val heap variables (fields, array elements) are initialized to all-zeroes. > * C.val might not be safe for publication (even though its fields are final ). > The JVM distinguishes C.val by giving it a different descriptor, a > so-called Q-descriptor of the form QC; , and it also provides a > so-called secondary mirror C.val.class which is similar to the > built-in primitive mirrors like int.class . > As the Valhalla performance model notes, flattening may be expected > but is not fully guaranteed. A C.val stored in an Object > container is likely to be boxed on the heap, for example. But C.val > objects created as bytecode temporaries, arguments, and return values > are likely to be flattened into machine registers, and C.val fields > and array elements (at least below certain size thresholds) are also > likely to be flattened into heap words. > As a special feature, C.ref is potentially flattenable if C is a > value class. There are additional terms and conditions for flattening > C.ref , however. If C is not yet loaded, nothing can be done: > Remember that reference types have full abstraction as one of their > powers, and this means building data structures that can refer to them > even before they are loaded. But a class file can request that the JVM > "peek" at a class to see if it is a value class, and if this request > is acted on early enough (at the JVM's discretion), then the JVM can > choose to lay out some or all C.ref values as flattened C.val > values plus a boolean or other sentinel value which indicates the > null state. Pitfalls of C.val > The advantages of value companion types imply some complementary > disadvantages. Hopefully they are rarely significant, but they > must sometimes be confronted. > * C.val might need to load a class file which is somehow unloadable > * C.val will fail to load if its instance layout directly or indirectly includes > a C.val field or subfield > * C.val will throw an exception if you try to assign a null to it. > * C.val may have surprising costs for multi-word footprint and assignment (and > so might C.ref if that is flattened) > * C.val is initialized to its all-zero value, which might be non-constructed > * C.val might allow data races on its components, creating values which are > non-constructed > The footprint issue shows up most strongly if you have many copies of > the same C.val value; each copy will duplicate all the fields, as > opposed many copies of the same C.ref reference, which are likely to > all point to a single heap location with one copie of all the fields. > Flat value size can also affect methods like Arrays.sort , which > perform many assignments of the base type, and must move all fields on > each assignment. If a C.val array has many words per element, then > the costs of moving those words around may dominate a sort request. > For array sorting there are ways to reduce such costs transparently, > but it is still a "law of physics" that editing a whole data structure > will have costs proportional to the size of the edited portions of the > data structure, and C.ref arrays will often be somewhat more compact > than C.val arrays. Programmers and library authors will have to use > their heads when deciding between the new alternatives given by value > classes. > But the last two pitfalls are hardest to deal with, because they both > have to do with non-constructed states. These states are the all-zero > state with the second-to-last pitfall, and (with the last pitfall) the > state obtained by mixing two previous states by means of a pair of > racing writes to the same mutable C.val variable in the heap. > Unlike reference types, value types can be manipulated to create these > non-constructed states even in well-designed classes. > Now, it may be that a constructor (or factory) might be perfectly able > to create one of the above non-constructed states as well, no strings > attached. In that case, the class author is enforcing few or no > invariants on the states of the value class. Many numeric classes, > like complex numbers, are like this: Initialization to all-zeroes is > no problem, and races between components are acceptable, compared to > the costs of excluding races. >> (The reader may recall that early JVMs accepted races on the high > and low halves of 64-bit integers as well; this is no longer a > widespread issue, but bigger value types like complex raise the same > issue again, and we need to provide class authors the same solution, > if it fits their class.) > There are also some classes for which there are no good defaults, or > for which a good default is definitely not the all-zero bit pattern. > Authors of such types will often wish to make that bit pattern > inaccessible to their clients and provide some factory or constant > that gives the real default. We expect that such types will choose > the C.ref companion, and rely on the extra null checks to ensure > correct initialization. > Other classes may need to avoid other non-constructed values that may > arise from data races, perhaps for reasons of reliability or security. > This is a subtle trade-off; very few class authors begin by asking > themselves about the consequences of data races on mutable members, > and even fewer will ask about races on whole instances of value > types, especially given that fields in value types are always > immutable. For this reason, we will set safety as the default, so > that a class (like complex numbers) which is willing to tolerate data > races must declare its tolerance explicitly. Only then will the JVM > drop the internal costs of race exclusion. > Whether to tolerate the all-zero bit pattern is a simpler decision. > Still, it turns out to be useful to give a common single point of > declarative control to handle all non-constructed states, both > the default value of C.val and its mysterious data races. Privatization to the > rescue > (Here are the important details about the encapsulation of value > types. The impatient reader may enjoy the very quick [ > https://partage.u-pem.fr/mail#summary-of-restrictions | summary of > restrictions ] at the end of this document.) > In order to hide non-constructed states, the value companion C.val > may be privatized by the author of the class C . A privatized > value companion is effectively withdrawn from clients and kept private > to its own class (and to nestmates). Inside the class, the value > companion can be used freely, fully under control of the class author. > But untrusted clients are prevented from building uninitialized fields > or arrays of type C.val . This prevents such clients from creating > (either accidentally or purposefully) non-constructed values of type > C.val . How privatization is declared and enforced is discussed in > the rest of this document. >> (To review, for those who skipped ahead, non-constructed values are > those not created under control of the class C by constructors or > other accessible API points. A non-constructed value may be either an > uninitialized variable of C.val , or the result of a data race on a > shared mutable variable of type C.val . The class itself can work > internally with such values all day long, but we exclude external > access to them by default.) Atomicity as well > As a second tactic, a value class C may select whether or not the > JVM enforces atomicity of all occurrences of its value companion > C.val . A non-atomic value companion is subject to data races, and > if it is not privatized, external code may misuse C.val variables > (in arrays or mutable fields) to create non-constructed values via > data races. > A value companion which is atomic is not subject to data races. This > will be the default if the the class C does not explicitly request > non-atomicity. This gives safety by default and limits > non-constructed states to only the all-zero initial value. The > techniques to support this are similar to the techniques for > implementing non-tearing of variables which are declared volatile ; > it is as if every variable of an atomic value variable has some (not > all) of the costs of volatility. > The JVM is likely to flatten such an atomic value only up to the > largest available atomically settable memory unit, usually 128 bits. > Values larger than that are likely to be boxed, or perhaps treated > with some other expensive transactional technique. Containers that > are immutable can still be fully flattened, since they are not subject > to data races. > The behavior of an atomic C.val is aligned with that of C.ref . A > reference to a value class C never admits data races on C 's > fields. The reason for this is simple: A C.ref value is a C.val > instance boxed on the heap in a single immutable box-class field of > type C.val . (Actually, the JVM may partially or wholly flatten the > representation of C.ref if it can get away with it; full flattening > is likely for JVM locals and stack values, but any such secret > flattening is undetectable by the user.) Since it is final all the > way down (to C 's fields) any C.ref value is safely published > without any possibility of data races. Therefore, an extra > declaration of non-atomicity in C affects only the value companion > C.val . > It seems that there are use cases which justify all four combinations > of both choices (privatization and declared non-atomicity), although > it is natural to try to boil down the size of the matrix. > * C.val private & atomic is the default, and safest configuration > hiding all non-constructed values outside of C and all data races > even inside of C . There are some runtime costs. > * C.val public & non-atomic is the opposite, with fewer runtime > costs. It must be explicitly declared. It is desirable for > numerics like complex numbers, where all possible bitwise states are > meaningful. It is analogous to the situation of a naturally > non-atomic primitive like long . > * C.val public & atomic allows everybody to see the all-zero > initial value but no other non-constructed states. This is > analogous to the situation of a naturally atomic primitive like > int . > * C.val private & non-atomic allows C complete control over the > visibility of non-constructed states, but C also has the ability > to work internally on arrays of non-atomic elements. C should > take care not to leak internally-created flat arrays to untrusted > clients, lest they use data races to hammer non-constructed values > into those arrays. > It is logically possible, but there does not seem to be a need, for > allowing a single class C to work with both kinds of arrays, atomic > and non-atomic. (In principle, the dynamic typing of Java arrays > would support this, as long as each array was configured at its > creation.) The effect of this can be simulated by wrapping a > non-atomic class C in another wrapper class WC which is atomic. > Then C.val[] arrays are non-atomic and WC.val[] arrays are atomic, > yet each kind of array can have the same "payload", a repeated > sequence of the fields of C . Privatization in code > For source code and bytecode, privatization is enforced by performing > access checks on names. Privatization rules in the language > We will stipulate that a value class C always has a value > companion type C.val , even if it is never declared or used. And we > give the author of C some control over how clients may use the type > C.val , in a manner roughly similar to nested member classes like > C.M . > Specifically, the declaration of C always selects an access mode for > its value companion C.val from one of the following three choices: > * C.val is declared private > * C.val is declared public > * C.val is declared, but neither public nor private > If C.val is declared private, then only nestmates of C may access > C.val . If it is neither public nor private, only classes in the > same runtime package as C may access it. If it is declared public, > then any class that can access C may also access C.val . > As an independent choice, the declaration of C may select an atomicity for its > value companion C.val` from one of the following two choices: > * C.val is explicitly declared non-atomic > * C.val is not explicitly declared non-atomic, and is thus atomic > If there is no explicit access declaration for C.val in the code of > C , then C.val is declared private and atomic. That is, we set the > default to the safest and most restrictive choice. > In source code, these declarations are applied to explicit occurrences > of the type name C.val . The access modification of C.val is also > transferred to the implicitly declared name C.default > The syntax looks like this: > class C { > //only one of the following lines may be specified > //the first line is the default > private value companion C.val; //nestmates only > value companion C.val; //package-mates only > public value companion C.val; //all may access > // the non-atomic modifier may be present: > private non-atomic value companion C.val; > public non-atomic value companion C.val; > non-atomic value companion C.val; > } > When a type name C.val or an expression C.default is > used by a class X , there are two access checks that occur. First, > access from X to the class C is checked according to the usual > rules of Java. If access to C is permitted, a second check is done > if the companion is not declared public . If the companion is > declared private , then X and C must be nestmates, or else access > will fail. If the companion is neither public nor private , then > X and C must be in the same package, or else access will fail. Example > privatized value companion > Here is an example of a class which refuses to construct its default > value, and which prevents clients from seeing that state: > class C { > int neverzero; > public C(int x) { > if (x == 0) throw new IllegalArgumentException(); > neverzero = x; > } > public void print() { System.out.println(this); } > private value companion C.val; //privatized (also the default) > // some valid uses of C.val follow: > public C.val[] flatArray() { return new C.val[]{ this }; } > private static C.ref nonConstructedZero() { > return (new C.val[1])[0]; //OK: C.val private but available > } > public static C.ref box(C.val val) { return val; } //OK param type > public C.val unbox() { return this; } //OK return type > // valid use of private C.default, with Lookup negotiation > public static > C.ref defaultValue(java.lang.reflect.MethodHandles.Lookup lookup) { > if (!lookup.in(C.class).hasFullPrivilegeAccess()) > return null; //?or throw > return C.default; //OK: default for me and maybe also for thee > } > } > // non-nestmate client: > class D { > static void passByValue(C x) { > C.ref ref = box(x); //OK, although x is null-checked > if (false) box((C.ref) null); //would throw NPE > assert ref == x; > } > static Object useValue(C x) { > x.unbox().print(); //OK, invoke method on C.val expression > var xv = x.unbox(); //OK, although C.val is non-denotable > xv.print(); //OK > //> C.val xv = x.unbox(); //ERROR: C.val is private > return xv; //OK, originally from legitimate method of C > } > static Object arrays(C x) { > var a = x.flatArray(); > //> C.val[] va = a; //ERROR: C.val is private > Arrays.toString(a); //OK > C.ref[] a2 = a; //covariant array assignment > C.ref[] na = new C.ref[1]; > //> na = new C.val[1]; //ERROR: C.val is private > return a[0]; //constructed values only > } > } > The above code shows how a privatized value companion can and cannot > be used. The type name may never be mentioned. Apart from that > restriction, client code can work with the value companion type as it > appears in parameters, return values, local variables, and array > elements. In this, a privatized companion behaves like other > non-denotable types in Java. >> Rationale: Note that a companion type is not a real class. > Therefore it cannot appeal, precisely, to the existing provisions (in > JLS or JVMS) for enforcing class accessibility. But because it is a > type, and today nearly all types are classes (and interfaces), users > have a right to expect that encapsulation of companion types will > "feel like" encapsulation of type names. More precisely, users will > hope to re-use their knowledge about how type name access works when > reasoning about companion types. We aim to accommodate that hope. If > it works, users won't have to think very often about the class-vs-type > distinction. That is why the above design emulates pre-existing > usage patterns for non-denotable types. Privatization in translation > When a value class is compiled to a class file, some metadata is > included to record the explicit declaration or implicit status of the > value companion. > The access selection of C 's value companion (public, package, > private) is encoded in the value_flags field of the ValueClass > attribute of the class information in the class file of C . > The value_flags field (16 bits) has the following legitimate values: > * zero: C.val default access, non-atomic > * ACC_PUBLIC : C.val public access, non-atomic > * ACC_PRIVATE : C.val private access, non-atomic > * ACC_VOLATILE : C.val default access, atomic > * ACC_VOLATILE|ACC_PUBLIC : C.val public access, atomic > * ACC_VOLATILE|ACC_PRIVATE : C.val private access, atomic > Other values are rejected when the class file is loaded. > ( JVM ISSUE #0: Can we kill the ACC_VALUE modifier bit? Do we > really care that jlr.Modifiers kind-of wants to own the reflection > of the contextual modifier value ? Who are the customers of this > modifier bit, as a bit? John doesn't care about it personally, and > thinks that if we are going to have an attribute we can get rid of the > flag bit. One implementation issue with killing ACC_VALUE is that > class modifiers are processed very late during class loading, while > class modifiers are processed very early. It may be easier to do some > kinds of structural checks on the fly during class loading even before > class attributes are processed. Yet this also seems like a poor > reason to use a modifier bit.) > ( JVM ISSUE #1: What if the attribute is missing; do we reject the > class file or do we infer value_flags=ACC_PRIVATE|ACC_VOLATILE ? > Let's just reject the file.) > ( JVM ISSUE #2: Is this ValueClass attribute really a good place > to store the "atomic" bit as well? This attribute is a green-field > for VM design, as opposed to the brown-field of modifier bits. The > above language assumes the atomic bit belongs in there as well.) > A use of a value companion C.val , in any source file, is generally > translated to a use of a Q-descriptor QC; : > * a field declaration of C.val translates to a field-info with a Q-descriptor > * a method or constructor declaration that mentions C.val mentions a > corresponding Q-descriptor in its method descriptor > * a use of a field resolves a CONSTANT_Fieldref with a Q-descriptor component > * a use of a method or constructor uses a CONSTANT_Methodref (or > CONSTANT_InterfaceMethodref ) with a Q-descriptor component > * a CONSTANT_Class entry main contain a Q-descriptor or an array type whose > element type is a Q-descriptor > * a verifier type record may refer to CONSTANT_Class which contains a > Q-descriptor > Privatization is enforced for these uses only as much as is needed to > ensure that classes cannot create unintiialized values, fields, and > arrays. > If an access from bytecode to a privatized Q-descriptor fails, an > exception is thrown; its type is IllegalAccessError , a subtype of > IncompatibleClassChangeError . Generally speaking such an exception > diagnoses an attempt by bytecode to make an access that would have > been prevented by the static compiler, if the Java source program had > been compiled together as a whole. > When a field of Q-descriptor type is declared in a class file, the > descriptor is resolved early, before the class is linked, and that > resolution includes an access check which will fail unless the class > being loaded has access to C.val , as determined by loading C and > inspecting its ValueClass attribute. These checks prevent untrusted > clients of C from created non-constructed zero values, in any of > their fields. > The timing of these checks, on fields, is aligned with the internal > logic of the JVM which consults the class file of C to answer other > related questions about field types: (a) whether C is in fact a > value class, and (b) what is the layout of C.val , in case the JVM > wishes to flatten the value in a containing field. The third check > (c) is C.val companion accessible happens at the same time. This is > early during class loading for non-static fields, and during > class preparation for static fields. > Privatization is not enforced for non-field Q-descriptors, that > occur in method and constructor signatures, and in state descriptions > for the verifier. This is because mere use of Q-descriptors to > describe pre-existing values cannot (by itself) expose non-constructed > values, when those values are on stack or in locals. >> This can happen invisible at the source-code level as well. An API > might be designed to return values of a privatized type from its > methods or fields, and/or accept values of a privatized type into its > methods, constructors, or fields. In general, the bytecode for a > client of such an API will work with a mix of Q-descriptor and > L-descriptor values. > The verifier's type system uses field descriptor types, and thus can > "see" both Q-descriptors and L-descriptors. Clients of a class with a > privatized companion are likely to work mostly with L-descriptor > values but may also have Q-descriptor values in locals and on stack. > When feeding an L-descriptor value to an API point that accepts a > Q-descriptor, the verifier needs help to keep the types straight. In > such cases, the bytecode compiler issues checkcast instructions to > adjust types to keep the verifier happy, and in this case the operand > of the checkcast would be of the form CONSTANT_Class["QC;"] . > ( JVM ISSUE #3: The Q/L distinction in the verifier helps the > interpreter avoid extra dynamic null checks around putfield , > putstatic , and the invoke instructions. This distinction requires > an explicit bytecode to fix up Q/L mismatches; the checkcast > bytecode serves this purpose. That means checkcast requires the > ability to work with privatized types. It requires us to make the > dynamic permission check when other bytecodes try to use the > privatized type. All this seems acceptable, but we could try to make > a different design which CONSTANT_Class resolution fails immediately > if it contains an inaccessible Q-descriptor. That design might > require a new bytecode which does what checkcast does today on a > Q-descriptor.) > Meanwhile, arrays are rich sources of non-constructed zero values. > They appear in bytecode as follows: > * A C.val[] array construction uses anewarray with a CONSTANT_Class type for the > Q-descriptor; this is new to Valhalla. > * Such an array construction may also use multianewarray with an appropriate > array type. > * An array element is read from heap to stack by aaload ; the verifier type of > the stacked value is copied from the verifier type of the array itself. > * An array element is written from stack to heap by aastore ; the verifier type > of the stored value is merely constrained to the type Object . > Note that there are no static type annotations on array access > instruction. The practical impact of this is that, if an array of a > privatized type C.val is passed outside of C , then any values in > that array become accessible outside of C . Moreover, if C.val is > non-atomic, clients may be able to inflict data races on the array. > Thus, the best point of control over misuse of arrays is their > creation , not their access . Array creation is controlled by > CONSTANT_Class constant pool entries and their access checking. > When an anewarray or multianewarray tries to create an array, > the CONSTANT_Class constant pool entry it uses must be consulted > to see if the element type is privatized and inaccessible to the > current class, and IllegalAccessError thrown if that is the case. > All this leads to special rules for resolving an entry of the form > CONSTANT_Class["QC;"] . When resolving such a constant, the class > file for C is loaded, and C is access checked against the current > class. (This is just what happens when CONSTANT_Class["C"] gets > resolved.) Next, the ValueClass attribute for C is examined; it > must exist, and if it indicates privatization of C.val , then access > is checked for C.val against the current class. > If that access to a privatized companion would fail, no exception is > thrown, but the constant pool entry is resolved into a special > restricted state. Thus, a resolved constant pool entry of the form > CONSTANT_Class["QC;"] can have the following states: > * Error, because C is inaccessible or doesn't exist or is not a value class. > * Full resolution, so C.val is ready for general use in the current class. > * Restricted resolution, so C.val is ready for restricted use in the current > class. > That last state happens when C is accessible but C.val is not. > Likewise, a constant pool entry of the form CONSTANT_Class["[QC;"] > (or a similar form with more leading array brackets) can have three > states, error, full resolution, and restricted resolution. > Pre-Valhalla CONSTANT_Class entries which do not mention > Q-descriptors have only two resolved states, error and full > resolution. > As required above, the checkcast bytecode treats full resolution and > restricted resolution states the same. > But when the anewarray or multianewarray instruction is executed, > it consults throws an access error if its CONSTANT_Class is not > fully resolved (either it is an error or is restricted). This is how > the JVM prevents creation of arrays whose component type is an > inaccessible value companion type, even if the class file does > not correspond to correct Java source code. > Here are all the classfile constructs that could refer to a > CONSTANT_Class constant in the restricted state, and whether they > respect it (throwing IllegalAccessError ): > * checkcast ignores the restriction and proceeds > * instanceof ignores the restriction (consistent with checkcast ) > * anewarray and multianewarray respect the restriction and throw > * ldc throws (consistent with C.val.class in source code) > * bootstrap arguments throw (consistent with ldc ) > * verifier types ignore the restriction and continue checking > * (FIXME: There must be more than this.) > Q-descriptors not in CONSTANT_Class constants are naturally immune > to privatization restrictions. In particular, CONSTANT_Methodtype > constants can successfully refer to mirrors to privatized companions. > Uses of CONSTANT_Class constants which forbid Q-descriptors and > their arrays are also naturally immune, since they will never > encounter a constant resolved in the restricted state. These include > new , aconst_init , the class sub-operands of CONSTANT_Methodref > and its friends, exception catch-types, and various attributes like > NestHost and InnerClasses : All of the above are allowed to refer > only to proper classes, and not to their value companions or arrays. > Nevertheless, a aconst_init bytecode must throw an access error when > applied to a class with an inaccessible privatized value companion. > This is worth noting because the constant pool entry for aconst_init > does not mention a Q-descriptor, unlike the array construction > bytecodes. >> Perhaps regular class constants of the form CONSTANT["C"] would > also benefit slightly from a restricted state, which would be > significant only to the aconst_init bytecode, and ignored by all > the above "naturally immune" usages. If a JVM implementation takes > this option, the same access check would be performed and recorded for > both CONSTANT["C"] and CONSTANT["QC;"] , but would be respected > only by withvalue (for the former) and anewarray and the other > cases noted above (for the latter but not the former). On the other > hand, the particular issue would become moot if aconst_init , like > withfield , were restricted to the nest of its class, because then > privatization would not matter. > The net effect of these rules, so far, is that neither source code nor > class files can directly make uninitialized variables of type C.val , > if the code or class file was not granted access to C.val via C . > Specifically, fields of type C.val cannot be declared nor can arrays > of type C.val[] be constructed. > This includes class files as correctly derived from valid source code > or as "spun" by dodgy compilers or even as derived validly from old > source code that has changed (and revoked some access). >> Remember that new nestmates can be injected at runtime via the > Lookup API, which checks access and then loads new code that enjoys > the same access. The level of access depends in detail on the > selection of ClassOption.NESTMATE (for nestmate injection) or not > (for package-mate injection). The JVM uses common rules for these > injected nestmates or package-mates and for normally compiled ones. > There are no restrictions on the use of C.ref , beyond the basic > access restrictions imposed by the language and JVM on the name C . > Access checks for regular references to classes and interfaces are > unchanged throughout all of the above. > There are more holes to be plugged, however. It will turn out that > arrays are once again a problem. But first let's examine how > reflection interacts with companion types and access control. Privatization and > APIs > Beyond the language there are libraries that must take account of the > privatization of value companions. We start on the shared boundary > between language and libraries, with reflection. Reflecting privatization > Every companion type is reflected by a Java class mirror of type > java.lang.Class . A Java class mirror also represents the class > underlying the type. The distinction between the concept of class and > companion type is relatively uninteresting, except for a value class > C , which has two companion types and thus two mirrors. > In Java source code the expression C.class obtains the mirror for > both C and its companion C.ref . The expression C.val.class > obtains the mirror for the value companion, if C is a value class. > Both expressions check access to C as a whole, and C.val.class > also checks access to the value companion (if it was privatized). > But it is a generally recognized fact that Java class mirrors are less > secure than the Java class types that the mirrors represent. It is > easy to write code that obtains a mirror on a class C without > directly mentioning the name C in source code. One can use > reflective lookup to get such mirrors, and without even trying one may > also "stumble upon" mirrors to inaccessible classes and companion > types. Here are some simple examples: > Class lookup() { > var name = "java.util.Arrays$ArrayList"; > //or name = "java.lang.AbstractStringBuilder"; > //> java.lang.invoke.MethodHandles.lookup().findClass(name); //ERROR > return Class.forName(name); //OK! > } > Class stumble1() { > //> return java.util.Arrays.ArrayList.class; //ERROR > return java.util.Arrays.asList().getClass(); //OK! > } > Class stumble2() { > //> return java.lang.AbstractStringBuilder.class; //ERROR > return StringBuilder.class.getSuperclass(); //OK! > } > Class stumble3() { > //> return C.val.class; //ERROR if C.val is privatized > return C.ref.class.asValueType(); //OK! > } > Therefore, access checking class names is not and cannot be the whole > story for protecting classes and their companion types from reflective > misuse. If a mirror is obtained that refers to an inaccessible > non-public class or privatized companion, the mirror will "defend > itself" against illegal access by checking whether the caller has > appropriate permissions. The same goes for method, constructor, and > field mirrors derived from the class mirror: You can reflect a method > but when you try to call it all of the access checks (including the > check against the class) are enforced against you, the caller of the > reflective API. >> The checking of the caller has two possible shapes. Either a caller > sensitive method looks directly at its caller, or the call is > delegated through an API that requires negotiation with a > MethodHandles.Lookup object that was previously checked against a > caller. > Now, if a class C is accessible but its value companion C.val is > privatized, all of C 's public methods and other API points are > accessible (via both companion types), but access is limited to those > very specific operations that could create non-constructed instances > (via a variable of companion type C.val ). And this boils down > to a limitation on array creation. If you cannot use either source > code or reflection to create an array of type C.val[] , then you > cannot create the conditions necessary to build non-constructed > instances. > Reflective APIs should be available to report the declared properties > of reference companions. It is enough to add the following two methods: > * Class::isNonAtomic is true only of mirrors of value companions > which have been declared non-atomic. On some JVM implementations it > may additionally be true of long.class and/or double.class . > * Class::getModifiers , when applied to a mirror of a value > companion, will return a modifier bit-mask that reflects the > declared access. (This is compatible with the current behavior of > HotSpot for primitive mirrors, which appear as if they were somehow > declared public , with abstract and final thrown in to boot.) > (Note that most reflective access checking should take care to work > with the reference mirror, not the value mirror, as the modifier bits > of the two mirrors might differ.) Privatization and arrays > There are a number of standard API points for creating Java array > objects. When they create arrays containing uninitialized elements, > then a non-constructed default value can appear. Even when they > create properly initialized arrays, if the type is declared > non-atomic, then non-constructed values can be created by races. > * java.lang.reflect.Array::newInstance takes an element mirror and length and > builds an array. The elements of the returned array are initialized to the > default value of the selected element type. > * java.util.Arrays::copyOf and copyOfRange can extend the length of an existing > array to include new uninitialized elements. > * A special overloading of java.util.Arrays::copyOf can request a different type > of the new array copy. > * java.util.Collection::toArray (an interface method) may extend the length of > an existing array, but does not add uninitialized elements. > * java.lang.invoke.MethodHandles.arrayConstructor creates a method handle that > creates uninitialized arrays of a given type, as if by the anewarray bytecode. > * The serialization API contains an operator for materializing arrays of > arbitrary type from the wire format. > The basic policy for all these API points is to conservatively limit > the creation of arrays of type C.val[] if C.val is not public. > * > java.lang.reflect.Array::newInstance will throw IllegalArgumentException if the > element type is privatized. (See below for a possible caller-sensitive > enhancement.) > * > java.util.Arrays::copyOf and copyOfRange will throw instead of creating > uninitialized elements, if the element type is privatized. If only previously > existing array elements are copied, there is no check, and this is a use common > case (e.g., in ArrayList::toArray ). > * > The special overloading of java.util.Arrays::copyOf will refuse to create an > array of any non-atomic privatized type. (This refusal protects against > non-constructed values arising from data races.) It also incorporates the > restrictions of its sibling methods, against creating uninitialized elements > (even of an atomic type). > * > java.lang.invoke.MethodHandles.arrayConstructor will refuse to create a factory > method handle if the element type is privatized. > * > java.util.Collection::toArray needs implementation review; as it is built on top > of the previous API points, it may possibly fail if asked to lengthen an array > of privatized type. Note that many methods of toArray use Arrays.copyOf in a > safe manner, which does not create uninitialized elements. > * > java.util.stream.Stream::toArray , the various List::toArray , and other clients > of Arrays::copyOf or Array::newInstance need implementation review. Where a > generic API is involved, the assumption is often that non-flat reference arrays > are being created, and in that case no outage is possible, since reference > companion arrays can always be freely created. For specialized generics with > flat types, additional implementation work is required, in general, to ensure > that flat arrays can be created by parties with the right to do so. > * > The serialization API should restrict its array creation operator. Serialization > methods should not attempt to serialize flat arrays either. It is enough to > serialize arrays of the reference type. > API ISSUE #1: Should we relax construction rules for zero-length > arrays? This would add complexity but might be a friendly move for > some use cases. A zero-length array cannot expose non-constructed > values. It may, however, serve as a misleading "witness" that some > code has gained permission to work with flat arrays. It's safer to > disallow even zero-length arrays. > API ISSUE #2: What about public value companions of non-public > inaccessible classes? In source code, we do not allow arrays of > private classes to be made, or of their their public value companions. > Should we be more permissive in this case? We could specify that > where a value companion has to be checked against a client, its > original class gets checked as well; this would exclude some use cases > allowed by the above language, which only takes effect if the > companion is privatized. An extra check for a public companion seems > like busy-work and a source of unnecessary surprises, though. Let's > not. > There are probably legitimate use cases for arrays of privatized > types, with which the new restrictions on the above API points would > interfere. So as a backup, we will make API adjustments to work with > privatized array types, with an extra handshake to perform the access > check (via either caller sensitivity or negotiation with an instance > of MethodHandles.Lookup ). > * > java.lang.reflect.Array::newInstance should probably be made caller sensitive, > so it can refrain from throwing if a privatized element type is accessible to > the caller. (Alternatively, a new caller-sensitive API point could made, such > as Array::newFlatInstance . But a new API point seems unnecessary in this case, > and caller-sensitivity is common practice in this method's package.) Note that, > as is typical of core reflection API points, many uses of newInstance will not > benefit from the caller sensitivity. > * > java.util.Arrays::copyOf and copyOfRange may be joined by additional "companion > friendly" methods of a similar character which fill new array elements with > some other specified fill value, and/or which cyclically replicate the contents > of the original array, and/or which call a functional interface to provide > missing elements. The details of this are a matter for library designers to > decide. Adding caller sensitivity to these API points is probably the wrong > move. > * > java.lang.invoke.MethodHandles::arrayConstructor will be joined by a method of > the same name on MethodHandles.Lookup which performs a companion check before > allowing the array constructor method handle to be returned. It will not check > the class , just the companion. Note that the use of caller sensitivity in the > Lookup API is concentrated on the factory method Lookup::lookup , which is the > starting point for Lookup -based negotiation. > Miscellaneous privatization checks > Besides newly-created or extended arrays, there are a few API points > in java.lang.invoke which expose default values of reflectively > determined types. Like the array creation methods, they must simply > refuse to expose default values of privatized value companions. > * MethodHandles::zero and MethodHandles::empty will simply > refuse to produce a result of a privatized C.val type. Clients > with a legitimate need to produce such default values can use > MethodHandles::filterReturnValue and/or MethodHandles::constant > to create equivalent handles, assuming they already possess the > default value. > * MethodHandles::explicitCastArguments will refuse to convert from > a nullable reference to a privatized C.val type. Clients with a > legitimate need to convert nulls to privatized values can use > conditional combinators to do this "the hard way". > * The method Lookup::accessCompanion will be defined analogously > to Lookup::accessClass . If Lookup::accessClass is applied to a > companion, it will check both the class and the companion, whereas > Lookup::accessCompanion will look only at the possible > privatization of the companion. (Thus it can simply refer to > Reflection::verifyCompanionType .) > To support reflective checks against array elements which may be > privatized companion types, an internal method of the form > jdk.internal.reflect.Reflection::verifyCompanionType may be defined. > It will pass any reference type (regardless of class accessibility) > and for a value companion it will check access of the companion (but > not the class itself). Building companion-safe APIs > The method Lookup::arrayConstructor gives enough of a "hook" to > create all kinds of safe but friendly APIs in privileged JDK code. > The methods in java.util could make use of this privileged API to > quickly adapt their internal code to create arrays in cases they are > refused by the existing methods Array.newInstance and > Arrays.copyOf . > For example, a checked method MethodHandles.Lookup::defaultValue(C) > may be added to provide the default value C.default if its companion > C.val is accessible. It will operate as if it first creates a > one-element array of the desired type, and then loads the element. > Or, a caller-sensitive method Class::defaultValue or Class::newArray > could be added which check the caller and return the requested result. > All such methods can be built on top of MethodHandles.Lookup . > In general, a library API may be designed to preserve some aspect of > companion safety, as it allows untrusted code to work with arrays of > privatized value type, while preventing non-constructed values of that > type from being materialized. Each such safe and friendly API has to > make a choice about how to prevent clients from creating > non-constructed states, or perhaps how to allow clients to gain > privilege to do so. Some points are worth remembering: > * An unprivileged client must not obtain C.default if C.val is privatized. > * An unprivileged client must not obtain a non-empty C.val[] array if C.val is > privatized and non-atomic. > * It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) > old arrays, if the default is not injected. > * If a new array is somehow frozen or wrapped so as be effectively immutable, it > is safe as long as it does not expose C.default values. > * If a value companion is public , there is no need for any restriction. > * Also, unrestricted use can be gated by a Lookup object or caller sensitivity. >> In the presence of a reconstruction capability, either in the > language or in a library API or as provided by a single class, > avoiding non-constructable objects includes allowing legitimate > reconstruction requests; each legitimate reconstruction request must > somehow preserve the intentions of the class's designer. > Reconstruction should act as if field values had been legitimately > (from C 's API) extracted, transformed, and then again legitimately > (to C 's API) rebuilt into an instance of C . Serialization is an > example of reconstruction, since field values can be edited in the > wire format. Proposed with expressions for records are another > example of reconstruction. The withfield bytecode is the primitive > reconstruction operator, and must be restricted to nestmates of C > since it can perform all physically possible field updates. > Reconstruction operations defined outside of C must be designed with > great care if they use elevated privileges beyond what C provides > directly. Summary of user model > A value class C has a value companion C.val which denotes the > null-hostile (zero-initialized) fully flattenable value type for C . > Like other type members of C , C.val can be declared with an access > modifier ( public or private or neither). It is therefore quite > possible that clients of C might be prevented from using the > companion type. > The operations on C.val are almost the same as the operations on > plain C ( C.ref ), so a private C.val is usually not a burden. > Operations which are unique to C.val , and which therefore may > be restricted to you, are: > * declaring a field of type C.val > * making an array with element type C.val > * getting the default flat value C.default > * asking for the mirror C.val.class > Library routines which create empty flattenable arrays of C.val > might not work as expected, when C.val is not public. You'll have > to find a workaround, such as: > * use a plain C reference array to hold your data > * use a different API point which is friendly to privatie C.val types > * ask C politely to build such an array for you > * crack into C with a reflective API and build your own > If you look closely at the code for C , you might noticed that it > uses its private type C.val in its public API. This is allowed. > Just be aware that null values will not flow through such API points. > When you get a C.val value into your own code, you can work on it > perfectly freely with the type C (which is C.ref ). > If a value companion C.val is declared public , the class has > declared that it is willing to encounter its own default value > C.default coming from untrusted code. If it is declared private , > only the class's own nest can work with C.default . If the value > companion is neither public nor private, the class has declared that > it is willing to encounter its own default within its own package. > If a class has declared its companion non-atomic, it is willing to > encounter states arising from data races (across multiple fields) in > the same places it is willing to encounter its default value. Summary of > restrictions > From the implementation point of view, the salient task is restricting > clients from illegitimately obtaining non-constructed values of C , > if the author of C has asked for such restrictions. (Recall that a > non-constructed value of C is one obtained without using C 's > constructor or other public API.) Here are the generally enforced > restrictions regarding a privatized type C.val : > * You cannot mention the name C.val or C.default in code. > * You cannot create and load bytecodes which would implement such a mention. > * You cannot obtain C.default from a mirror of C or C.val . > * You cannot create a new C.val[] array from a mirror of C or C.val . > * You cannot lengthen an existing C.val[] array to contain uninitialized > elements. > * You cannot copy an existing array as a new C.val[] array, if C.val is declared > non-atomic. > Even so, let us suppose you are an accident-prone client of C . > Ignoring the above restrictions, you might go about obtaining a > non-constructed value of C in several ways, and there is an > answer from the system in each case that stops you: > * You can mention the C.val or C.default directly in code, in various ways. > * After obtaining the mirror C.val.class (by one of several means), you can call > Class::defaultValue , MethodHandles::zero , or a similar API point. > * If you can declare a field of type C.val directly you can extract an initial > value (or a data-race result, if C.val is non-atomic). > * If you can indirectly create an array of type C.val , you can extract an > initial value (or a data-race result, if C.val is non-atomic). > And there are a number of ways you might attempt to indirectly create > an array of type C.val[] : > * Indirectly create it from a mirror using Array::newInstance or Arrays::copyOf > or MethodHandles::arrayConstructor or another similar API point. > * Create it from a pre-existing array of the same type using Object::clone or > Arrays::copyOf or another similar API point. > * Specify such an array on a serialization wire format and deserialize it. > Using C.val or C.default directly is blocked if C privatizes its > value companion, unless you are coding a nestmate or package-mate of > C . These checks are applied both at compile time and when the JVM > resolves names, so they apply equally to source code and bytecodes > created by any means whatsoever. > There are no realistic restrictions on obtaining a mirror to a > companion type C.val . (Accidental and casual direct use of > C.val.class is prevented by access restrictions on the type name > C.val . But there are many ways to get around this limitation.) > Therefore any method or API which could violate the above generally > enforced restrictions must perform an appropriate dynamic access check > on behalf of its mirror argument. > Such a dynamic access check can be made negotiable by an appeal to > caller sensitivity or a Lookup check, so a correctly configured call > can avoid the restriction. For some simple methods (perhaps > Arrays::copyOf or MethodHandles::zero ) there is no negotiation. > Depending on the use case, access failure can be worked around via a > "negotiable" API point like Lookup::arrayConstructor . -------------- next part -------------- An HTML attachment was scrubbed... URL: From jens.lidestrom at fripost.org Sat Jul 23 14:51:13 2022 From: jens.lidestrom at fripost.org (=?UTF-8?Q?Jens_Lidestr=c3=b6m?=) Date: Sat, 23 Jul 2022 16:51:13 +0200 Subject: Iterable and valhalla In-Reply-To: References: Message-ID: This kind of cursor class is an interesting application of value classes. They have been discussed on the mailing lists previously and are also used as an example in the State of Valhalla document: https://github.com/openjdk/valhalla-docs/blob/100f007ceba1beae11d9bdb7eef017b0f7d980e9/site/design-notes/state-of-valhalla/02-object-model.md#value-classes-separating-references-from-identity Mutable iterators in C# is an interesting application of mutable value classes. It would be interesting to dig up the initial design discussion about mutable contra immutable value classes in the mailing list archives. It's too bad that Pipermail archives are so hard to search... BR, Jens Lidestr?m On 2022-07-22 00:00, Robbe Pincket wrote: > Hello all > > I was recently thinking about cases where the new "value classes"/"primitive classes" (or whatever they'll be called) can be used. One of the common places where I learned C# devs use their structs, back when I used to code in C#, is in iterators. However, this is not a case we can mirror in Java, as our version of these "structs" are immutable. However there are variants that could still allow these, and I was wondering whether any thought has been given to those (or other variants) yet, and/or whether they are deemed not useful. > > One such variation: > > ```java > > class ArrayList implements Iterable2 { > > ??? @Override > > ??? public Iterator2 iterator2() { > > ??????? return new ArrayListCursor<>(this, 0); > > ??? } > > ????value record ArrayListCursor(ArrayList list, int index) implements Iterator2 { > > ??????? @Override > > ??????? public boolean hasNext() { > > ??????????? return index < list.size(); > > ??????? } > > ????????@Override > > ????????public Tupple2> moveNext() { > > ??????????? if (!this.hasNext()) { > > ??????????????? throw new NoSuchElementException(); > > ??????????? } > > ????????????return new Tupple2(list.get(index), new ArrayListCursor<>(list, index + 1)); > > ??????? } > > ????????// or > > ??????? @Override > > ??????? public T next() { > > ??????????? if (!this.hasNext()) { > > ??????????????? throw new NoSuchElementException(); > > ??????????? } > > ????????????return list.get(index); > > ??????? } > > ????????@Override > > ????????Iterator2 moveNext() { > > ??????????? if (!this.hasNext()) { > > ??????????????? throw new NoSuchElementException(); > > ??????????? } > > ????????????return new ArrayListCursor<>(list, index + 1); > > ??????? } > > ??? } > > } > > ``` > > Greetings > > Robbe Pincket > From brian.goetz at oracle.com Mon Jul 25 14:05:32 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Jul 2022 10:05:32 -0400 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> Message-ID: I had another read through your Values document (https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/edit#). Let me try to summarize. Values.? You want to use Values to describe "free floating pieces of data."? They don't live any place specific, they have no identity, they are immutable.? Every value has a type, but values do not necessarily incorporate their own typestate; this may live elsewhere (e.g., field descriptors.) Variables.? Variables can hold values.? Variables have types, which determine which values may be written to them and what we can assume about values read from them. Containers.? Variables live in containers (classes, instances, arrays, stack frames.) Kinds of values.? Values are primitives, references to objects, or the special reference null. Objects.? Objects have an independent existence, are self-describing (e.g., Object::getClass), may have identity, and can only be interacted with through references. I think this is a valid model of where things are today, though I think that some of the "essential characteristics" of Objects in your model may be more accidental than you give them credit for.? That is, some of these characteristics are of "objects that are the target of an object *reference*", which happens to be all the objects today.? Similarly, "has its own independent existence" may feel more accidental once references are optional. Of course something will have to change, and we want that change to feel natural and not pulling the rug out from under user's mental models. The change I'm proposing in this model is: Instead of values being "primitives and object references", values become "value objects and object references".? A Complex.val is a value.? 3+2i still meets all the requirements of values: free floating, no canonical location, no identity, immutable.? It's just a "bigger" value that we could have before.? Primitives become value objects.? I think people can understand (and will like) this story. Variables and containers are unchanged. Objects are instances of classes.? Instances of identity classes remain dependent on references to interact with them; instances of value classes can also be the target of references, *and* are values on their own.? (This is not excessively weird, since "Complex" and "reference to Complex" is like `int` and `int *` in C.) Let's take a look at your essential characteristics of objects again. ?- Objects are entities, they have their own independent existence.? I think this one is a consequence of "objects only can be interacted with through references."? That is, there is a kind of value called "reference to object", and the reference refers to ... an object, which is a thing separate from the reference. So, *if* an object is the target of a reference, then yes, it must be an entity that is somewhere else, with its independent existence.? "Thing that is the target of an object reference" is one reasonable definition for "Object", but I don't think it is the only one.? What I'm saying is that I think its fair to say an instance of Complex is an object (and further, that saying "its an instance, but not an object", is likely to be more confusing that beneficial.)? I think the term for what you are describing is *referent*, and not all objects are referents. ?- Objects are self-describing.? By this I'll assume you mean Object::getClass.? Here, I say that objects remain self-describing under the "instances are objects" model, but something interesting happens under the hood about *where* the description lives.? If I have a `Complex` in a variable of type `Complex.val`, there is sufficient information *in the container* to know the class of the instance, so the instance doesn't have to carry it with it.? There is an operation for "take a reference of" that can be applied to value objects. This operation (logically, though this is frequently optimized away) takes that information out of the container and puts it in an object header.? But regardless of whether the typestate is in the container or the object itself, objects are self-describing. ?- Only an object can have identity.? Remains true; new thing is that not all objects have identity. ?- An object is always accessed by reference.? This is what I'm saying changes; value objects are values. So I think that what you are describing as essential characteristics of objects, are really essential characteristics of *referents of object references*.? And I would argue that while this is a well-defined concept, it's not the most important distinction we want to put in the user's face. Instead, we can say that an instance of Complex can be a value, or it can be a referent, but its the same Complex either way, and the user gets to decide what packaging it wants to put it in. On 7/22/2022 7:16 PM, Brian Goetz wrote: >> Now I wonder if these points, at least, might be uncontroversial: >> >> 1. There exist useful well-defined concepts of "value" and "object" >> that are disjoint and that *have been* valid up to now. (I'll hazard >> a claim that my paper still defends at least /this/?much well enough.) >> 2. Also, you've had to treat the two quite differently from each >> other in your programs. >> 3. We *are* changing (improving) #2 through this project. > > I claim we are changing #1 as well, though to a lesser degree. ?#2 > should ?mostly go away?; #1 should transform into other terms, such as > e.g. ?object stored directly? vs ?reference to object?. ?It is those > other terms that I think we are searching for consensus on, but #1 is > moving. > >> 4. But users may still need #1's disjoint concepts when they are >> trying to reason about the *performance* model (tho they'll also need >> to understand that the VM is empowered to "fake" one as the other >> when the spirit so moves it). > > Yes, though I think these are concepts that are more _derived from_ > the distinction in #1. ?John?s notion of ?placement? is good here; the > choice of ref vs val constrains the placement, and placement informs > the performance model. ?I think part of what has been missing until > today is a good attempt to name the intermediate actors, like > placement. ?I hope that if we refine those terms a bit, things will > get clearer. > >> 5. The questions at hand in this thread are not foremost about the >> performance model but about the basic "start-here" user model. >> 6. These miiight be fair descriptions of the 2 camps? >> >> A. Because you'll get to program mostly the same way in both >> cases, we can and should de-emphasize the distinction. There >> might be a reference sitting in between you and the data/"object" >> or there might not. It's mostly in the VM's hands. If you ever >> think you care about the distinction, you probably are dipping >> down into the performance model. There is a "just don't worry >> about it!" flavor to this option. >> B. It's still helpful to have a solid sense of the distinction, >> even as we benefit from getting to code the same way to each. >> Even though the VM might really fake one as the other; again, >> that's performance model. >> >> >> Anything controversial about the above? > > No, and I want to choose both A and B! ?I don?t think they are > opposed, I think they are different angles on the elephant. > >> (If I had to explain why I've been so dogged about B, maybe it's the >> sense that we simply won't "get away with" A. It feels hard (to me) >> to tell users simultaneously that they should stop caring about a >> distinction AND that we're changing up how all kinds of stuff works >> across that distinction. It feels more solid to firm up the >> distinction so that we can talk about how things are changing, and >> then let that distinction just slowly matter less and less over time.) > > Agree that we need a good "start here? story, but I think a good one > will have aspects of A and B. ?I think we?re making progress? > >> >> >> On Fri, Jul 22, 2022 at 12:02 PM John Rose >> wrote: >> >> On 22 Jul 2022, at 10:55, Brian Goetz wrote: >> >> ? >> >> So then, would we call an instance of `Complex.val` a >> "non-heap object" or an "inlined object" or what? We need >> to flesh out a whole lexicon. The phrase "value object" >> becomes useless for this particular distinction as it >> will apply to both. >> >> Yes, in the taxonomy I?m pushing, a ?value object? is one >> without identity, and is the kind of object you can store >> directly in variables without going through a reference. But >> I don?t think that there are instances of Complex.val and >> instances of Complex.ref; I think there are instances of >> *Complex*, and multiple ways to describe/store/access them. >> >> FTR, I enthusiastically agree with this viewpoint, even though I >> am also probing for weaknesses and alternatives. (FTR I feel the >> same about Brian?s summary in his previous short message.) >> >> And under this viewpoint, the terms ?instance? and ?object? have >> the same denotation, though difference connotations. (When I say >> ?instance? you may well think, ?instance of what?? But you don?t >> ask that question so much if I say ?object?.) >> >> That `int/Integer` decision you've been making has >> always been between (1) value and (2) (reference-to) >> object, and that decision is still exactly between >> (1) value and (2) (reference-to) object now, and btw >> the definitions of 'reference' and 'object' remain >> precisely wedded to each other as always. >> >> The "heap object" alternative strikes me (and I am trying >> to be fair, here) as: >> >> Now, that's an object either way, and you're going to >> apply that old thought process toward which *kind* of >> object you mean, either a (1) "inline object" or a >> (2) "(reference-to) heap object". It's now just heap >> objects and references that are paired together. >> >> I think, Kevin, you are going wrong at this point: It?s not a >> /kind/ of object, it is a /placement/ of an object. What ?kind? >> of person am I when I am diving to the office? Surely the same >> ?kind? as when I am at home. But when I am driving, I am equipped >> with a car and a road, much like a heap-placed object is equipped >> with a header and references. >> >> Likewise, an int/Integer is (in Valhalla) the same ?kind? of >> object (if we go all the way to making primitives be honorary >> objects) whether it is placed in heap or on stack or inside >> another object. >> >> The distinction that comes from the choice of equipping an int >> with a header in heap storage is a distinction of placement (and >> corresponding representation). So an int/Integer does not >> intrinsically have a header because it is an object (because of >> its ?kind?). It /may/ have a header if the JVM needs to give it >> one, because it is stuck in the heap. >> >> (My points about int/Integer could partly fail if we fail to >> align int and Integer in the end. So transfer the argument to >> C.val/C.ref if you prefer. It is the same argument.) >> >> And I would say the /placement/ of an object is in three broad >> cases which are worth teaching even to beginners: >> >> * >> >> ?in the heap?: therefore referred to by a machine word >> address, and presumably equipped with a header and maybe >> surrounded by some alignment waste; a JVM might have multiple >> heaps but at this level of discourse we say ?the heap? >> >> * >> >> ?on the stack?: therefore manipulated directly by its >> components, which are effectively separated into scalars (it >> is ?scalarized?, we sometimes say); we might sometimes wish >> to say ?JVM stack or locals? instead of ?stack?, or, with >> increasing detail, ?on stack, in locals, and/or in registers, >> and/or as immediates in the machine code? >> >> * >> >> ?contained in another object?: in a field or array element, >> therefore piggy-backing on the other object?s placement; and >> note that even arrays are scalarized sometimes, lifting their >> elements into registers etc. >> >> To summarize: |Placement = Heap | Stack | Contained[Placement]|. >> >> One might use the term ?inline? somewhere in there, either to >> mean |Contained| or |Stack|Contained[*]|. >> >> Static field values are a special case, but they can be >> classified in one of the above ways. HotSpot places static fields >> inside a special per-class object (the mirror, in fact), so their >> values are either contained or separate in the heap (JVM?s choice >> again). >> >> One might be pedantic and say that an instance can be contained >> ?in static memory? (neither heap nor stack) if the JVM implements >> storage for static fields outside of the heap. But in that case >> I?d rather say that they are in a funny corner of the heap, where >> perhaps headers are not needed, because some static metadata >> somewhere dictates what is stored. >> >> (Hence I like to be cagey about whether a heap-object actually >> has a physical header. It might not in some JVM implementations.) >> >> Starting to prefer the first way (as I did) did not feel >> like going rogue: after all, did we not gravitate toward >> ".ref" and ".val" as our placeholder syntaxes, not >> ".inline" and ".heap" or anything else? >> >> With you on this. I think asking users to reason about ?heap >> objects? vs ?inline objects? is pushing them towards the >> implementation, not the concepts. They may have to reason >> about this to understand the performance model, but that?s >> already advanced material. >> >> Yes. And even more specifically in the implementation, users who >> think about ?heap objects? are really (IMO) trying to predict the >> /placement/ of the objects, /where/ the JVM will choose to place >> their bits in physical memory. >> >> This question of placement is very interesting to the ?alert? >> performance-minded programmer. Not every programmer is in that >> state; for me I try to practice ?first make it work then make it >> fast?. I get ?alert? to performance only in the ?make it fast >> phase?, a phase which many of my codes never reach. >> >> As a sort of ?siren song? the question of placement is /also/ >> interesting to the beginning student who is struggling to build a >> mental image of Java data, and is reaching for visualizations in >> terms of memory and addresses, or (what is about the same) boxes >> and arrows. But the JVM will make a hash of all that, if it is >> doing a good job. So the student must be told to hold those >> mental models lightly. >> >> Kevin is insisting (for his own good reasons) on his answer to >> ?where are the objects?: They are always ?in the heap? and thus >> ?with headers, accessed by pointers?. I suspect (but haven?t seen >> from Kevin himself yet) that this is in part due to a desire to >> work with, rather than work against, the student?s desire to make >> simple visual models of Java data. >> >> Crucially, in a literal ?boxes and arrows? model, an arrow >> (perhaps a |C.ref| reference to an instance) looks very different >> from a nested box (perhaps a |C.val| instance), and the naive >> user might insist that such differences are part of the contract >> between the user and the JVM. But they are not. The JVM might >> introduce invisible ?arrows? (because of heap buffering) and it >> might remove arrows (because of scalarization for a number of >> possible reasons). >> >> So if the student is told that the arrows and boxes are ?what?s >> really going on? the student using that assurance to predict >> performance and footprint will feel cheated in the end. >> >> To summarize: Any given instance/object has logically independent >> properties of class and placement. >> >> And thus: The choice of companion type does not affect class but >> may (may!) affect placement. >> >> Circling back to the language design, it might seem odd that >> there are three ways to place an object but just two companion >> types. But this oddness goes away if you realize that |C.val| and >> |C.ref| are not placement directives. The choice between the two >> is a net-binary selection from a sizeable menu of ?affordances? >> that the user might be expecting or disavowing at any given point >> in the code. (See my lists of ?affordances? and ?alternative >> affordances? in encapsulating-val >> .) >> >> The user is given this simplified switch to influence the JVM?s >> decisions about placement (and therefore representation). It is >> useful because the JVM can employ different implementation >> tactics depending on the differences between the user-visible >> contracts of |C.ref| and of |C.val|. In the choice of >> implementation tactics, the JVM has the final say. >> >> >> >> -- >> Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Mon Jul 25 16:33:48 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Jul 2022 09:33:48 -0700 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> Message-ID: On Fri, Jul 22, 2022 at 4:16 PM Brian Goetz wrote: > Now I wonder if these points, at least, might be uncontroversial: > > 1. There exist useful well-defined concepts of "value" and "object" that > are disjoint and that *have been* valid up to now. (I'll hazard a claim > that my paper still defends at least *this* much well enough.) > 2. Also, you've had to treat the two quite differently from each other in > your programs. > 3. We *are* changing (improving) #2 through this project. > > I claim we are changing #1 as well, though to a lesser degree. #2 should > ?mostly go away?; #1 should transform into other terms, such as e.g. > ?object stored directly? vs ?reference to object?. It is those other terms > that I think we are searching for consensus on, but #1 is moving. > #1 is only addressing the fact that something *has been* true thus far, which is not something we can change. It was supposed to be easy common ground. :-) It sounds to me like you are really alluding here to the A vs B distinction below? > 4. But users may still need #1's disjoint concepts when they are trying to > reason about the *performance* model (tho they'll also need to understand > that the VM is empowered to "fake" one as the other when the spirit so > moves it). > > > Yes, though I think these are concepts that are more _derived from_ the > distinction in #1. John?s notion of ?placement? is good here; the choice > of ref vs val constrains the placement, and placement informs the > performance model. I think part of what has been missing until today is a > good attempt to name the intermediate actors, like placement. I hope that > if we refine those terms a bit, things will get clearer. > In my attempts to flesh out and understand "your model" (which I attempted to describe as "A" below and which I've called "VAO" in the past for "values are objects", vs. my preferred model "VANO"), I will indeed adopt this helpful term "placement". A reason why I resist this model so much: to me this is like saying that the difference between a giraffe and a drawing of a giraffe is just where the giraffe is "placed". It feels like more than that. One has its own independent existence, possibly even changing over time, and the other is just this freely copyable snapshot of information, each copy wholly dependent on the medium it's drawn on. That feels like a very fundamental distinction to me, and it has a good physicality to it. 5. The questions at hand in this thread are not foremost about the > performance model but about the basic "start-here" user model. > 6. These miiight be fair descriptions of the 2 camps? > > A. Because you'll get to program mostly the same way in both cases, we can > and should de-emphasize the distinction. There might be a reference sitting > in between you and the data/"object" or there might not. It's mostly in the > VM's hands. If you ever think you care about the distinction, you probably > are dipping down into the performance model. There is a "just don't worry > about it!" flavor to this option. > B. It's still helpful to have a solid sense of the distinction, even as we > benefit from getting to code the same way to each. Even though the VM might > really fake one as the other; again, that's performance model. > > > Anything controversial about the above? > > > No, and I want to choose both A and B! I don?t think they are opposed, I > think they are different angles on the elephant. > I strongly suspect that trying to have a thing like this both ways may be exactly where we inject the most confusion. (If I had to explain why I've been so dogged about B, maybe it's the sense > that we simply won't "get away with" A. It feels hard (to me) to tell users > simultaneously that they should stop caring about a distinction AND that > we're changing up how all kinds of stuff works across that distinction. It > feels more solid to firm up the distinction so that we can talk about how > things are changing, and then let that distinction just slowly matter less > and less over time.) > > Agree that we need a good "start here? story, but I think a good one will > have aspects of A and B. I think we?re making progress? > Yes, progress. > > > On Fri, Jul 22, 2022 at 12:02 PM John Rose wrote: > >> On 22 Jul 2022, at 10:55, Brian Goetz wrote: >> >> ? >> >> So then, would we call an instance of `Complex.val` a "non-heap object" >> or an "inlined object" or what? We need to flesh out a whole lexicon. The >> phrase "value object" becomes useless for this particular distinction as it >> will apply to both. >> >> Yes, in the taxonomy I?m pushing, a ?value object? is one without >> identity, and is the kind of object you can store directly in variables >> without going through a reference. But I don?t think that there are >> instances of Complex.val and instances of Complex.ref; I think there are >> instances of *Complex*, and multiple ways to describe/store/access them. >> >> FTR, I enthusiastically agree with this viewpoint, even though I am also >> probing for weaknesses and alternatives. (FTR I feel the same about Brian?s >> summary in his previous short message.) >> >> And under this viewpoint, the terms ?instance? and ?object? have the same >> denotation, though difference connotations. (When I say ?instance? you may >> well think, ?instance of what?? But you don?t ask that question so much if >> I say ?object?.) >> >> That `int/Integer` decision you've been making has always been between >> (1) value and (2) (reference-to) object, and that decision is still exactly >> between (1) value and (2) (reference-to) object now, and btw the >> definitions of 'reference' and 'object' remain precisely wedded to each >> other as always. >> >> The "heap object" alternative strikes me (and I am trying to be fair, >> here) as: >> >> Now, that's an object either way, and you're going to apply that old >> thought process toward which *kind* of object you mean, either a (1) >> "inline object" or a (2) "(reference-to) heap object". It's now just heap >> objects and references that are paired together. >> >> I think, Kevin, you are going wrong at this point: It?s not a *kind* of >> object, it is a *placement* of an object. What ?kind? of person am I >> when I am diving to the office? Surely the same ?kind? as when I am at >> home. But when I am driving, I am equipped with a car and a road, much like >> a heap-placed object is equipped with a header and references. >> >> Likewise, an int/Integer is (in Valhalla) the same ?kind? of object (if >> we go all the way to making primitives be honorary objects) whether it is >> placed in heap or on stack or inside another object. >> >> The distinction that comes from the choice of equipping an int with a >> header in heap storage is a distinction of placement (and corresponding >> representation). So an int/Integer does not intrinsically have a header >> because it is an object (because of its ?kind?). It *may* have a header >> if the JVM needs to give it one, because it is stuck in the heap. >> >> (My points about int/Integer could partly fail if we fail to align int >> and Integer in the end. So transfer the argument to C.val/C.ref if you >> prefer. It is the same argument.) >> >> And I would say the *placement* of an object is in three broad cases >> which are worth teaching even to beginners: >> >> - >> >> ?in the heap?: therefore referred to by a machine word address, and >> presumably equipped with a header and maybe surrounded by some alignment >> waste; a JVM might have multiple heaps but at this level of discourse we >> say ?the heap? >> - >> >> ?on the stack?: therefore manipulated directly by its components, >> which are effectively separated into scalars (it is ?scalarized?, we >> sometimes say); we might sometimes wish to say ?JVM stack or locals? >> instead of ?stack?, or, with increasing detail, ?on stack, in locals, >> and/or in registers, and/or as immediates in the machine code? >> - >> >> ?contained in another object?: in a field or array element, therefore >> piggy-backing on the other object?s placement; and note that even arrays >> are scalarized sometimes, lifting their elements into registers etc. >> >> To summarize: Placement = Heap | Stack | Contained[Placement]. >> >> One might use the term ?inline? somewhere in there, either to mean >> Contained or Stack|Contained[*]. >> >> Static field values are a special case, but they can be classified in one >> of the above ways. HotSpot places static fields inside a special per-class >> object (the mirror, in fact), so their values are either contained or >> separate in the heap (JVM?s choice again). >> >> One might be pedantic and say that an instance can be contained ?in >> static memory? (neither heap nor stack) if the JVM implements storage for >> static fields outside of the heap. But in that case I?d rather say that >> they are in a funny corner of the heap, where perhaps headers are not >> needed, because some static metadata somewhere dictates what is stored. >> >> (Hence I like to be cagey about whether a heap-object actually has a >> physical header. It might not in some JVM implementations.) >> >> Starting to prefer the first way (as I did) did not feel like going >> rogue: after all, did we not gravitate toward ".ref" and ".val" as our >> placeholder syntaxes, not ".inline" and ".heap" or anything else? >> >> With you on this. I think asking users to reason about ?heap objects? vs >> ?inline objects? is pushing them towards the implementation, not the >> concepts. They may have to reason about this to understand the performance >> model, but that?s already advanced material. >> >> Yes. And even more specifically in the implementation, users who think >> about ?heap objects? are really (IMO) trying to predict the *placement* >> of the objects, *where* the JVM will choose to place their bits in >> physical memory. >> >> This question of placement is very interesting to the ?alert? >> performance-minded programmer. Not every programmer is in that state; for >> me I try to practice ?first make it work then make it fast?. I get ?alert? >> to performance only in the ?make it fast phase?, a phase which many of my >> codes never reach. >> >> As a sort of ?siren song? the question of placement is * also* >> interesting to the beginning student who is struggling to build a mental >> image of Java data, and is reaching for visualizations in terms of memory >> and addresses, or (what is about the same) boxes and arrows. But the JVM >> will make a hash of all that, if it is doing a good job. So the student >> must be told to hold those mental models lightly. >> >> Kevin is insisting (for his own good reasons) on his answer to ?where are >> the objects?: They are always ?in the heap? and thus ?with headers, >> accessed by pointers?. I suspect (but haven?t seen from Kevin himself yet) >> that this is in part due to a desire to work with, rather than work >> against, the student?s desire to make simple visual models of Java data. >> >> Crucially, in a literal ?boxes and arrows? model, an arrow (perhaps a >> C.ref reference to an instance) looks very different from a nested box >> (perhaps a C.val instance), and the naive user might insist that such >> differences are part of the contract between the user and the JVM. But they >> are not. The JVM might introduce invisible ?arrows? (because of heap >> buffering) and it might remove arrows (because of scalarization for a >> number of possible reasons). >> >> So if the student is told that the arrows and boxes are ?what?s really >> going on? the student using that assurance to predict performance and >> footprint will feel cheated in the end. >> >> To summarize: Any given instance/object has logically independent >> properties of class and placement. >> >> And thus: The choice of companion type does not affect class but may >> (may!) affect placement. >> >> Circling back to the language design, it might seem odd that there are >> three ways to place an object but just two companion types. But this >> oddness goes away if you realize that C.val and C.ref are not placement >> directives. The choice between the two is a net-binary selection from a >> sizeable menu of ?affordances? that the user might be expecting or >> disavowing at any given point in the code. (See my lists of ?affordances? >> and ?alternative affordances? in encapsulating-val >> >> .) >> >> The user is given this simplified switch to influence the JVM?s decisions >> about placement (and therefore representation). It is useful because the >> JVM can employ different implementation tactics depending on the >> differences between the user-visible contracts of C.ref and of C.val. In >> the choice of implementation tactics, the JVM has the final say. >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Mon Jul 25 18:14:20 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 25 Jul 2022 11:14:20 -0700 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> Message-ID: On Mon, Jul 25, 2022 at 7:05 AM Brian Goetz wrote: I had another read through your Values document ( > https://docs.google.com/document/d/1J-a_K87P-R3TscD4uW2Qsbt5BlBR_7uX_BekwJ5BLSE/edit#). > Let me try to summarize. > > Values. You want to use Values to describe "free floating pieces of > data." They don't live any place specific, they have no identity, they are > immutable. Every value has a type, but values do not necessarily > incorporate their own typestate; this may live elsewhere (e.g., field > descriptors.) > > Variables. Variables can hold values. Variables have types, which > determine which values may be written to them and what we can assume about > values read from them. > > Containers. Variables live in containers (classes, instances, arrays, > stack frames.) > > Kinds of values. Values are primitives, references to objects, or the > special reference null. > > Objects. Objects have an independent existence, are self-describing > (e.g., Object::getClass), may have identity, and can only be interacted > with through references. > All correct. Maybe I could have made the doc shorter. :-) Thank you for rereading it. I think this is a valid model of where things are today, though I think > that some of the "essential characteristics" of Objects in your model may > be more accidental than you give them credit for. That is, some of these > characteristics are of "objects that are the target of an object > *reference*", which happens to be all the objects today. Similarly, "has > its own independent existence" may feel more accidental once references are > optional. > Perfect: this is indeed meant to be the part of the document that we haggle over. The doc doesn't do a good job of portraying it as such, but if I can get back to working on it at some point I'll try to address that. Of course something will have to change, and we want that change to feel > natural and not pulling the rug out from under user's mental models. > > The change I'm proposing in this model is: > > Instead of values being "primitives and object references", values become > "value objects and object references". A Complex.val is a value. 3+2i > still meets all the requirements of values: free floating, no canonical > location, no identity, immutable. It's just a "bigger" value that we could > have before. Primitives become value objects. I think people can > understand (and will like) this story. > > Variables and containers are unchanged. > > Objects are instances of classes. Instances of identity classes remain > dependent on references to interact with them; instances of value classes > can also be the target of references, *and* are values on their own. (This > is not excessively weird, since "Complex" and "reference to Complex" is > like `int` and `int *` in C.) > It's a familiar aspect of C that Java quite distinctly distanced itself from! Let's take a look at your essential characteristics of objects again. > > - Objects are entities, they have their own independent existence. I > think this one is a consequence of "objects only can be interacted with > through references." That is, there is a kind of value called "reference > to object", and the reference refers to ... an object, which is a thing > separate from the reference. > > So, *if* an object is the target of a reference, then yes, it must be an > entity that is somewhere else, with its independent existence. "Thing that > is the target of an object reference" is one reasonable definition for > "Object", but I don't think it is the only one. What I'm saying is that I > think its fair to say an instance of Complex is an object (and further, > that saying "its an instance, but not an object", is likely to be more > confusing that beneficial.) > Two possible directions of that confusion: 1. But why is it not an object? The big problem here, raised by John I think, is "Java is an object-oriented language". Uh oh. That certainly does demand the expansive notion of "object"; I have to concede that point. On the other hand, maybe, Java did then put that theory into practice using terms like "instance member", not "object member". It also created a class called "Object" and attached a bunch of specific ideas to that, only 3.5 of which are really general to all instances. 2. Okay but then why is it still an instance? Here we'd be asking users to shed some of the baggage they've (incidentally) attached to "instance" in the past, realizing that that baggage actually belonged with "object". What remains with "instance" is the essential stuff: instance members, instance state. I think the term for what you are describing is *referent*, and not all > objects are referents. > I agree that "referent" is another reasonable choice of term for what the doc calls "object". I think it beats "heap object". But I think it has serious problems (below). > - Objects are self-describing. By this I'll assume you mean > Object::getClass. > Yes, as well as arrays knowing their own length. I mean that if you have the data itself you have everything you need to know about the layout of that data in memory. Here, I say that objects remain self-describing under the "instances are > objects" model, but something interesting happens under the hood about > *where* the description lives. If I have a `Complex` in a variable of type > `Complex.val`, there is sufficient information *in the container* to know > the class of the instance, so the instance doesn't have to carry it with > it. There is an operation for "take a reference of" that can be applied to > value objects. This operation (logically, though this is frequently > optimized away) takes that information out of the container and puts it in > an object header. But regardless of whether the typestate is in the > container or the object itself, objects are self-describing. > I think this paragraph is merely asking for a different term/definition from the "self-describing" term I'm using and doesn't say anything deeper than that. Maybe there is a better term. I used "self-describing" partly because I thought it has a healthy existing connotation that the data itself is bloated with all that description. e.g. Java serialized forms are self-describing (setting aside all the ways that description can fail). > - Only an object can have identity. Remains true; new thing is that not > all objects have identity. > I was making a slightly stronger point. There is still a difference here. In the values-are-not-objects model (VANO or "B" in this email), every object is *eligible* to have identity -- it only doesn't if the user declines it. Values are inherently ineligible. That's an intrinsic difference between the two. - An object is always accessed by reference. This is what I'm saying > changes; value objects are values. So I think that what you are > describing as essential characteristics of objects, are really essential > characteristics of *referents of object references*. And I would argue > that while this is a well-defined concept, it's not the most important > distinction we want to put in the user's face. Instead, we can say that an > instance of Complex can be a value, or it can be a referent, but its the > same Complex either way, and the user gets to decide what packaging it > wants to put it in. > I think the key thing to notice about "referent" is that it's a role noun (I don't know what grammarians would call it), like "pedestrian" or "projectile". That is, it doesn't seem to be saying anything at all about the thing itself, only about the role that thing is currently playing in some broader relationship or activity. To my VANO mindset, that just doesn't feel sufficient or appropriate, because the things feel inherently different. (I also think "referent" will fail to be a workable term for other reasons, including but not limited to its very unfortunate plural form. And I literally just spotted that my fingers had typed "reference" when I'd meant to say "referent" 2 paragraphs up. I also cannot come up with another term for this, and that's part of the problem I'm having with VAO.) > On 7/22/2022 7:16 PM, Brian Goetz wrote: > > Now I wonder if these points, at least, might be uncontroversial: > > 1. There exist useful well-defined concepts of "value" and "object" that > are disjoint and that *have been* valid up to now. (I'll hazard a claim > that my paper still defends at least *this* much well enough.) > 2. Also, you've had to treat the two quite differently from each other in > your programs. > 3. We *are* changing (improving) #2 through this project. > > > I claim we are changing #1 as well, though to a lesser degree. #2 should > ?mostly go away?; #1 should transform into other terms, such as e.g. > ?object stored directly? vs ?reference to object?. It is those other terms > that I think we are searching for consensus on, but #1 is moving. > > 4. But users may still need #1's disjoint concepts when they are trying to > reason about the *performance* model (tho they'll also need to understand > that the VM is empowered to "fake" one as the other when the spirit so > moves it). > > > Yes, though I think these are concepts that are more _derived from_ the > distinction in #1. John?s notion of ?placement? is good here; the choice > of ref vs val constrains the placement, and placement informs the > performance model. I think part of what has been missing until today is a > good attempt to name the intermediate actors, like placement. I hope that > if we refine those terms a bit, things will get clearer. > > 5. The questions at hand in this thread are not foremost about the > performance model but about the basic "start-here" user model. > 6. These miiight be fair descriptions of the 2 camps? > > A. Because you'll get to program mostly the same way in both cases, we can > and should de-emphasize the distinction. There might be a reference sitting > in between you and the data/"object" or there might not. It's mostly in the > VM's hands. If you ever think you care about the distinction, you probably > are dipping down into the performance model. There is a "just don't worry > about it!" flavor to this option. > B. It's still helpful to have a solid sense of the distinction, even as we > benefit from getting to code the same way to each. Even though the VM might > really fake one as the other; again, that's performance model. > > > Anything controversial about the above? > > > No, and I want to choose both A and B! I don?t think they are opposed, I > think they are different angles on the elephant. > > (If I had to explain why I've been so dogged about B, maybe it's the sense > that we simply won't "get away with" A. It feels hard (to me) to tell users > simultaneously that they should stop caring about a distinction AND that > we're changing up how all kinds of stuff works across that distinction. It > feels more solid to firm up the distinction so that we can talk about how > things are changing, and then let that distinction just slowly matter less > and less over time.) > > > Agree that we need a good "start here? story, but I think a good one will > have aspects of A and B. I think we?re making progress? > > > > On Fri, Jul 22, 2022 at 12:02 PM John Rose wrote: > >> On 22 Jul 2022, at 10:55, Brian Goetz wrote: >> >> ? >> >> So then, would we call an instance of `Complex.val` a "non-heap object" >> or an "inlined object" or what? We need to flesh out a whole lexicon. The >> phrase "value object" becomes useless for this particular distinction as it >> will apply to both. >> >> Yes, in the taxonomy I?m pushing, a ?value object? is one without >> identity, and is the kind of object you can store directly in variables >> without going through a reference. But I don?t think that there are >> instances of Complex.val and instances of Complex.ref; I think there are >> instances of *Complex*, and multiple ways to describe/store/access them. >> >> FTR, I enthusiastically agree with this viewpoint, even though I am also >> probing for weaknesses and alternatives. (FTR I feel the same about Brian?s >> summary in his previous short message.) >> >> And under this viewpoint, the terms ?instance? and ?object? have the same >> denotation, though difference connotations. (When I say ?instance? you may >> well think, ?instance of what?? But you don?t ask that question so much if >> I say ?object?.) >> >> That `int/Integer` decision you've been making has always been between >> (1) value and (2) (reference-to) object, and that decision is still exactly >> between (1) value and (2) (reference-to) object now, and btw the >> definitions of 'reference' and 'object' remain precisely wedded to each >> other as always. >> >> The "heap object" alternative strikes me (and I am trying to be fair, >> here) as: >> >> Now, that's an object either way, and you're going to apply that old >> thought process toward which *kind* of object you mean, either a (1) >> "inline object" or a (2) "(reference-to) heap object". It's now just heap >> objects and references that are paired together. >> >> I think, Kevin, you are going wrong at this point: It?s not a *kind* of >> object, it is a *placement* of an object. What ?kind? of person am I >> when I am diving to the office? Surely the same ?kind? as when I am at >> home. But when I am driving, I am equipped with a car and a road, much like >> a heap-placed object is equipped with a header and references. >> >> Likewise, an int/Integer is (in Valhalla) the same ?kind? of object (if >> we go all the way to making primitives be honorary objects) whether it is >> placed in heap or on stack or inside another object. >> >> The distinction that comes from the choice of equipping an int with a >> header in heap storage is a distinction of placement (and corresponding >> representation). So an int/Integer does not intrinsically have a header >> because it is an object (because of its ?kind?). It *may* have a header >> if the JVM needs to give it one, because it is stuck in the heap. >> >> (My points about int/Integer could partly fail if we fail to align int >> and Integer in the end. So transfer the argument to C.val/C.ref if you >> prefer. It is the same argument.) >> >> And I would say the *placement* of an object is in three broad cases >> which are worth teaching even to beginners: >> >> - >> >> ?in the heap?: therefore referred to by a machine word address, and >> presumably equipped with a header and maybe surrounded by some alignment >> waste; a JVM might have multiple heaps but at this level of discourse we >> say ?the heap? >> - >> >> ?on the stack?: therefore manipulated directly by its components, >> which are effectively separated into scalars (it is ?scalarized?, we >> sometimes say); we might sometimes wish to say ?JVM stack or locals? >> instead of ?stack?, or, with increasing detail, ?on stack, in locals, >> and/or in registers, and/or as immediates in the machine code? >> - >> >> ?contained in another object?: in a field or array element, therefore >> piggy-backing on the other object?s placement; and note that even arrays >> are scalarized sometimes, lifting their elements into registers etc. >> >> To summarize: Placement = Heap | Stack | Contained[Placement]. >> >> One might use the term ?inline? somewhere in there, either to mean >> Contained or Stack|Contained[*]. >> >> Static field values are a special case, but they can be classified in one >> of the above ways. HotSpot places static fields inside a special per-class >> object (the mirror, in fact), so their values are either contained or >> separate in the heap (JVM?s choice again). >> >> One might be pedantic and say that an instance can be contained ?in >> static memory? (neither heap nor stack) if the JVM implements storage for >> static fields outside of the heap. But in that case I?d rather say that >> they are in a funny corner of the heap, where perhaps headers are not >> needed, because some static metadata somewhere dictates what is stored. >> >> (Hence I like to be cagey about whether a heap-object actually has a >> physical header. It might not in some JVM implementations.) >> >> Starting to prefer the first way (as I did) did not feel like going >> rogue: after all, did we not gravitate toward ".ref" and ".val" as our >> placeholder syntaxes, not ".inline" and ".heap" or anything else? >> >> With you on this. I think asking users to reason about ?heap objects? vs >> ?inline objects? is pushing them towards the implementation, not the >> concepts. They may have to reason about this to understand the performance >> model, but that?s already advanced material. >> >> Yes. And even more specifically in the implementation, users who think >> about ?heap objects? are really (IMO) trying to predict the *placement* >> of the objects, *where* the JVM will choose to place their bits in >> physical memory. >> >> This question of placement is very interesting to the ?alert? >> performance-minded programmer. Not every programmer is in that state; for >> me I try to practice ?first make it work then make it fast?. I get ?alert? >> to performance only in the ?make it fast phase?, a phase which many of my >> codes never reach. >> >> As a sort of ?siren song? the question of placement is * also* >> interesting to the beginning student who is struggling to build a mental >> image of Java data, and is reaching for visualizations in terms of memory >> and addresses, or (what is about the same) boxes and arrows. But the JVM >> will make a hash of all that, if it is doing a good job. So the student >> must be told to hold those mental models lightly. >> >> Kevin is insisting (for his own good reasons) on his answer to ?where are >> the objects?: They are always ?in the heap? and thus ?with headers, >> accessed by pointers?. I suspect (but haven?t seen from Kevin himself yet) >> that this is in part due to a desire to work with, rather than work >> against, the student?s desire to make simple visual models of Java data. >> >> Crucially, in a literal ?boxes and arrows? model, an arrow (perhaps a >> C.ref reference to an instance) looks very different from a nested box >> (perhaps a C.val instance), and the naive user might insist that such >> differences are part of the contract between the user and the JVM. But they >> are not. The JVM might introduce invisible ?arrows? (because of heap >> buffering) and it might remove arrows (because of scalarization for a >> number of possible reasons). >> >> So if the student is told that the arrows and boxes are ?what?s really >> going on? the student using that assurance to predict performance and >> footprint will feel cheated in the end. >> >> To summarize: Any given instance/object has logically independent >> properties of class and placement. >> >> And thus: The choice of companion type does not affect class but may >> (may!) affect placement. >> >> Circling back to the language design, it might seem odd that there are >> three ways to place an object but just two companion types. But this >> oddness goes away if you realize that C.val and C.ref are not placement >> directives. The choice between the two is a net-binary selection from a >> sizeable menu of ?affordances? that the user might be expecting or >> disavowing at any given point in the code. (See my lists of ?affordances? >> and ?alternative affordances? in encapsulating-val >> >> .) >> >> The user is given this simplified switch to influence the JVM?s decisions >> about placement (and therefore representation). It is useful because the >> JVM can employ different implementation tactics depending on the >> differences between the user-visible contracts of C.ref and of C.val. In >> the choice of implementation tactics, the JVM has the final say. >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jul 25 19:15:11 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Jul 2022 15:15:11 -0400 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <73702885-441E-4552-9100-2806257405CD@oracle.com> <8F9B793E-5574-4E95-AAA2-0E9B18A70769@oracle.com> Message-ID: <011b724b-e713-a843-aa8b-3a5e48c1c4e4@oracle.com> > > > Of course something will have to change, and we want that change > to feel natural and not pulling the rug out from under user's > mental models. > > The change I'm proposing in this model is: > > Instead of values being "primitives and object references", values > become "value objects and object references".? A Complex.val is a > value.? 3+2i still meets all the requirements of values: free > floating, no canonical location, no identity, immutable.? It's > just a "bigger" value that we could have before. Primitives become > value objects.? I think people can understand (and will like) this > story. > > Variables and containers are unchanged. > > Objects are instances of classes.? Instances of identity classes > remain dependent on references to interact with them; instances of > value classes can also be the target of references, *and* are > values on their own.? (This is not excessively weird, since > "Complex" and "reference to Complex" is like `int` and `int *` in C.) > > > It's a familiar aspect of C that Java quite distinctly distanced > itself from! Agreed, and I do feel a little dirty for making the comparison, as we have gotten so far without having to understand the distinction between `int` and `int *`.? But only a little dirty, because if you want to ignore the difference, you can -- and if you want to understand the difference there is a pretty clear foundation for it. Further, in my defense, we have been playing a kind of interesting trick since the beginning.? Arguably, Java has *never* had objects -- only object references!? You can't denote "reference to X" in the type system; some times are just reference types.? And there is no expression whose type is any kind of object; only references to objects.? The dot operator does some sort of quantum tunneling to access the object state, and bring it back into the program, as if by magic.? The Java 1.0 version of your question is: where are *any* of the objects? So you could describe my "VAO" position as "finally, you can touch some of the objects" (we always knew they were in there somewhere!) The identity objects are still as hidden away as ever, but the value objects will come out and play, and in fact the legacy primitives are revealed to have been "bare" objects all along!?? And the box types were a clunky way to say "reference to value object", which we've now declunkified. Where I see your discomfort is the "a value object can be a value *or* a referent, dealer's choice."? That is indeed new and confusing.? My claim is that this (a) really describes what is going on, (b) is a clean generalization of the primitives-and-references we have now, (c) allows the primitives to become real objects, and (d) once you internalize this "value or referent" duality, the rest comes along easily.? So maybe I've just made my peace with this, because I like the properties it gives me. > > Let's take a look at your essential characteristics of objects again. > > ?- Objects are entities, they have their own independent > existence.? I think this one is a consequence of "objects only can > be interacted with through references."? That is, there is a kind > of value called "reference to object", and the reference refers to > ... an object, which is a thing separate from the reference. > > So, *if* an object is the target of a reference, then yes, it must > be an entity that is somewhere else, with its independent > existence.? "Thing that is the target of an object reference" is > one reasonable definition for "Object", but I don't think it is > the only one. What I'm saying is that I think its fair to say an > instance of Complex is an object (and further, that saying "its an > instance, but not an object", is likely to be more confusing that > beneficial.) > > > > Two possible directions of that confusion: > > 1. But why is it not an object? The big problem here, raised by John I > think, is "Java is an object-oriented language". Uh oh. That certainly > does demand the expansive notion of "object"; I have to concede that > point. On the other hand, maybe, Java did then put that theory into > practice using terms like "instance member", not "object member". It > also created a class called "Object" and attached a bunch of specific > ideas to that, only 3.5 of which are really general to all instances. > > 2. Okay but then why is it still an instance? Here we'd be asking > users to shed some of the baggage they've (incidentally) attached to > "instance" in the past, realizing that that baggage actually belonged > with "object". What remains with "instance" is the essential stuff: > instance members, instance state. The intuition I think users already have, and which we can double down on, is that classes are like templates for stamping out objects/instances.? The user declares the class with its name, supertypes, members, etc, and then can make many instances from the template.? So anything that says "instance, but not object" seems needlessly confusing, because it breaks an existing intuition that is working fine, for little benefit (that I can see.) The use of object and instance to mean the same thing has an understandable historical context.? In the late 80s/early 90s, object orientation was an abstract concept; objects are stateful, independent entities that communicate by message passing.? (Kay's objects were more like Erlang's actors than Java/C++ classes.)? Java (and other languages) interpreted OO through the lens of classes, which have instances.? So this is really like "class-oriented-language implements object-oriented-language", where we say "class instances are what we mean by objects."? But, this is historical path-weaving; objects == instances seems like a sleeping dog that should be let lie, because the cost of changing perception seems way higher than the value of doing so. > > ? I think the term for what you are describing is *referent*, and > not all objects are referents. > > > > I agree that "referent" is another reasonable choice of term for what > the doc calls "object". I think it beats "heap object". But I think it > has serious problems (below). > > > ? Here, I say that objects remain self-describing under the > "instances are objects" model, but something interesting happens > under the hood about *where* the description lives.? If I have a > `Complex` in a variable of type `Complex.val`, there is sufficient > information *in the container* to know the class of the instance, > so the instance doesn't have to carry it with it.? There is an > operation for "take a reference of" that can be applied to value > objects.? This operation (logically, though this is frequently > optimized away) takes that information out of the container and > puts it in an object header.? But regardless of whether the > typestate is in the container or the object itself, objects are > self-describing. > > > > I think this paragraph is merely asking for a different > term/definition from the "self-describing" term I'm using and doesn't > say anything deeper than that. Maybe there is a better term. I used > "self-describing" partly because I thought it has a healthy existing > connotation that the data itself is bloated with all that description. > e.g. Java serialized forms are self-describing (setting aside all the > ways that description can fail). OK, I think I see what you're getting at.? I'm focusing on "self-describing" as "can it answer the question".? (Today, `int` is arguably not self-describing by this description, because you can't ask it any questions, though that will change.)? You're taking self-describing to mean the cost of stapling an object header to the payload.? (Which is implicitly one of those "box and pointer" perspectives that John talked about, which on the one hand is useful for building intuitions, but on the other can build wrong intuitions.) > > ?- An object is always accessed by reference.? This is what I'm > saying changes; value objects are values. So I think that what you > are describing as essential characteristics of objects, are really > essential characteristics of *referents of object references*. And > I would argue that while this is a well-defined concept, it's not > the most important distinction we want to put in the user's face.? > Instead, we can say that an instance of Complex can be a value, or > it can be a referent, but its the same Complex either way, and the > user gets to decide what packaging it wants to put it in. > > > > I think the key thing to notice about "referent" is that it's a role > noun (I don't know what grammarians would call it), like "pedestrian" > or "projectile". That is, it doesn't seem to be saying anything at all > about the thing itself, only about the role that thing is currently > playing in some broader relationship or activity. Yes!? Which is in line with the fact that we blur the distinction between reference and referent today, and we will continue to blur it tomorrow.? And whether a value is a value object or an object reference is steered by types -- whether the types are value types or reference types.? But one reason to push this "its the same instance either way", is that it goes a ways towards busting the existing concept of boxing.? The existing concept of boxing appeals to "construct a new box" and "unpack the value from the box", which (a) sounds like a lot of motion (it is) and (b) says that the primitive and the box are totally different kinds of things. Whereas the "value object vs referent" encourages a model of "its the same thing all along, we are just holding it differently."? Bare hands vs tongs, if you will. Are we getting any closer, or just digging in? -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Mon Jul 25 20:33:29 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Jul 2022 22:33:29 +0200 (CEST) Subject: The problem with encapsulating C.val + autoboxing Message-ID: <1254790090.15376108.1658781209702.JavaMail.zimbra@u-pem.fr> One of the idea of encapsulating C.val is that even with a value class C with no good default, accessing C.val is useful. That's true but a unfortunate consequence is that it makes leaking T in a generic code a security issue. It is in fact more secure to use two classes C and CFlat, one with no good default and the other which allow the default value, when using with generics. Here is an example, let say we have a value class Month (1 to 12) and an identity class Agenda that contains several months. WE can declare Month like this, with a package-private value companion. value record Month(int value) { /*package-private*/ companion val; public Month { if (value < 1 || value > 12) throw new IAE(); } } So we can flatten the Month when stored in a list class Agenda { private final MyList months = new MyList<>(); public void add(Hour hour) { Objects.requireNonNull(hour); months.add(hour); } public Hour getFirst() { return months.isEmpty()? null: months.getFirst(); } } Is this code safe ? The trouble is that it depends on the implementation of MyList, by example with class MyList { private E[] array = new E[16]; private int size; public boolean isEmpty() { return size == 0; } public void add(E element) { array[size++] = element; } public E getFirst() { // Objects.checkIndex(0, size); return array[0]; } } MyList.getFirst() leaks E.default, so the implementation of Agenda is not safe. Using the encapsulation to hide C.val is only safe if the generics code never leak E.default. Weirdly, if i use different classes to represent C and C.val, i do not have that issue. class Agenda { value record MonthFlat(int value) { public companion val; } private final MyList months = new MyList<>(); public void add(Hour hour) { Objects.requireNonNull(hour); months.add(new MonthFlat(hour.value())); } public Hour getFirst() { return months.isEmpty()? null: new Month(months.getFirst().value); } } because unlike the autoboxing conversion between Month.val to Month, the conversion from MonthFlat to Month does not bypass the constructor. R?mi From john.r.rose at oracle.com Tue Jul 26 03:00:09 2022 From: john.r.rose at oracle.com (John Rose) Date: Mon, 25 Jul 2022 20:00:09 -0700 Subject: The problem with encapsulating C.val + autoboxing In-Reply-To: <1254790090.15376108.1658781209702.JavaMail.zimbra@u-pem.fr> References: <1254790090.15376108.1658781209702.JavaMail.zimbra@u-pem.fr> Message-ID: <52A2D9C6-C757-4C5E-86FA-1E095B760B08@oracle.com> This is not a general security problem, but rather an example of how code that is inside an encapsulation can violate that encapsulation. Because `Agenda` is allowed to mention the type `MyList` which includes the (package-private) type `Month.val`, it must be the case that `Agenda` is inside the same package as `Month`. Since `Month` has granted access rights to all package-mates, if there is a ?leakage? of `Month.default` somewhere, it is only because someone in the same package as `Month` has given away that value. In other words, it?s how the encapsulation is written, for better or worse. If the author of `Month` wants to distrust package-mates, that author should declare the companion type `private`, not package-private. You can?t grant access to code you distrust, and then complain about security, unless you are pointing at yourself! This particular example does not stress autoboxing in any interesting way that I see. Clearly if somebody has access to `Month.val` they can then grab `Month.default` by one of several ways, and then it?s up to them to keep the secret safe, if it is in fact a secret to be kept safe. BTW, this example assumes specialized generics. Since they don?t exist fully yet, except on paper, we are assuming properties of specialized generics that may not in fact turn out to be true. But I assume that: - Any non-erased type argument, which is set to a privatized class, must be accessible to the code which mentions the type argument (as part of the generic type application). - Any generic that uses non-erased type parameters which may be bound to privatized types should document how it materializes externally-visible default values and/or arrays of that type, if it in fact does this. (Most won?t need to.) Perhaps you are pointing to the fact that bugs in generic containers might leak non-constructed values from specialized generics? (The missing call to `checkIndex` allows an empty value to leak from `getFirst`.) We should keep this in mind, I guess. But note that the fault is not solely in the buggy generic, in your example: There is also some fault in the client which passed a privatized type to a generic of unknown quality. If necessary, as part of the ?opt in? for specialization we could add another layer of ?opt in? to handle privatized type arguments. There is always a reasonable fallback for `MyList` if `MyList` is not expecting to handle privatized types. The fallback is to partially erase `MyList` to `MyList`. That would be a third, intermediate form of erasure: Lift privatized types to their reference types. In fact, as I?ve noted before, this is a question for non-flat types as well, in the setting of specialized generics: Perhaps specialized generics should *never* specialize on inaccessible type arguments of any sort, neither refs to non-public classes nor privatized vals. It?s a fair question to ponder? If we *do* allow specialization on non-public type arguments, it?s probably on the grounds that the client ?knows what he?s doing? and is consciously sharing access to the non-public type by mentioning it in a type argument. The generic who gets a non-public type shared with it must handle it with due care, not immediately leaking it to the world. It?s part of the user contract for specialized generics: What are the rules for non-public types? ? John On 25 Jul 2022, at 13:33, Remi Forax wrote: > One of the idea of encapsulating C.val is that even with a value class > C with no good default, accessing C.val is useful. > That's true but a unfortunate consequence is that it makes leaking T > in a generic code a security issue. > > It is in fact more secure to use two classes C and CFlat, one with no > good default and the other which allow the default value, when using > with generics. > > Here is an example, let say we have a value class Month (1 to 12) and > an identity class Agenda that contains several months. > WE can declare Month like this, with a package-private value > companion. > > value record Month(int value) { > /*package-private*/ companion val; > > public Month { > if (value < 1 || value > 12) throw new IAE(); > } > } > > So we can flatten the Month when stored in a list > > class Agenda { > private final MyList months = new MyList<>(); > > public void add(Hour hour) { > Objects.requireNonNull(hour); > months.add(hour); > } > > public Hour getFirst() { > return months.isEmpty()? null: months.getFirst(); > } > } > > Is this code safe ? The trouble is that it depends on the > implementation of MyList, by example with > > class MyList { > private E[] array = new E[16]; > private int size; > > public boolean isEmpty() { return size == 0; } > > public void add(E element) { > array[size++] = element; > } > > public E getFirst() { > // Objects.checkIndex(0, size); > return array[0]; > } > } > > MyList.getFirst() leaks E.default, so the implementation of Agenda is > not safe. > Using the encapsulation to hide C.val is only safe if the generics > code never leak E.default. > > Weirdly, if i use different classes to represent C and C.val, i do not > have that issue. > > class Agenda { > value record MonthFlat(int value) { > public companion val; > } > > private final MyList months = new MyList<>(); > > public void add(Hour hour) { > Objects.requireNonNull(hour); > months.add(new MonthFlat(hour.value())); > } > > public Hour getFirst() { > return months.isEmpty()? null: new Month(months.getFirst().value); > } > } > > because unlike the autoboxing conversion between Month.val to Month, > the conversion from MonthFlat to Month does not bypass the > constructor. > > R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Jul 26 18:18:03 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 26 Jul 2022 14:18:03 -0400 Subject: Updated SoV, take 3 Message-ID: Yet another attempt at updating SoV to reflect the current thinking.? Please review. # State of Valhalla ## Part 2: The Language Model {.subtitle} #### Brian Goetz {.author} #### July 2022 {.date} > _This is the second of three documents describing the current State of ? Valhalla.? The first is [The Road to Valhalla](01-background); the ? third is [The JVM Model](03-vm-model)._ This document describes the directions for the Java _language_ charted by Project Valhalla.? (In this document, we use "currently" to describe the language as it stands today, without value classes.) Valhalla started with the goal of providing user-programmable classes which can be flat and dense in memory.? Numerics are one of the motivating use cases; adding new primitive types directly to the language has a very high barrier.? As we learned from [Growing a Language][growing] there are infinitely many numeric types we might want to add to Java, but the proper way to do that is via libraries, not as a language feature. ## Primitive and objects today Java currently has eight built-in primitive types.? Primitives represent pure _values_; any `int` value of "3" is equivalent to, and indistinguishable from, any other `int` value of "3".? Because primitives are "just their bits" with no ancillarly state such as object identity, they are _freely copyable_; whether there is one copy of the `int` value "3", or millions, doesn't matter to the execution of the program.? With the exception of the unusual treatment of exotic floating point values such as `NaN`, the `==` operator on primitives performs a _substitutibility test_ -- it asks "are these two values the same value". Java also has _objects_, and each object has a unique _object identity_.? This means that each object must live in exactly one place (at any given time), and this has consequences for how the JVM lays out objects in memory.? Objects in Java are not manipulated or accessed directly, but instead through _object references_.? Object references are also a kind of value -- they encode the identity of the object to which they refer, and the `==` operator on object references also performs a substitutibility test, asking "do these two references refer to the same object."? Accordingly, object _references_ (like other values) can be freely copied, but the objects they refer to cannot. This dichotomy -- that the universe of values consists of primitives and object references -- has long been at the core of Java's design.? JVMS 2.2 (Data Types) opens with: > There are two kinds of values that can be stored in variables, passed as > arguments, returned by methods, and operated upon: primitive values and > reference values. Primitives and objects currently differ in almost every conceivable way: | Primitives???????????????????????????????? | Objects??????????????????????????? | | ------------------------------------------ | ---------------------------------- | | No identity (pure values)????????????????? | Identity?????????????????????????? | | `==` compares values?????????????????????? | `==` compares object identity????? | | Built-in?????????????????????????????????? | Declared in classes??????????????? | | No members (fields, methods, constructors) | Members (including mutable fields) | | No supertypes or subtypes????????????????? | Class and interface inheritance??? | | Accessed directly????????????????????????? | Accessed via object references???? | | Not nullable?????????????????????????????? | Nullable?????????????????????????? | | Default value is zero????????????????????? | Default value is null????????????? | | Arrays are monomorphic???????????????????? | Arrays are covariant?????????????? | | May tear under race??????????????????????? | Initialization safety guarantees?? | | Have reference companions (boxes)????????? | Don't need reference companions??? | Primitives embody a number tradeoffs aimed at maximizing the performance and usability of the primitive types.? Reference types default to `null`, meaning "referring to no object", and must be initialized before use; primitives default to a usable zero value (which for most primitives is the additive identity) and therefore may be used without initialization.? (If primitives were nullable like references, not only would this be less convenient in many situations, but they would likely consume additional memory footprint to accomodate the possibility of nullity, as most primitives already use all their bit patterns.)? Similarly, reference types provide initialization safety guarantees for final fields even under a certain category of data races (this is where we get the "immutable objects are always thread-safe" rule from); primitives allow tearing under race for larger-than-32-bit values.? We could characterize the design principles behind these tradeoffs are "make objects safer, make primitives faster." The following figure illustrates the current universe of Java's types.? The upper left quadrant is the built-in primitives; the rest of the space is reference types.? In the upper-right, we have the abstract reference types -- abstract classes, interfaces, and `Object` (which, though concrete, acts more like an interface than a concrete class).? The built-in primitives have wrappers or boxes, which are reference types.
? ??? Current universe of Java 
field types ?
Valhalla aims to unify primitives and objects such that both are declared with classes, but maintains the special runtime characteristics -- flatness and density -- that primitives currently enjoy. ### Primitives and boxes today The built-in primitives are best understood as _pairs_ of types: the primitive type (`int`) and its reference companion type (`Integer`), with built-in conversions between the two.? The two types have different characteristics that makes each more or less appropriate for a given situations. Primitives are optimized for efficient storage and access: they are monomorphic, not nullable, tolerate uninitialized (zero) values, and larger primitive types (`long`, `double`) may tear under racy access.? The box types add back the affordances of references -- nullity, polymorphism, interoperation with generics, and initialization safety -- but at a cost. Valhalla generalizes this primitive-box relationship, in a way that is more regular and extensible and reduces the "boxing tax". ## Eliminating unwanted object identity Many impediments to optimization stem from _unwanted object identity_. For many classes, not only is identity not directly useful, it can be a source of bugs. For example, due to caching, `Integer` can be accidentally compared correctly with `==` just often enough that people keep doing it. Similarly, [value-based classes][valuebased] such as `Optional` have no need for identity, but pay the costs of having identity anyway. Valhalla allows classes to explicitly disavow identity by declaring them as _value classes_.? The instances of a value class are called _value objects_. ``` value class Point implements Serializable { ??? int x; ??? int y; ??? Point(int x, int y) { ??????? this.x = x; ??????? this.y = y; ??? } ??? Point scale(int s) { ??????? return new Point(s*x, s*y); ??? } } ``` This says that an `Point` is a class whose instances have no identity.? As a consequence, it must give up the things that depend on identity; the class and its fields are implicitly final.? Additionally, operations that depended on identity must either be adjusted (`==` on value objects compares state, not identity) or disallowed (it is illegal to lock on a value object.) Value classes can still have most of the affordances of classes -- fields, methods, constructors, type parameters, superclasses (with some restrictions), nested classes, class literals, interfaces, etc.? The classes they can extend are restricted: `Object` or abstract classes with no instance fields, empty no-arg constructor bodies, no other constructors, no instance initializers, no synchronized methods, and whose superclasses all meet this same set of conditions.? (`Number` is an example of such an abstract class.) Because `Point` has value semantics, `==` compares by state rather than identity.? This means that value objects, like primitives, are _freely copyable_; we can explode them into their fields and re-aggregate them into another value object, and we cannot tell the difference. So far we've addressed the first two lines in our table of differences; rather than all objects having identity, classes can opt into, or out of, object identity for their instances.? By allowing classes to exclude unwanted identity, we free the runtime to make better layout and compilation decisions. ### Example: immutable cursors Collections today use `Iterator` to facilitate traversal through the collection, which store iteration state in mutable fields.? While heroic optimizations such as _escape analysis_ can sometimes eliminate the cost associated with iterators, such optimizations are fragile and hard to rely on.? Value objects offer an iteration approach that is more reliably optimized: immutable cursors. (Without value objects, immutable cursors would be prohibitively expensive for iteration.) ``` value class ArrayCursor { ??? T[] array; ??? int offset; ??? public ArrayCursor(T[] array, int offset) { ??????? this.array = array; ??????? this.offset = offset; ??? } ??? public ArrayCursor(T[] array) { ??????? this(array, 0); ??? } ??? public boolean hasNext() { ??????? return offset < array.length; ??? } ??? public T next() { ??????? return array[offset]; ??? } ??? public ArrayCursor advance() { ??????? return new ArrayCursor(array, offset+1); ??? } } ``` In looking at this code, we might mistakenly assume it will be inefficient, as each loop iteration appears to allocate a new cursor: ``` for (ArrayCursor c = new ArrayCursor<>(array); ???? c.hasNext(); ???? c = c.advance()) { ??? // use c.next(); } ``` In reality, we should expect that _no_ cursors are actually allocated here.? An `ArrayCursor` is just its two fields, and the runtime is free to scalarize the object into its fields and hoist them into registers.? The calling convention for `advance` is optimized so that both receiver and return value are scalarized.? Even without inlining `advance`, no allocation will take place, just some shuffling of the values in registers.? And if `advance` is inlined, the client code will compile down to having a single register increment and compare in the loop header. ### Migration The JDK (as well as other libraries) has many [value-based classes][valuebased] such as `Optional` and `LocalDateTime`.? Value-based classes adhere to the semantic restrictions of value classes, but are still identity classes -- even though they don't want to be.? Value-based classes can be migrated to true value classes simply by redeclaring them as value classes, which is both source- and binary-compatible. We plan to migrate many value-based classes in the JDK to value classes. Additionally, the primitive wrappers can be migrated to value classes as well, making the conversion between `int` and `Integer` cheaper; see "Migrating the legacy primitives" below.? (In some cases, this may be _behaviorally_ incompatible for code that synchronizes on the primitive wrappers.? [JEP 390][jep390] has supported both compile-time and runtime warnings for synchronizing on primitive wrappers since Java 16.)
? ??? Java field types adding 
value classes ?
### Identity-sensitive operations Certain operations are currently defined in terms of object identity.? As we've already seen, some of these, like equality, can be sensibly extended to cover all instances.? Others, like synchronization, will become partial. Identity-sensitive operations include: ? - **Equality.**? We extend `==` on references to include references to value ??? objects.? Where it currently has a meaning, the new definition coincides ??? with that meaning. ? - **System::identityHashCode.**? The main use of `identityHashCode` is in the ??? implementation of data structures such as `IdentityHashMap`.? We can extend ??? `identityHashCode` in the same way we extend equality -- deriving a hash on ??? value objects from the hash of all the fields. ? - **Synchronization.**? This becomes a partial operation.? If we can ??? statically detect that a synchronization will fail at runtime (including ??? declaring a `synchronized` method in a value class), we can issue a ??? compilation error; if not, attempts to lock on a value object results in ??? `IllegalMonitorStateException`.? This is justifiable because it is ??? intrinsically imprudent to lock on an object for which you do not have a ??? clear understanding of its locking protocol; locking on an arbitrary ??? `Object` or interface instance is doing exactly that. ? - **Weak, soft, and phantom references.**? Capturing an exotic reference to a ??? value object becomes a partial operation, as these are intrinsically tied to ??? reachability (and hence to identity).? However, we will likely make ??? enhancements to `WeakHashMap` to support mixed identity and value keys. ### Value classes and records While records have a lot in common with value classes -- they are final and their fields are final -- they are still identity classes. Records embody a tradeoff: give up on decoupling the API from the representation, and in return get various syntactic and semantic benefits.? Value classes embody another tradeoff: give up identity, and get various semantic and performance benefits. If we are willing to give up both, we can get both sets of benefits, by declaring a _value record_. ``` value record NameAndScore(String name, int score) { } ``` Value records combine the data-carrier idiom of records with the improved scalarization and flattening benefits of value classes. In theory, it would be possible to apply `value` to certain enums as well, but this is not currently possible because the `java.lang.Enum` base class that enums extend do not meet the requirements for superclasses of value classes (it has fields and non-empty constructors). ### Value and reference companion types Value classes are generalizations of primitives.? Since primitives have a reference companion type, value classes actually give rise to _pairs_ of types: a value type and a reference type.? We've seen the reference type already; for the value class `ArrayCursor`, the reference type is called `ArrayCursor`, just as with identity classes.? The full name for the reference type is `ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for that.? (This aliasing is what allows value-based classes to be compatibly migrated to value classes.) The value type is called `ArrayCursor.val`, and the two types have the same conversions between them as primitives do today with their boxes.? The default value of the value type is the one for which all fields take on their default value; the default value of the reference type is, like all reference types, null.? We will refer to the value type of a value class as the _value companion type_. Just as with today's primitives and their boxes, the reference and value companion types of a value class differ in their support for nullity, polymorphism, treatment of uninitialized variables, and safety guarantees under race.? Value companion types, like primitive types, are monomorphic, non-nullable, tolerate uninitialized (zero) values, and (under some circumstances) may tear under racy access.? Reference types are polymorphic, nullable, and offer the initialization safety guarantees for final fields that we have come to expect from identity objects. Unlike with today's primitives, the "boxing" and "unboxing" conversions between the reference and value companion types are not nearly as heavy or wasteful, because of the lack of identity.? A variable of type `Point.val` holds a "bare" value object; a variable of type `Point.ref` holds a _reference to_ a value object.? For many use cases, the reference type will offer good enough performance; in some cases, it may be desire to additionally give up the affordances of reference-ness to make further flatness and footprint gains.? See [Performance Model](05-performance-model) for more details on the specific tradeoffs. In our diagram, these new types show up as another entity that straddles the line between primitives and identity-free references, alongside the legacy primitives: ** UPDATE DIAGRAM **
? ??? Java field types with 
extended primitives ?
### Member access Both the reference and value companion types have the same members. Unlike today's primitives, value companion types can be used as receivers to access fields and invoke methods (subject to the usual accessibility constraints): ``` Point.val p = new Point(1, 2); assert p.x == 1; p = p.scale(2); assert p.x == 2; ``` ### Polymorphism An identity class `C` that extends `D` sets up a subtyping (is-a) relationship between `C` and `D`.? For value classes, the same thing happens between its ?_reference type_ and the declared supertypes.? (Reference types are ?polymorphic; value types are not.)? This means that if we declare: ``` value class UnsignedShort extends Number ????????????????????????? implements Comparable { ?? ... } ``` then `UnsignedShort` is a subtype of `Number` and `Comparable`, and we can ask questions about subtyping using `instanceof` or pattern matching. What happens if we ask such a question of the value companion type? ``` UnsignedShort.val us = ... if (us instanceof Number) { ... } ``` Since subtyping is defined only on reference types, the `instanceof` operator (and corresponding type patterns) will behave as if both sides were lifted to the appropriate reference type (unboxed), and then we can appeal to subtyping. (This may trigger fears of expensive boxing conversions, but in reality no actual allocation will happen.) We introduce a new relationship between types based on `extends` / `implements` clauses, which we'll call "extends": we define `A extends B` as meaning `A <: B` when A is a reference type, and `A.ref <: B` when A is a value companion type. The `instanceof` relation, reflection, and pattern matching are updated to use "extends". ### Array covariance Arrays of reference types are _covariant_; this means that if `A <: B`, then `A[] <: B[]`.? This allows `Object[]` to be the "top array type" -- but only for arrays of references.? Arrays of primitives are currently left out of this story.?? We unify the treatment of arrays by defining array covariance over the new "extends" relationship; if A _extends_ B, then `A[] <: B[]`.? This means that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when we migrate the primitive types to be value classes, then `Object[]` is finally the top type for all arrays.? (When the built-in primitives are migrated to value classes, this means `int[] <: Integer[] <: Object[]` too.) ### Equality For values, as with primitives, `==` compares by state rather than by identity. Two value objects are `==` if they are of the same type and their fields are pairwise equal, where equality is defined by `==` for primitives (except `float` and `double`, which are compared with `Float::equals` and `Double::equals` to avoid anomalies), `==` for references to identity objects, and recursively with `==` for references to value objects.? In no case is a value object ever `==` to an identity object. When comparing two object _references_ with `==`, they are equal if they are both null, or if they are both references to the same identity object, or they are both references to value objects that are `==`.? (When comparing a value type with a reference type, we treat this as if we convert the value to a reference, and proceed as per comparing references.)? This means that the following will succeed: ``` Point.val p = new Point(3, 4); Point pr = p; assert p == pr; ``` The base implementation of `Object::equals` delegates to `==`, which is a suitable default for both reference and value classes. ### Serialization If a value class implements `Serializable`, this is also really a statement about the reference type.? Just as with other aspects described here, serialization of value companions can be defined by converting to the corresponding reference type and serializing that, and reversing the process at deserialization time. Serialization currently uses object identity to preserve the topology of an object graph.? This generalizes cleanly to objects without identity, because `==` on value objects treats two identical copies of a value object as equal. So any observations we make about graph topology prior to serialization with `==` are consistent with those after deserialization. ## Refining the value companion Value classes have several options for refining the behavior of the value companion type and how they are exposed to clients. ### Classes with no good default value For a value class `C`, the default value of `C.ref` is the same as any other reference type: `null`.? For the value companion type `C.val`, the default value is the one where all of its fields are initialized to their default value (0 for numbers, false for boolean, null for references.) The built-in primitives reflect the design assumption that zero is a reasonable default.? The choice to use a zero default for uninitialized variables was one of the central tradeoffs in the design of the built-in primitives.? It gives us a usable initial value (most of the time), and requires less storage footprint than a representation that supports null (`int` uses all 2^32 of its bit patterns, so a nullable `int` would have to either make some 32 bit signed integers unrepresentable, or use a 33rd bit).? This was a reasonable tradeoff for the built-in primitives, and is also a reasonable tradeoff for many other potential value classes (such as complex numbers, 2D points, half-floats, etc). But for other potential value classes, such as `LocalDate`, there simply _is_ no reasonable default.? If we choose to represent a date as the number of days since some some epoch, there will invariably be bugs that stem from uninitialized dates; we've all been mistakenly told by computers that something that never happened actually happened on or near 1 January 1970.? Even if we could choose a default other than the zero representation as a default, an uninitialized date is still likely to be an error -- there simply is no good default date value. For this reason, value classes have the choice of _encapsulating_ their value companion type.? If the class is willing to tolerate an uninitialized (zero) value, it can freely share its `.val` companion with the world; if uninitialized values are dangerous (such as for `LocalDate`), the value companion can be encapsulated to the class or package, and clients can use the reference companion.? Encapsulation is accomplished using ordinary access control.? By default, the value companion is `private` to the value class (it need not be declared explicitly); a class that wishes to share its value companion more broadly can do so by declaring it explicitly: ``` public value record Complex(double real, double imag) { ??? public value companion Complex.val; } ``` ### Atomicity and tearing For the primitive types longer than 32 bits (long and double), it was always possible that reads and writes from different threads (without suitable coordination) were not atomic with respect to each other.? This means that, if accessed under data race, a long or double field or array element could be seen to "tear", where a read sees the low 32 bits of one write and the high 32 bits of another.? (Declaring the containing field `volatile` is sufficient to restore atomicity, as is properly coordinating with locks or other concurrency control, or not sharing across threads in the first place.) This was a pragmatic tradeoff given the hardware of the time; the cost of 64-bit atomicity on 1995 hardware would have been prohibitive, and problems only arise when the program already has data races -- and most numeric code deals entirely with thread-local data.? Just like with the tradeoff of nulls vs zeros, the design of the built-in primitives permits tearing as part of a tradeoff between performance and correctness, where we chose "as fast as possible" for primitives, and more safety for reference types. Today, most JVMs give us atomic loads and stores of 64-bit primitives, because the hardware already makes them cheap enough.? But value classes bring us back to 1995; atomic loads and stores of larger-than-64-bit values are still expensive on many CPUs, leaving us with a choice of "make operations on value types slower" or permitting tearing when accessed under race. It would not be wise for the language to select a one-size-fits-all policy about tearing; choosing "no tearing" means that types like `Complex` are slower than they need to be, even in a single-threaded program; choosing "tearing" means that classes like `Range` can be seen to not exhibit invariants asserted by their constructor.? Class authors can choose, with full knowledge of their domain, whether their types can tolerate tearing.? The default is no tearing (following the principle of "safe by default"); a class can opt for greater flattening (at the cost of potential tearing) by declaring the value companion as `non-atomic`: ``` public value record Complex(double real, double imag) { ??? public non-atomic value companion Complex.val; } ``` For classes like `Complex`, all of whose bit patterns are valid, this is very much like the choice around `long` in 1995.? For other classes that might have nontrivial representational invariants -- specifically, invariants that relate multiple fields, such as ensuring that a range goes from low to high -- they likely want to stick to the default of atomicity. ## Do we really need two types? It is sensible to ask: why do we need companion types at all? This is analogous to the need for boxes in 1995: we'd made one set of tradeoffs for primitives favoring performance (monomorphic, non-nullable, zero-default, tolerant of non-initialization, tolerant of tearing under race, unrelated to `Object`), and another for references, favoring flexibility and safety.? Most of the time, we ignored the primitive wrapper classes, but sometimes we needed to temporarily suppress one of these properties, such as when interoperating with code that expects an `Object` or the ability to express "no value".? The reasons we needed boxes in 1995 still apply today: sometimes we need the affordances of references, and in those cases, we appeal to the reference companion. Reasons we might want to use the reference companion include: ?- **Interoperation with reference types.**? Value classes can implement ?? interfaces and extend classes (including `Object` and some abstract classes), ?? which means some class and interface types are going to be polymorphic over ?? both identity and primitive objects.? This polymorphism is achieved through ?? object references; a reference to `Object` may be a reference to an identity ?? object, or a reference to a value object. ?- **Nullability.**? Nullability is an affordance of object _references_, not ?? objects themselves.? Most of the time, it makes sense that value types are ?? non-nullable (as the primitives are today), but there may be situations where ?? null is a semantically important value.? Using the reference companion when ?? nullability is required is semantically clear, and avoids the need to invent ?? new sentinel values for "no value." ?? This need comes up when migrating existing classes; the method `Map::get` ?? uses `null` to signal that the requested key was not present in the map. But, ?? if the `V` parameter to `Map` is a value type, `null` is not a valid value. ?? We can capture the "`V` or null" requirement by changing the descriptor of ?? `Map::get` to: ?? ``` ?? public V.ref get(K key); ?? ``` ?? where, whatever type `V` is instantiated as, `Map::get` returns the reference ?? companion. (For a type `V` that already is a reference type, this is just `V` ?? itself.) This captures the notion that the return type of `Map::get` will ?? either be a reference to a `V`, or the `null` reference. (This is a ?? compatible change, since both erase to the same thing.) ?- **Self-referential types.**? Some types may want to directly or indirectly ?? refer to themselves, such as the "next" field in the node type of a linked ?? list: ?? ``` ?? class Node { ?????? T theValue; ?????? Node nextNode; ?? } ?? ``` ?? We might want to represent this as a value class, but if the type of ?? `nextNode` were `Node.val`, the layout of `Node` would be ?? self-referential, since we would be trying to flatten a `Node` into its own ?? layout. ?- **Protection from tearing.**? For a value class with a non-atomic value ?? companion type, we may want to use the reference companion in cases where we ?? are concerned about tearing; because loads and stores of references are ?? atomic, `P.ref` is immune to the tearing under race that `P.val` might be ?? subject to. ?- **Compatibility with existing boxing.**? Autoboxing is convenient, in that it ?? lets us pass a primitive where a reference is required.? But boxing affects ?? far more than assignment conversion; it also affects method overload ?? selection.? The rules are designed to prefer overloads that require no ?? conversions to those requiring boxing (or varargs) conversions.? Having both ?? a value and reference type for every value class means that these rules can ?? be cleanly and intuitively extended to cover value classes. ### Choosing which to use How would we choose between declaring an identity class or a value class, and the various options on value companions?? Here are some quick rules of thumb for declaring classes: ?- If you need mutability, subclassing, locking, or aliasing, choose an identity ?? class. ?- Otherwise, choose a value class.? If uninitialized (zero) values are ?? unacceptable, leave the value companion encapsulated; if zero is a reasonable ?? default value, make the value companion `public`. ?- If there are no cross-field invariants and you are willing to tolerate ?? possible tearing to enable more flattening, make the value companion ?? `non-atomic`. ## Migrating the legacy primitives As part of generalizing primitives, we want to adjust the built-in primitives to behave as consistently with value classes as possible.? While we can't change the fact that `int`'s reference companion is the oddly-named `Integer`, we can give them more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an alias for `Integer.val`) -- so that we can use a consistent rule for naming companions.? Similarly, we can extend member access to the legacy primitives (`3.getClass()`) and adjust `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.) We will redeclare `Integer` as a value class with a public value companion: ``` value class Integer { ??? public value companion Integer.val; ??? // existing methods } ``` where the type name `int` is an alias for `Integer.val`. ## Unifying primitives with classes Earlier, we had a chart of the differences between primitive and reference types: | Primitives???????????????????????????????? | Objects??????????????????????????? | | ------------------------------------------ | ---------------------------------- | | No identity (pure values)????????????????? | Identity?????????????????????????? | | `==` compares values?????????????????????? | `==` compares object identity????? | | Built-in?????????????????????????????????? | Declared in classes??????????????? | | No members (fields, methods, constructors) | Members (including mutable fields) | | No supertypes or subtypes????????????????? | Class and interface inheritance??? | | Accessed directly????????????????????????? | Accessed via object references???? | | Not nullable?????????????????????????????? | Nullable?????????????????????????? | | Default value is zero????????????????????? | Default value is null????????????? | | Arrays are monomorphic???????????????????? | Arrays are covariant?????????????? | | May tear under race??????????????????????? | Initialization safety guarantees?? | | Have reference companions (boxes)????????? | Don't need reference companions??? | The addition of value classes addresses many of these directly. Rather than saying "classes have identity, primitives do not", we make identity an optional characteristic of classes (and derive equality semantics from that.)? Rather than primitives being built in, we derive all types, including primitives, from classes, and endow value companion types with the members and supertypes declared with the value class.? Rather than having primitive arrays be monomorphic, we make all arrays covariant under the `extends` relation. The remaining differences now become differences between reference types and value types: | Value types?????????????????????????????????? | Reference types????????????????? | | --------------------------------------------- | -------------------------------- | | Accessed directly???????????????????????????? | Accessed via object references?? | | Not nullable????????????????????????????????? | Nullable???????????????????????? | | Default value is zero???????????????????????? | Default value is null??????????? | | May tear under race, if declared `non-atomic` | Initialization safety guarantees | The current dichotomy between primitives and references morphs to one between value objects and references, where the legacy primitives become (slightly special) value objects, and, finally, "everything is an object". ## Summary Valhalla unifies, to the extent possible, primitives and objects.?? The following table summarizes the transition from the current world to Valhalla. | Current World?????????????????????????????? | Valhalla????????????????????????????????????????????????? | | ------------------------------------------- | --------------------------------------------------------- | | All objects have identity?????????????????? | Some objects have identity??????????????????????????????? | | Fixed, built-in set of primitives?????????? | Open-ended set of primitives, declared via classes??????? | | Primitives don't have methods or supertypes | Primitives are classes, with methods and supertypes?????? | | Primitives have ad-hoc boxes??????????????? | Primitives have regularized reference companions????????? | | Boxes have accidental identity????????????? | Reference companions have no identity???????????????????? | | Boxing and unboxing conversions???????????? | Primitive reference and value conversions, but same rules | | Primitive arrays are monomorphic??????????? | All arrays are covariant????????????????????????????????? | [valuebased]: https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621 [jep390]: https://openjdk.java.net/jeps/390 -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Tue Jul 26 22:20:01 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Jul 2022 22:20:01 +0000 Subject: The storage hint model In-Reply-To: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> Message-ID: On Jul 20, 2022, at 9:44 AM, Remi Forax > wrote: And not having .ref and .val to be types greatly simplify the model, because they is no interaction between the type checking and the storage hints, those are two separated concerns. To emphasize this point: I think we're talking about years of development time that could be cut from this project if we could live with storage modifiers rather than needing to thread the flattening information through the type system. Not to mention the downstream simplification for all the developers that don't have to learn what a value type is. (Think about it: no Q types, no .val/.ref, no boxing, no universal generics, no new unchecked warnings, no secondary reflection mirrors, no new descriptors, no bridges. Just a storage modifier akin to 'volatile', support for that modifier in arrays, and value classes able to opt in to allow that modifier.) But there are details to sort out, and those details will determine whether this simplifying move is a fantasy or a viable alternative design. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Tue Jul 26 22:42:26 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Jul 2022 22:42:26 +0000 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: On Jul 22, 2022, at 9:04 AM, Kevin Bourrillion > wrote: Note that *some* decisions which produce strong initial antipathy in the minds of users will actually become good teachable moments! "Here's why the reaction you had was tied to old assumptions that we are intentionally leaving behind for these good reasons." Even a user who doesn't agree with the decision can still hang their learning onto this. In fact I think some of the *best* teachable moments will be like that. This seems like a really good piece of wisdom to hold on to in all of our terminology/model discussions. Unintuitive != bad. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jul 27 00:02:34 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 27 Jul 2022 02:02:34 +0200 (CEST) Subject: The storage hint model In-Reply-To: References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> Message-ID: <1319017271.15730754.1658880154315.JavaMail.zimbra@u-pem.fr> > From: "daniel smith" > To: "Remi Forax" > Cc: "valhalla-spec-experts" > Sent: Wednesday, July 27, 2022 12:20:01 AM > Subject: Re: The storage hint model >> On Jul 20, 2022, at 9:44 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >> And not having .ref and .val to be types greatly simplify the model, because >> they is no interaction between the type checking and the storage hints, those >> are two separated concerns. > To emphasize this point: I think we're talking about years of development time > that could be cut from this project if we could live with storage modifiers > rather than needing to thread the flattening information through the type > system. Not to mention the downstream simplification for all the developers > that don't have to learn what a value type is. > (Think about it: no Q types, no .val/.ref, no boxing, no universal generics, no > new unchecked warnings, no secondary reflection mirrors, no new descriptors, no > bridges. Just a storage modifier akin to 'volatile', support for that modifier > in arrays, and value classes able to opt in to allow that modifier.) > But there are details to sort out, and those details will determine whether this > simplifying move is a fantasy or a viable alternative design. No bridges but you still need the Preload attribute. The main drawback is that the storage hints are not available when you have an abstract method, so you have to fallback to the idea that any type (type variables included) is nullable when calling a virtual method (apart if the type has already been loaded). Maybe the cost can be simulated by patching javac to never generates a Q-type (apart from anewarray and fields) and generate a Preload attribute instead. R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jul 27 00:28:54 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 27 Jul 2022 02:28:54 +0200 (CEST) Subject: The problem with encapsulating C.val + autoboxing In-Reply-To: <52A2D9C6-C757-4C5E-86FA-1E095B760B08@oracle.com> References: <1254790090.15376108.1658781209702.JavaMail.zimbra@u-pem.fr> <52A2D9C6-C757-4C5E-86FA-1E095B760B08@oracle.com> Message-ID: <1314860028.15732122.1658881734727.JavaMail.zimbra@u-pem.fr> > From: "John Rose" > To: "Remi Forax" > Cc: "valhalla-spec-experts" > Sent: Tuesday, July 26, 2022 5:00:09 AM > Subject: Re: The problem with encapsulating C.val + autoboxing > This is not a general security problem, but rather an example of how code that > is inside an encapsulation can violate that encapsulation. > Because Agenda is allowed to mention the type MyList which includes > the (package-private) type Month.val , it must be the case that Agenda is > inside the same package as Month . Since Month has granted access rights to all > package-mates, if there is a ?leakage? of Month.default somewhere, it is only > because someone in the same package as Month has given away that value. In > other words, it?s how the encapsulation is written, for better or worse. > If the author of Month wants to distrust package-mates, that author should > declare the companion type private , not package-private. You can?t grant > access to code you distrust, and then complain about security, unless you are > pointing at yourself! > This particular example does not stress autoboxing in any interesting way that I > see. Clearly if somebody has access to Month.val they can then grab > Month.default by one of several ways, and then it?s up to them to keep the > secret safe, if it is in fact a secret to be kept safe. > BTW, this example assumes specialized generics. Since they don?t exist fully > yet, except on paper, we are assuming properties of specialized generics that > may not in fact turn out to be true. But I assume that: > * > Any non-erased type argument, which is set to a privatized class, must be > accessible to the code which mentions the type argument (as part of the generic > type application). > * > Any generic that uses non-erased type parameters which may be bound to > privatized types should document how it materializes externally-visible default > values and/or arrays of that type, if it in fact does this. (Most won?t need > to.) > Perhaps you are pointing to the fact that bugs in generic containers might leak > non-constructed values from specialized generics? (The missing call to > checkIndex allows an empty value to leak from getFirst .) We should keep this > in mind, I guess. But note that the fault is not solely in the buggy generic, > in your example: There is also some fault in the client which passed a > privatized type to a generic of unknown quality. I see two issues with that idea in practice - Leaking null is perfectly valid for a non-specialized generics, so it means that some generics can not be not upgraded to specialized generics because of their semantics. Given that a missing checkIndex makes the code buggy or not, it is also undecidable if a specialized generics is buggy or not. - or sending a privatized type to a generics is considered as too dangerous, and nobody will use C.val if val is defined as private or package-private because you can not using it apart on toy codes. > If necessary, as part of the ?opt in? for specialization we could add another > layer of ?opt in? to handle privatized type arguments. There is always a > reasonable fallback for MyList if MyList is not expecting to handle > privatized types. The fallback is to partially erase MyList to > MyList . That would be a third, intermediate form of erasure: Lift > privatized types to their reference types. > In fact, as I?ve noted before, this is a question for non-flat types as well, in > the setting of specialized generics: Perhaps specialized generics should never > specialize on inaccessible type arguments of any sort, neither refs to > non-public classes nor privatized vals. It?s a fair question to ponder? You are proposing that T inside a generics and T outside a generics behave differently. It does not seem realistic to me because it does not work well with the idea that .val is propagated by the T, so it has to be the same T. By example with, class Foo { T getDefault() { return T.default; } } If privatized types are not specialized, new Foo().getDefault() will return null if Month.val is a privatized type. > If we do allow specialization on non-public type arguments, it?s probably on the > grounds that the client ?knows what he?s doing? and is consciously sharing > access to the non-public type by mentioning it in a type argument. The generic > who gets a non-public type shared with it must handle it with due care, not > immediately leaking it to the world. It?s part of the user contract for > specialized generics: What are the rules for non-public types? > ? John If we take a step back, privatized types has introduced a coupling between those types and specialized generics. So now, our plan to add specialized generics after value classes is a kind of dangerous. I think a simpler plan is to not introduce privatized types now to avoid the coupling and to introduce them later when we introduce specialized generics. R?mi > On 25 Jul 2022, at 13:33, Remi Forax wrote: >> One of the idea of encapsulating C.val is that even with a value class C with no >> good default, accessing C.val is useful. >> That's true but a unfortunate consequence is that it makes leaking T in a >> generic code a security issue. >> It is in fact more secure to use two classes C and CFlat, one with no good >> default and the other which allow the default value, when using with generics. >> Here is an example, let say we have a value class Month (1 to 12) and an >> identity class Agenda that contains several months. >> WE can declare Month like this, with a package-private value companion. >> value record Month(int value) { >> /*package-private*/ companion val; >> public Month { >> if (value < 1 || value > 12) throw new IAE(); >> } >> } >> So we can flatten the Month when stored in a list >> class Agenda { >> private final MyList months = new MyList<>(); >> public void add(Hour hour) { >> Objects.requireNonNull(hour); >> months.add(hour); >> } >> public Hour getFirst() { >> return months.isEmpty()? null: months.getFirst(); >> } >> } >> Is this code safe ? The trouble is that it depends on the implementation of >> MyList, by example with >> class MyList { >> private E[] array = new E[16]; >> private int size; >> public boolean isEmpty() { return size == 0; } >> public void add(E element) { >> array[size++] = element; >> } >> public E getFirst() { >> // Objects.checkIndex(0, size); >> return array[0]; >> } >> } >> MyList.getFirst() leaks E.default, so the implementation of Agenda is not safe. >> Using the encapsulation to hide C.val is only safe if the generics code never >> leak E.default. >> Weirdly, if i use different classes to represent C and C.val, i do not have that >> issue. >> class Agenda { >> value record MonthFlat(int value) { >> public companion val; >> } >> private final MyList months = new MyList<>(); >> public void add(Hour hour) { >> Objects.requireNonNull(hour); >> months.add(new MonthFlat(hour.value())); >> } >> public Hour getFirst() { >> return months.isEmpty()? null: new Month(months.getFirst().value); >> } >> } >> because unlike the autoboxing conversion between Month.val to Month, the >> conversion from MonthFlat to Month does not bypass the constructor. >> R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Wed Jul 27 11:57:06 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 27 Jul 2022 12:57:06 +0100 Subject: The storage hint model In-Reply-To: <1319017271.15730754.1658880154315.JavaMail.zimbra@u-pem.fr> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <1319017271.15730754.1658880154315.JavaMail.zimbra@u-pem.fr> Message-ID: <5560f817-1dc5-38c9-158e-cc6ede29b657@oracle.com> Remi, IMHO, if you go down the storage modifier model, you have to let go of the idea of annotating local parameters, and stick to fields and arrays (e.g. "true" containers). This means that some of the "sharp" info would be lost on the way, and we would always use nullable representation on the stack. It's a trade off, of course - the programming model has less types, we lose ability for type information to flow from generic clients to containers and we also lose some ability to fully flatten on the stack (in the sense that a null side-channel will always need to be there, e.g. in the form of another synthetic parameter). Maurizio On 27/07/2022 01:02, forax at univ-mlv.fr wrote: > > > ------------------------------------------------------------------------ > > *From: *"daniel smith" > *To: *"Remi Forax" > *Cc: *"valhalla-spec-experts" > *Sent: *Wednesday, July 27, 2022 12:20:01 AM > *Subject: *Re: The storage hint model > > On Jul 20, 2022, at 9:44 AM, Remi Forax wrote: > > And not having .ref and .val to be types greatly simplify the > model, because they is no interaction between the type > checking and the storage hints, those are two separated concerns. > > > To emphasize this point: I think we're talking about years of > development time that could be cut from this project if we could > live with storage modifiers rather than needing to thread the > flattening information through the type system. Not to mention the > downstream simplification for all the developers that don't have > to learn what a value type is. > > (Think about it: no Q types, no .val/.ref, no boxing, no universal > generics, no new unchecked warnings, no secondary reflection > mirrors, no new descriptors, no bridges. Just a storage modifier > akin to 'volatile', support for that modifier in arrays, and value > classes able to opt in to allow that modifier.) > > But there are details to sort out, and those details will > determine whether this simplifying move is a fantasy or a viable > alternative design. > > > No bridges but you still need the Preload attribute. > > The main drawback is that the storage hints are not available when you > have an abstract method, so you have to fallback to the idea that any > type (type variables included) is nullable when calling a virtual > method (apart if the type has already been loaded). Maybe the cost can > be simulated by patching javac to never generates a Q-type (apart from > anewarray and fields) and generate a Preload attribute instead. > > R?mi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jul 27 13:13:33 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 27 Jul 2022 15:13:33 +0200 (CEST) Subject: The storage hint model In-Reply-To: <5560f817-1dc5-38c9-158e-cc6ede29b657@oracle.com> References: <303592244.13560433.1658335440278.JavaMail.zimbra@u-pem.fr> <1319017271.15730754.1658880154315.JavaMail.zimbra@u-pem.fr> <5560f817-1dc5-38c9-158e-cc6ede29b657@oracle.com> Message-ID: <425139199.15901064.1658927613210.JavaMail.zimbra@u-pem.fr> > From: "Maurizio Cimadamore" > To: "Remi Forax" , "daniel smith" > Cc: "valhalla-spec-experts" > Sent: Wednesday, July 27, 2022 1:57:06 PM > Subject: Re: The storage hint model > Remi, > IMHO, if you go down the storage modifier model, you have to let go of the idea > of annotating local parameters, and stick to fields and arrays (e.g. "true" > containers). > This means that some of the "sharp" info would be lost on the way, and we would > always use nullable representation on the stack. > It's a trade off, of course - the programming model has less types, we lose > ability for type information to flow from generic clients to containers and we > also lose some ability to fully flatten on the stack (in the sense that a null > side-channel will always need to be there, e.g. in the form of another > synthetic parameter). I think you're right, it's not a core part of the model, more something on the side. The VM can always speculate that the side-channel will be non-null by default (using an uncommon trap) especially if the type is know to be a value type (from the Preload attribute) so the ability to full flatten on the stack is not lost, only the calling convention need to take care of the side-channel. Also null tracking is not something Java the language does but a language like Kotlin should be able to generate a slightly better code using the TypeRestriction attribute. > Maurizio R?mi > On 27/07/2022 01:02, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] wrote: >>> From: "daniel smith" [ mailto:daniel.smith at oracle.com | >>> ] >>> To: "Remi Forax" [ mailto:forax at univ-mlv.fr | ] >>> Cc: "valhalla-spec-experts" [ mailto:valhalla-spec-experts at openjdk.java.net | >>> ] >>> Sent: Wednesday, July 27, 2022 12:20:01 AM >>> Subject: Re: The storage hint model >>>> On Jul 20, 2022, at 9:44 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | >>>> forax at univ-mlv.fr ] > wrote: >>>> And not having .ref and .val to be types greatly simplify the model, because >>>> they is no interaction between the type checking and the storage hints, those >>>> are two separated concerns. >>> To emphasize this point: I think we're talking about years of development time >>> that could be cut from this project if we could live with storage modifiers >>> rather than needing to thread the flattening information through the type >>> system. Not to mention the downstream simplification for all the developers >>> that don't have to learn what a value type is. >>> (Think about it: no Q types, no .val/.ref, no boxing, no universal generics, no >>> new unchecked warnings, no secondary reflection mirrors, no new descriptors, no >>> bridges. Just a storage modifier akin to 'volatile', support for that modifier >>> in arrays, and value classes able to opt in to allow that modifier.) >>> But there are details to sort out, and those details will determine whether this >>> simplifying move is a fantasy or a viable alternative design. >> No bridges but you still need the Preload attribute. >> The main drawback is that the storage hints are not available when you have an >> abstract method, so you have to fallback to the idea that any type (type >> variables included) is nullable when calling a virtual method (apart if the >> type has already been loaded). Maybe the cost can be simulated by patching >> javac to never generates a Q-type (apart from anewarray and fields) and >> generate a Preload attribute instead. >> R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Jul 27 16:14:14 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 27 Jul 2022 16:14:14 +0000 Subject: EG meeting schedule Message-ID: <6E5317F7-8844-43CD-9E5D-6D25C71AAF0F@oracle.com> Looks like I our EG meeting event expired (don't know if all enterprise systems do this, but Oracle only lets you schedule a repeated event for like a year). I threw together a new one last night, but I don't think the info propagated to most people. So, no meeting today, we'll pick back up on August 10. If you don't see a repeating calendar event/notification now, let me know and I'll see what we can do. From kevinb at google.com Wed Jul 27 18:48:22 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 27 Jul 2022 11:48:22 -0700 Subject: Question about universal type variables Message-ID: Okay, I have labored for way too long to try to discuss "are values objects?" without ever facing universal type variables head-on. Sorry! Here goes. My (limited) understanding is: a) The first goal is just to enable a type parameter to accept both reftypes and valtypes as type arguments (making it a "universal" type parameter or UTP, and making the type variable it defines for use inside that scope a UTV). b) Goals to follow would do progressively more efficient things when the type arguments are valtypes. I'd expect that optimal performance demands dynamically generating customized versions of the class in some manner (which I'd further expect is fair to call "templating"?). For starters, does the above seem accurate and well-stated? ~~ The main question of this email is: if T is a universal type variable, then *what kind of type* is that? Is it a reftype, a valtype, or something else? I can see two main options for how to answer that, which I think follow naturally from the two *already*-existing models for how developers might conceptualize type variables. These existing models, first: Model 1: A type variable is a mere placeholder that "will be" some other type later. When you interact with it, you're "really" interacting with the future type argument. If asked a question like "is an `E` inside `class ArrayList` a `Number`?" this model would say "well, it might be or it might not be". Model 2: A type variable is a bona fide local type in its own right. It is or becomes no other type but itself. Its simple job is just to enforce whatever restrictions it needs to in order to *preserve its substitutability* for any type argument in-bounds. If asked the same question as above, "is an `E` inside `class ArrayList` a `Number`?" this model would say "no, it is certainly not, but it does guarantee to be substitutable for `Number`, among other types." I would describe Model 2 as being close to the JLS view of the world, but in a way, Model 1 is the very illusion that Model 2 is working to create. I certainly expect the majority of developers think like Model 1 most of the time, and most of the time it's okay. ~~ If we follow these models where they lead they seem to suggest two different answers for my question (i.e.,"if T is a UTV, what kind of type is that?"): Model 1: Since the UTV type represents or "stands in for" future type arguments which might be of either kind -- and note that it does behave differently from any regular reftype or valtype, being sort of quasi-nullable -- we are forced to conclude that it will be a third kind of type, a "universal" type. So the Java type system ends up tripartite after all. And, what is a value of this type? * The values-are-not-objects model stammers, "you've got a class instance for sure, but it might ambiguously be a value or a reference to an object, and often that doesn't matter". * The values-are-objects model stammers, "you've got an object for sure, all you don't know is the 'placement' of the object, direct or indirected, and often that doesn't matter". Model 2: All type variables are quite simply reftypes, because reftypes are the kind that are polymorphic. As always, a type variable will enforce whatever restrictions it must toward the goal of preserving its substitutability. Those restrictions are what make the type variable "universal" and produce the behaviors (quasi-nullness) we observe. Here, a value of type T is always a reference. Keep in mind that a generic class might eventually be used as a template to stamp out customized classes, and in *those* classes, your usages of T might be replaced with usages of some actual valtype. If asked "is an `E` inside `class ArrayList` an `int`?" this model would answer, "not *per se*, but this same code might be used to stamp out a copy of ArrayList where `E` has been changed to `int`, so code with that in mind." ~~ I think Model 1 is close to how most developers think day-to-day, but maybe gets us into a bit of trouble sometimes. I think Model 2 is already the more accurate way to understand type variables, when an accurate understanding is needed, and we will probably base several of our JSpecify/nullness explanations on it (with care). People *will* still lean on Model 1 day-to-day, as now, but they'd do *best* to view it as a crutch when they do. More to the point, maybe, I'd very, very much like to not wind up with three kinds of types, so, my money's on Model 2 at this point. Interestingly, Model 2 drags the notion of a "non-nullable reference type" into the mix. And we'd like to keep a path open toward other kinds of NNRTs in the future, that act harmoniously with these. Uh oh! But... I think that sounds better than a future where NNRTs and universal-types both exist and are so similar. I have no idea how any of this aligns or doesn't with the conversations that have already happened about UTPs. Thoughts? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jul 27 19:22:51 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Jul 2022 15:22:51 -0400 Subject: Question about universal type variables In-Reply-To: References: Message-ID: > My (limited) understanding is: > > a) The first goal is just to enable a type parameter to accept both > reftypes and valtypes as type?arguments (making it a "universal" type > parameter or UTP, and making the type variable it defines for use > inside that scope a UTV). > > b) Goals to follow would do progressively more efficient things when > the type arguments are valtypes. I'd expect that optimal performance > demands dynamically generating customized versions of the class in > some manner (which I'd further expect is fair to call "templating"?). > > For starters, does the above seem accurate and well-stated? Correct.? We envision two phases, like your (a) and (b), where the first is an erasure-based, language-based system which conforms to the behavior that the VM-based specialization would offer, allowing specialization to be viewed as a pure optimization, rather than an actual language feature. > The main question of this email is: if T is a universal type variable, > then /what kind of type/?is that? Is it a reftype, a valtype, or > something else? It is a type of indeterminate ref-ness or val-ness.? This will have both restrict some old behavior and provide some new behavior.? For example: ??? T t = null; will generate an unchecked warning, since we can't be sure we're not polluting the heap with nulls.? (Null pollution is a straightforward special case of heap pollution.)? On the other hand, you'll be able to use the new derived type `T.ref`, which would mean "T if its already a ref, or the ref companion if T is a val."? This allows Map::get to be rescued: ??? V.ref get(K k); > If we follow these models where they lead they seem to suggest two > different answers for my question (i.e.,"if T is a UTV, what kind of > type is that?"): > > Model 1: Since the UTV type represents or "stands in for" future type > arguments which might be of either kind -- and note that it does > behave differently from any regular reftype or valtype, being sort of > quasi-nullable -- we are forced to conclude that it will be a third > kind of type, a "universal" type. So the Java type system ends up > tripartite after all. And, what is a value of this type? > ? ?* The values-are-not-objects model stammers, "you've got a class > instance for sure, but it might ambiguously be a value or a reference > to an object, and often that doesn't matter". > ? ?* The values-are-objects model stammers, "you've got an object for > sure, all you don't know is the 'placement' of the object, direct or > indirected, and often that doesn't matter". So, note that in a generic class today, there's no way to "summon" any value of T *except null*.? You can't say `new T()` or `new T[3]` or `T.class.newInstance()`.? The values of T (except null) always come from *outside the house*, and their T-ness is backed up by synthetic casts (modulo heap pollution).? An ArrayList starts out empty, then someone puts T's in it, at which point the ArrayList can turn around and hand those Ts back.? But it can't make any new Ts.? All it needs to know is that T is layout-compatible with the layout of the bound. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbepincket at live.be Wed Jul 27 19:37:21 2022 From: robbepincket at live.be (Robbe Pincket) Date: Wed, 27 Jul 2022 19:37:21 +0000 Subject: Question about universal type variables In-Reply-To: References: Message-ID: On Wed Jul 27 19:22:51 UTC 2022, Brian Goetz wrote: > It is a type of indeterminate ref-ness or val-ness. This will have both restrict some old behavior and provide some new behavior. For example: > > T t = null; > > will generate an unchecked warning, since we can't be sure we're not polluting the heap with nulls. I seem to remember there being talks about `IdentityObject` and `ValueObject` interfaces or something similar, is this still planned. If so, would the warning go away if `T` has a typebound of `IdentityObject` and become an error if it has a typebound of `ValueObject` Regards Robbe Pincket -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Jul 27 20:22:09 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 27 Jul 2022 13:22:09 -0700 Subject: Updated SoV, take 3 In-Reply-To: References: Message-ID: I got through half of it, maybe more, so far. Several of my suggestions are of a similar form, "I would also point out X here and now", because in those places I suspect a nontrivial number of readers may have "but wait a minute" reactions that will be distracting. Of course, I am happy if this is the end of "primitive classes". :-) On Tue, Jul 26, 2022 at 11:18 AM Brian Goetz wrote: > Yet another attempt at updating SoV to reflect the current thinking. > Please review. > > > # State of Valhalla > ## Part 2: The Language Model {.subtitle} > > #### Brian Goetz {.author} > #### July 2022 {.date} > > > _This is the second of three documents describing the current State of > Valhalla. The first is [The Road to Valhalla](01-background); the > third is [The JVM Model](03-vm-model)._ > > This document describes the directions for the Java _language_ charted by > Project Valhalla. (In this document, we use "currently" to describe the > language as it stands today, without value classes.) > > Valhalla started with the goal of providing user-programmable classes > which can > be flat and dense in memory. Numerics are one of the motivating use cases; > adding new primitive types directly to the language has a very high > barrier. As > we learned from [Growing a Language][growing] there are infinitely many > numeric > types we might want to add to Java, but the proper way to do that is via > libraries, not as a language feature. > > ## Primitive and objects today > > Java currently has eight built-in primitive types. Primitives represent > pure > _values_; any `int` value of "3" is equivalent to, and indistinguishable > from, > any other `int` value of "3". Because primitives are "just their bits" > with no > ancillarly state such as object identity, they are _freely copyable_; > whether > there is one copy of the `int` value "3", or millions, doesn't matter to > the > execution of the program. With the exception of the unusual treatment of > exotic > floating point values such as `NaN`, the `==` operator on primitives > performs a > _substitutibility test_ -- it asks "are these two values the same value". > I've said this before, but I think both "substitutability" and "sameness" just lead to more questions, and I'm not sure why we don't appeal to distinguishability instead. > Java also has _objects_, and each object has a unique _object identity_. > This > means that each object must live in exactly one place (at any given time), > and > this has consequences for how the JVM lays out objects in memory. Objects > in > Java are not manipulated or accessed directly, but instead through _object > references_. Object references are also a kind of value -- they encode the > identity of the object to which they refer, > Do we really want to invoke identity here? That surprises me. That suggests that a `ValueClass.ref` instance will have identity too. Isn't it really only about the object being addressable or locatable (some term like that)? > and the `==` operator on object > references also performs a substitutibility test, asking "do these two > references refer to the same object." Accordingly, object _references_ > (like > other values) can be freely copied, but the objects they refer to cannot. > > This dichotomy -- that the universe of values consists of primitives and > object > references -- has long been at the core of Java's design. JVMS 2.2 (Data > Types) > opens with: > > > There are two kinds of values that can be stored in variables, passed as > > arguments, returned by methods, and operated upon: primitive values and > > reference values. > > Primitives and objects currently differ in almost every conceivable way: > > | Primitives | > Objects | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values) | > Identity | > | `==` compares values | `==` compares object > identity | > | Built-in | Declared in > classes | > | No members (fields, methods, constructors) | Members (including mutable > fields) | > | No supertypes or subtypes | Class and interface > inheritance | > | Accessed directly | Accessed via object > references | > | Not nullable | > Nullable | > | Default value is zero | Default value is > null | > | Arrays are monomorphic | Arrays are > covariant | > | May tear under race | Initialization safety > guarantees | > | Have reference companions (boxes) | Don't need reference > companions | > > Primitives embody a number tradeoffs aimed at maximizing the performance > and > usability of the primitive types. Reference types default to `null`, > meaning > "referring to no object", and must be initialized before use; primitives > default > to a usable zero value (which for most primitives is the additive > identity) and > therefore may be used without initialization. (If primitives were > nullable like > references, not only would this be less convenient in many situations, but > they > would likely consume additional memory footprint to accomodate the > possibility > of nullity, as most primitives already use all their bit patterns.) > Similarly, > reference types provide initialization safety guarantees for final fields > even > under a certain category of data races (this is where we get the "immutable > objects are always thread-safe" rule from); primitives allow tearing under > race > for larger-than-32-bit values. We could characterize the design principles > behind these tradeoffs are "make objects safer, make primitives faster." > > The following figure illustrates the current universe of Java's types. The > upper left quadrant is the built-in primitives; the rest of the space is > reference types. In the upper-right, we have the abstract reference types > -- > abstract classes, interfaces, and `Object` (which, though concrete, acts > more > like an interface than a concrete class). The built-in primitives have > wrappers > or boxes, which are reference types. > >
> > Current universe of Java field
> types > >
> > Valhalla aims to unify primitives and objects such that both are declared > with > classes, but maintains the special runtime characteristics -- flatness and > density -- that primitives currently enjoy. > > ### Primitives and boxes today > > The built-in primitives are best understood as _pairs_ of types: the > primitive > type (`int`) and its reference companion type (`Integer`), with built-in > conversions between the two. The two types have different characteristics > that > makes each more or less appropriate for a given situations. Primitives are > optimized for efficient storage and access: they are monomorphic, not > nullable, > tolerate uninitialized (zero) values, and larger primitive types (`long`, > `double`) may tear under racy access. The box types add back the > affordances of > references -- nullity, polymorphism, interoperation with generics, and > initialization safety -- but at a cost. > > Valhalla generalizes this primitive-box relationship, in a way that is more > regular and extensible and reduces the "boxing tax". > > ## Eliminating unwanted object identity > > Many impediments to optimization stem from _unwanted object identity_. For > many > classes, not only is identity not directly useful, it can be a source of > bugs. > For example, due to caching, `Integer` can be accidentally compared > correctly > with `==` just often enough that people keep doing it. Similarly, > [value-based > classes][valuebased] such as `Optional` have no need for identity, but pay > the > costs of having identity anyway. > > Valhalla allows classes to explicitly disavow identity by declaring them as > _value classes_. The instances of a value class are called _value > objects_. > > ``` > value class Point implements Serializable { > int x; > int y; > > Point(int x, int y) { > this.x = x; > this.y = y; > } > > Point scale(int s) { > return new Point(s*x, s*y); > } > } > ``` > > This says that an `Point` is a class whose instances have no identity. As > a > consequence, it must give up the things that depend on identity; the class > and > its fields are implicitly final. Additionally, operations that depended on > identity must either be adjusted (`==` on value objects compares state, not > identity) or disallowed (it is illegal to lock on a value object.) > Just for broad understandability, you might want to address here "but then how could a reference 'identify' what object it's pointing to?" Value classes can still have most of the affordances of classes -- fields, > methods, constructors, type parameters, superclasses (with some > restrictions), > nested classes, class literals, interfaces, etc. The classes they can > extend > are restricted: `Object` or abstract classes with no instance fields, empty > no-arg constructor bodies, no other constructors, no instance > initializers, no > synchronized methods, and whose superclasses all meet this same set of > conditions. (`Number` is an example of such an abstract class.) > > Because `Point` has value semantics, `==` compares by state rather than > identity. This means that value objects, like primitives, are _freely > copyable_; we can explode them into their fields and re-aggregate them into > another value object, and we cannot tell the difference. > It feels like if this wants to rest some stuff on "comparing by state" it ought to explain here what that means? Or, I guess at least a forward reference. It seems pretty important to understand that it means shallow fieldwise delegation back to `==` again, meaning that fields of identity types are still identity-compared. In many contexts "value semantics" and "comparing by state" tend to only make sense if done recursively/deeply. > So far we've addressed the first two lines in our table of differences; > rather > than all objects having identity, classes can opt into, or out of, object > identity for their instances. By allowing classes to exclude unwanted > identity, > we free the runtime to make better layout and compilation decisions. > > ### Example: immutable cursors > > Collections today use `Iterator` to facilitate traversal through the > collection, > which store iteration state in mutable fields. While heroic optimizations > such > as _escape analysis_ can sometimes eliminate the cost associated with > iterators, > such optimizations are fragile and hard to rely on. Value objects offer an > iteration approach that is more reliably optimized: immutable cursors. > (Without > value objects, immutable cursors would be prohibitively expensive for > iteration.) > > ``` > value class ArrayCursor { > T[] array; > int offset; > > public ArrayCursor(T[] array, int offset) { > this.array = array; > this.offset = offset; > } > > public ArrayCursor(T[] array) { > this(array, 0); > } > > public boolean hasNext() { > return offset < array.length; > } > > public T next() { > return array[offset]; > } > > public ArrayCursor advance() { > return new ArrayCursor(array, offset+1); > } > } > ``` > > In looking at this code, we might mistakenly assume it will be > inefficient, as > each loop iteration appears to allocate a new cursor: > > ``` > for (ArrayCursor c = new ArrayCursor<>(array); > c.hasNext(); > c = c.advance()) { > // use c.next(); > } > ``` > > In reality, we should expect that _no_ cursors are actually allocated > here. An > `ArrayCursor` is just its two fields, and the runtime is free to scalarize > the > object into its fields and hoist them into registers. The calling > convention > for `advance` is optimized so that both receiver and return value are > scalarized. Even without inlining `advance`, no allocation will take > place, > just some shuffling of the values in registers. And if `advance` is > inlined, > the client code will compile down to having a single register increment and > compare in the loop header. > > ### Migration > > The JDK (as well as other libraries) has many [value-based > classes][valuebased] > such as `Optional` and `LocalDateTime`. Value-based classes adhere to the > semantic restrictions of value classes, but are still identity classes -- > even > though they don't want to be. Value-based classes can be migrated to true > value > classes simply by redeclaring them as value classes, which is both source- > and > binary-compatible. > This gave me a slight "huh, then what's the catch?" reaction. It might make more sense by adding the fact right away that any errant usages (that don't adhere to the VBC requirements) will start failing at runtime, and might cause compilation warnings? We plan to migrate many value-based classes in the JDK to value classes. > Additionally, the primitive wrappers can be migrated to value classes as > well, > making the conversion between `int` and `Integer` cheaper; see "Migrating > the > legacy primitives" below. (In some cases, this may be _behaviorally_ > incompatible for code that synchronizes on the primitive wrappers. [JEP > 390][jep390] has supported both compile-time and runtime warnings for > synchronizing on primitive wrappers since Java 16.) > Putting this in parens under the topic of the primitive wrappers feels like "pulling a fast one". Like it's pretending that this incompatibility problem is somehow unique to those 8 classes, hoping people won't notice "wait a minute, *any* class hopeful of future migration would have the same desire to opt into such warnings in advance." (And for more than just synchronization.) I get that there is no current plan to solve that problem, but we could be more up-front about that? (Cross-reference my earlier agitations about this in a thread called "we need help migrating from bucket 1 to 2...", maybe a couple months ago.)
> > Java field types adding value
> classes > >
> > ### Identity-sensitive operations > > Certain operations are currently defined in terms of object identity. As > we've > already seen, some of these, like equality, can be sensibly extended to > cover > all instances. Others, like synchronization, will become partial. > Identity-sensitive operations include: > > - **Equality.** We extend `==` on references to include references to > value > objects. Where it currently has a meaning, the new definition > coincides > with that meaning. > > - **System::identityHashCode.** The main use of `identityHashCode` is > in the > implementation of data structures such as `IdentityHashMap`. We can > extend > `identityHashCode` in the same way we extend equality -- deriving a > hash on > value objects from the hash of all the fields. > > - **Synchronization.** This becomes a partial operation. If we can > statically detect that a synchronization will fail at runtime > (including > declaring a `synchronized` method in a value class), we can issue a > compilation error; if not, attempts to lock on a value object results > in > `IllegalMonitorStateException`. This is justifiable because it is > intrinsically imprudent to lock on an object for which you do not have > a > clear understanding of its locking protocol; locking on an arbitrary > `Object` or interface instance is doing exactly that. > > - **Weak, soft, and phantom references.** Capturing an exotic reference > to a > value object becomes a partial operation, as these are intrinsically > tied to > reachability (and hence to identity). However, we will likely make > enhancements to `WeakHashMap` to support mixed identity and value > keys. > > ### Value classes and records > > While records have a lot in common with value classes -- they are final and > their fields are final -- they are still identity classes. Records embody > a > tradeoff: give up on decoupling the API from the representation, and in > return > get various syntactic and semantic benefits. Value classes embody another > tradeoff: give up identity, and get various semantic and performance > benefits. > If we are willing to give up both, we can get both sets of benefits, by > declaring a _value record_. > > ``` > value record NameAndScore(String name, int score) { } > ``` > > Value records combine the data-carrier idiom of records with the improved > scalarization and flattening benefits of value classes. > > In theory, it would be possible to apply `value` to certain enums as well, > but > this is not currently possible because the `java.lang.Enum` base class that > enums extend do not meet the requirements for superclasses of value > classes (it > has fields and non-empty constructors). > > ### Value and reference companion types > > Value classes are generalizations of primitives. Since primitives have a > reference companion type, value classes actually give rise to _pairs_ of > types: > a value type and a reference type. We've seen the reference type already; > for > the value class `ArrayCursor`, the reference type is called `ArrayCursor`, > just > as with identity classes. The full name for the reference type is > `ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for that. > (This > aliasing is what allows value-based classes to be compatibly migrated to > value > classes.) > It's more than just that: it's what unifies all classes together! They all define a reference type, always with the same name as the class. That's nice, unchanging solid ground under our feet while all the Valhalla shifts are going on. It would make more sense to me if `ArrayCursor.ref` were the alias to `ArrayCursor`, and it would be appropriate for the reader to wonder "why do we even need that alias?". > The value type is called `ArrayCursor.val`, and the two types have the > same conversions between them as primitives do today with their boxes. The > default value of the value type is the one for which all fields take on > their > default value; the default value of the reference type is, like all > reference > types, null. We will refer to the value type of a value class as the > _value > companion type_. > ... because it acts as a companion to the reference type you've always known. (At least, *I* still really don't want people to think that both the value type and the reference types are "companions" to the class that defined them.) Just as with today's primitives and their boxes, the reference and value > companion types of a value class differ in their support for nullity, > polymorphism, treatment of uninitialized variables, and safety guarantees > under > race. Value companion types, like primitive types, are monomorphic, > non-nullable, tolerate uninitialized (zero) values, and (under some > circumstances) may tear under racy access. Reference types are > polymorphic, > nullable, and offer the initialization safety guarantees for final fields > that > we have come to expect from identity objects. > > Unlike with today's primitives, the "boxing" and "unboxing" conversions > between > the reference and value companion types are not nearly as heavy or > wasteful, > because of the lack of identity. A variable of type `Point.val` holds a > "bare" > value object; a variable of type `Point.ref` holds a _reference to_ a value > object. For many use cases, the reference type will offer good enough > performance; in some cases, it may be desire to additionally give up the > affordances of reference-ness to make further flatness and footprint > gains. See > [Performance Model](05-performance-model) for more details on the specific > tradeoffs. > > In our diagram, these new types show up as another entity that straddles > the > line between primitives and identity-free references, alongside the legacy > primitives: > > ** UPDATE DIAGRAM ** > >
> > Java field types with extended
> primitives > >
> > ### Member access > > Both the reference and value companion types have the same members. > Maybe worth acknowledging "(even those, like `wait()` inherited from `Object`, that don't make sense and will fail at runtime, for simplicity's sake)". > Unlike > today's primitives, value companion types can be used as receivers to > access > fields and invoke methods (subject to the usual accessibility > constraints): > > ``` > Point.val p = new Point(1, 2); > assert p.x == 1; > > p = p.scale(2); > assert p.x == 2; > ``` > I think it is worth acknowledging that this does lead to `5.toString()` becoming valid and functioning code, which happens just for consistency and not because it was a goal in itself. > ### Polymorphism > > An identity class `C` that extends `D` sets up a subtyping (is-a) > relationship > between `C` and `D`. For value classes, the same thing happens between its > _reference type_ and the declared supertypes. (Reference types are > polymorphic; value types are not.) This means that if we declare: > > ``` > value class UnsignedShort extends Number > implements Comparable { > ... > } > ``` > > then `UnsignedShort` is a subtype of `Number` and > `Comparable`, > and we can ask questions about subtyping using `instanceof` or pattern > matching. > What happens if we ask such a question of the value companion type? > > ``` > UnsignedShort.val us = ... > if (us instanceof Number) { ... } > ``` > > Since subtyping is defined only on reference types, the `instanceof` > operator > (and corresponding type patterns) will behave as if both sides were lifted > to > the appropriate reference type (unboxed), and then we can appeal to > subtyping. > (This may trigger fears of expensive boxing conversions, but in reality no > actual allocation will happen.) > > We introduce a new relationship between types based on `extends` / > `implements` > clauses, which we'll call "extends": we define `A extends B` as meaning `A > <: B` > when A is a reference type, and `A.ref <: B` when A is a value companion > type. > The `instanceof` relation, reflection, and pattern matching are updated to > use > "extends". > > ### Array covariance > > Arrays of reference types are _covariant_; this means that if `A <: B`, > then > `A[] <: B[]`. This allows `Object[]` to be the "top array type" -- but > only for > arrays of references. Arrays of primitives are currently left out of this > story. We unify the treatment of arrays by defining array covariance > over the > new "extends" relationship; if A _extends_ B, then `A[] <: B[]`. This > means > that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when we > migrate the > primitive types to be value classes, then `Object[]` is finally the top > type for > all arrays. (When the built-in primitives are migrated to value classes, > this > means `int[] <: Integer[] <: Object[]` too.) > I think it's worth addressing that this does mean there will be `Integer[]` and `Object[]` instances that can't store null, failing at runtime, but that this is consistent with the existing quirks of array covariance. ### Equality > > For values, as with primitives, `==` compares by state rather than by > identity. > Two value objects are `==` if they are of the same type and their fields > are > pairwise equal, where equality is defined by `==` for primitives (except > `float` > and `double`, which are compared with `Float::equals` and `Double::equals` > to > avoid anomalies), `==` for references to identity objects, and recursively > with > `==` for references to value objects. In no case is a value object ever > `==` to > an identity object. > > When comparing two object _references_ with `==`, they are equal if they > are > both null, or if they are both references to the same identity object, or > they > are both references to value objects that are `==`. (When comparing a > value > type with a reference type, we treat this as if we convert the value to a > reference, and proceed as per comparing references.) This means that the > following will succeed: > > ``` > Point.val p = new Point(3, 4); > Point pr = p; > assert p == pr; > ``` > > The base implementation of `Object::equals` delegates to `==`, which is a > suitable default for both reference and value classes. > This is where you could appeal to the idea that `==` has always meant "strictly indistinguishable by any means" and this preserves that meaning (modulo float/double weirdness). ### Serialization > > If a value class implements `Serializable`, this is also really a statement > about the reference type. Just as with other aspects described here, > serialization of value companions can be defined by converting to the > corresponding reference type and serializing that, and reversing the > process at > deserialization time. > It's nonobvious to me why the reference type is being elevated as the primary one here, except that of course a method like `writeObject` is only going to be fed the reference type. I would have expected just that serializability applies equally to both types in the same way, much like invoking some method on both types. Serialization currently uses object identity to preserve the topology of an > object graph. This generalizes cleanly to objects without identity, > because > `==` on value objects treats two identical copies of a value object as > equal. > So any observations we make about graph topology prior to serialization > with > `==` are consistent with those after deserialization. > > ## Refining the value companion > > Value classes have several options for refining the behavior of the value > companion type and how they are exposed to clients. > > ### Classes with no good default value > > For a value class `C`, the default value of `C.ref` is the same as any > other > reference type: `null`. For the value companion type `C.val`, the default > value > is the one where all of its fields are initialized to their default value > (0 for > numbers, false for boolean, null for references.) > > The built-in primitives reflect the design assumption that zero is a > reasonable > default. The choice to use a zero default for uninitialized variables was > one > of the central tradeoffs in the design of the built-in primitives. It > gives us > a usable initial value (most of the time), and requires less storage > footprint > than a representation that supports null (`int` uses all 2^32 of its bit > patterns, so a nullable `int` would have to either make some 32 bit signed > integers unrepresentable, or use a 33rd bit). This was a reasonable > tradeoff > for the built-in primitives, and is also a reasonable tradeoff for many > other > potential value classes (such as complex numbers, 2D points, half-floats, > etc). > You might not want to go into the following. But I hope that users will understand that the numeric types really do clear a pretty high bar here. They are fortunate that for the *two* most popular reduction operations over those types, zero happens to be the correct identity for one of them, and absolutely destructive to the other (i.e., making it at least easy to detect the bug). If not for *both* of those facts we would have more and worse bugs in the world. But for other potential value classes, such as `LocalDate`, there simply > _is_ no > reasonable default. If we choose to represent a date as the number of days > since some some epoch, there will invariably be bugs that stem from > uninitialized dates; we've all been mistakenly told by computers that > something > that never happened actually happened on or near 1 January 1970. Even if > we > could choose a default other than the zero representation as a default, an > uninitialized date is still likely to be an error -- there simply is no > good > default date value. > > For this reason, value classes have the choice of _encapsulating_ their > value > companion type. If the class is willing to tolerate an uninitialized > (zero) > value, it can freely share its `.val` companion with the world; if > uninitialized > values are dangerous (such as for `LocalDate`), the value companion can be > encapsulated to the class or package, and clients can use the reference > companion. Encapsulation is accomplished using ordinary access control. > By > default, the value companion is `private` to the value class (it need not > be > declared explicitly); a class that wishes to share its value companion more > broadly can do so by declaring it explicitly: > > ``` > public value record Complex(double real, double imag) { > public value companion Complex.val; > } > ``` > I think you should add that the name `Complex.val` can't be changed here, much like you can't change the name of a constructor even though it *looks* like you could. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Jul 27 21:09:54 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 27 Jul 2022 14:09:54 -0700 Subject: Question about universal type variables In-Reply-To: References: Message-ID: On Wed, Jul 27, 2022 at 12:22 PM Brian Goetz wrote: > The main question of this email is: if T is a universal type variable, > then *what kind of type* is that? Is it a reftype, a valtype, or > something else? > > It is a type of indeterminate ref-ness or val-ness. > This is to merely assert that Model 1 is correct. But I was asking for a fair consideration of both models and a discussion of *why* one is better than the other. It's not clear whether that was understood. I think this is worth some serious consideration, because having to say that there are three kinds of types now in Java would be quite disappointing. Your message continues with what purports to be justification for Model 1 over 2, I assume?, but it's only describing behavior (that is already understood from the JEP draft) that would behave the same way under either model. So I don't see what argument it's making. The behavior isn't in question, just the conceptual model. So, note that in a generic class today, there's no way to "summon" any > value of T *except null*. You can't say `new T()` or `new T[3]` or > `T.class.newInstance()`. The values of T (except null) always come from > *outside the house*, and their T-ness is backed up by synthetic casts > (modulo heap pollution). An ArrayList starts out empty, then someone > puts T's in it, at which point the ArrayList can turn around and hand those > Ts back. But it can't make any new Ts. All it needs to know is that T is > layout-compatible with the layout of the bound. > Yes, of course. I don't even *expect* it to be an inherent property of a type that it necessarily knows how to conjure up instances of itself. After all, most of your examples don't work for a plain old abstract-class type either. Overall I'm not grasping what this is trying to argue about the question I've raised. Thanks! -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Jul 27 21:59:43 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 27 Jul 2022 23:59:43 +0200 (CEST) Subject: Question about universal type variables In-Reply-To: References: Message-ID: <47625260.16070671.1658959183339.JavaMail.zimbra@u-pem.fr> > From: "Robbe Pincket" > To: "Brian Goetz" > Cc: "Valhalla Expert Group Observers" , > "Kevin Bourrillion" > Sent: Wednesday, July 27, 2022 9:37:21 PM > Subject: RE: Question about universal type variables > On Wed Jul 27 19:22:51 UTC 2022, Brian Goetz wrote: > > It is a type of indeterminate ref-ness or val-ness. This will have both > restrict some old behavior and provide some new behavior. For example: > > T t = null; > > will generate an unchecked warning, since we can't be sure we're not > polluting the heap with nulls. > I seem to remember there being talks about `IdentityObject` and `ValueObject` > interfaces or something similar, is this still planned. No, it's not anymore. > If so, would the warning go away if `T` has a typebound of `IdentityObject` and > become an error if it has a typebound of `ValueObject` > Regards > Robbe Pincket There are still cases where because of the bound of the type variable, a value class is not possible - if the bound is a class - if the bound is an abstract class with at least a constructor. but i'm not sure it's a good idea to change the type checking depending on those property. regards, R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Jul 27 22:25:15 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 27 Jul 2022 22:25:15 +0000 Subject: Question about universal type variables In-Reply-To: References: Message-ID: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> > On Jul 27, 2022, at 11:48 AM, Kevin Bourrillion wrote: > > The main question of this email is: if T is a universal type variable, then what kind of type is that? Is it a reftype, a valtype, or something else? > > I can see two main options for how to answer that, which I think follow naturally from the two already-existing models for how developers might conceptualize type variables. > > These existing models, first: > > Model 1: A type variable is a mere placeholder that "will be" some other type later. When you interact with it, you're "really" interacting with the future type argument. If asked a question like "is an `E` inside `class ArrayList` a `Number`?" this model would say "well, it might be or it might not be". > > Model 2: A type variable is a bona fide local type in its own right. It is or becomes no other type but itself. Its simple job is just to enforce whatever restrictions it needs to in order to preserve its substitutability for any type argument in-bounds. If asked the same question as above, "is an `E` inside `class ArrayList` a `Number`?" this model would say "no, it is certainly not, but it does guarantee to be substitutable for `Number`, among other types." > > I would describe Model 2 as being close to the JLS view of the world, but in a way, Model 1 is the very illusion that Model 2 is working to create. I certainly expect the majority of developers think like Model 1 most of the time, and most of the time it's okay. I'm not *totally* sure I grasp all the differences, but here are a couple of observations that seem to support Model 2: - At compile time, type checking, overload resolution, etc., is performed with respect to a single type variable type, not some sort of "for all possible types..." analysis. If you invoke a method on an interface bound, you're really invoking that method of the interface, not (directly) whatever method gets implemented by an instantiation of the type variable. - At run time, the code is executed in terms of a single runtime type (the erased bound). Specialization, a JVM concept, may allow different species to carry different "type restrictions" attached to their parameter types, but those type restrictions *add* type information, they don't *contradict* type information. If the descriptor says LObject, a type restriction might say "this is an LObject that is known to be convertible to QPoint", but it won't say "instead of an LObject, this is actually a QPoint". (I'm muddling language and JVM models here, but you get the idea: generic code is executed generically.) From forax at univ-mlv.fr Wed Jul 27 22:48:35 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 28 Jul 2022 00:48:35 +0200 (CEST) Subject: Question about universal type variables In-Reply-To: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> References: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> Message-ID: <1358310800.16072693.1658962115813.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "daniel smith" > To: "Kevin Bourrillion" > Cc: "valhalla-spec-experts" > Sent: Thursday, July 28, 2022 12:25:15 AM > Subject: Re: Question about universal type variables >> On Jul 27, 2022, at 11:48 AM, Kevin Bourrillion wrote: >> >> The main question of this email is: if T is a universal type variable, then what >> kind of type is that? Is it a reftype, a valtype, or something else? >> >> I can see two main options for how to answer that, which I think follow >> naturally from the two already-existing models for how developers might >> conceptualize type variables. >> >> These existing models, first: >> >> Model 1: A type variable is a mere placeholder that "will be" some other type >> later. When you interact with it, you're "really" interacting with the future >> type argument. If asked a question like "is an `E` inside `class ArrayList` >> a `Number`?" this model would say "well, it might be or it might not be". >> >> Model 2: A type variable is a bona fide local type in its own right. It is or >> becomes no other type but itself. Its simple job is just to enforce whatever >> restrictions it needs to in order to preserve its substitutability for any type >> argument in-bounds. If asked the same question as above, "is an `E` inside >> `class ArrayList` a `Number`?" this model would say "no, it is certainly >> not, but it does guarantee to be substitutable for `Number`, among other >> types." >> >> I would describe Model 2 as being close to the JLS view of the world, but in a >> way, Model 1 is the very illusion that Model 2 is working to create. I >> certainly expect the majority of developers think like Model 1 most of the >> time, and most of the time it's okay. > > I'm not *totally* sure I grasp all the differences, but here are a couple of > observations that seem to support Model 2: > > - At compile time, type checking, overload resolution, etc., is performed with > respect to a single type variable type, not some sort of "for all possible > types..." analysis. If you invoke a method on an interface bound, you're really > invoking that method of the interface, not (directly) whatever method gets > implemented by an instantiation of the type variable. > > - At run time, the code is executed in terms of a single runtime type (the > erased bound). Specialization, a JVM concept, may allow different species to > carry different "type restrictions" attached to their parameter types, but > those type restrictions *add* type information, they don't *contradict* type > information. If the descriptor says LObject, a type restriction might say "this > is an LObject that is known to be convertible to QPoint", but it won't say > "instead of an LObject, this is actually a QPoint". (I'm muddling language and > JVM models here, but you get the idea: generic code is executed generically.) For the runtime part, there are also operations (opcodes) that will do something different depending on the type argument (or a type derived from the type argument), by example, creating an array, instantiating a generic, asking for a default value at runtime returns different values depending on the type argument at runtime. The types restrictions are one of those operations which are expressed as an attribute instead of as an opcode because it also helps the calling convention (monomorphization), but at runtime a type restriction will do a checkcast (or an asType (or an explicitCast)) depending on the type argument, R?mi From brian.goetz at oracle.com Thu Jul 28 14:35:08 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 28 Jul 2022 10:35:08 -0400 Subject: Question about universal type variables In-Reply-To: References: Message-ID: On 7/27/2022 5:09 PM, Kevin Bourrillion wrote: > On Wed, Jul 27, 2022 at 12:22 PM Brian Goetz > wrote: > >> The main question of this email is: if T is a universal type >> variable, then /what kind of type/?is that? Is it a reftype, a >> valtype, or something else? > It is a type of indeterminate ref-ness or val-ness. > > > This is to merely assert that Model 1 is correct. But I was asking for > a fair consideration of both models and a discussion of *why* one is > better than the other. It's not clear whether that was understood. I wanted to recap the decisions that we've already made about *how* it works, before stepping onto the philosophical playing field.? Its not something we've discussed a lot, and wanted to make sure there were no misconceptions about how works.? (For example, it's easy to assume that "of course" things like `new T[3]` and `new T(foo)` might work under specialization, though these are fairly presumptuous assumptions.) > I think this is worth some serious consideration, because having to > say that there are three kinds of types now in Java would be quite > disappointing. I don't think that type variables are actually a "kind" of type at all, in the way you are thinking.? In type theory, generics are modeled as quantification.? The standard model for this is "System F", which extends the simply typed lambda calculus to support abstraction over types as well as terms.? (See https://en.wikipedia.org/wiki/System_F, though as usual the Wikipedia presentation is maximally offputting, I find the TAPL presentation (23.3) to be clearer) Recall that in lambda calculus, we have abstraction terms: ??? \lambda arg . body and application terms: ??? function argument which is defined by substitution: ??? (\lambda a . y) z -> [a/z]y which means "substitute z for a in y", being disciplined about not having multiple variables of the same name at different levels. Abstraction over types is the same thing, except that we can define lambdas that take *types*, and produce new types.? So System F adds "type abstraction" and "type application" to live alongside term abstraction (lambdas) and term application (lambda application). We would model the generic method identity: ??? T identity(T x) { return x; } as ??? \lambda t . ( \lambda x : t . x ) This says: identity abstracts over types; to apply it, first we have to apply a type t, which yields a function t -> t, which we can then apply to terms of type t as usual.? A common presentation is to use a separate operator (capital lambda, or \forall) to distinguish abstracted types from abstracted terms. We would write this in Haskell as ??? identity :: forall t . t -> t ??? identity x = x This interpretation says that type variables are not types; they are placeholders for types, in the same way that arguments are placeholders for terms, and before we can do anything with something that has placeholders, we have to fill in the placeholders with actual types.? (Systems that permit both subtyping and quantification will allow for _bounded_ quantification, which we can use to model Foo.)? Haskell helps / confuses us by not making us say the forall explicitly; if we say ??? identity :: t -> t it figures out that `t` is free in this type descriptor and sticks the `forall` in for us: ??? ghci> :t identity ??? identity :: forall {p}. p -> p OK, that was long-winded.? But my point is that a type variable a placeholder for types, and we're already familiar with refining the set of types that can go in the box (e.g., bounds).? We may add additional kinds of bounds (e.g., `Foo`) to constrain the range of T in ways that are useful to the compiler (e.g., a `ref` type variable could use null freely without warning). Alternately, if you want to give them a kind, you could give it a union kind (ref | val). -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 28 18:24:21 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 28 Jul 2022 14:24:21 -0400 Subject: Updated SoV, take 3 In-Reply-To: References: Message-ID: <39563834-80a0-5fec-5e9a-603eae4d1f5e@oracle.com> > > Java currently has eight built-in primitive types. Primitives > represent pure > _values_; any `int` value of "3" is equivalent to, and > indistinguishable from, > any other `int` value of "3".? Because primitives are "just their > bits" with no > ancillarly state such as object identity, they are _freely > copyable_; whether > there is one copy of the `int` value "3", or millions, doesn't > matter to the > execution of the program.? With the exception of the unusual > treatment of exotic > floating point values such as `NaN`, the `==` operator on > primitives performs a > _substitutibility test_ -- it asks "are these two values the same > value". > > > I've said this before, but I think both "substitutability" and > "sameness" just lead to more questions, and I'm not sure why we don't > appeal to distinguishability instead. Fair.? Substitutibility is neither a commonly understood concept, nor is it an official term in the spec, so happy to change this to something more intuitive.? That said, I'm not sure why you're down on "sameness"? > Java also has _objects_, and each object has a unique _object > identity_.? This > means that each object must live in exactly one place (at any > given time), and > this has consequences for how the JVM lays out objects in memory.? > Objects in > Java are not manipulated or accessed directly, but instead through > _object > references_.? Object references are also a kind of value -- they > encode the > identity of the object to which they refer, > > > Do we really want to invoke?identity here? That surprises me. That > suggests that a `ValueClass.ref` instance will have identity too. > Isn't it really only about the object being addressable or locatable > (some term like that)? Will adjust; this is more of an implementation detail anyway. > > This says that an `Point` is a class whose instances have no > identity.? As a > consequence, it must give up the things that depend on identity; > the class and > its fields are implicitly final.? Additionally, operations that > depended on > identity must either be adjusted (`==` on value objects compares > state, not > identity) or disallowed (it is illegal to lock on a value object.) > > > Just for broad understandability, you might want to address here "but > then how could a reference 'identify' what object it's pointing to?" Indeed, this is a tricky new concept; a reference to a thing that is not necessarily unique, but for which we can't distinguish between copies. > > Value classes can still have most of the affordances of classes -- > fields, > methods, constructors, type parameters, superclasses (with some > restrictions), > nested classes, class literals, interfaces, etc.? The classes they > can extend > are restricted: `Object` or abstract classes with no instance > fields, empty > no-arg constructor bodies, no other constructors, no instance > initializers, no > synchronized methods, and whose superclasses all meet this same set of > conditions.? (`Number` is an example of such an abstract class.) > > Because `Point` has value semantics, `==` compares by state rather > than > identity.? This means that value objects, like primitives, are _freely > copyable_; we can explode them into their fields and re-aggregate > them into > another value object, and we cannot tell the difference. > > > It feels like if this wants to rest some stuff on "comparing by state" > it ought to explain here what that means? Or, I guess at least a > forward reference. > It seems pretty important to understand that it means shallow > fieldwise delegation back to `==` again, meaning that fields of > identity types are still identity-compared. > In many contexts "value semantics" and "comparing by state" tend to > only make sense if done recursively/deeply. It's worse than that, because references to value objects get a deeper comparison than refs to identity objects.? I'll stay away from shallow/deep, but talk about fieldwise equivalence. > > > ### Migration > > The JDK (as well as other libraries) has many [value-based > classes][valuebased] > such as `Optional` and `LocalDateTime`.? Value-based classes > adhere to the > semantic restrictions of value classes, but are still identity > classes -- even > though they don't want to be.? Value-based classes can be migrated > to true value > classes simply by redeclaring them as value classes, which is both > source- and > binary-compatible. > > > This gave me a slight "huh, then what's the catch?" reaction. It might > make more sense by adding the fact right away that any errant usages > (that don't adhere to the VBC requirements) will start failing at > runtime, and might cause compilation warnings? The catch is twofold: ?- Clients that depend on that accidental identity despite the warning signs are in for a surprise (hello, Integer); ?- The ref companion gets the good name, which will surely annoy people The former should be viewed as an anti-catch, but not everyone will see it that way.? The latter will surely be spun as "why do you guys hate your users."? For which we'll tell them it was Kevin's idea. > > We plan to migrate many value-based classes in the JDK to value > classes. > Additionally, the primitive wrappers can be migrated to value > classes as well, > making the conversion between `int` and `Integer` cheaper; see > "Migrating the > legacy primitives" below.? (In some cases, this may be _behaviorally_ > incompatible for code that synchronizes on the primitive > wrappers.? [JEP > 390][jep390] has supported both compile-time and runtime warnings for > synchronizing on primitive wrappers since Java 16.) > > > Putting this in parens under the topic of the primitive wrappers feels > like "pulling a fast one". Like it's?pretending that this > incompatibility problem is somehow unique to those 8 classes, hoping > people won't notice "wait a minute, *any* class hopeful of future > migration would have the same desire to opt into such warnings in > advance." (And for more than just synchronization.) I get that there > is no current plan to solve that problem, but we could be more > up-front about that? I think it is just these eight classes, since in Java 8, we wrote this into the definition of value-based class (but couldn't back-apply that definition to these eight.)? But I can drop the parens if that helps :) > > Value classes are generalizations of primitives.? Since primitives > have a > reference companion type, value classes actually give rise to > _pairs_ of types: > a value type and a reference type.? We've seen the reference type > already; for > the value class `ArrayCursor`, the reference type is called > `ArrayCursor`, just > as with identity classes.? The full name for the reference type is > `ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for > that.? (This > aliasing is what allows value-based classes to be compatibly > migrated to value > classes.) > > > It's more than just that: it's what unifies all classes together! They > all define a reference type, always with the same name as the class. > That's nice, unchanging solid ground under our feet while all the > Valhalla shifts are going on. > > It would make more sense to me if `ArrayCursor.ref` were the alias to > `ArrayCursor`, and it would be appropriate for the reader to wonder > "why do we even need that alias?". Yes, and the answer is "we almost don't", except for type variables (T.ref). > The value type is called `ArrayCursor.val`, and the two types have the > same conversions between them as primitives do today with their > boxes.? The > default value of the value type is the one for which all fields > take on their > default value; the default value of the reference type is, like > all reference > types, null.? We will refer to the value type of a value class as > the _value > companion type_. > > > ... because it acts as a companion to the reference type you've always > known. > (At least, *I* still really don't want people to think that both the > value type and the reference types are "companions" to the class that > defined them.) I am thinking they companions to each other, we can be more explicit about this. > > > Both the reference and value companion types have the same members. > > > Maybe worth acknowledging?"(even those, like `wait()` inherited from > `Object`, that don't make sense and will fail at runtime, for > simplicity's sake)". It is not clear how pedantic to be here.? Do they have the same members, or are the members all on the ref type, and we just provide a convenient syntax / fast implementations for vals as receivers? The latter is closer to reality, but does that explanation help? > > I think it is worth acknowledging that this does lead to > `5.toString()` becoming valid and functioning code, which happens just > for consistency and not because it was a goal in itself. OK.? Another good thing that happens here is that we can write equals() methods uniformly: ??? return o instanceof Foo f && ??????? i.equals(f.i) && name.equals(f.name); and not have to worry about "is this a ref or a primitive".? Just use equals everywhere. > > > Arrays of reference types are _covariant_; this means that if `A > <: B`, then > `A[] <: B[]`.? This allows `Object[]` to be the "top array type" > -- but only for > arrays of references.? Arrays of primitives are currently left out > of this > story.?? We unify the treatment of arrays by defining array > covariance over the > new "extends" relationship; if A _extends_ B, then `A[] <: B[]`.? > This means > that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when > we migrate the > primitive types to be value classes, then `Object[]` is finally > the top type for > all arrays.? (When the built-in primitives are migrated to value > classes, this > means `int[] <: Integer[] <: Object[]` too.) > > > I think it's worth addressing that this does mean there will be > `Integer[]` and `Object[]` instances that can't store null, failing at > runtime, but that this is consistent with the existing quirks of array > covariance. Yep, same ASE > > > The base implementation of `Object::equals` delegates to `==`, > which is a > suitable default for both reference and value classes. > > > This is where you could appeal to the idea that `==` has always meant > "strictly indistinguishable by any means" and this preserves that > meaning (modulo float/double weirdness). Yep > > ### Serialization > > If a value class implements `Serializable`, this is also really a > statement > about the reference type.? Just as with other aspects described here, > serialization of value companions can be defined by converting to the > corresponding reference type and serializing that, and reversing > the process at > deserialization time. > > > It's nonobvious to me why the reference type is being elevated as the > primary one here, except that of course a method like `writeObject` is > only going to be fed the reference type. I would have expected just > that serializability applies equally to both types in the same way, > much like invoking some method on both types. It's a lot like members; we can define them to be the same on both, or we can define them to live on the ref.? A lot of things are simpler with the latter, but its not clear readers of this doc need to understand all that. > > The built-in primitives reflect the design assumption that zero is > a reasonable > default.? The choice to use a zero default for uninitialized > variables was one > of the central tradeoffs in the design of the built-in > primitives.? It gives us > a usable initial value (most of the time), and requires less > storage footprint > than a representation that supports null (`int` uses all 2^32 of > its bit > patterns, so a nullable `int` would have to either make some 32 > bit signed > integers unrepresentable, or use a 33rd bit).? This was a > reasonable tradeoff > for the built-in primitives, and is also a reasonable tradeoff for > many other > potential value classes (such as complex numbers, 2D points, > half-floats, etc). > > > You might not want to go into the following. But I hope that users > will understand that the numeric types really do clear a pretty high > bar here. They are fortunate that for the *two* most popular reduction > operations over those types, zero happens to be the correct identity > for one of them, and absolutely destructive to the other (i.e., making > it at least easy to detect the bug). If not for *both* of those facts > we would have more and worse bugs in the world. Yeah, it's not obvious how much algebra is helpful here.? I mostly want to make the point that zero wasn't chosen at random; its the default you actually want, and if you got null, you probably wouldn't like it as much.? Agree about the high bar; Jan 1 1970 doesn't clear that bar. > > But for other potential value classes, such as `LocalDate`, there > simply _is_ no > reasonable default.? If we choose to represent a date as the > number of days > since some some epoch, there will invariably be bugs that stem from > uninitialized dates; we've all been mistakenly told by computers > that something > that never happened actually happened on or near 1 January 1970.? > Even if we > could choose a default other than the zero representation as a > default, an > uninitialized date is still likely to be an error -- there simply > is no good > default date value. > > For this reason, value classes have the choice of _encapsulating_ > their value > companion type.? If the class is willing to tolerate an > uninitialized (zero) > value, it can freely share its `.val` companion with the world; if > uninitialized > values are dangerous (such as for `LocalDate`), the value > companion can be > encapsulated to the class or package, and clients can use the > reference > companion.? Encapsulation is accomplished using ordinary access > control.? By > default, the value companion is `private` to the value class (it > need not be > declared explicitly); a class that wishes to share its value > companion more > broadly can do so by declaring it explicitly: > > ``` > public value record Complex(double real, double imag) { > ??? public value companion Complex.val; > } > ``` > > > I think you should add that the name `Complex.val` can't be changed > here, much like you can't change the name of a constructor even though > it *looks* like you could. I keep hoping that we'll come up with a brilliant replacement for X.val before that.... -------------- next part -------------- An HTML attachment was scrubbed... URL: