From john.r.rose at oracle.com Tue Aug 2 21:26:46 2022 From: john.r.rose at oracle.com (John Rose) Date: Tue, 02 Aug 2022 14:26:46 -0700 Subject: Updated SoV, take 3 In-Reply-To: References: Message-ID: <927C2482-28EC-4D25-ACBF-C0795D2FFC7D@oracle.com> Kevin, I generally agree with your comments; you are narrating the thought processes of that ?Alert Reader? we love to write for. I will start a separate thread on the issue of ?substitutability?, which needs some more discussion FTR. ? John From john.r.rose at oracle.com Tue Aug 2 21:48:50 2022 From: john.r.rose at oracle.com (John Rose) Date: Tue, 02 Aug 2022 14:48:50 -0700 Subject: object sameness, Leibniz's Law, substitutability, indistinguishability Message-ID: On 27 Jul 2022, at 13:22, Kevin Bourrillion wrote: > I've said this before, but I think both "substitutability" and > "sameness" just lead to more questions, and I'm not sure why we don't > appeal to distinguishability instead. The short answer is there is no one-magic-word-no-more-questions explanation for what we need to convey. More words will surely help, but at some point you have to stop talking. Except on Email threads. So, here?s a lot more talk from me on the subject! There are a number of words we could use here. ?Substitutability? is (AFAICR) a bow to Liskov, who uses it to talk about correctness-preserving type changes at compile time, specifically substituting an argument type S for for a parameter type C where S<:C, when binding a method argument. From that POV we are bending it (just a little) to talk about result-preserving reference changes at runtime. But this term also occurs (along with the related term ?substitutivity?) in logic, law, and economics, with related meanings. In the context of software (and sometimes math and logic), ?x is substitutable for y? means ?x can replace y?, and in an absolute sense it captures the intuition that ?x is just as good as y (for any purpose we are now contemplating)?. We could certainly pick other words out of the dictionary, and/or negate the proposed predicate (defining x!=y instead of x==y) to allow other related concepts like ?[in-]distinguishability?, or ?sameness?, or ?indiscernability?, or perhaps simply ?unconditional equality?. But choosing another word will probably not refine our understanding of what?s going on; it will just shuffle around the connotations and emphases, making some folks happier and other folks more confused. Conversely, given a clearer shared understanding, we can then proceed to choose any of a range of reasonable words to refer to that understanding, including either or both of ?substitutable? and ?indistinguishable?, or some other word. But I?d like to back up first. Our problem with object equality, as I understand it, begins with the word ?same?. That little word ?same? is hardwired in the JLS and JVMS to talk about identity of many kinds of things: tokens, source locations, classes, interfaces, types, values, exceptions, packages, and more ? in both static and dynamic contexts (as appropriate). The word shows up several hundred times, often with strongly normative force. That is, the specification would be seriously broken without each such statement of ?sameness?. Occasionally sameness is defined explicitly, as in JLS 4.3.4,??When Reference Types Are the Same?. But often the meaning (including normative meanings) is left to the reader?s common sense, as in the many places the spec talks about ?same name? and ?same descriptor? (both referring to character sequences). But for object references the meaning of ?same? is left to the reader in JLS 15.21.3: > At run time, the result of == is true if the operand values are both > null or both refer to the same object or array; otherwise, the result > is false. It seems to me that, in order to sharpen up the meaning of that use of ?same? (as well as others), a very good tool is something called ?Leibniz?s Law?, which simply says that two things are the same if and only if there is no predicate that can distinguish them. That is, x=y ? ?F(Fx ? Fy). The application to software is immediate, since (in a software system) any predicate can be represented as a procedure which accepts an input and returns a boolean. (Obviously the procedure cannot just look at a random number or changing global variable, or throw something; it has to be immunized somehow against indeterminacy, lest our conceptual test harness fail to produce the kind of result we want.) And the law?s force is preserved if formulated in terms of many kinds of functions, rather than predicates: x=y ? ?f(fx = fy), as long as the second equality check is somehow more primitive; we Java folks could say it must return an identity class or primitive. So, for us, Leibniz?s Law defines that two software objects (or values) are ?the same? (identical in a purely logical sense) if and only if no reasonable software function can tell them apart, by returning different values for each of the two objects. From there, it is a short step (a) to recognize that pre-Valhalla reference equality satisfies Leibniz?s Law, and thus (b) to recognize that this state of affairs can (and should) be preserved by choosing a design much like we have today. The law breaks into two parts, each part being its own separate principle: x=y ? ?f(fx = fy) (If x equals y then applying any reasonable f to x and y gives the same answer.) This means that if two object references are evaluated (by the JLS and JVMS) as referring to ?the same object?, there must be no subsequent computation which somehow extracts two different results from the two references. Putting on our engineer?s hat, that means that if `acmp` reports equality, the two inputs must have the same class (f = `Object::getClass`) and pairwise ?same? fields (f = various `getfield` operations). Specifically, the JVM is not allowed to get away with comparing some pointers and returning ?false? if the pointers refer to structurally ?same? objects which happen to be duplicated in the heap. It must do the work of chasing the pointers. (This principle breaks when applied to `==` on floating point numbers, since +0.0 = -0.0 but f=`doubleToLongBits` returns distinct values.) x?y ? ?f(fx ? fy) (If x doesn?t equal y then you can find a reasonable f to apply to x and y, which gives different answers.) This converse means that if two object references are evaluated as *not* referring to the same object, then there must exist a computation which can derive distinct values from them. Specifically, we don?t just hardwire the `==` operator, much less the ?sameness? condition, to a user-defined `Foo::equal` method. Doing this would require an infinite regress in the body of the method, if (and when) it tries to ask if its operands are ?the same? using `==`. (This principle breaks when applied to `!=` on floating point numbers, since 0/0.=0/0. but there is no function which can distinguish 0/0. from itself.) Moving back to the problem at hand: Valhalla forces the specs to define `acmp` and `==` in terms of same-class-and-fields for value objects. It also (IMO) raises the question of whether we should try to tighten the existing spec for identity objects (since there?s now a clear contrast). I think it works to appeal to an object?s ?identity?, as a property of the object assigned at the first instant of the execution of `new` (the operator or bytecode), and differently from (?not the same as?) any other ?identity? from any other execution. The specs do not *need* to refer to Leibniz?s Law, nor even appeal to concepts of ?substitutability? or ?indistinguishability?, in order to be operationally complete and sound. Email archives will hold the present discussion basically forever, in case someone wants to inquire why it?s the way it is. But, the JVMS and JLS *should* (IMO) include some non-normative prose that encourages the reader to think about intuitions like ?once we know x==y, then x is as good as y everywhere?, and also ?if x==y in one place, it will be impossible to tell them apart somewhere else?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Aug 2 22:36:41 2022 From: john.r.rose at oracle.com (John Rose) Date: Tue, 02 Aug 2022 15:36:41 -0700 Subject: Updated SoV, take 3 In-Reply-To: References: Message-ID: On 26 Jul 2022, at 11:18, Brian Goetz wrote: > Yet another attempt at updating SoV to reflect the current thinking. > Please review. > > > # State of Valhalla > ## Part 2: The Language Model {.subtitle} > > #### Brian Goetz {.author} > #### July 2022 {.date} Here?s a big diff on the MD file. (I scraped the MD out of my mailer, which is an iffy proposition.) ``` > --- > a/Users/jrose/Projects/openjdk/valhalla-docs/site/design-notes/state-of-valhalla/02-object-model-take-3.md.~1~ > +++ > b/Users/jrose/Projects/openjdk/valhalla-docs/site/design-notes/state-of-valhalla/02-object-model-take-3.md > @@ -24,7 +24,7 @@ libraries, not as a language feature. > Java currently has eight built-in primitive types. Primitives > represent pure > _values_; any `int` value of "3" is equivalent to, and > indistinguishable from, > any other `int` value of "3". Because primitives are "just their > bits" with no > -ancillarly state such as object identity, they are _freely copyable_; > whether > +ancillary state such as object identity, they are _freely copyable_; > whether > there is one copy of the `int` value "3", or millions, doesn't matter > to the > execution of the program. With the exception of the unusual > treatment of exotic > floating point values such as `NaN`, the `==` operator on primitives > performs a > @@ -53,10 +53,10 @@ Primitives and objects currently differ in almost > every conceivable way: > | Primitives | Objects > | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values) | Identity > | > -| `==` compares values | `==` compares object > identity | > +| Operator `==` compares values | Operator `==` compares > object identity | > | Built-in | Declared in classes > | > | No members (fields, methods, constructors) | Members (including > mutable fields) | > -| No supertypes or subtypes | Class and interface > inheritance | > +| No inherited supertypes or subtypes | Class and interface > inheritance | > | Accessed directly | Accessed via object > references | > | Not nullable | Nullable > | > | Default value is zero | Default value is null > | > @@ -64,7 +64,7 @@ Primitives and objects currently differ in almost > every conceivable way: > | May tear under race | Initialization safety > guarantees | > | Have reference companions (boxes) | Don't need reference > companions | > -Primitives embody a number tradeoffs aimed at maximizing the > performance and > +Primitives embody a number of tradeoffs aimed at maximizing the > performance and > usability of the primitive types. Reference types default to `null`, > meaning > "referring to no object", and must be initialized before use; > primitives default > to a usable zero value (which for most primitives is the additive > identity) and > @@ -77,6 +77,7 @@ under a certain category of data races (this is > where we get the "immutable > objects are always thread-safe" rule from); primitives allow tearing > under race > for larger-than-32-bit values. We could characterize the design > principles > behind these tradeoffs are "make objects safer, make primitives > faster." > + > The following figure illustrates the current universe of Java's > types. The > upper left quadrant is the built-in primitives; the rest of the space > is > @@ -140,9 +141,10 @@ value class Point implements Serializable { > This says that an `Point` is a class whose instances have no > identity. As a > consequence, it must give up the things that depend on identity; the > class and > -its fields are implicitly final. Additionally, operations that > depended on > -identity must either be adjusted (`==` on value objects compares > state, not > -identity) or disallowed (it is illegal to lock on a value object.) > +its fields are implicitly final. Additionally, operations that > depend on > +identity are adjusted as necessary for value objects. (For example, > operator `==` on compares state not > +identity, and it is illegal to lock on a value object.) > + > Value classes can still have most of the affordances of classes -- > fields, > methods, constructors, type parameters, superclasses (with some > restrictions), > @@ -190,7 +192,7 @@ value class ArrayCursor { > return offset < array.length; > } > - public T next() { > + public T get() { > return array[offset]; > } > @@ -199,6 +201,12 @@ value class ArrayCursor { > } > } > ``` > + > In looking at this code, we might mistakenly assume it will be > inefficient, as > each loop iteration appears to allocate a new cursor: > @@ -224,8 +232,8 @@ compare in the loop header. > The JDK (as well as other libraries) has many [value-based > classes][valuebased] > such as `Optional` and `LocalDateTime`. Value-based classes adhere > to the > -semantic restrictions of value classes, but are still identity > classes -- even > -though they don't want to be. Value-based classes can be migrated to > true value > +semantic restrictions of value classes, but they still possess > identity -- even > +though they don't want it. Value-based classes can be migrated to > true value > classes simply by redeclaring them as value classes, which is both > source- and > binary-compatible. @@ -325,7 +333,7 @@ the reference and value > companion types are not nearly as heavy or wasteful, > because of the lack of identity. A variable of type `Point.val` > holds a "bare" > value object; a variable of type `Point.ref` holds a _reference to_ a > value > object. For many use cases, the reference type will offer good > enough > -performance; in some cases, it may be desire to additionally give up > the > +performance; in some cases, the discerning user may choose to give up > the > affordances of reference-ness to make further flatness and footprint > gains. See > [Performance Model](05-performance-model) for more details on the > specific > tradeoffs. > @@ -336,6 +344,7 @@ primitives: > ** UPDATE DIAGRAM ** > + >
> > Java field types with 
> extended primitives > @@ -381,15 +390,15 @@ if (us instanceof Number) { ... } > Since subtyping is defined only on reference types, the `instanceof` > operator > (and corresponding type patterns) will behave as if both sides were > lifted to > -the appropriate reference type (unboxed), and then we can appeal to > subtyping. > +the appropriate reference type (boxing any bare value), and then we > can appeal to subtyping. > (This may trigger fears of expensive boxing conversions, but in > reality no > actual allocation will happen.) > We introduce a new relationship between types based on `extends` / > `implements` > -clauses, which we'll call "extends": we define `A extends B` as > meaning `A <: B` > +clauses, which we'll call "`extends`": we define `A extends B` as > meaning `A <: B` > when A is a reference type, and `A.ref <: B` when A is a value > companion type. > The `instanceof` relation, reflection, and pattern matching are > updated to use > -"extends". > +`extends`. > ### Array covariance > @@ -397,24 +406,28 @@ Arrays of reference types are _covariant_; this > means that if `A <: B`, then > `A[] <: B[]`. This allows `Object[]` to be the "top array type" -- > but only for > arrays of references. Arrays of primitives are currently left out of > this > story. We unify the treatment of arrays by defining array > covariance over the > -new "extends" relationship; if A _extends_ B, then `A[] <: B[]`. > This means > +new `extends` relationship; if A `extends` B, then `A[] <: B[]`. > This means > that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when we > migrate the > primitive types to be value classes, then `Object[]` is finally the > top type for > all arrays. (When the built-in primitives are migrated to value > classes, this > means `int[] <: Integer[] <: Object[]` too.) > + > ### Equality > -For values, as with primitives, `==` compares by state rather than by > identity. > +For values, as with primitives, operator `==` compares by state > rather than by identity. > Two value objects are `==` if they are of the same type and their > fields are > -pairwise equal, where equality is defined by `==` for primitives > (except `float` > -and `double`, which are compared with `Float::equals` and > `Double::equals` to > -avoid anomalies), `==` for references to identity objects, and > recursively with > -`==` for references to value objects. In no case is a value object > ever `==` to > +pairwise the same, where sameness is defined by bitwise equality > (operator `==` for primitives except `float` > +and `double`, which are compared as if by `Float::equals` and > `Double::equals` to > +avoid anomalies), reference equality (operator `==`) for references > to identity objects (and for `null`), and recursively with > +operaetor `==` for references to value objects. In no case is a > value object ever `==` to > an identity object. > When comparing two object _references_ with `==`, they are equal if > they are > -both null, or if they are both references to the same identity > object, or they > +both `null`, or if they are both references to the same identity > object, or they > are both references to value objects that are `==`. (When comparing > a value > type with a reference type, we treat this as if we convert the value > to a > reference, and proceed as per comparing references.) This means that > the > @@ -489,6 +502,10 @@ public value record Complex(double real, double > imag) { > public value companion Complex.val; > } > ``` > + > ### Atomicity and tearing > @@ -534,8 +551,9 @@ public value record Complex(double real, double > imag) { > For classes like `Complex`, all of whose bit patterns are valid, this > is very > much like the choice around `long` in 1995. For other classes that > might have > nontrivial representational invariants -- specifically, invariants > that relate > -multiple fields, such as ensuring that a range goes from low to high > -- they > -likely want to stick to the default of atomicity. +multiple fields, > such as ensuring that a range goes from low to high -- > +the default of atomicity is likely to be a better choice. > + > ## Do we really need two types? > @@ -658,7 +676,7 @@ types: > | Primitives | Objects > | > | ------------------------------------------ | > ---------------------------------- | > | No identity (pure values) | Identity > | > -| `==` compares values | `==` compares object > identity | > +| Operator `==` compares values | Operator `==` compares > object identity | > | Built-in | Declared in classes > | > | No members (fields, methods, constructors) | Members (including > mutable fields) | > | No supertypes or subtypes | Class and interface > inheritance | > @@ -672,10 +690,21 @@ types: > The addition of value classes addresses many of these directly. > Rather than > saying "classes have identity, primitives do not", we make identity > an optional > characteristic of classes (and derive equality semantics from that.) > Rather > -than primitives being built in, we derive all types, including > primitives, from > +than primitives being built in, we derive all types, including > existing primitives and new primitive-like types, from > classes, and endow value companion types with the members and > supertypes > declared with the value class. Rather than having primitive arrays > be > monomorphic, we make all arrays covariant under the `extends` > relation. + > The remaining differences now become differences between reference > types and > value types: > @@ -687,21 +716,22 @@ value types: > | Default value is zero | Default value is > null | > | May tear under race, if declared `non-atomic` | Initialization > safety guarantees | > -The current dichotomy between primitives and references morphs to one > between > +The current dichotomy between primitive-like types and references > morphs to one between > value objects and references, where the legacy primitives become > (slightly > special) value objects, and, finally, "everything is an object". > ## Summary > -Valhalla unifies, to the extent possible, primitives and objects. > The > +Valhalla unifies, to the extent possible, primitives and objects and > introduces > +primitive-like types as optional companions to classes. The > following table summarizes the transition from the current world to > Valhalla. > | Current World | Valhalla > | > | ------------------------------------------- | > --------------------------------------------------------- | > | All objects have identity | Some objects have > identity | > -| Fixed, built-in set of primitives | Open-ended set of > primitives, declared via classes | > -| Primitives don't have methods or supertypes | Primitives are > classes, with methods and supertypes | > -| Primitives have ad-hoc boxes | Primitives have > regularized reference companions | > +| Fixed, built-in set of primitives | Open-ended set of > primitive-like types, declared via classes | > +| Primitives don't have methods or supertypes | Primitive-like types > are classes, with methods and supertypes | > +| Primitives have ad-hoc boxes | Primitive-like types > have regularized reference companions | > | Boxes have accidental identity | Reference companions > have no identity | > | Boxing and unboxing conversions | Primitive reference > and value conversions, but same rules | > | Primitive arrays are monomorphic | All arrays are > covariant | ``` -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ~/bin/diff-git -u -w /Users/jrose/Projects/openjdk/valhalla-docs/site/design-notes/state-of-valhalla/02-object-model-take-3.md.\~1\~ /Users/jrose/Projects/openjdk/valhalla-docs/site/design-notes/state-of-valhalla/02-object-model-take-3.md --- a/Users/jrose/Projects/openjdk/valhalla-docs/site/design-notes/state-of-valhalla/02-object-model-take-3.md.~1~ +++ b/Users/jrose/Projects/openjdk/valhalla-docs/site/design-notes/state-of-valhalla/02-object-model-take-3.md @@ -24,7 +24,7 @@ libraries, not as a language feature. Java currently has eight built-in primitive types. Primitives represent pure _values_; any `int` value of "3" is equivalent to, and indistinguishable from, any other `int` value of "3". Because primitives are "just their bits" with no -ancillarly state such as object identity, they are _freely copyable_; whether +ancillary state such as object identity, they are _freely copyable_; whether there is one copy of the `int` value "3", or millions, doesn't matter to the execution of the program. With the exception of the unusual treatment of exotic floating point values such as `NaN`, the `==` operator on primitives performs a @@ -53,10 +53,10 @@ Primitives and objects currently differ in almost every conceivable way: | Primitives | Objects | | ------------------------------------------ | ---------------------------------- | | No identity (pure values) | Identity | -| `==` compares values | `==` compares object identity | +| Operator `==` compares values | Operator `==` compares object identity | | Built-in | Declared in classes | | No members (fields, methods, constructors) | Members (including mutable fields) | -| No supertypes or subtypes | Class and interface inheritance | +| No inherited supertypes or subtypes | Class and interface inheritance | | Accessed directly | Accessed via object references | | Not nullable | Nullable | | Default value is zero | Default value is null | @@ -64,7 +64,7 @@ Primitives and objects currently differ in almost every conceivable way: | May tear under race | Initialization safety guarantees | | Have reference companions (boxes) | Don't need reference companions | -Primitives embody a number tradeoffs aimed at maximizing the performance and +Primitives embody a number of tradeoffs aimed at maximizing the performance and usability of the primitive types. Reference types default to `null`, meaning "referring to no object", and must be initialized before use; primitives default to a usable zero value (which for most primitives is the additive identity) and @@ -77,6 +77,7 @@ under a certain category of data races (this is where we get the "immutable objects are always thread-safe" rule from); primitives allow tearing under race for larger-than-32-bit values. We could characterize the design principles behind these tradeoffs are "make objects safer, make primitives faster." + The following figure illustrates the current universe of Java's types. The upper left quadrant is the built-in primitives; the rest of the space is @@ -140,9 +141,10 @@ value class Point implements Serializable { This says that an `Point` is a class whose instances have no identity. As a consequence, it must give up the things that depend on identity; the class and -its fields are implicitly final. Additionally, operations that depended on -identity must either be adjusted (`==` on value objects compares state, not -identity) or disallowed (it is illegal to lock on a value object.) +its fields are implicitly final. Additionally, operations that depend on +identity are adjusted as necessary for value objects. (For example, operator `==` on compares state not +identity, and it is illegal to lock on a value object.) + Value classes can still have most of the affordances of classes -- fields, methods, constructors, type parameters, superclasses (with some restrictions), @@ -190,7 +192,7 @@ value class ArrayCursor { return offset < array.length; } - public T next() { + public T get() { return array[offset]; } @@ -199,6 +201,12 @@ value class ArrayCursor { } } ``` + In looking at this code, we might mistakenly assume it will be inefficient, as each loop iteration appears to allocate a new cursor: @@ -224,8 +232,8 @@ compare in the loop header. The JDK (as well as other libraries) has many [value-based classes][valuebased] such as `Optional` and `LocalDateTime`. Value-based classes adhere to the -semantic restrictions of value classes, but are still identity classes -- even -though they don't want to be. Value-based classes can be migrated to true value +semantic restrictions of value classes, but they still possess identity -- even +though they don't want it. Value-based classes can be migrated to true value classes simply by redeclaring them as value classes, which is both source- and binary-compatible. @@ -325,7 +333,7 @@ the reference and value companion types are not nearly as heavy or wasteful, because of the lack of identity. A variable of type `Point.val` holds a "bare" value object; a variable of type `Point.ref` holds a _reference to_ a value object. For many use cases, the reference type will offer good enough -performance; in some cases, it may be desire to additionally give up the +performance; in some cases, the discerning user may choose to give up the affordances of reference-ness to make further flatness and footprint gains. See [Performance Model](05-performance-model) for more details on the specific tradeoffs. @@ -336,6 +344,7 @@ primitives: ** UPDATE DIAGRAM ** +
Java field types with extended primitives @@ -381,15 +390,15 @@ if (us instanceof Number) { ... } Since subtyping is defined only on reference types, the `instanceof` operator (and corresponding type patterns) will behave as if both sides were lifted to -the appropriate reference type (unboxed), and then we can appeal to subtyping. +the appropriate reference type (boxing any bare value), and then we can appeal to subtyping. (This may trigger fears of expensive boxing conversions, but in reality no actual allocation will happen.) We introduce a new relationship between types based on `extends` / `implements` -clauses, which we'll call "extends": we define `A extends B` as meaning `A <: B` +clauses, which we'll call "`extends`": we define `A extends B` as meaning `A <: B` when A is a reference type, and `A.ref <: B` when A is a value companion type. The `instanceof` relation, reflection, and pattern matching are updated to use -"extends". +`extends`. ### Array covariance @@ -397,24 +406,28 @@ Arrays of reference types are _covariant_; this means that if `A <: B`, then `A[] <: B[]`. This allows `Object[]` to be the "top array type" -- but only for arrays of references. Arrays of primitives are currently left out of this story. We unify the treatment of arrays by defining array covariance over the -new "extends" relationship; if A _extends_ B, then `A[] <: B[]`. This means +new `extends` relationship; if A `extends` B, then `A[] <: B[]`. This means that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when we migrate the primitive types to be value classes, then `Object[]` is finally the top type for all arrays. (When the built-in primitives are migrated to value classes, this means `int[] <: Integer[] <: Object[]` too.) + ### Equality -For values, as with primitives, `==` compares by state rather than by identity. +For values, as with primitives, operator `==` compares by state rather than by identity. Two value objects are `==` if they are of the same type and their fields are -pairwise equal, where equality is defined by `==` for primitives (except `float` -and `double`, which are compared with `Float::equals` and `Double::equals` to -avoid anomalies), `==` for references to identity objects, and recursively with -`==` for references to value objects. In no case is a value object ever `==` to +pairwise the same, where sameness is defined by bitwise equality (operator `==` for primitives except `float` +and `double`, which are compared as if by `Float::equals` and `Double::equals` to +avoid anomalies), reference equality (operator `==`) for references to identity objects (and for `null`), and recursively with +operaetor `==` for references to value objects. In no case is a value object ever `==` to an identity object. When comparing two object _references_ with `==`, they are equal if they are -both null, or if they are both references to the same identity object, or they +both `null`, or if they are both references to the same identity object, or they are both references to value objects that are `==`. (When comparing a value type with a reference type, we treat this as if we convert the value to a reference, and proceed as per comparing references.) This means that the @@ -489,6 +502,10 @@ public value record Complex(double real, double imag) { public value companion Complex.val; } ``` + ### Atomicity and tearing @@ -534,8 +551,9 @@ public value record Complex(double real, double imag) { For classes like `Complex`, all of whose bit patterns are valid, this is very much like the choice around `long` in 1995. For other classes that might have nontrivial representational invariants -- specifically, invariants that relate -multiple fields, such as ensuring that a range goes from low to high -- they -likely want to stick to the default of atomicity. +multiple fields, such as ensuring that a range goes from low to high -- +the default of atomicity is likely to be a better choice. + ## Do we really need two types? @@ -658,7 +676,7 @@ types: | Primitives | Objects | | ------------------------------------------ | ---------------------------------- | | No identity (pure values) | Identity | -| `==` compares values | `==` compares object identity | +| Operator `==` compares values | Operator `==` compares object identity | | Built-in | Declared in classes | | No members (fields, methods, constructors) | Members (including mutable fields) | | No supertypes or subtypes | Class and interface inheritance | @@ -672,10 +690,21 @@ types: The addition of value classes addresses many of these directly. Rather than saying "classes have identity, primitives do not", we make identity an optional characteristic of classes (and derive equality semantics from that.) Rather -than primitives being built in, we derive all types, including primitives, from +than primitives being built in, we derive all types, including existing primitives and new primitive-like types, from classes, and endow value companion types with the members and supertypes declared with the value class. Rather than having primitive arrays be monomorphic, we make all arrays covariant under the `extends` relation. + The remaining differences now become differences between reference types and value types: @@ -687,21 +716,22 @@ value types: | Default value is zero | Default value is null | | May tear under race, if declared `non-atomic` | Initialization safety guarantees | -The current dichotomy between primitives and references morphs to one between +The current dichotomy between primitive-like types and references morphs to one between value objects and references, where the legacy primitives become (slightly special) value objects, and, finally, "everything is an object". ## Summary -Valhalla unifies, to the extent possible, primitives and objects. The +Valhalla unifies, to the extent possible, primitives and objects and introduces +primitive-like types as optional companions to classes. The following table summarizes the transition from the current world to Valhalla. | Current World | Valhalla | | ------------------------------------------- | --------------------------------------------------------- | | All objects have identity | Some objects have identity | -| Fixed, built-in set of primitives | Open-ended set of primitives, declared via classes | -| Primitives don't have methods or supertypes | Primitives are classes, with methods and supertypes | -| Primitives have ad-hoc boxes | Primitives have regularized reference companions | +| Fixed, built-in set of primitives | Open-ended set of primitive-like types, declared via classes | +| Primitives don't have methods or supertypes | Primitive-like types are classes, with methods and supertypes | +| Primitives have ad-hoc boxes | Primitive-like types have regularized reference companions | | Boxes have accidental identity | Reference companions have no identity | | Boxing and unboxing conversions | Primitive reference and value conversions, but same rules | | Primitive arrays are monomorphic | All arrays are covariant | Diff finished (no differences). Tue Aug 2 15:35:19 2022 From kevinb at google.com Wed Aug 3 16:44:00 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Aug 2022 09:44:00 -0700 Subject: Question about universal type variables In-Reply-To: References: Message-ID: On Thu, Jul 28, 2022 at 7:35 AM Brian Goetz wrote: On 7/27/2022 5:09 PM, Kevin Bourrillion wrote: > > On Wed, Jul 27, 2022 at 12:22 PM Brian Goetz > wrote: > >> The main question of this email is: if T is a universal type variable, >> then *what kind of type* is that? Is it a reftype, a valtype, or >> something else? >> >> It is a type of indeterminate ref-ness or val-ness. >> > > This is to merely assert that Model 1 is correct. But I was asking for a > fair consideration of both models and a discussion of *why* one is better > than the other. It's not clear whether that was understood. > > I wanted to recap the decisions that we've already made about *how* it > works, > Fine starting point, I was just trying to prompt the second step. before stepping onto the philosophical playing field. Its not something > we've discussed a lot, and wanted to make sure there were no misconceptions > about how works. (For example, it's easy to assume that "of course" things > like `new T[3]` and `new T(foo)` might work under specialization, though > these are fairly presumptuous assumptions.) > > I think this is worth some serious consideration, because having to say > that there are three kinds of types now in Java would be quite > disappointing. > > > I don't think that type variables are actually a "kind" of type at all, in > the way you are thinking. In type theory, > I'm sure the theoretic argument is fine as far as it goes, but it's not much help for the end user. My issue is with the user model we present to the world; what "useful fictions" are we securing for them, that enable them to read and write code with confidence? I'm sure the notion that T is always a reference type would be initially surprising to many; maybe enough so that that makes it the wrong model. But I wanted to (re)state some advantages I see in it. (If some are built on misunderstandings, I'm hoping to shake those out.) *Some "T always a reference type" advantages:* * With subtype polymorphism, the user enjoys a solid understanding that "reference types are polymorphic, value types are monomorphic". As I'd put it: you can never have a value (say as a field) without statically knowing its exact type, because its exact type governs the shape and interpretation of the bits actually making up the value. Don't know the exact type --> you need a reference. But parametric polymorphism (thanks for laying out these terms in the JEP draft, Dan) feels very similar! I'd expect the user to consult the same intuitions we just drilled into them about subtype polymorphism. It would be nice if the same simple rule held there too. * When my class gets used as `MyClass`, I would get to reason like so: * When that code runs on some JVM that doesn't do specialization yet, then my class gets used directly, so those `int`s are really `Integer`s; of course they are, because T is a reference type. (I expect I can't tear a value this way.) * When that code runs on some JVM that has specialization, then different "species" of my class are being forked off from my template, each one physically *replacing* T with some value type. So *those* are value types, but once again T is still a reference type. (And here I do expect tearing risk, for non-atomic types.) * If Java might ever to have non-nullable reference types, I suspect it might immediately expose this whole type variable issue as having been, at its essence, never really about ref-vs-val in the first place. What it's really about is that there used to be one value in the intersection of every Object type's value set, and now there isn't anymore. * The best way a user can prepare their generic class for becoming "universal" in the future is to adopt aftermarket nullness analysis (such as I'm working on standardizing the semantics for in JSpecify). They'll mark type parameters like `V extends @Nullable Object`, and methods like `Map.get` will return `@Nullable V`. That will shake out any obstacles up front. Then once V becomes a UTP, they'd just change that `V` to `V.ref`, and they could presumably drop the `@Nullable` too because `.ref` implies it (why else would it be used?). So the language feature you're introducing for ref-vs-val universality is immediately doing double duty, capturing nullness information for reference types too. This would probably mean rethinking the `T.ref` syntax to something that more closely evokes "T or null" (the fact this would, for an species, have to box to `Integer` in the process seems intuitive enough). -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Aug 3 17:37:41 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 3 Aug 2022 19:37:41 +0200 (CEST) Subject: Question about universal type variables In-Reply-To: References: Message-ID: <1734475855.18322004.1659548261151.JavaMail.zimbra@u-pem.fr> > From: "Kevin Bourrillion" > To: "Brian Goetz" > Cc: "valhalla-spec-experts" > Sent: Wednesday, August 3, 2022 6:44:00 PM > Subject: Re: Question about universal type variables > On Thu, Jul 28, 2022 at 7:35 AM Brian Goetz < [ mailto:brian.goetz at oracle.com | > brian.goetz at oracle.com ] > wrote: >> On 7/27/2022 5:09 PM, Kevin Bourrillion wrote: >>> On Wed, Jul 27, 2022 at 12:22 PM Brian Goetz < [ mailto:brian.goetz at oracle.com | >>> brian.goetz at oracle.com ] > wrote: >>>>> The main question of this email is: if T is a universal type variable, then what >>>>> kind of type is that? Is it a reftype, a valtype, or something else? >>>> It is a type of indeterminate ref-ness or val-ness. >>> This is to merely assert that Model 1 is correct. But I was asking for a fair >>> consideration of both models and a discussion of *why* one is better than the >>> other. It's not clear whether that was understood. >> I wanted to recap the decisions that we've already made about *how* it works, > Fine starting point, I was just trying to prompt the second step. >> before stepping onto the philosophical playing field. Its not something we've >> discussed a lot, and wanted to make sure there were no misconceptions about how >> works. (For example, it's easy to assume that "of course" things like `new >> T[3]` and `new T(foo)` might work under specialization, though these are fairly >> presumptuous assumptions.) >>> I think this is worth some serious consideration, because having to say that >>> there are three kinds of types now in Java would be quite disappointing. >> I don't think that type variables are actually a "kind" of type at all, in the >> way you are thinking. In type theory, > I'm sure the theoretic argument is fine as far as it goes, but it's not much > help for the end user. My issue is with the user model we present to the world; > what "useful fictions" are we securing for them, that enable them to read and > write code with confidence? > I'm sure the notion that T is always a reference type would be initially > surprising to many; maybe enough so that that makes it the wrong model. But I > wanted to (re)state some advantages I see in it. (If some are built on > misunderstandings, I'm hoping to shake those out.) > Some "T always a reference type" advantages: > * With subtype polymorphism, the user enjoys a solid understanding that > "reference types are polymorphic, value types are monomorphic". As I'd put it: > you can never have a value (say as a field) without statically knowing its > exact type, because its exact type governs the shape and interpretation of the > bits actually making up the value. Don't know the exact type --> you need a > reference. But parametric polymorphism (thanks for laying out these terms in > the JEP draft, Dan) feels very similar! I'd expect the user to consult the same > intuitions we just drilled into them about subtype polymorphism. It would be > nice if the same simple rule held there too. Here is an example where it's easy to see the difference between subtype polymorphism vs parametric polymorphism. At some point we will want T = void (or an unit type whatever it is exactly) so we can use the same functional interface for a function, a consumer or a producer. It only works if T is bound by something that has not polymorphic methods, because obviously void has none. It can be Object but it can not be a reference otherwise null will be a possible value. R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Wed Aug 3 17:53:41 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 3 Aug 2022 18:53:41 +0100 Subject: Question about universal type variables In-Reply-To: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> References: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> Message-ID: <2dd1bcbf-cac3-2c2c-8d18-db868baf578b@oracle.com> On 27/07/2022 23:25, Dan Smith wrote: > I'm not*totally* sure I grasp all the differences, but here are a couple of observations that seem to support Model 2: I'm not sure I grasp the differences between model 1 and 2 either. But if by Model 1 you simply mean that a type-variable is a place holder that the compiler knows nothing about, then I believe that model to be not an accurate description of what happens. For instance, a type variable still as _members_ and, because of that, it _has_ to be a type (at least in JLS-land). Note that, even in System-F, type-variables have _bounds_, so they are not mere placeholders and you can reason about (some of) their properties. That said, a type-variable, even today, does not expose all the properties of regular types. For instance, members of type-variables are defined in a different ways (e.g. by filtering out public members), and a type-variable cannot be the target of a cast (module warnings), or an instance test (modulo errors) expression. In the same way, universal type-variables cannot answer the question of "are you a ref or a val" (in the same way in which today's type variable cannot answer the question of: are you an Integer or a Double). Maurizio From maurizio.cimadamore at oracle.com Wed Aug 3 17:55:43 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 3 Aug 2022 18:55:43 +0100 Subject: Question about universal type variables In-Reply-To: <2dd1bcbf-cac3-2c2c-8d18-db868baf578b@oracle.com> References: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> <2dd1bcbf-cac3-2c2c-8d18-db868baf578b@oracle.com> Message-ID: <85c808d5-4b17-853d-992a-06b60bb43e1b@oracle.com> On 03/08/2022 18:53, Maurizio Cimadamore wrote: > (e.g. by filtering out public members) I obviously meant "non-public" members Maurizio From kevinb at google.com Wed Aug 3 19:00:54 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 3 Aug 2022 12:00:54 -0700 Subject: Question about universal type variables In-Reply-To: <2dd1bcbf-cac3-2c2c-8d18-db868baf578b@oracle.com> References: <327279B6-E6E7-4E95-93F0-E33FEAFAD9B3@oracle.com> <2dd1bcbf-cac3-2c2c-8d18-db868baf578b@oracle.com> Message-ID: On Wed, Aug 3, 2022 at 10:53 AM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: On 27/07/2022 23:25, Dan Smith wrote: > > I'm not*totally* sure I grasp all the differences, but here are a > couple of observations that seem to support Model 2: > > I'm not sure I grasp the differences between model 1 and 2 either. > This is probably because they are conceptual-model differences only -- differences in framing, influencing how we talk about things but usually leading to the same outcomes (because we rarely weigh "fits a better conceptual model" as a sufficient reason to *choose* that behavior; the model is usually just playing catch-up). For example: In the same way, universal type-variables cannot answer the question of > "are you a ref or a val" (in the same way in which today's type variable > cannot answer the question of: are you an Integer or a Double). > This would be Model 1 framing, whereas Model 2 might say "it's neither; it is itself, a type variable; the relevant question is what types it is *substitutable* to, or perhaps what *other* types its instances might have". It might be cleaner to think "T is preserving its substitutability for either a ref or val type" than either "T might *be* either a ref or val type" or "T is a special 'ref-or-val' type". -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccherlin at gmail.com Thu Aug 4 21:37:26 2022 From: ccherlin at gmail.com (Clement Cherlin) Date: Thu, 4 Aug 2022 16:37:26 -0500 Subject: Value type companions, encapsulated In-Reply-To: References: Message-ID: On Mon, Jul 25, 2022 at 7:29 AM John Rose wrote: > > In this message Brian wrote out the major features > of an emerging design for value classes: > > From: Brian Goetz brian.goetz at oracle.com > To: ? valhalla-spec-experts at openjdk.java.net > Subject: Re: User model stacking: current status > Date: Thu, 23 Jun 2022 15:01:24 -0400 > > I think controlling the complexity by having a separate > nested declaration of the value companion type will > work very well. > > So what exactly does a private value companion do? > What is it you can and cannot do with this type? > What problems are prevented by privatizing it? > How and when is privatization enforced? > What other problems are created by those new rules? > > I have been pulling on this thread for a few days > now, and I think I have some answers. > > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md > http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html > > (The Hitchhiker?s Guide suddenly comes to mind. Don?t panic!) > > I expect I will be editing these files as we go. > For reference here is a verbatim copy of the MD file > as it stands right now (minus the header): > [snipped] After reading most of this proposal, I have a few thoughts. The "encapsulated .val" design requires a tremendous amount of JVM machinery; non-denotable types; and exacting care from class authors to prevent zero values from escaping. I propose a simpler, more robust design. Every `non-zero flattened` class has a default value, chosen by the class author in one of the following ways: 1. A constant expression which is stamped in the class file and copied bit for bit by the JVM. 2. A non-constant static method which runs once at class loading time, the result of which is thereafter copied bit for bit by the JVM. There are far fewer problems safely constructing value objects than identity objects because value objects can't have super constructors, all fields are final, and if necessary, default value constructors can be constrained (similar to canonical record constructors) to have limited access to `this`. Rule: When creating an array or heap-allocated object that contains one or more `non-zero flattened` elements/fields, all `non-zero flattened` elements/fields must be pre-initialized by the JVM. If an explicit value is not provided, the class's default value is used. For heap-allocated objects, this pre-initialization must be performed before calling . For arrays, this pre-initialization must complete before the array is published. If a heap-allocated object's wants to initialize a `non-zero flattened` field with a dynamically computed value, it will need to overwrite the constant, pre-initialized value. This is no different from having to overwrite the pre-zeroed primitive/null values in an identity class constructor. While this proposal requires a modest amount of JVM and verification machinery to ensure zero-safety, that machinery is localized to the nuts and bolts of creating heap-allocated instances and arrays. There is no need to perform extensive visibility or permissions checks, it avoids a new "companion" keyword, and all the class author needs to do is write a simple default declaration. Note that the machinery to ensure heap-allocated instances and arrays of `non-zero flattened` classes is *still required* in the "encapsulated .val" world, but must be repeated by the authors of every individual `non-zero flattened` class in perpetuity, instead of being implemented correctly, once, in the JVM. Cheers, Clement Cherlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Aug 8 17:25:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 8 Aug 2022 13:25:10 -0400 Subject: Question about universal type variables In-Reply-To: References: Message-ID: <37f91c4f-764f-d7dd-544d-1469e891f5c4@oracle.com> Let's try and separate the various things going on, and then we can see if there are attractive fictions we want to spin.? First, let's talk about kinds of polymorphism.? Cardelli and Wegner's "On understanding types, data abstraction, and polymorphism" (1985) divides polymorphism into a hierarchy (though these distinctions predate this paper): Polymorphism ??? Universal ??????? Parametric ??????? Inclusion ??? Ad Hoc ??????? Overloading ??????? Coercion Inclusion polymorphism is subtyping; the value set of String is included in the value set of Object. Coercion polymorphism is conversions; we can use a `short` where an `int` is called for, because we can summon an `int` with the same value as the `short` at will. Overloading refers to the fact that we can declare `f(int)` and `f(long)` so that at the use site, `f` appears to take multiple types. (Pattern matching fits into ad-hoc polymorphism, but in Java it gets funneled through the other forms first.? Union types are another form of ad-hoc polymorphism.) The special behavior of `null` could be explained by multiple paths: ?- Subtyping, with the Null type as a bottom type for all reference types, or ?- Ad-hoc, where it is understood that `null` is in the value set of every reference type, and treating an unbounded T as an (infinite) union type. I think the latter path for explaining null is more useful in general, and it is probably closer to what the JLS actually says - this is how interfaces "inherit" members like Object::equals.? (I think it also offers a more useful way to talk about non-nullable reference types, but I'll come back to that.) Java exhibits all of these forms of polymorphism.?? Parametric and inclusion are on prominent display, but the others are there too, and coercion is actually quite relevant, both in general and to the point you are making which is about how the user thinks about what it means to instantiate a generic class with `int`.? I think we will need all of these tools to get to "everything is an Object" (which I think we agree is a desirable unification.) A `String` is an Object through inclusion.? An `int` is an Object through coercion; if we have an `int` and we need an `Object`, we can box the `int` to `Integer`.? Today we do this only for assignment, but going forward we will do this in other contexts, such as member access (e.g., `1.toString()`, equality, array covariance, and serialization.? We heal the multiple rifts through a combination of subtyping and coercion. So, in the world of universal type variables, what is a T?? I claim it is a union over the set of all types that conform to T's bound. Today this includes only reference types, but once we extend bounds conformance to admit types that are convertible to T's bound, this includes value types as well. This union offers a rational explanation for why we can say `t.toString()` -- because `toString()` is a member of every type in the union (when enumerate the members of a union type, you take the _intersection_ of the members in all the types in the union).? We leave it to the implementation as to how to actually dispatch `toString()`, which will be different depending on whether we specialize `Foo` or not.? It also offers a rational explanation of why `T` has `null` in its value set today -- and why we are going to adjust this to generate unchecked warnings tomorrow -- because now we'll be intersecting in some types that don't have null.? The same is true for `synchronized` -- which has nothing to do with reference vs value, but with identity -- and again, we're now adding new types to the union that don't have a previously universal property. The union model is based on the "stand in" model -- T can stand for some unknown type, so you can at most do things on a T that you can do on *all* the unknown types.? (Even when we get to specialized generics, we might still not allow all such operations, such as `new T[n]`; the union offers an upper bound on what is sensible, but languages can be more restrictive.) The best way I've found to think about types like `String!` in Java is as _refinement types_.? (See Liquid Haskell (https://ucsd-progsys.github.io/liquidhaskell-tutorial/), or Clojure Spec (https://clojure.org/guides/spec)).? A refinement type takes a type and a _predicate_ which refines its value set, such as "even integer", and can contain arbitrary predicative logic.? The compiler then attempts to prove the desired properties (easier in functional languages).? In other words, the type `String!` takes as its base the reference type `String`, along with a predicate `s -> s != null`.? Taking away the null doesn't change the reference-ness of it, it just restricts the value set. Interestingly, the languages that have the most direct claim to modifiers like `!` and `?` treat them as _cardinalities_, such as X# and to a lesser degree XSL.? In X#, where "everything is a sequence", cardinality modifiers are: refinement types!? They constrain the length of the sequence (imagine a refinement type on List which said "size() > 3".) We're clearly never going to plunk for arbitrary predicative logic in our type system and the theorem provers that come with them, but ad-hoc predicates like "not null", "has identity" and "is reference" are already swimming under the surface of the type system we have, and we'll see more like this when we get to specialization (where we will model specialized instantiations as refinements rather than substitution.) OK, with that as background, let's dive into your mail. > I'm sure the theoretic argument is fine as far as it goes, but it's > not much help for the end user. My issue is with the user model we > present to the world; what "useful fictions" are we securing for them, > that enable them to read and write code with confidence? One locus of potential fiction is what we mean by "is" in "everything is an Object".? If a T is an Object, do we really just mean "things that are subtypes of Object", or do we mean "things that can be bounded by Object" (which includes value types via conversion/coercion, rather than via subtyping.)? I think ultimately the latter is more helpful, because when someone says `ArrayList`, what they really want is an ArrayList that is backed by a long[], with all the non-nullability, flatness, and tearability that long already has.? `ArrayList` can be thought of something that "has Ts" in it; if we are substituting in T=long, we will want all the properties of long because that allows for greater compositionality of semantics. > *Some "T always a reference type" advantages:* > > * With subtype polymorphism, the user enjoys a solid understanding > that "reference types are polymorphic, value types are monomorphic". > As I'd put it: you can never have a value (say as a field) without > statically knowing its exact type, because its exact type governs the > shape and interpretation of the bits actually making up the value. > Don't know the exact type --> you need a reference. But > parametric?polymorphism (thanks for laying out these terms in the JEP > draft, Dan) feels very similar! I'd expect the user to consult the > same intuitions we just drilled into them about subtype polymorphism. > It would be nice if the same simple rule held there too. I think this tries to flip around "reference types are polymorphic" into "polymorphic types are references."?? T is polymorphic, users will get that without trouble.? But does it have to be inclusion polymorphism?? I think it is an ad-hoc union between coercion (value types) and inclusion (reference types). If we push towards the fiction of "they're all reference types", then Foo really means Foo, with all the nullability and tearability differences between long and Long. > * When my class gets used as `MyClass`, I would get to reason > like so: > ? ? * When that code runs on some JVM that doesn't do specialization > yet, then my class gets used directly, so those `int`s are really > `Integer`s; of course they are, because T is a reference type. (I > expect I can't tear a value this way.) I would say it differently: in this world, `long` *erases to* `Object`, just as `String` does.? Which means it will inherit some of the properties of Object that String doesn't have, such the chance for heap pollution.? Similarly, when we erase `long` to `Object`, we pick up some of these properties too, including the additional chance of null pollution, as well as some atomicity we didn't ask for.? But that's because of the erasure, not for any intrinsic property of type variables.? And the compiler will try to claw back some of that nullability with unchecked warnings anyway, just as we try to claw back some of the vectors for heap pollution. The nullity of T is the same erasure-driven pollution we already know and tolerate. > ? ? * When that code runs on some JVM that has specialization, then > different "species" of my class are being forked off from my template, > each one physically /replacing/?T with some value type. So /those/?are > value types, but once again T is still a reference type. (And here I > do expect tearing risk, for non-atomic types.) When I specialize `Foo`, any T-valued fields or arrays or method parameters really are long, with all the characteristics of long.? Treating them as references (which have properties long doesn't have) seems more confusing.? "Placeholder, which collapses to its instantiation" feels more natural here? > * If Java might ever to have non-nullable reference types, I suspect > it might immediately expose this whole type variable issue as having > been, at its essence, never really about ref-vs-val in the first > place. What it's really about is that there used to be one value in > the union of every Object type's value set, and now there isn't anymore. Agree -- it was always about the union of types / intersection of properties of those types.? Null used to be in that intersection, but now things got more complicated -- but doesn't this argue against the reference interpretation, and towards the placeholder/union interpretation? > * The best way a user can prepare their generic class for becoming > "universal" in the future is to adopt aftermarket nullness analysis > (such as I'm working on standardizing the semantics for in JSpecify). > They'll mark type parameters like `V extends?@Nullable Object`, and > methods like `Map.get` will return `@Nullable V`. That will shake out > any obstacles up front. Then once V becomes a UTP, they'd just change > that `V` to `V.ref`, and they could presumably drop the `@Nullable` > too because `.ref` implies it (why else would it be used?). So the > language feature you're introducing for ref-vs-val universality is > immediately doing double duty, capturing nullness information for > reference types too. > > This would probably mean rethinking the `T.ref` syntax to something > that more closely evokes "T or null" (the fact this would, for an > species, have to box to `Integer` in the process seems intuitive > enough). Open to finding a better way to spell "T or null"; I think the path to this involves having this conversation converge :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Aug 10 03:46:05 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 10 Aug 2022 03:46:05 +0000 Subject: EG meeting, 2022-08-10 Message-ID: <421CB3EA-D194-44BC-8597-8EDBC4017876@oracle.com> EG Zoom meeting August 10 at 4pm UTC (9am PDT, 12pm EDT). Lots of recent threads that could be further discussed: - "Question about universal type variables": Kevin started a discussion about how type variable types should be modeled, and what changes when they become universal - "Updated SoV, take 3": Brian revised the State of Valhalla document to reflect recent design ideas - "object sameness, Lebniz's Law, ...": John elaborated on SoV review comments regarding value object equality/substitutability - "The storage hint model": Remi shared thoughts about using a storage attribute, rather than a value type, to encode flatness - "The problem with encapsulating C.val + autoboxing": Remi discussed the treatment of access-restricted value types in generics - "where are all the objects?": John and Kevin discussed usages of the terms "object" and "instance" - "one class, two types, many bikesheds": John discussed how we model classes vs. types, the relationship of ref and val types, and how syntax like .ref and .val might be used - "Value type companions, encapsulated": John shared a document describing how access restrictions could be enforced on value types - "races on flat values": John discussed how the memory model needs to be updated to describe concurrent accesses of flat variables From forax at univ-mlv.fr Wed Aug 10 12:28:22 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 10 Aug 2022 14:28:22 +0200 (CEST) Subject: EG meeting, 2022-08-10 In-Reply-To: <421CB3EA-D194-44BC-8597-8EDBC4017876@oracle.com> References: <421CB3EA-D194-44BC-8597-8EDBC4017876@oracle.com> Message-ID: <1413578036.20357848.1660134502462.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "daniel smith" > To: "valhalla-spec-experts" > Sent: Wednesday, August 10, 2022 5:46:05 AM > Subject: EG meeting, 2022-08-10 > EG Zoom meeting August 10 at 4pm UTC (9am PDT, 12pm EDT). > > Lots of recent threads that could be further discussed: > > - "Question about universal type variables": Kevin started a discussion about > how type variable types should be modeled, and what changes when they become > universal > > - "Updated SoV, take 3": Brian revised the State of Valhalla document to reflect > recent design ideas > > - "object sameness, Lebniz's Law, ...": John elaborated on SoV review comments > regarding value object equality/substitutability > > - "The storage hint model": Remi shared thoughts about using a storage > attribute, rather than a value type, to encode flatness > > - "The problem with encapsulating C.val + autoboxing": Remi discussed the > treatment of access-restricted value types in generics > > - "where are all the objects?": John and Kevin discussed usages of the terms > "object" and "instance" > > - "one class, two types, many bikesheds": John discussed how we model classes > vs. types, the relationship of ref and val types, and how syntax like .ref and > .val might be used > > - "Value type companions, encapsulated": John shared a document describing how > access restrictions could be enforced on value types > > - "races on flat values": John discussed how the memory model needs to be > updated to describe concurrent accesses of flat variables Sadly, i will not be available. I think that most of the arguments of Brian about T not being nullable by default could be applied to value type too (as an exercise, replace T by value type when re-reading Brian's email). For me, it seems that either we go with value type being nullable by default and T being nullable by default or we don't for both. If we choose that both value types and type variables are nullable by default, then we are very close to the storage hint model (it's a List of Integer but at runtime it uses an array of ints). I think the storage hint model is far simpler than the Q-type model, it means less change for the VM at the price of paying a lightweight box price each time we cross an inlining horizon (sorry John). I really do not like the idea of a private companion type, it introduces a strong couple between value type and universal generics and i do not like any of the options proposed by John to implement it. regards, R?mi From heidinga at redhat.com Wed Aug 10 14:38:09 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 10 Aug 2022 10:38:09 -0400 Subject: races on flat values In-Reply-To: <4A145916-740D-4C22-AC54-808280403753@oracle.com> References: <4A145916-740D-4C22-AC54-808280403753@oracle.com> Message-ID: Thanks for writing this up, John. This matches my thinking overall. Looking for confirmation on one point below related to synchronized blocks & non-atomic values. On Wed, Jul 13, 2022 at 11:32 PM John Rose wrote: > > So, let?s talk about the Java Memory Model and consequent JVM support for flattenable types. > > (Racing on flats is either really cool, as on the Bonneville Salt Flats, or else really dumb, if the flats are your flat tires. Either image will do here. Races are a generally bad idea, but sometimes they are how the cool kids get the best performance, just barely avoiding application-defined failure modes.) > > When a value is atomic, and/or when a reference type is used (not the C.val companion), I think there does not need to be any impact on the JMM. It is always enough to say that a load or store of the value behaves as if (two very important words: ?as if?) the value were separately buffered in the heap, and accessed only via a safely published pointer. > > Consider a racing read of a composite (multi-field) value from a variable V of type C.val, that has also been set from two or more racing writes V=X, V=Y. The ?as if? rule implies that the read will see either X with its fields X.*F* or Y with its fields Y.*F*, because X and Y behave ?as if? they are references to separate buffered instances of class C. (Either of the writes could be a buffered snapshot of aboriginal value C.default.) This is all straight out of the JMM?s repertoire of scenarios. > > The JVM can try to pull off various shenanigans under the covers, as long as the user observes behavior ?as if? the values were buffered in the heap. This is a good starting point for implementors, and it takes us quickly into special cases where the structure of the .*F* fields is simple enough. If they all pack in an atomically readable/writable memory unit (64 or 128 bits typically), then the JVM implementor can choose to quietly maintain the illusion of heap buffering, while storing a composite like X.*F* by packing the .*F* in a single memory unit, but omitting any physical reference for X. And then of course it doesn?t matter whether X existed in the first place or not, so don?t allocate it either. > > There?s more where that came from. An example would be an optimistic technique which stores both X and some of the .*F* in 128 bits, plus the rest of the .*F* in more bits; some sort of pointer encoding would say whether X must be followed (to a different heap block) or else (as the optimist hopes) all the .*F* can be read, perhaps helped by with a fast seq-lock or some other exclusion of races. I?m just getting started here, but it gets very very tricky and performance can stall out quickly. > > As has been mentioned already, nullable reference types can potentially be flattened as well as regular value types, and this would require encoding an extra ?null channel? in addition to the .*F* fields of the value being pointed to. This could be a byte, or a bit, or less than a bit if there is ?slack? in any of the other .*F* fields. So a tri-state OptionalBoolean could still be just one byte, but an OptionalLong is cursed with the need to find that 65th bit. There?s a lot to say about potential implementations under that benign ?as if? rule. With respect to JMM, the implementor has to ensure that if a default value C.default is never stored into a C.ref flat container, no possible race can observe that value. This is a special hazard, depending on the order in which the null channel is read and written, since an all zero flat C.ref might race to a C.default value if the null channel were set to the not-null state, before any .*F* components were stored. The safest way to stay within bounds is to use what Brian calls ?half-flat? formats that fit in 128 bits (or whatever is the natural unit for a platform) and load and store those formats with atomic instructions. This was already true for the C.val flats discussed above, but it becomes that much harder when you have the null channel along as a hitchhiker. > > Those were the preliminaries; now we come to non-atomic value types of the form C.val. When such a variable is flattened fully (this is Brian?s ?full flat? option), new kinds of races can create torn values. A full account of these races requires new rules within the JMM, describing what loads and stores (and other events) look like when they involve the new value types. > > I think we can say that a variable (field or array element) of a non-atomic value type (but no other type) decomposes into independent sub-variables, one for each field. This has to happen recursively, for fields that are themselves Q-types. Each sub-variable is an independent variable of the JMM. > > Maybe that is enough to build up various useful and interesting events and relations (read, write, happens-before, etc.), or maybe not. This is regardless of whether the JVM actually flattens; again it is all ?as if?, but this time with more races. > > So, suppose a read (getfield or aaload) grabs the .*F* field values from some variable V, into which was previously stored both X (including by reference all of X.*F*) and racing with X also Y (including by reference Y.*F*). Let?s call these variables V.*F*. I think we should then break the single read and both writes into one read and two writes per field (of C). > > I think the JMM can ignore the references X and Y and just track the individual read and write events. Having null out of the picture helps us forget the pointers as well. > > From the POV of the write, a thread decided on a whole bundle of field values and stored them, one by one, into the separate .*F* variables. > > The more exotic events of the JMM can, I think, simply be distributed from V to the sub-variables V.*F* in a regular way, just as we distributed the plain reads and writes. > > This seems workable. Let?s test it by considering a hybrid scenario where the class C includes a slowly-varying field and a rapidly-varying one. Maybe an array cursor: > > value record C(Object[] array; int index) { > public non-atomic companion type C.val; > public boolean hasNext() { return index < array.length; } > public Object get() { return array[index]; } > public C next() { check(); return new C(array, index+1); } > private void check() { if (!hasNext()) throw new Error(); } > } > > (Such a type is reasonably safe to use even with a public value type: The failure modes are comparable to those you get if you race an iterator for an ArrayList. One might even expect its null-capable reference type to flatten nicely in 64 or 128 bits. But that?s beside my point here.) > > Suppose I have a full-flat container of C.val and I have two racing writes; the first set up the variable from scratch, and the second changes the index but not the array (say, by calling next). > > static final Object[] ARR = {22, 33, 55, 77}; > static C.val V = new C(ARR, 0); > void T1() { V = V.next(); } > void T2() { System.out.println(V); } > // T1 and T2 execute concurrently > > The effect I would like is for a racing read to receive either index value, but always the same array value, as long as all racing writes have contributed the same array value. I think this is true for the example code above. What do you think? > > The reason I want this effect is I want to enable an optimization like this: > > void T1optimized() { > if (false) V = V.next(); //original code version > if (false) V = new C(V.array, V.index+1); //inlining > if (false) V = V __WithSubVariables { //inlining > array = V.array; > index = V.index+1; > } > boolean MAYBE = false; > if (MAYBE) { //unbundling sub-variables > V.*array = V.*array; //useless store, kill it if you can > V.*index = V.*index+1; > } else { > V.*index += 1; //32-bit memory update > } > } > > To get to the pleasing end result, I think the JIT has to work through the intermediate phases, and have permission to stop at any of them at any time. So I want the JIT to have the option (at its own whim) to either make a 32-bit memory update of just the V.*index sub-variable, or else a larger update to both sub-variables (the if (MAYBE) block in the example). > > This will have no effect in the simple scenario described above. But in more complex ones, where the V.*array sub-variable is changing as well, the JMM will allow arbitrarily strange mismatches between fields, such as a really obsolete array and the latest index (into a different array). This could happen if a racing composite write stalled just before writing V.*array, waiting until just about now, wrote a really old value, and then stalled permanently before writing the associated V.*index value. Meanwhile more normal threads are writing more or less coherent array/index pairs, but suddenly a racing read can pick up a recent index and the very old array component. > > I guess this is all true whether or not the JIT makes that final optimization step of nullifying a useless write. So (to finish with this laborious example) I guess the JIT has all it needs to optimize the processing of non-atomic full-flat values, without straying from the JMM (which is very permissive in the presence of races). > > One thing to observe in passing is that when a method like C::next runs, it has a value this which is on the stack, not in heap. This means that there can be no races on the fields of this during the execution of any method. So as the body of any C method executes, the fields this.array and this.index cannot change. This is as one would expect from final fields. But it?s true even if the original copy of this is being concurrently trashed by racing writes. This means the JIT cannot treat race-prone heap containers as spilled copies of this, to be reloaded at leisure. Agreed. A field (V.*F) must be "sticky" after having been read and so we can't re-read from the heap containers. > It has to pick up any and all fields that it might need just once per field needed in an inlined method call. It might kill dead stores, of course. If the class of this is atomic, it must use an atomic to pick up this.*F* (or the parts needed) at one time; otherwise it can pick up the needed parts of this.*F* as needed, but at most once per part. I think this is right but I'm still slightly uncomfortable with picking up the "needed parts of this.*F* as needed" as it can greatly expand the race window in user code in ways that aren't obvious from the source code. Since this only applies to racy programs, the correct answer for the programmer - regardless of the size of the window - is to properly synchronize the access. In this model, when do the reads of fields actually occur for synchronized blocks? In the following code, all C.val fields used after the block must be read during the synchronized block, correct? static C.val sharedVal = ....; C.val myVal; synchronized(someLock) { myVal = sharedVal; } ... myVal.array ... ... myVal.index Assuming all the subsequent field reads through myVal are guaranteed to have been privatized by the end of the synchronized block, then I think the model makes sense and sounds right to me. @Tobi, have the rest of the OpenJ9 JIT developers weighed in on this model yet? > > Does the partial write technique sketched here work for atomic flat values? I wish it would but I think it doesn?t, in general. Suppose that C above (the cursor class) were atomic, as is surely more typical. If I update the V.*index sub-variable by itself, I have to make sure that the 32-bit update is atomic with respect to the neighboring V.*array variable. If the hardware allows me to mix 32-bit and 64-bit atomics on the same word, well and good; I can do the narrow update. But it probably won?t work very often, and perhaps the hardware would have trouble sorting out the conflicting update sizes. > > A special case of this is setting a half-flat C.ref variable to null. This should allow a narrow store (say just one byte to the null channel), leaving the other bytes as garbage to be dealt with later. (The GC can come along later and zero them out, kind of like with weak reference processing, but more certain and eager.) Doing this requires care in ordering the null checks. If the JIT sees the null channel set to the ?null=yes? state (probably a zero bit or byte) then the JIT needs to cover its eyes and ignore any other bits it picked up in the same atomic read, because they might be non-zero garbage marooned by a partial write to the null channel. Since writing a zero byte to memory is naturally atomic, the hardware might tolerate null-channel writes mixed in with full 64-bit and 128-bit reads. > > An optimistic narrow-word null check might work on the read side. If I read the null channel using a single-byte read, and observe the ?null=yes? state, I don?t need to read anything else. But, if I observe ?null=no? using the narrow read, and do the full-width atomic read of 64 or 128 bits, I need to check the null channel again, in the full read, since a null might have come into memory between my two memory operations. This is uncomfortably like the ?test twice? anti-pattern, but I think it actually works. Whether it is profitable is anybody?s guess. I put this in as another example of VM shenanigans behind the ?as if? rules. > > Partial reads from flat atomic variables are probably a good idea in general. (That is, as long as they don?t interfere with hardware?s graceful execution of the atomic write instructions that populate the variables in the first place.) If the cursor C is atomic, and I write V.index() (again V is of type C), the JIT doesn?t need to load the V.*array sub-variable, just the V.*index sub-variable. No atomicity failures can be observed even if V has racing writes, since you can believe any V.*array value you like came with your sample of V.*index. But methods which work on two or more fields must pick up all of the sub-variable values (for those fields) in an atomic operation. So V.hasNext(), which looks at both array.length and index, needs to pick up the bundle V.*{index,array} in a coherent manner, using an atomic 64-bit or 128-bit load. From john.r.rose at oracle.com Wed Aug 10 15:29:29 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 10 Aug 2022 08:29:29 -0700 Subject: races on flat values In-Reply-To: References: <4A145916-740D-4C22-AC54-808280403753@oracle.com> Message-ID: <2BF2D4B5-CC4D-4E4D-9937-6FDCA7FECF7C@oracle.com> On 10 Aug 2022, at 7:38, Dan Heidinga wrote: > Thanks for writing this up, John. > > This matches my thinking overall. Looking for confirmation on one > point below related to synchronized blocks & non-atomic values. > > On Wed, Jul 13, 2022 at 11:32 PM John Rose > wrote: >> >> So, let?s talk about the Java Memory Model and consequent JVM >> support for flattenable types. >> ? >> One thing to observe in passing is that when a method like C::next >> runs, it has a value this which is on the stack, not in heap. This >> means that there can be no races on the fields of this during the >> execution of any method. So as the body of any C method executes, the >> fields this.array and this.index cannot change. This is as one would >> expect from final fields. But it?s true even if the original copy >> of this is being concurrently trashed by racing writes. This means >> the JIT cannot treat race-prone heap containers as spilled copies of >> this, to be reloaded at leisure. > > Agreed. A field (V.*F) must be "sticky" after having been read and so > we can't re-read from the heap containers. Yet another way of saying this is that the JMM Read event that picked up some field value from the heap has a single, unambiguous value (as a bundle of scalar values, of course). Races come from consulting multiple JMM Reads, not from any possible variation in a single JMM Read. The value of a single JMM Read event can and must be modeled (by the JIT) as a bit pattern to be stored in registers or in spill slots, not a heap location subject to any extra JMM Read events. This is not new; the JIT never re-reads a field value that was already read by a `getfield` instruction, unless there is a second `getfield` instruction. >> It has to pick up any and all fields that it might need just once per >> field needed in an inlined method call. It might kill dead stores, of >> course. If the class of this is atomic, it must use an atomic to pick >> up this.*F* (or the parts needed) at one time; otherwise it can pick >> up the needed parts of this.*F* as needed, but at most once per part. > > I think this is right but I'm still slightly uncomfortable with > picking up the "needed parts of this.*F* as needed" as it can greatly > expand the race window in user code in ways that aren't obvious from > the source code. Since this only applies to racy programs, the > correct answer for the programmer - regardless of the size of the > window - is to properly synchronize the access. Yes. I guess my point is that a race from two back-to-back reads (of adjacent locations, say) is not qualitatively different from a race from the same two reads separated by other stuff. The two scenarios will show the same kinds of races, just at different rates (probabilities). > In this model, when do the reads of fields actually occur for > synchronized blocks? In the following code, all C.val fields used > after the block must be read during the synchronized block, correct? Correct. The JMM sees just one Read event that is trapped in the block by bracketing synchronization events. It doesn?t matter (for this point) that the value of the Read consists of two scalars (array/index). The value must be sampled between the block brackets. Looking closer, the Read event decomposes to two scalar sub-Reads. Those might happen in separate hardware instructions, or a single non-atomic hardware instruction. In such a case, a non-synchronized Write event in another thread, using a non-atomic sequence of instructions, could cause a race. (Same as with `long` tearing in today?s Java.) > static C.val sharedVal = ....; > > C.val myVal; > synchronized(someLock) { > myVal = sharedVal; > } > ... myVal.array ... > ... myVal.index > > Assuming all the subsequent field reads through myVal are guaranteed > to have been privatized by the end of the synchronized block, then I > think the model makes sense and sounds right to me. Thanks for giving it a close look! > @Tobi, have the rest of the OpenJ9 JIT developers weighed in on this > model yet? >> ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Aug 10 17:38:24 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 10 Aug 2022 10:38:24 -0700 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: I thought we had a good discussion this morning. Some random followups: I like the concept of identity as akin to an extra field that only identity classes have, very much. It should feel like an "extra" feature, and purely behavioral in nature. That means I like to untie identity from addressability; push back on the idea that a reference encodes the object's identity in any way. It's instead some opaque "location" and we have no further expectations about it. The VANO model does a lot of "doubling down" on the distinctions between objects and values, which does have a certain "stuck in the past" feeling to it. A benefit is "concreteness" and a feeling that "there is plenty of solid ground under your feet that Valhalla is not pulling out from under you". That feeling is important to me, particularly because developers in the real world will regularly find themselves going back and forth between pre-valhalla and post-valhalla code for a very long time. A hazard is that it equips developers to care more about that distinction that they really should *moving forward*, particularly since the VM is a cheating cheater. Essentially we're saying "for understanding, you can lean on what you know about int-vs-Integer, but now understand why the distinction will matter less and less to you." (John associates VANO with "boxes and arrows". I think that's right and good? It seems called for because an *arrow* is the thing that could be null instead.) The VAO model does seem more forward-thinking. Why should we invest in the question of when exactly objects "exist" or not when those objects can't be programmatically distinguished from each other anyway, nor from their corresponding values? The eventual "final" presentation of value classes to users (in the permanent documentation and the definitive seminal slide presentations, and to *some* degree in the JLS itself) should anchor on one model or the other. But these docs/presentations might also want to say "and here's why you get to think of it this other way when you want to". That may sound like trying to have it both ways, but... I want to believe that as long as one is subjugated to the other it would be fine. "Here's how little is *really* changing; here's how your day-to-day modell gets to evolve because of that". On Tue, Jul 26, 2022 at 3:42 PM Dan Smith wrote: > On Jul 22, 2022, at 9:04 AM, Kevin Bourrillion wrote: > > Note that *some* decisions which produce strong initial antipathy in the > minds of users will actually become good teachable moments! "Here's why the > reaction you had was tied to old assumptions that we are intentionally > leaving behind for these good reasons." Even a user who doesn't *agree* with > the decision can still hang their *learning* onto this. In fact I think > some of the *best* teachable moments will be like that. > > > This seems like a really good piece of wisdom to hold on to in all of our > terminology/model discussions. Unintuitive != bad. Thanks! > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Aug 10 18:11:00 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 10 Aug 2022 14:11:00 -0400 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: <2885518b-ba0f-55ca-bf55-75c6edce6e3d@oracle.com> > I like the concept of identity as akin to an extra field that only > identity classes have, very much. It should feel like an "extra" > feature, and purely behavioral in nature. That means I like to untie > identity from addressability; push back on the idea that a reference > encodes the object's identity in any way. It's instead some opaque > "location" and we have no further expectations about it. Let's run with this for a bit and see where it leads.? The following sketch is more of a "how we could describe it works" rather than a concrete proposal for refactoring the object model, but one can imagine an alternate universe where it actually was coded like this. Imagine we have a magic class java.lang.Identity: ??? final class Identity { ??????? /* Guaranteed to produce an Identity that is not equal() to any other known Identity */ ??????? static Identity newIdentity(); ??????? /* Special instance whose methods throw IMSE */ ??????? static Identity NO_IDENTITY; ??????? void withLock(Runnable r) { ... } ??????? void wait() { ... } ??????? void notify() { ... } ??? } And every class (until now) has an invisible field: ??? final Identity identity = Identity.newIdentity(); And == works by comparing all the fields (as per the Valhalla description), _including_ the identity field.? Since two "different" objects have different identities, they are never == (and the implementation can short-circuit.)? System::identityHashCode(o) is just `o.identity.hashCode()`, and Object::hashCode delegates to that just as Object::equals delegates to `==`. Object methods can be redefined as: ??? class Object { ??? ??? final Identity identity = Identity.newIdentity(); ??????? void wait() { identity.wait(); } ??????? void notify() { identity.notify(); } ??? } and `synchronized (o) { block }` really means `o.identity.withLock(block)` (modulo exception transparency.) Now, the main change for Valhalla is that instances of value classes, instead of having ??? final Identity identity = Identity.newIdentity(); we have ??? final Identity identity = Identity.NO_IDENTITY; Can you think of any ways in which this is not isomorphic to the reality we have, other than assumptions about cost model? From brian.goetz at oracle.com Wed Aug 10 18:30:34 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 10 Aug 2022 14:30:34 -0400 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> And now to your other thread: On 8/10/2022 1:38 PM, Kevin Bourrillion wrote: > > The VANO model does a lot of "doubling down" on the distinctions > between objects and values, which does have a certain "stuck in the > past" feeling to it. A benefit is "concreteness" and a feeling that > "there is plenty of solid ground under your feet that Valhalla is not > pulling out from under you". That feeling is important to me, > particularly because developers in the real world will regularly find > themselves going back and forth between pre-valhalla and post-valhalla > code for a very long time. A hazard is that it equips developers to > care more about that distinction that they really should *moving > forward*, particularly since the VM is a cheating cheater. Essentially > we're saying "for understanding, you can lean on what you know about > int-vs-Integer, but now understand why the distinction?will matter > less and less to you." > > (John associates VANO with "boxes and arrows". I think that's right > and good? It seems called for because an /arrow/?is the thing that > could be null instead.) > > The VAO model does seem more forward-thinking. Why should we invest in > the question of when exactly objects "exist" or not when those objects > can't be programmatically distinguished from each other anyway, nor > from their corresponding values? > > The eventual "final" presentation of value classes to users (in the > permanent documentation and the definitive seminal slide > presentations, and to *some* degree in the JLS itself) should anchor > on one model or the other. But these docs/presentations might also > want to say "and here's why you get to think of it this other way when > you want to". That may sound like trying to have it both ways, but... > I want to believe that as long as one is subjugated to the other it > would be fine. "Here's how little is *really* changing; here's how > your day-to-day modell gets to evolve because of that". > The VAO model says: ?- classes have instances, which are objects ?- An object _reference_ can refer to any object, identity or value ?- A value object can also be represented directly, as primitives are today?? // (*) ?- We've unified under "everything is an object", but added: not all objects require references The uncomfortable part of this model is that while we are familiar with the notion marked (*), because of primitives, we don't really have a good name for it.? So part of this is that there much bumbling and fumfering around phrases like "represented directly" and "bare" and other made-up words that we have to say "but but it's just like what you know about primitives." The VANO model says: ?- classes have instances, but not all instances are objects. some are values instead ?- Each value class V gets a special box class V.ref, which has a single field of type V.val, like the boxes we know ?- The boxing and unboxing conversions, though, are super fast! Because they're not burdened by identity preservation ?- We've unified under "everything is a class instance" and kept "all objects require references", and the spirit (but not performance) of boxes My intuition is that if we can come up with a better term for "represented directly" that doesn't feel forced, the things we dislike about VAO will be lessened and its unification under "everything is an object" will win the day. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Wed Aug 10 19:17:05 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 10 Aug 2022 15:17:05 -0400 Subject: where are all the objects? In-Reply-To: <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> Message-ID: I found today's discussion of giraffes and leashes that hold onto giraffes really useful. Sometimes we have the (flat papery) giraffe in hand, sometimes we have the leash. Whether we're directly holding it or holding the leash doesn't change the giraffe. Which fits with the VAO model. Object references are clearly our giraffe leashes but that doesn't give us a good word for the flattened papery giraffe itself. The challenge here is we're not describing a property of the thing, we're describing the lack of something (the leash or reference). Add to this the VM doesn't provide guarantees about the storage for the non-reference case - it's all VM discretion on whether to flatten or not - so accurate terms feel like weasel words. Does the VAO model feel less forced if we're explicit about references being the difference? - A value object can also be represented without an object reference, as primitives are today // (*) --Dan (And I don't think I've ever typed "giraffe" this many times before) On Wed, Aug 10, 2022 at 2:30 PM Brian Goetz wrote: > And now to your other thread: > > On 8/10/2022 1:38 PM, Kevin Bourrillion wrote: > > > The VANO model does a lot of "doubling down" on the distinctions between > objects and values, which does have a certain "stuck in the past" feeling > to it. A benefit is "concreteness" and a feeling that "there is plenty of > solid ground under your feet that Valhalla is not pulling out from under > you". That feeling is important to me, particularly because developers in > the real world will regularly find themselves going back and forth between > pre-valhalla and post-valhalla code for a very long time. A hazard is that > it equips developers to care more about that distinction that they really > should *moving forward*, particularly since the VM is a cheating cheater. > Essentially we're saying "for understanding, you can lean on what you know > about int-vs-Integer, but now understand why the distinction will matter > less and less to you." > > (John associates VANO with "boxes and arrows". I think that's right and > good? It seems called for because an *arrow* is the thing that could be > null instead.) > > The VAO model does seem more forward-thinking. Why should we invest in the > question of when exactly objects "exist" or not when those objects can't be > programmatically distinguished from each other anyway, nor from their > corresponding values? > > The eventual "final" presentation of value classes to users (in the > permanent documentation and the definitive seminal slide presentations, and > to *some* degree in the JLS itself) should anchor on one model or the > other. But these docs/presentations might also want to say "and here's why > you get to think of it this other way when you want to". That may sound > like trying to have it both ways, but... I want to believe that as long as > one is subjugated to the other it would be fine. "Here's how little is > *really* changing; here's how your day-to-day modell gets to evolve because > of that". > > > The VAO model says: > > - classes have instances, which are objects > - An object _reference_ can refer to any object, identity or value > - A value object can also be represented directly, as primitives are > today // (*) > - We've unified under "everything is an object", but added: not all > objects require references > > The uncomfortable part of this model is that while we are familiar with > the notion marked (*), because of primitives, we don't really have a good > name for it. So part of this is that there much bumbling and fumfering > around phrases like "represented directly" and "bare" and other made-up > words that we have to say "but but it's just like what you know about > primitives." > > The VANO model says: > > - classes have instances, but not all instances are objects. some are > values instead > - Each value class V gets a special box class V.ref, which has a single > field of type V.val, like the boxes we know > - The boxing and unboxing conversions, though, are super fast! Because > they're not burdened by identity preservation > - We've unified under "everything is a class instance" and kept "all > objects require references", and the spirit (but not performance) of boxes > > My intuition is that if we can come up with a better term for "represented > directly" that doesn't feel forced, the things we dislike about VAO will be > lessened and its unification under "everything is an object" will win the > day. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Aug 10 19:30:06 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 10 Aug 2022 12:30:06 -0700 Subject: visualizing objects on two heaps Message-ID: Are values like 42 objects like some `new Object()` (call it X) is an object? We are (mostly) saying ?yes? because that lets us make new kinds of primitives which code like objects (classes) but act like values (primitives). Also, unifying concepts (where they *can* be unified) will probably produce a user model that is easier to use. But this viewpoint (everything is an object) has a downside, because objects like X seem ?thick and chunky? while 42 seems to be different, maybe ?light and airy?. The following is a rambling exploration of why those seemings might differ, and how perhaps to realign them. I tend to think of Valhalla value objects and primitives in traditional mathematical terms: The entity 42 exists independently of any context, and can be summoned as needed, many times if needed, in a given Java application. (As Brian points out, that?s what `CONSTANT_Integer` is for, up to 32 bits.) And the same goes for the 2D point value `new Point(42,42)`. There are surely integers (bigger than 32 bits) which nobody is currently thinking of and no application is currently processing, but any one of them can be summoned at a moment?s notice when an application needs it. The same reasoning applies to structures built up from integers like 2D points. (Summoning an integer or a 2D point in software requires a mechanical procedure to derive its bits. You can?t say in a real program, ?the least crypto-key not yet used anywhere on this planet?, even though in some sense that bit pattern is well-defined. For fun discussions of numbers which are at the far edge of the thinkable, see sites like http://jdh.hamkins.org/largest-number-contest/ and https://medium.com/@joshkerr/who-can-name-the-biggest-number-contest-a2211d21be09 . Today I was reminded that any given volume of physical space can only represent a limited range of integers, in any scheme. As an amusing corollary, if I were able briefly to wrap my brain around the bits of something really big like Graham?s number, the region of space containing that brain would require cosmic inflation, and/or would collapse into a black hole.) In some sense, the entity X which my program is about to call into existence by executing `new Object()` could be said to exist independently of any context as well. This entity certainly possesses a hidden identity property that is (presumably) different in all space and time from any other similarly created object. Both the object X and its identity can be viewed as outside of time, eternally pre-existent. And yet, it is hardly ever useful to think of X in these terms. X is clearly something inside a physical box somewhere, and pretending it was eternally pre-existent feels like a mind game. I will say, however, that when writing formal semantics in the language of math, you *do* play that mind game. And when reasoning about compiler optimizations, is sometimes useful to play such games. The C2 JIT models immutable fields (like `oop._klass` and `arrayOop._length`) as indefinitely pre-existent in memory, whether the containing object?s allocation was recent or not. This model is chosen not because the authors of C2 are platonists, but because it is the simplest model to use inside the limited horizon of a compilation task. (If X had mutable fields, then a timeless mathematical model of its would require some representation of the varying memory states that X might have. You need to say things like ?X has these field values in this overall memory state.? It can be done, and in fact optimizers *must* do this. But today I?m not dealing with mutable fields, nor with synchronization state, which is also mutable.) So, it?s possible to think of both 42 and X as existing outside of time in some platonic mathematical universe. But most people will find it tolerable to think of a platonic 42 but not X. After all, the way most of us learn about X is by being shown a diagram of X as a data structure, probably a box with a header field. There might be an arrow from header to a type metadata entity (maybe labeled `Object.class`). There is certainly an arrow from any other place that is referring to X. I call this the ?boxes and arrows? presentation of data structures. Clearly if something is a box, it?s sitting on a whiteboard or page, or in a warehouse where such boxes live. We are taught (early on) that such boxes sit in ?memory? or in ?the heap?. In search of consistency between 42 and X, we can go to the other extreme, and require that all entities (in software) are confined to real, existing, physical boxes of computer stuff. This is easy for X, and not hard for 42 either. You simply say that 42 exists wherever its bits have been computed and stored (in a variable or a `CONSTANT_Integer` structure). Then, you agree that there is a way to detect the equality of any two summonings of 42 are in fact both 42 (or to tell that they differ). And you should also agree to talk of ?some 42 somewhere?, not ?the value 42?. At most you can say ?some 42 somewhere which will be detectably equal to the 42 I am working with right now?. I think most people fill find this to be a mind game as well; they are platonists for 42 but not for our friend X. As a historical note, there is are schools of mathematical thought called ?constructivist?, predating the era of computing, that bravely reject the taint of platonism. Such viewpoints take that view that every mathematical (and hence computational) object and reasoning is a matter of construction, with a finite sequence of steps. Without being an expert, I would guess that constructivists might prefer the extreme account of 42: It?s a formal pattern which doesn?t exist until someone constructs it; it is constructed many times; constructed versions are distinct but can be proven equal; such equality proofs are again constructive in nature. How many versions of 42 can dance on the head of a constructivist? Maybe many, but most people would say no more than one. If we decide that 42, like X, is a merely bit pattern in the computer (perhaps replicated many times), then we get a nice, concrete model of boxes and arrows everywhere. We will need to sidestep an embarrassing infinite regress when try to draw an arrow to 42 coming from the field `Integer.value`; it?s doable without arrows thankfully, when you just write the label ?42? in the box. So in a graph of boxes and arrows, such labels are usually necessary also. This is why Java has primitives, and a distinction between `Integer` (which is a box) and `int` (which is the label in one of the fields of that box). Given such a distinction between labels and arrows, one might think that the Valhalla goal unifying `int` and `Integer` is impossible. But this leads me to a new thought, which is (a) put everything on heap, and (b) distinguish sharply between the value heap and the identity heap. So, although I prefer a platonic viewpoint in most cases, here?s a non-platonic, constructive, boxes-and-arrows viewpoint that one might prefer for teaching and visualization. There is a value heap and an identity heap in the Valhalla VM. Every identity object lives as a box on the identity heap, and conversely every value object lives as a box on the value heap. All non-reference values are visualized as ?value arrows? into the value heap. All non-null reference values are visualized as ?identity arrows? into the identity heap. For every non-abstract class, instances are allocated on one heap or the other; no class is allocated on both heaps. Instances of `String` and `Integer` are allocated on the identity and value heaps, respectively. Our friend X is visualized as an identity arrow into the identity heap. He has a header and no other fields. (We might give him a secret field to hold his identity, but this model does not require it, unlike the platonic model.) The value 42 is visualized as a value arrow into the value heap, to a box labeled 42. The box also has a header, which says ?Integer/int?. Its backward-compatible `value` field points to itself. The label is not the same as the `value` field; it is part of the header I guess. As a clever optimization, compiled code might dispense with the arrow and just hoist the bit pattern of the label into a register. This is what we mean when we say that value types are monomorphic: They can be manipulated in terms of the characteristic labels, instead of their arrows. Nevertheless, the concept of value arrow comes first, and only the performance model or JIT-writer?s manual mentions the possibility of unboxed labels. A variable of type `Object` is visualized as the root of an arrow (into either heap), or else the special label or pseudo-arrow `null` which is not an arrow into either heap. Other than `null`, primitive value labels (like 42) can exist only inside the value heap. In fact, they properly exist only inside the primitive objects themselves. Everything else is arrows (or `null`). Normally identity and value arrows look and feel the same. Field access works the same for both. (That is why `Integer.value` loops back to `this`.) But they differ when the `==` (`acmp`) operator looks at them. Value arrows are proven equal or unequal using a field-wise recursive descent. This descent bottoms out at primitives (consulting their labels) or at `null` or an identity arrow. Identity arrows appeal solely to the identity of the object in the identity heap. Of course `null` is equal only to itself. As noted above, new identity objects are created only by the `new` bytecode. New value objects are created by `aconst_init` or `withfield` or by any bytecode which produces a primitive value! For example, incrementing 42 (maybe with `iinc`) produces a new value arrow to the value heap, which just so happens to contain an object labeled 43. How did we get so lucky? Perhaps Plato is smiling on us, and it has always been there. (This is basically what Java mandates today for auto-boxing, for small-enough values!) More constructively, the VM ensures that, if 42 must be incremented, it either finds a previously created copy of 43, or makes a new copy on the fly. The VM can flip a coin in real-time and do either, because it is allowed to make many copies of 43 in the value heap. This is OK because there is nothing the user can do to distinguish such multiple copies, just as there is nothing the user can do to tell if the GC has moved an object in the identity heap. What goes for `iinc` goes for all the other value-producing bytecodes, whether primitive or not. The VM is always free to recycle a previously existing object in the value heap, if it can find one, or to make a new box. All of this is a visualization exercise. One might prefer, after all, to visualize a single heap with two kinds of objects in it. In any case, a less platonic, more constructive visualization can be obtained by insisting that all objects, including all values even primitives, are uniformly accessed via arrows. I guess my point is that, if we are willing to pretend that arrows are everywhere, we need not worry whether something is ?really? an `Integer` or ?really? an `int`. I haven?t said anything yet about flattening. That requires additional work to visualize the container property of the arrows in the heap. There are at least two ways to do it: Color the value arrows that are to be flattened, and just nest one box inside the other. I think I would start with colored arrows, explaining that the VM is being invited to flatten, and then show nested boxes. But the nested boxes violate the ?everything is an arrow? symmetry of the visualization, so they are a sort of commentary. Again, as commentary on the VM?s likely optimization, you might ?hoist? 42 onto the stack or into a field by erasing the arrow (to a copy of 42 in the value heap) and writing the label ?42? in its place. And you might do this in heap fields as well. Turning back to 2D points, you might ?hoist? `Point(42,42)` into a stack variable or heap field by erasing the arrow and replacing it with a nested box (for the point). In either case (unboxed 42 or `Point(42,42)`), there is a caveat that when you use that value, you are ?really? using an arrow into the value heap. So, FWIW, and in the spirit of brainstorming, that is one way to (a) make values and objects more like each other, while (b) staying relentlessly constructive, and (c) avoiding the question of whether an object is an `Integer` or an `int`. The distinction of `Integer` vs. `int`, and also `Point` vs. `Point.val`, is therefore a matter of viewpoint and not essence. Everything is an object, such as `Integer` or `Point`. (Except `null`.) The `.val`/`.ref` distinction, like other non-value-set distinctions (`final` vs. non-`final`, or `Object` vs. a narrower type), is a way for the programmer to annotate the program to express a richer view of the programmer?s reasonings about the program, and to unlock optimizations. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Aug 10 19:34:01 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 10 Aug 2022 15:34:01 -0400 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> Message-ID: <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> On 8/10/2022 3:17 PM, Dan Heidinga wrote: > Which fits with the VAO model.? Object references are clearly our > giraffe leashes but that doesn't give us a good word for the flattened > papery?giraffe?itself.? The challenge here is we're not describing a > property of the?thing, we're describing the lack of something (the > leash or reference).? Add to this the?VM doesn't provide guarantees > about the storage for the non-reference case - it's all VM discretion > on whether to flatten or not - so accurate?terms feel like weasel words. > > Does the VAO model feel less forced if we're explicit about references > being the difference? > ?- A value object can also be represented without an object reference, > as primitives are today?? // (*) > This is actually the new weirdness -- that you have a choice about how to handle a value object.? Previously we had two kinds of things: primitives (direct values) and object references (which can be null, or can have an object at the end of them, but you can't touch the object, only the reference.)? Now we still have two kinds of values: value objects, and object references -- but object references can also refer to value objects!? This is the weird part; that value objects have two separate placements / representations / interaction modes.? We got away before without a notion of "bare object" because the only way you could touch a primitive is directly, and the only way you could touch an object is through a reference.? But now some objects can be touched directly, *or* through a reference. C programmers would say "Duh, that's just X vs X*", but we want to blur the difference, so we don't distinguish between `x.a` and `x->a` as we do in C. Brainstorming on terms....we haven't tried "unboxed". ?- A Point.ref is a reference to a Point object ?- A Point.val is an unboxed Point object This fits into the "reference as default" model, but it suggests that Point.val are *not* objects, they are the "scooped out filling" of an object.? So I guess this is more of a VANO term.? Oh well. Is there any mileage in splitting "reference" instead of object? Like, a "lightweight reference" (a short leash), which doesn't support polymorphism or null?? Then a Point.val is a "lightweight reference" to a Point? The VAO term we are looking for might be more like "grab the giraffe by the collar, rather than by the leash" :) From john.r.rose at oracle.com Thu Aug 11 01:35:29 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 10 Aug 2022 18:35:29 -0700 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> Message-ID: <549B9099-F83F-46F1-BE57-2695D11F88D1@oracle.com> On 10 Aug 2022, at 10:38, Kevin Bourrillion wrote: > I thought we had a good discussion this morning. Some random followups: Likewise. Good followups; one comment here: > ? > The eventual "final" presentation of value classes to users (in the > permanent documentation and the definitive seminal slide presentations, and > to *some* degree in the JLS itself) should anchor on one model or the > other. But these docs/presentations might also want to say "and here's why > you get to think of it this other way when you want to". That may sound > like trying to have it both ways, but... I want to believe that as long as > one is subjugated to the other it would be fine. "Here's how little is > *really* changing; here's how your day-to-day modell gets to evolve because > of that". In terms of my previous note today, I guess VANO makes a hard distinction between platonic (pure logic) and constructive (box/arrow) entities, with values taking their preferred platonic form and objects the preferred constructive form. Meanwhile, VAO asks that both be viewed under a common mindset, both being ?the same kind of stuff?. My goal in that note was to suggest that either values or objects can be viewed as either kind of ?stuff?, platonic ideal or box/arrow gadget. (And with a conceptual bridge of some sort between the two kinds of ?stuff? that preserves our ideas of how values and objects should work.) This supports both VAO and VANO options. In fact, VAO comes in either pure-platonic or pure-constructive form. I suppose there could also be a ?mind games? version of VANO where values are constructive and objects are platonic; yuck. Anyway, I feel there is a continuum (not a hard divide) between simple/ideal to complex/ad-hoc, for both values and objects, and that even if the opposite ends of the continuum feel different to teachers and students, there are useful ways to unify the whole continuum. Simpler kinds of values, and simpler objects (e.g., immutable with no fields or one field), are natural to regard as timeless platonic ideals. Or, with different metaphysical color, as algebraic objects, or as formal label-like expressions. Meanwhile complex many-field objects (with or without state or identity, usually with methods) surely seem less like algebraic expressions and more like ad hoc box/arrow configurations, gadgets to be built procedurally or displayed in diagrams. So I suppose we can point out multiple viewpoints and invite teachers and students to pick the useful ones on a case-by-case basis. And then the specification can choose one normative viewpoint (perhaps for technical reasons), subjugating other viewpoints as mere non-normative commentary. From john.r.rose at oracle.com Thu Aug 11 01:49:05 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 10 Aug 2022 18:49:05 -0700 Subject: where are all the objects? In-Reply-To: <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> Message-ID: <30ADA251-2CF1-4139-9267-568F957F28AB@oracle.com> On 10 Aug 2022, at 12:34, Brian Goetz wrote: > On 8/10/2022 3:17 PM, Dan Heidinga wrote: >> Which fits with the VAO model.? Object references are clearly our >> giraffe leashes but that doesn't give us a good word for the >> flattened papery?giraffe?itself.? The challenge here is we're not >> describing a property of the?thing, we're describing the lack of >> something (the leash or reference).? Add to this the?VM doesn't >> provide guarantees about the storage for the non-reference case - >> it's all VM discretion on whether to flatten or not - so >> accurate?terms feel like weasel words. >> >> Does the VAO model feel less forced if we're explicit about >> references being the difference? >> ?- A value object can also be represented without an object >> reference, as primitives are today?? // (*) >> > > This is actually the new weirdness -- > that you have a choice about how to handle a value object.? > Previously we had two kinds of things: primitives (direct values) and > object references (which can be null, or can have an object at the end > of them, but you can't touch the object, only the reference.)? Now we > still have two kinds of values: value objects, and object references > -- > but object references can also refer to value objects!? This is the > weird part; that value objects have two separate placements / > representations / interaction modes.? We got away before without a > notion of "bare object" because the only way you could touch a > primitive is directly, and the only way you could touch an object is > through a reference.? But now some objects can be touched directly, > *or* through a reference. > > C programmers would say "Duh, that's just X vs X*", but we want to > blur the difference, so we don't distinguish between `x.a` and `x->a` > as we do in C. > > Brainstorming on terms....we haven't tried "unboxed". > > ?- A Point.ref is a reference to a Point object > ?- A Point.val is an unboxed Point object The issue with many of these formulations (like ?unboxed Point?) is that it makes it sound like the Point object has a funny decoration on it that makes it be unboxed. But what we want to say (I think) is that it is unburdened by any such decoration. That?s why I like a ?just the facts? Point: ?pure Point? or ?bare Point? or ?simple Point? or ?unadorned Point?. Or maybe, suggesting the lack of an intermediary, ?immediate Point? or ?direct Point?. > This fits into the "reference as default" model, but it suggests that > Point.val are *not* objects, they are the "scooped out filling" of an > object.? So I guess this is more of a VANO term.? Oh well. Yeah; it seems to damage the integrity (simplicity, purity) of the Point to say, ?oh that?s a spilled point?. (Spilled = unboxed.) > > Is there any mileage in splitting "reference" instead of object? Like, > a "lightweight reference" (a short leash), which doesn't support > polymorphism or null?? Then a Point.val is a "lightweight reference" > to a Point? I think there is some mileage there for specifications, if not for teaching and learning. That?s why I suggested ?value arrows? in my previous. Those are the same as your ?short leash?. > The VAO term we are looking for might be more like "grab the giraffe > by the collar, rather than by the leash" :) Yes, the two-heap model suggests we could pretend that all objects require leashes of some sort (chopsticks), but there are short ones and long ones with somewhat differing operational semantics (especially for `==`/`acmp`). And then, *as an optimization* but not in the specification, a short-leash giraffe can climb right up into your lap (if the VM lets it). You know you?ll like it, since you always preferred your ints and bytes close at hand. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Aug 11 20:08:39 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Aug 2022 16:08:39 -0400 Subject: where are all the objects? In-Reply-To: <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> Message-ID: <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> We may be circling around the terminology block, but let's try on not calling an int or a Point "a value" without some sort of modifier. Let's try "value object" rather than just "value"; a variable of type int or Point.val holds a value object, and Point.ref holds a *reference to a* value object.? Object holds references to either value or identity objects.? Primitives are revealed to be value objects.? Everything is an object. (Alternately, we could lean on more placement-centric terminology.? Point.val and int are _direct values_ (or immediate values).? But what do we say about references then? References are references to objects.? This one feels like it recedes more into the mental models we don't want to encourage.) So: ?- classes have instances, which are objects ?- some classes are identity classes and some are value classes ?? - instances of identity classes have identity, are called identity objects ?? - instances of value classes have no identity, are called value objects ?- any objects can be the target of an object reference ?? - Polymorphic types like Object or Runnable may refer to identity or value objects ?- value objects can be represented/stored/manipulated directly as well, like our old friend int ?- legacy primitive are value objects now! ?? - everything is an object ?- The type P.ref is a reference type, it consists of references to instances of P ?- The type P.val is a value type, it consists of instances of P, which are value objects ?- Integer is a reference type, int is a value type This isn't much different from the previous "VAO" presentation, other than being more explicit about saying "value objects" rather than just values -- does that help at all? On 8/10/2022 3:34 PM, Brian Goetz wrote: >> >> Does the VAO model feel less forced if we're explicit about >> references being the difference? >> ?- A value object can also be represented without an object >> reference, as primitives are today?? // (*) >> > > This is actually the new weirdness -- that you have a choice about how > to handle a value object.? Previously we had two kinds of things: > primitives (direct values) and object references (which can be null, > or can have an object at the end of them, but you can't touch the > object, only the reference.)? Now we still have two kinds of values: > value objects, and object references -- but object references can also > refer to value objects!? This is the weird part; that value objects > have two separate placements / representations / interaction modes.? > We got away before without a notion of "bare object" because the only > way you could touch a primitive is directly, and the only way you > could touch an object is through a reference.? But now some objects > can be touched directly, *or* through a reference. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Aug 11 21:54:47 2022 From: john.r.rose at oracle.com (John Rose) Date: Thu, 11 Aug 2022 14:54:47 -0700 Subject: where are all the objects? In-Reply-To: <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> Message-ID: I could live with this. It doubles down on ?everything is an object?, and something might be modeled separately as a non-object a ?value?, is just a sub-kind of object, hence ?value object?. So the real big divide is not between objects and non-objects, but between value-objects and identity-objects. And, as befits a world where everybody is in the object family, it?s a gentle divide, since values and identities have lots in common: references, methods, `extends Object`. Meanwhile, the symmetric difference of value and identity contains, on the one side, the option for ?direct? storage (whatever that means; it?s vague but real), and on the other side mutability, etc. A couple more comments below: On 11 Aug 2022, at 13:08, Brian Goetz wrote: > ?So: > > ?- classes have instances, which are objects > ?- some classes are identity classes and some are value classes > ?? - instances of identity classes have identity, are called > identity objects > ?? - instances of value classes have no identity, are called value > objects > ?- any objects can be the target of an object reference > ?? - Polymorphic types like Object or Runnable may refer to identity > or value objects > ?- value objects can be represented/stored/manipulated directly as > well, like our old friend int > ?- legacy primitive are value objects now! > ?? - everything is an object > ?- The type P.ref is a reference type, it consists of references to > instances of P Tedious pedant is tediously pedantic: or `null`. > ?- The type P.val is a value type, it consists of instances of P, > which are value objects > ?- Integer is a reference type, int is a value type Maybe emphasize the linkage: `Integer.val`, `int.ref` are a aliases for `int` and `Integer` > > This isn't much different from the previous "VAO" presentation, other > than being more explicit about saying "value objects" rather than just > values -- > does that help at all? I think it does. Saying ?values? unqualified suggests (misleadingly for VAO) that not everything is an object. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Aug 11 22:02:04 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Aug 2022 18:02:04 -0400 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> Message-ID: <551ff94c-fd8c-8834-d406-bc0b4f1d0d62@oracle.com> > I think it does. Saying ?values? unqualified suggests (misleadingly > for VAO) that not everything is an object. > Well, some things are still not objects -- object /references./ (I see your tedious pedantry, and raise you.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbepincket at live.be Thu Aug 11 22:24:49 2022 From: robbepincket at live.be (Robbe Pincket) Date: Thu, 11 Aug 2022 22:24:49 +0000 Subject: where are all the objects? In-Reply-To: References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> Message-ID: On Thu Aug 11 21:54:47 UTC 2022, John Rose wrote: > > - The type P.ref is a reference type, it consists of references to > > instances of P > > Tedious pedant is tediously pedantic: or `null`. Err, to be pedantic, that would be different from the current java spec where it's not the reference types being nullable, but the variables of a reference type being able to store the null reference instead. Greetings Robbe Pincket -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Thu Aug 11 23:06:54 2022 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 11 Aug 2022 16:06:54 -0700 Subject: where are all the objects? In-Reply-To: <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> References: <95D1BB4B-A75F-4493-B126-69FFB3548409@oracle.com> <0E535893-C045-4A9F-AD57-94EAFFA6844B@oracle.com> <6F6A9ACA-FBDE-4DE6-B450-B7C8E794D1E6@oracle.com> <573a27ec-a6b8-5e7e-b648-9f8595a7a9d2@oracle.com> <16431a3f-e3dd-b102-47be-dd41b1b1db44@oracle.com> <2efcf1d8-2432-0fed-7108-37750cc6f2f2@oracle.com> Message-ID: On Thu, Aug 11, 2022 at 1:08 PM Brian Goetz wrote: We may be circling around the terminology block, but let's try on not > calling an int or a Point "a value" without some sort of modifier. > > Let's try "value object" rather than just "value"; > Just to be clear, this is about VAO, and under VANO we would still have distinct meanings for "value" and "value object": An instance of... Is a... String identity object int, Foo.val value Integer, Foo value object I agree that what you said is the right way to present VAO. And I do see the appeal of the story that goes, "when you used to decide between int and Integer, that was a weighty decision -- is it going to be an object or not? But now the distance is much closer; it's an object either way, and the question is only about whether you want a *reference* in the middle. If that sounds expensive, well, realize it's a much lighter weight object, a value object." VANO's story there is more like, "you're still deciding between an object or value; specifically a *value object* or a value. But now the distance is much closer, because value objects are so much lighter weight than identity objects." I'll repeat that I really don't know yet which of VAO/VANO will prove best, and also I'm aware of being the only real VANO cheerleader at this point. It would be nice to widen the net, but it's hard to get an easy off-the-street opinion from developers about this, given that each model has its own long tail of weirdnesses/surprises in store, that won't be represented in the (predictably strong) gut reactions. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Aug 24 14:11:43 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 24 Aug 2022 14:11:43 +0000 Subject: EG meeting *canceled*, 2022-08-24 Message-ID: <5E109DCC-390C-4E63-9EB9-65375808AF45@oracle.com> No EG meeting today. There were a few followup emails to last meeting's discussion about terms like "object" and "instance", but I think we've covered that topic pretty well and there's not much more to say. We might benefit from further discussion of the other topics I listed last week, but what I think would be best is if there are open questions or ideas to pursue, post something in the thread, and we'll touch on it next time. For reference, here was the list from last meeting: > Lots of recent threads that could be further discussed: > > - "Question about universal type variables": Kevin started a discussion about how type variable types should be modeled, and what changes when they become universal > > - "Updated SoV, take 3": Brian revised the State of Valhalla document to reflect recent design ideas > > - "object sameness, Lebniz's Law, ...": John elaborated on SoV review comments regarding value object equality/substitutability > > - "The storage hint model": Remi shared thoughts about using a storage attribute, rather than a value type, to encode flatness > > - "The problem with encapsulating C.val + autoboxing": Remi discussed the treatment of access-restricted value types in generics > > - "where are all the objects?": John and Kevin discussed usages of the terms "object" and "instance" > > - "one class, two types, many bikesheds": John discussed how we model classes vs. types, the relationship of ref and val types, and how syntax like .ref and .val might be used > > - "Value type companions, encapsulated": John shared a document describing how access restrictions could be enforced on value types > > - "races on flat values": John discussed how the memory model needs to be updated to describe concurrent accesses of flat variables From daniel.smith at oracle.com Tue Aug 30 19:18:36 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 30 Aug 2022 19:18:36 +0000 Subject: JLS updates Message-ID: FYI, I've made some minor fixes to the JLS change document supporting Value Objects, published here: https://wiki.se.oracle.com/display/JPG/Spec+Change+Documents (Look for "revision history" in the introduction for details.) From daniel.smith at oracle.com Tue Aug 30 19:42:53 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 30 Aug 2022 19:42:53 +0000 Subject: JLS updates In-Reply-To: References: Message-ID: <0342FD93-3506-41D3-A7D4-D7C53727D316@oracle.com> On Aug 30, 2022, at 12:18 PM, Dan Smith > wrote: FYI, I've made some minor fixes to the JLS change document supporting Value Objects, published here: https://wiki.se.oracle.com/display/JPG/Spec+Change+Documents (Look for "revision history" in the introduction for details.) Ugh, copy-paste error. Here's the link: http://cr.openjdk.java.net/~dlsmith/jep8277163/latest -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Aug 31 15:49:03 2022 From: john.r.rose at oracle.com (John Rose) Date: Wed, 31 Aug 2022 11:49:03 -0400 Subject: JVMS updates as well In-Reply-To: <0342FD93-3506-41D3-A7D4-D7C53727D316@oracle.com> References: <0342FD93-3506-41D3-A7D4-D7C53727D316@oracle.com> Message-ID: <6C3EAD97-8B4E-47CC-ACD1-25B52987C94A@oracle.com> > http://cr.openjdk.java.net/~dlsmith/jep8277163/latest I edited the URL to find the corresponding JVMS updates: http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220830/specs/value-objects-jvms.html I have a few unsystematic comments on the special subject of special methods. In that draft JVMS, we say in 2.9.1 > An instance initialization method may not be declared in a value class > or an interface (4.6). and in 2.9.4 > A value class instance creation method may not be declared in an > identity class, an abstract class, or an interface (4.6). Then in 4.6 we reaffirm these restrictions on `` and ``, along with many similar pre-existing restrictions. These are restrictions on *declaration* of special methods. Then in 4.9.1 we have similar static constraints on *uses* of special methods, by bytecodes. Such constraints also appear on constants in the constant pool, in 4.4.8. Eventually in 5.3.5 we discover when and what we must throw if any of these conditions fail: > If the purported representation is not a ClassFile structure (?4.1, > ?4.8), loading throws an instance of ClassFormatError. IIRC somewhere there is a place in chapter 4 that tells us that all of the many requirements will be enforced. I don?t remember where that promise is made. (Remind me?) The phrases ?instance initialization method ? and ?value class instance creation method? are accompanied by cross references to 2.9.1 and 2.9.4. I?m starting to think that the cross references should *also* mention the special method names `` and ``. That would be a new thing for the JVMS but makes more are more sense as we add more and more special method names. Now we are at three such names; maybe it?s time to mention the names along with the cross references. I think I should be able to grep the JVMS for `` for mentions of this feature, not just the phrase ?value class instance creation method? or the section number 2.9.4. For example: > Only the invokestatic instruction is allowed to invoke a value class > instance creation method (2.9.4 ``). At some time we will want to change JVMS language that says things like ?not ``, ``, or ``? to ?not a special method name?. Is that time now yet? Places that allow some but not all would have to say ?not a special method name other than ``? or something of that nature. I think that would expose some irregularities in our treatment of ?surprising? names, but that?s all to the good, even if embarrassing. You have this commentary: > As an alternative naming scheme, we could re-use but give the > method a non-void return type. You could also say something like: > An alternative implementation technique could use regular static > methods (presumably marked with a `Synthetic` attribute) of a standard > name to implement constructors of value classes. This would be > convenient for some tools in that they could treat these factory > methods like all other factory methods, without learning the new rules > for `` symbols, but would be inconvenient for other tools, > including the JVM, which need to make a clear distinction between > constructors and other methods. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Aug 31 18:54:43 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 31 Aug 2022 18:54:43 +0000 Subject: JLS updates In-Reply-To: <0342FD93-3506-41D3-A7D4-D7C53727D316@oracle.com> References: <0342FD93-3506-41D3-A7D4-D7C53727D316@oracle.com> Message-ID: On Aug 30, 2022, at 12:42 PM, Dan Smith > wrote: On Aug 30, 2022, at 12:18 PM, Dan Smith > wrote: FYI, I've made some minor fixes to the JLS change document supporting Value Objects, published here: http://cr.openjdk.java.net/~dlsmith/jep8277163/latest One change I'll need to make: an interface is not a functional interface if it is `identity` or `value`; but this should also be the case if any *superinterface* is `identity` or `value`. (Ultimately, the implementation should be free to generate identity objects or value objects to encode lambdas and method references.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Aug 31 22:15:20 2022 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 31 Aug 2022 22:15:20 +0000 Subject: JLS updates In-Reply-To: References: <0342FD93-3506-41D3-A7D4-D7C53727D316@oracle.com> Message-ID: On Aug 31, 2022, at 11:54 AM, Dan Smith > wrote: On Aug 30, 2022, at 12:42 PM, Dan Smith > wrote: On Aug 30, 2022, at 12:18 PM, Dan Smith > wrote: FYI, I've made some minor fixes to the JLS change document supporting Value Objects, published here: http://cr.openjdk.java.net/~dlsmith/jep8277163/latest One change I'll need to make: an interface is not a functional interface if it is `identity` or `value`; but this should also be the case if any *superinterface* is `identity` or `value`. (Ultimately, the implementation should be free to generate identity objects or value objects to encode lambdas and method references.) And another (just want to memorialize these so I can fix them later): rule in 8.1.1 about 'final' conflicting with 'sealed' also applies to the implicit 'final' of a concrete value class. (More generally, anywhere 'final' is mentioned as a class modifier, need to be sure it's clear that this includes the implicit 'final'.) -------------- next part -------------- An HTML attachment was scrubbed... URL: