<div dir="ltr"><div>Apologies that my comments might inevitably be a bit to the side of what you're really hoping to talk about. If one doesn't go in a helpful direction feel free to just not lean into it.</div><div><br></div><div><br></div><div dir="ltr">On Thu, Jun 23, 2022 at 12:01 PM Brian Goetz <<a href="mailto:brian.goetz@oracle.com">brian.goetz@oracle.com</a>> wrote:<br></div><div dir="ltr"><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div><span style="font-family:monospace;font-size:large"> - Value classes have ref and val companion types with the
obvious properties. (Notably, refs are always atomic.)</span><br></div><font size="4"><font face="monospace">
- For `value class C`, C as a type is an alias for `C.ref`.</font></font></div></blockquote><div><br></div><div>I'm happy about all this -- this comment is purely about mental model / framing. I really think we want to flip this around. For <i>any</i> class `One`, you always get a reference type and it is always called `One`. Then you might <i>also</i> get a value type which is called `One.val` or whatever we come up with. If we want the type `One.ref` to exist for some reason (I have not understood why yet, only why we need `T.ref` for <i>type variables</i>), then <i>that's</i> the alias. If I print out the names of One.val.class, One.ref.class, and One.class, I should get "One.val, One, One", not "One.val, One.ref, One.ref". I harp on this because this is how Java classes get to continue to feel unified.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="font-family:monospace;font-size:large">> _This is the second of three documents describing the
current State of</span><br></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
Valhalla. The first is [The Road to Valhalla](01-background);
the<br>
third is [The JVM Model](03-vm-model)._<br>
<br>
This document describes the directions for the Java _language_
charted by<br>
Project Valhalla. (In this document, we use "currently" to
describe the<br>
language as it stands today, without value classes.)<br>
<br>
Valhalla started with the goal of providing user-programmable
classes which can<br>
be flat and dense in memory. Numerics are one of the motivating
use cases;<br>
adding new primitive types directly to the language has a very
high barrier. As<br>
we learned from [Growing a Language][growing] there are
infinitely many numeric<br>
types we might want to add to Java, but the proper way to do
that is via<br>
libraries, not as a language feature.<br>
<br>
## Primitive and reference types in Java today<br>
<br>
Java currently has eight built-in primitive types. Primitives
represent pure<br>
_values_; any `int` value of "3" is equivalent to, and
indistinguishable from,<br>
any other `int` value of "3". Primitives are monolithic (their
bits cannot be<br>
addressed individually) and have no canonical location, and so
are _freely<br>
copyable_. With the exception of the unusual treatment of exotic
floating point<br>
values such as `NaN`, the `==` operator performs a
_substitutibility test_ -- it<br>
asks "are these two values the same value".<br></font></font></div></blockquote><div><br></div><div>The last part needs to be much clearer imho; appealing to "sameness" raises as many questions as it answers. btw, Why do we say "substitutability" and not "(in)distinguishability"? It seems more readily obvious what the second means. Substitutability brings LSP to mind, which is a different and asymmetric kind. I feel like distinguishability is the concept you want here in place of "sameness".</div><div><br></div><div>Rather, "strict distinguishability". What `==` does is more specifically a <i>strict</i> or absolute distinguishability test. I think this should be called out, because the kind of distinguishability that <i>matters more to users more often</i> is the other kind: <i>logical</i> distinguishability, the kind that `Object.equals` empowers them to control for themselves. The exact reason they override that method is to govern what they want to be considered distinguishable. But ofc. `==` ignores all that. I hope to lean on the twin concepts of "strict distinguishability" and "logical distinguishability" for these two kinds.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">Java also has _objects_, and each object has a unique _object
identity_.</font></font></div></blockquote><div><br></div><div>("currently")</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace"> Because<br>
of identity, objects are not freely copyable; each object lives
in exactly one<br>
place at any given time, and to access its state we have to go
to that place.<br>
But we mostly don't notice this because objects are not
manipulated or accessed<br>
directly, but instead through _object references_. Object
references are also a<br>
kind of value -- they encode the identity of the object to which
they refer, and<br>
the `==` operator on object references asks "do these two
references refer to<br>
the same object." Accordingly, object _references_ (like other
values) can be<br>
freely copied, but the objects they refer to cannot. <br>
<br>
Primitives and objects differ in almost every conceivable way:<br>
<br>
| Primitives |
Objects |<br>
| ------------------------------------------ |
---------------------------------- |<br>
| No identity (pure values) |
Identity |<br>
| `==` compares values | `==` compares
object identity |<br>
| Built-in | Declared in
classes |<br>
| No members (fields, methods, constructors) | Members
(including mutable fields) |<br>
| No supertypes or subtypes | Class and
interface inheritance |<br>
| Accessed directly | Accessed via
object references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value is
null |<br>
| Arrays are monomorphic | Arrays are
covariant |<br>
| May tear under race | Initialization
safety guarantees |<br>
| Have reference companions (boxes) | Don't need
reference companions |<br>
<br>
The design of primitives represents various tradeoffs aimed at
maximizing<br>
performance and usability of the primtive types. Reference
types default to<br>
`null`, meaning "referring to no object"; primitives default to
a usable zero<br>
value (which for most primitives is the additive identity).
Reference types<br>
provide initialization safety guarantees against a certain
category of data<br>
races; primitives allow tearing under race for
larger-than-32-bit values. <br>
We could characterize the design principles behind these
tradeoffs are "make<br>
objects safer, make primitives faster."<br>
<br>
The following figure illustrates the current universe of Java's
types. The<br>
upper left quadrant is the built-in primitives; the rest of the
space is<br>
reference types. In the upper-right, we have the abstract
reference types --<br>
abstract classes, interfaces, and `Object` (which, though
concrete, acts more<br>
like an interface than a concrete class). The built-in
primitives have wrappers<br>
or boxes, which are reference types.<br>
<br>
<figure><br>
<a href="field-type-zoo.pdf" title="Click for PDF"><br>
<img src="field-type-zoo-old.png" alt="Current universe
of Java field types"/><br>
</a><br>
</figure><br>
<br>
Valhalla aims to unify primitives and objects in that they can
both be<br>
declared with classes, but maintains the special runtime
characteristics<br>
primitives have. But while everyone likes the flatness and
density that<br>
user-definable value types promise, in some cases we want them
to be more like<br>
classical objects (nullable, non-tearable), and in other cases
we want them to<br>
be more like classical primitives (trading some safety for
performance). <br>
<br>
## Value classes: separating references from identity<br>
<br>
Many of the impediments to optimization that Valhalla seeks to
remove center<br>
around _unwanted object identity_. The primitive wrapper
classes have identity,<br>
but it is a purely accidental one. Not only is it not directly
useful, it can<br>
be a source of bugs. For example, due to caching, `Integer` can
be accidentally<br>
compared correctly with `==` just often enough that people keep
doing it.<br>
Similarly, [value-based classes][valuebased] such as `Optional`
have no need for<br>
identity, but pay the costs of having identity anyway. <br>
<br>
Our first step is allowing class declarations to explicitly
disavow identity, by<br>
declaring themselves as _value classes_. The instances of a
value class are<br>
called _value objects_.</font></font></div></blockquote><div><br></div><div>Obviously the fuller explanation will come below, but even just as a stepping stone, this statement already seems problematic to me.</div><div><br></div><div>I feel like we want to end up here, do you disagree?:</div><div><br></div><div>* Instances of the type `Foo.val` are values, not objects</div><div>* Instances of the type `Foo` are objects, not values, which we call "value objects" as short for "value-like objects" or somesuch</div><div>* When we say instances of the <i>class</i> `Foo` we might mean instances of either type the class declares (and really, maybe we should just downplay "instances of the class" and emphasize "instances of the type")</div><div><br></div><div>We need students to be able to confidently articulate things like "value objects are objects in every way, but they aren't values in any way; the name really just means objects without identity, which makes them sort of 'adjacent to' being values...." It's going to be a bit of an awkward sell.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">```<br>
value class ArrayCursor<T> { <br>}<br>
```<br>
<br>
This says that an `ArrayCursor` is a class whose instances have
no identity --<br>
that instead they have _value semantics_.</font></font></div></blockquote><div><br></div><div>Oof. I'm glad you used the ArrayCursor example, as it shines a spotlight on the precise meaning of the phrase "value semantics". I have been saying that the well-developed concept of "value semantics" out there in the world is generally assumed <i>recursive</i> -- we expect it to have value semantics all the way down. If I tell someone a thing has value semantics but (any # of levels deep) in it is some string that's getting compared by identity, their expectations will be violated.</div><div><br></div><div>In our context, being a "value class" (thus birthing 2 types whose instances are "values" and "value objects", as above) is inherently about what's going on one-level deep and nothing else. How might we keep this terminology straight?</div><div><br></div><div>Suggestion: leave "semantics" out of it?</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">As a consequence, it
must give up the<br>
things that depend on identity; the class and its fields are
implicitly final. <br>
<br>
But, value classes are still classes, and can have most of the
things classes<br>
can have -- fields, methods, constructors, type parameters,
superclasses (with<br>
some restrictions), nested classes, class literals, interfaces,
etc. The<br>
classes they can extend are restricted: `Object` or abstract
classes with no<br>
instance fields, empty no-arg constructor bodies, no other
constructors, no instance<br>
initializers, no synchronized methods, and whose superclasses
all meet this same<br>
set of conditions. (`Number` meets these conditions.)<br></font></font></div></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">Classes in Java give rise to types; the class `ArrayCursor`
gives rise to a type<br>
`ArrayCursor` (actually a parametric family of instantiations
`ArrayCursor<T>`.)<br>
`ArrayCursor` is still a reference type, just one whose
references refer to<br>
value objects rather than identity objects. For the types in the
upper-right<br>
quadrant of the diagram (interfaces, abstract classes, and
`Object`), references<br>
to these types might refer to either an identity object or a
value object.<br>
(Historically, JVMs were effectively forced to represent object
references with<br>
pointers; for references to value objects, JVMs now have more
flexibility.)<br>
<br>
Because `ArrayCursor` is a reference type, it is nullable
(because references<br>
are nullable), its default value is null, and loads and stores
of references are<br>
atomic with respect to each other even in the presence of data
races, providing<br>
the initialization safety we are used to with classical objects.<br>
<br>
Because instances of `ArrayCursor` have value semantics, `==`
compares by state<br>
rather than identity.</font></font></div></blockquote><div><br></div><div>Would be good to always include the word "shallow" in that.</div><div>Comparing "deeply" by state only emerges from being "deeply" value classes.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">This means that value objects, like
primitives, are<br>
_freely copyable_; we can explode them into their fields and
re-aggregate them<br>
into another value object, and we cannot tell the difference.
(Because they<br>
have no identity, some identity-sensitive operations, such as
synchronization,<br>
are disallowed.)<br></font></font></div></blockquote><div><br></div><div>(Would always mention mutation together with synchronization, as it's the "biggie".)</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
So far we've addressed the first two lines of the table of
differences above;<br>
rather than identity being a property of all object instances,
classes can<br>
decide whether their instances have identity or not. By
allowing classes that<br>
don't need identity to exclude it, we free the runtime to make
better layout and<br>
compilation decisions -- and avoid a whole category of bugs.<br>
<br>
In looking at the code for `ArrayCursor`, we might mistakenly
assume it will be<br>
inefficient, as each loop iteration appears to allocate a new
cursor:<br>
<br>
```<br>
for (ArrayCursor<T> c = Arrays.cursor(array); <br>
c.hasNext(); <br>
c = c.advance()) {<br>
// use c.next();<br>
}<br>
```<br>
<br>
One should generally expect here that _no_ cursors are actually
allocated.<br>
Because an `ArrayCursor` is just its two fields, these fields
will routinely get<br>
scalarized and hoisted into registers, and the constructor call
in `advance`<br>
will typically compile down to incrementing one of these
registers.<br>
<br>
### Migration<br>
<br>
The JDK (as well as other libraries) has many [value-based
classes][valuebased]<br>
such as `Optional` and `LocalDateTime`. Value-based classes
adhere to the<br>
semantic restrictions of value classes, but are still identity
classes -- even<br>
though they don't want to be. Value-based classes can be
migrated to true value<br>
classes simply by redeclaring them as value classes, which is
both source- and<br>
binary-compatible.<br></font></font></div></blockquote><div><br></div><div>I think this is confusing unless you point out that these classes are only "labeled" as value-based, and voluntarily police their own restrictions; there's no actual feature behind it. </div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
We plan to migrate many value-based classes in the JDK to value
classes.<br>
Additionally, the primitive wrappers can be migrated to value
classes as well,<br>
making the conversion between `int` and `Integer` cheaper; see
the section<br>
"Legacy Primitives" below. (In some cases, this may be
_behaviorally_<br>
incompatible for code that synchronizes on the primitive
wrappers. [JEP<br>
390][jep390] has supported both compile-time and runtime
warnings for<br>
synchronizing on primitive wrappers since Java 16.)</font></font></div></blockquote><div><br></div><div>The text here just sounds a little "problem all solved!"-ish, while only mentioning synchronization and not the rest of the list you go through below.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace"><figure><br>
<a href="field-type-zoo.pdf" title="Click for PDF"><br>
<img src="field-type-zoo-mid.png" alt="Java field types
adding value classes"/><br>
</a><br>
</figure><br>
<br>
### Equality<br>
<br>
Earlier we said that `==` compares value objects by state rather
than by<br>
identity. More precisely, two value objects are `==` if they
are of the same<br>
type, and each of their fields are pairwise equal, where
equality is given by<br>
`==` for primitives (except `float` and `double`, which are
compared with<br>
`Float::equals` and `Double::equals` to avoid anomalies), `==`
for references to<br>
identity objects, and recursively with `==` for references to
value objects. In<br>
no case is a value object ever `==` to a reference to an
identity object.<br></font></font></div></blockquote><div><br></div><div>This seems like a good place to explain (a) *why* this is the best thing we can do for `==` if we must keep it working, and (b) why it is problematic as heck.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
<br>
### Value records<br>
<br>
While records have a lot in common with value classes -- they
are final and<br>
their fields are final -- they are still identity classes.</font></font></div></blockquote><div><br></div><div>(This sentence seems off, because as you go on to say, value records <i>aren't</i> identity classes)</div><div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
Records embody a<br>
tradeoff: give up on decoupling the API from the representation,
and in return<br>
get various syntactic and semantic benefits. Value classes
embody another<br>
tradeoff: give up identity, and get various semantic and
performance benefits.<br>
If we are willing to give up both, we can get both sets of
benefits. <br>
<br>
```<br>
value record NameAndScore(String name, int score) { }<br>
```<br>
<br>
Value records combine the data-carrier idiom of records with the
improved <br>
scalarization and flattening benefits of value classes. <br></font></font></div></blockquote><div><br></div><div>Do we have a good use case for an identity record?</div><div>You might want to mention that we'd expect most records to become value records (don't we?).</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">In theory, it would be possible to apply `value` to certain
enums as well, but<br>
this is not currently possible because the `java.lang.Enum` base
class that<br>
enums extend do not meet the requirements for superclasses of
value classes (it<br>
has fields and non-empty constructors).<br></font></font></div></blockquote><div><br></div><div>I would think that whether an enum is a value or not doesn't tend to matter much?</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">## Unboxing values for flatness and density<br>
<br>
Value classes shed object identity, gaining a host of
performance and<br>
predictability benefits in the process. They are an ideal
replacement for many<br>
of today's value-based classes, fully preserving their semantics
(except for the<br>
accidental identity these classes never wanted). But
identity-free reference<br>
types are only one point a spectrum of tradeoffs between
abstraction and<br>
performance, and other desired use cases -- such as numerics --
may want a<br>
different set of tradeoffs.<br>
<br>
Reference types are nullable, and therefore must account for
null somehow in<br>
their representation, which may involve additional footprint.
Similarly, they<br>
offer the initialization safety guarantees for final fields that
we come to<br>
expect from identity objects, which may entail limits on
flatness. For certain<br>
use cases, it may be desire to additionally give up something
else to make<br>
further flatness and footprint gains -- and that something else
is<br>
reference-ness.<br>
<br>
The built-in primitives are best understood as _pairs_ of types:
a primitive<br>
type (e.g., `int`) and its reference companion or box
(`Integer`), with<br>
conversions between the two (boxing and unboxing.) We have both
types because<br>
the two have different characteristics. Primitives are
optimized for efficient<br>
storage and access: they are not nullable, they tolerate
uninitialized (zero)<br>
values, and larger primitive types (`long`, `double`) may tear
under racy<br>
access. References err on the side of safety and flexibility;
they support<br>
nullity, polymorphism, and offer initialization safety (freedom
from tearing),<br>
but by comparison to primitives, they pay a footprint and
indirection cost. <br>
<br>
For these reasons, value classes give rise to pairs of types as
well: a<br>
reference type and a _value companion type_. We've seen the
reference type so<br>
far; for a value class `Point`, the reference type is called
`Point`. (The full<br>
name for the reference type is `Point.ref`; `Point` is an alias
for that.) The<br>
value companion type is called `Point.val`, and the two types
have the same<br>
conversions between them as primitives do today with their
boxes. (If we are<br>
talking explicitly about the value companion type of a value
class, we may<br>
sometimes describe the corresponding reference type as its
_reference<br>
companion_.)<br>
<br>
```<br>
value class Point implements Serializable {<br>
int x;<br>
int y;<br>
<br>
Point(int x, int y) { <br>
this.x = x;<br>
this.y = y;<br>
}<br>
<br>
Point scale(int s) { <br>
return new Point(s*x, s*y);<br>
}<br>
}<br>
```<br>
<br>
The default value of the value companion type is the one for
which all fields<br>
take on their default value; the default value of the reference
type is, like<br>
all reference types, null. <br>
<br>
In our diagram, these new types show up as another entity that
straddles the<br>
line between primitives and identity-free references, alongside
the legacy<br>
primitives: <br>
<br>
** UPDATE DIAGRAM **<br>
<br>
<figure><br>
<a href="field-type-zoo.pdf" title="Click for PDF"><br>
<img src="field-type-zoo-new.png" alt="Java field types
with extended primitives"/><br>
</a><br>
</figure><br>
<br>
### Member access<br>
<br>
Both the reference and value companion types are seen to have
the same instance<br>
members. Unlike today's primitives, value companion types can
be used as<br>
receivers to access fields and invoke methods, subject to
accessibility<br>
constraints: <br>
<br>
```<br>
Point.val p = new Point(1, 2);<br>
assert p.x == 1;<br>
<br>
p = p.scale(2);<br>
assert p.x == 2;<br>
```<br></font></font></div></blockquote><div><br></div><div>Maybe clarify that this isn't because p is getting boxed.</div><div>I like to point out that "we might be used to thinking of `.` as a 'dereference operator', but it's always been just a member access expression; the runtime will dereference IF necessary to carry that out."</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
### Polymorphism<br>
<br>
When we declare a class today, we set up a subtyping (is-a)
relationship between<br>
the declared class and its supertypes. When we declare a value
class, we set up<br>
a subtyping relationship between the _reference type_ and the
declared<br>
supertypes.</font></font></div></blockquote><div><br></div><div>Beating dead horse, just, again it makes it sound like two different things are happening when it could emphasize that the same thing is happening in both cases.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace"> This means that if we declare:<br>
<br>
```<br>
value class UnsignedShort extends Number <br>
implements
Comparable<UnsignedShort> { <br>
...<br>
}<br>
```<br>
<br>
then `UnsignedShort` is a subtype of `Number` and
`Comparable<UnsignedShort>`,<br>
and we can ask questions about subtyping using `instanceof` or
pattern matching.<br>
What happens if we ask such a question of the value companion
type?<br>
<br>
```<br>
UnsignedShort.val us = ...<br>
if (us instanceof Number) { ... }<br>
```<br>
<br>
Since subtyping is defined only on reference types, the
`instanceof` operator<br>
(and corresponding type patterns) will behave as if both sides
were lifted to<br>
the approrpriate reference type, and we can answer the question
that way.</font></font></div></blockquote><div><br></div><div>So ... this will yield `true`? I hope that is useful enough to pay for the deeper confusions it might sow. Who knows, maybe I'll need to loosen up on this, but I have assumed that we do want/need users to understand that `UnsignedShort.val` and `short` are monomorphic, having no supertypes or subtypes.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">(This<br>
may trigger fears of expensive boxing conversions, but in
reality no actual<br>
allocation will happen.)<br>
<br>
We introduce a new relationship based on `extends` /
`implements` clauses, which<br>
we'll call "extends"; we define `A extends B` as meaning `A
<: B` when A is a<br>
reference type, and `A.ref <: B` when A is a value companion
type. The<br>
`instanceof` relation, reflection, and pattern matching are
updated to use<br>
"extends".<br></font></font></div></blockquote><div><br></div><div>(This will make some readers want to hear your explanation of why it isn't easier to just say that `A <: A.ref` and be done with it)</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
### Arrays<br>
<br>
Arrays of reference types are _covariant_; this means that if `A
<: B`, then<br>
`A[] <: B[]`. This allows `Object[]` to be the "top array
type", at least for<br>
arrays of references. But arrays of primitives are currently
left out of this<br>
story. We can unify the treatment of arrays by defining array
covariance over<br>
the new "extends" relationship; if A extends B, then `A[] <:
B[]`. For a value<br>
class P, `P.val[] <: P.ref[] <: Object[]`, finally making
`Object[]` the top<br>
type for all arrays.<br></font></font></div></blockquote><div><br></div><div>Isn't it really "value companion types" you want to talk about here -- then primitives get it for free when we cover that they are becoming just VCTs?</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">### Equality<br>
<br>
Just as with `instanceof`, we define `==` on values by appealing
to the<br>
reference companion (though no actual boxing need occur).
Evaluating `a == b`,<br>
where one or both operands are of a value companion type, can be
defined as if<br>
the operands are first converted to their corresponding
reference type, and then<br>
comparing the results. This means that the following will
succeed: <br>
<br>
```<br>
Point.val p = new Point(3, 4);<br>
Point pr = p;<br>
assert p == pr;<br>
```<br>
<br>
The base implementation of `Object::equals` delegates to `==`,
which is a<br>
suitable default for both reference and value classes. <br>
<br>
### Serialization<br>
<br>
If a value class implements `Serializable`, this is also really
a statement<br>
about the reference type. Just as with other aspects described
here,<br>
serialization of value companions can be defined by converting
to the<br>
corresponding reference type and serializing that, and reversing
the process at<br>
deserialization time.<br>
<br>
Serialization currently uses object identity to preserve the
topology of an<br>
object graph. This generalizes cleanly to objects without
identity, because<br>
`==` on value objects treats two identical copies of a value
object as equal. <br>
So any observations we make about graph topology prior to
serialization with<br>
`==` are consistent with those after deserialization.<br>
<br>
### Identity-sensitive operations<br>
<br>
Certain operations are currently defined in terms of object
identity. As we've<br>
already seen, some of these, like equality, can be sensibly
extended to cover<br>
all instances.</font></font></div></blockquote><div><br></div><div>As you know, I object to calling this `==` behavior "sensible". It is forced by compatibility and isn't what users really want, but will be close enough to what they want often enough to get them into trouble.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace"> Others, like synchronization, will become
partial. <br>
Identity-sensitive operations include:<br>
<br>
- **Equality.** We extend `==` on references to include
references to value<br>
objects. Where it currently has a meaning, the new
definition coincides<br>
with that meaning.<br>
<br>
- **System::identityHashCode.** The main use of
`identityHashCode` is in the<br>
implementation of data structures such as
`IdentityHashMap`. We can extend<br>
`identityHashCode` in the same way we extend equality --
deriving a hash on<br>
primitive objects from the hash of all the fields.<br></font></font></div></blockquote><div><br></div><div>s/primitive/value/?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
<br>
- **Synchronization.** This becomes a partial operation. If
we can<br>
statically detect that a synchronization will fail at
runtime (including<br>
declaring a `synchronized` method in a value class), we can
issue a<br>
compilation error; if not, attempts to lock on a value
object results in<br>
`IllegalMonitorStateException`. This is justifiable because
it is<br>
intrinsically imprudent to lock on an object for which you
do not have a<br>
clear understanding of its locking protocol; locking on an
arbitrary<br>
`Object` or interface instance is doing exactly that.<br>
<br>
- **Weak, soft, and phantom references.** Capturing an exotic
reference to a<br>
value object becomes a partial operation, as these are
intrinsically tied to<br>
reachability (and hence to identity). However, we will
likely make<br>
enhancements to `WeakHashMap` to support mixed identity and
value keys. <br>
<br>
### What about Object?<br>
<br>
The root class `Object` poses an unusual problem, in that every
class must<br>
extend it directly or indirectly, but it is also instantiable
(non-abstract),<br>
and its instances have identity -- it is common to use `new
Object()` as a way<br>
to obtain a new object identity for purposes of locking.</font></font></div></blockquote><div><br></div><div>... left me hanging!</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">## Why two types?<br>
<br>
It is sensible to ask: why do we need companion types at all?
This is analogous<br>
to the need for boxes in 1995: we'd made one set of tradeoffs
for primitives,<br>
favoring performance (non-nullable, zero-default, tolerant of<br>
non-initialization, tolerant of tearing under race, unrelated to
`Object`), and<br>
another for references, favoring flexibility and safety. Most
of the time, we<br>
ignored the primitive wrapper classes, but sometimes we needed
to temporarily<br>
suppress one of these properties, such as when interoperating
with code that<br>
expects an `Object` or the ability to express "no value". The
reasons we needed<br>
boxes in 1995 still apply today: sometimes we need the
affordances of<br>
references, and in those cases, we appeal to the reference
companion. <br>
<br>
Reasons we might want to use the reference companion include: <br>
<br>
- **Interoperation with reference types.** Value classes can
implement<br>
interfaces and extend classes (including `Object` and some
abstract classes),<br>
which means some class and interface types are going to be
polymorphic over<br>
both identity and primitive objects. This polymorphism is
achieved through<br>
object references; a reference to `Object` may be a reference
to an identity<br>
object, or a reference to a value object. <br>
<br>
- **Nullability.** Nullability is an affordance of object
_references_, not<br>
objects themselves. Most of the time, it makes sense that
primitive types<br>
are non-nullable (as the primitives are today), but there may
be situations<br>
where null is a semantically important value. Using the
reference companion<br>
when nullability is required is semantically clear, and
avoids the need to<br>
invent new sentinel values for "no value."<br>
<br>
This need comes up when migrating existing classes; the
method `Map::get`<br>
uses `null` to signal that the requested key was not present
in the map. But,<br>
if the `V` parameter to `Map` is a primitive class, `null` is
not a valid<br>
value. We can capture the "`V` or null" requirement by
changing the<br>
descriptor of `Map::get` to:<br></font></font></div></blockquote><div><br></div><div>Three more stale references to "primitive class" here?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
<br>
```<br>
public V.ref get(K key);<br>
```<br>
<br>
where, whatever type `V` is instantiated as, `Map::get`
returns the reference<br>
companion. (For a type `V` that already is a reference type,
this is just `V`<br>
itself.) This captures the notion that the return type of
`Map::get` will<br>
either be a reference to a `V`, or the `null` reference.
(This is a<br>
compatible change, since both erase to the same thing.)<br>
<br>
<br>
- **Self-referential types.** Some types may want to directly
or indirectly<br>
refer to themselves, such as the "next" field in the node
type of a linked<br>
list:<br>
<br>
```<br>
class Node<T> {<br>
T theValue;<br>
Node<T> nextNode;<br>
}<br>
```<br>
<br>
We might want to represent this as a value class, but if the
type of<br>
`nextNode` were `Node.val<T>`, the layout of `Node`
would be<br>
self-referential, since we would be trying to flatten a
`Node` into its own<br>
layout. <br>
<br>
- **Protection from tearing.** For a value class with a
non-atomic value<br>
companion type, we may want to use the reference companion in
cases where we<br>
are concerned about tearing; because loads and stores of
references are<br>
atomic, `P.ref` is immune to the tearing under race that
`P.val` might be<br>
subject to.<br>
<br>
- **Compatibility with existing boxing.** Autoboxing is
convenient, in that it<br>
lets us pass a primitive where a reference is required. But
boxing affects<br>
far more than assignment conversion; it also affects method
overload<br>
selection. The rules are designed to prefer overloads that
require no<br>
conversions to those requiring boxing (or varargs)
conversions. Having both<br>
a value and reference type for every value class means that
these rules can<br>
be cleanly and intuitively extended to cover value classes.<br>
<br>
## Refining the value companion<br>
<br>
Value classes have several options for refining the behavior of
the value<br>
companion type and how they are exposed to clients.<br>
<br>
### Classes with no good default value<br>
<br>
For a value class `C`, the default value of `C.ref` is the same
as any other<br>
reference type: `null`. For the value companion type `C.val`,
the default value<br>
is the one where all of its fields are initialized to their
default value. <br>
<br>
The built-in primitives reflect the design assumption that zero
is a reasonable<br>
default. The choice to use a zero default for uninitialized
variables was one<br>
of the central tradeoffs in the design of the built-in
primitives. It gives us<br>
a usable initial value (most of the time), and requires less
storage footprint<br>
than a representation that supports null (`int` uses all 2^32 of
its bit<br>
patterns, so a nullable `int` would have to either make some 32
bit signed<br>
integers unrepresentable, or use a 33rd bit). This was a
reasonable tradeoff<br>
for the built-in primitives, and is also a reasonable tradeoff
for many (but not<br>
all) other potential value classes (such as complex numbers, 2D
points,<br>
half-floats, etc).<br>
<br>
But for others potential value classes, such as `LocalDate`,
there _is_ no<br>
reasonable default. If we choose to represent a date as the
number of days<br>
since some some epoch, there will invariably be bugs that stem
from<br>
uninitialized dates; we've all been mistakenly told by computers
that something<br>
will happen on or near 1 January 1970. Even if we could choose
a default other<br>
than the zero representation, an uninitialized date is still
likely to be an<br>
error -- there simply is no good default date value. <br>
<br>
For this reason, value classes have the choice of encapsulating
or exposing<br>
their value companion type. If the class is willing to tolerate
an<br>
uninitialized (zero) value, it can freely share its `.val`
companion with the<br>
world; if uninitialized values are dangerous (such as for
`LocalDate`), it can<br>
be encapsulated to the class or package. <br>
<br>
Encapsulation is accomplished using ordinary access control. By
default, the<br>
value companion is `private`, and need not be declared
explicitly; a class that<br>
wishes to share its value companion can make it public:<br>
<br>
```<br>
public value record Complex(double real, double imag) { <br>
public value companion Complex.val;<br>
}<br>
```<br></font></font></div></blockquote><div><br></div><div>Elephant in the room: so I can name it something else?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">### Atomicity and tearing<br>
<br>
For the primitive types longer than 32 bits (long and double),
it is not<br>
guaranteed that reads and writes from different threads (without
suitable<br>
coordination) are atomic with respect to each other. The result
is that, if<br>
accessed under data race, a long or double field or array
element can be seen to<br>
"tear", and a read might see the low 32 bits of one write and
the high 32 bits<br>
of another. (Declaring the containing field `volatile` is
sufficient to restore<br>
atomicity, as is properly coordinating with locks or other
concurrency control,<br>
or not sharing across threads in the first place.)<br>
<br>
This was a pragmatic tradeoff given the hardware of the time;
the cost of 64-bit<br>
atomicity on 1995 hardware would have been prohibitive, and
problems only arise<br>
when the program already has data races -- and most numeric code
deals with<br>
thread-local data. Just like with the tradeoff of nulls vs
zeros, the design of<br>
the built-in primitives permits tearing as part of a tradeoff
between<br>
performance and correctness, where primitives chose "as fast as
possible" and<br>
reference types chose more safety.<br>
<br>
Today, most JVMs give us atomic loads and stores of 64-bit
primitives, because<br>
the hardware makes them cheap enough. But value classes bring
us back to<br>
1995; atomic loads and stores of larger-than-64-bit values are
still expensive<br>
on many CPUs, leaving us with a choice of "make operations on
primitives slower"<br>
or permitting tearing when accessed under race. <br>
<br>
It would not be wise for the language to select a
one-size-fits-all policy about<br>
tearing; choosing "no tearing" means that types like `Complex`
are slower than<br>
they need to be, even in a single-threaded program; choosing
"tearing" means<br>
that classes like `Range` can be seen to not exhibit invariants
asserted by<br>
their constructor. Class authors have to choose, with full
knowledge of their<br>
domain, whether their types can tolerate tearing. The default
is no tearing<br>
(safe by default); a class can opt for greater flattening at the
cost of<br>
potential tearing by declaring the value companion as
`non-atomic`:<br>
<br>
```<br>
public value record Complex(double real, double imag) { <br>
public non-atomic value companion Complex.val;<br>
}<br>
```<br>
<br>
For classes like `Complex`, all of whose bit patterns are valid,
this is very<br>
much like the choice around `long` in 1995. For other classes
that might have<br>
nontrivial representational invariants, they likely want to
stick to the default<br>
of atomicity.</font></font></div></blockquote><div><br></div><div>I just think many readers are going to think "well of course I need to be safe from this terrible tearable thing" without realizing that this only even comes up when someone uses it in a wrong or risky way. An extra reminder of this might be helpful.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">## Migrating legacy primitives<br>
<br>
As part of generalizing primitives, we want to adjust the
built-in primitives to<br>
behave as consistently with value classes as possible. While we
can't change<br>
the fact that `int`'s reference companion is the oddly-named
`Integer`, we can give them<br>
more uniform aliases (`int.ref` is an alias for `Integer`; `int`
is an alias for<br>
`Integer.val`) -- so that we can use a consistent rule for
naming companions.<br>
Similarly, we can extend member access to the legacy primitives,
and allow<br>
`int[]` to be a subtype of `Integer[]` (and therefore of
`Object[]`.)<br>
<br>
We will redeclare `Integer` as a value class with a public value
companion:<br>
<br>
```<br>
value class Integer { <br>
public value companion Integer.val;<br>
<br>
// existing methods<br>
}<br>
```<br>
<br>
where the type name `int` is an alias for `Integer.val`.</font></font></div></blockquote><div><br></div><div>Many people will wonder how that is going to work since the class currently contains `int value` which becomes circular. Do you want to mention the 2 candidate solutions for that we've discussed?</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace"> The
primitive array<br>
types will be retrofitted such that arrays of primitives are
subtypes of arrays<br>
of their boxes (`int[] <: Integer[]`). <br>
<br>
## Unifying primitives with classes<br>
<br>
Earlier, we had a chart of the differences between primitive and
reference<br>
types: <br>
<br>
| Primitives |
Objects |<br>
| ------------------------------------------ |
---------------------------------- |<br>
| No identity (pure values) |
Identity |<br>
| `==` compares values | `==` compares
object identity |<br>
| Built-in | Declared in
classes |<br>
| No members (fields, methods, constructors) | Members
(including mutable fields) |<br>
| No supertypes or subtypes | Class and
interface inheritance |<br>
| Accessed directly | Accessed via
object references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value is
null |<br>
| Arrays are monomorphic | Arrays are
covariant |<br>
| May tear under race | Initialization
safety guarantees |<br>
| Have reference companions (boxes) | Don't need
reference companions |<br>
<br>
The addition of value classes addresses many of these directly.
Rather than<br>
saying "classes have identity, primitives do not", we make
identity an optional<br>
characteristic of classes (and derive equality semantics from
that.) Rather<br>
than primitives being built in, we derive all types, including
primitives, from<br>
classes, and endow value companion types with the members and
supertypes<br>
declared with the value class. Rather than having primitive
arrays be<br>
monomorphic, we make all arrays covariant under the `extends`
relation. <br>
<br>
The remaining differences now become differences between
reference types and<br>
value types:<br>
<br>
| Value types | Reference
types |<br>
| --------------------------------------------- |
-------------------------------- |<br>
| Accessed directly | Accessed via
object references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value
is null |<br>
| May tear under race, if declared `non-atomic` | Initialization
safety guarantees |<br>
<br>
<br>
### Choosing which to use<br>
<br>
How would we choose between declaring an identity class or a
value class, and<br>
the various options on value companiones? Here are some quick
rules of thumb: <br>
<br>
- If you need mutability, subclassing, or aliasing, choose an
identity class. <br>
- If uninitialized (zero) values are unacceptable, choose a
value class with <br>
the value companion encapsulated. <br>
- If you have no cross-field invariants and are willing to
tolerate tearing to<br>
enable more flattening, choose a value class with a
non-atomic value<br>
companion.<br>
<br>
## Summary<br>
<br>
Valhalla unifies, to the extent possible, primitives and
objects. The<br>
following table summarizes the transition from the current world
to Valhalla.<br>
<br>
| Current World |
Valhalla |<br>
| ------------------------------------------- |
--------------------------------------------------------- |<br>
| All objects have identity | Some objects
have identity |<br>
| Fixed, built-in set of primitives | Open-ended set
of primitives, declared via classes |<br></font></font></div></blockquote><div><br></div><div>Intentional use of "primitives" still? I would think it should say "value types no longer limited to just the built-in primitives".</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4"><font face="monospace">
| Primitives don't have methods or supertypes | Primitives are
classes, with methods and supertypes |<br>
| Primitives have ad-hoc boxes | Primitives have
regularized reference companions |<br>
| Boxes have accidental identity | Reference
companions have no identity |<br>
| Boxing and unboxing conversions | Primitive
reference and value conversions, but same rules |<br>
| Primitive arrays are monomorphic | All arrays are
covariant |<br>
<br>
<br>
[valuebased]:
<a href="https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html" target="_blank">https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html</a><br>
[growing]: <a href="https://dl.acm.org/doi/abs/10.1145/1176617.1176621" target="_blank">https://dl.acm.org/doi/abs/10.1145/1176617.1176621</a><br>
[jep390]: <a href="https://openjdk.java.net/jeps/390" target="_blank">https://openjdk.java.net/jeps/390</a><br>
<br>
</font></font>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div style="line-height:1.5em;padding-top:10px;margin-top:10px;color:rgb(85,85,85);font-family:sans-serif"><span style="border-width:2px 0px 0px;border-style:solid;border-color:rgb(213,15,37);padding-top:2px;margin-top:2px">Kevin Bourrillion |</span><span style="border-width:2px 0px 0px;border-style:solid;border-color:rgb(51,105,232);padding-top:2px;margin-top:2px"> Java Librarian |</span><span style="border-width:2px 0px 0px;border-style:solid;border-color:rgb(0,153,57);padding-top:2px;margin-top:2px"> Google, Inc. |</span><span style="border-width:2px 0px 0px;border-style:solid;border-color:rgb(238,178,17);padding-top:2px;margin-top:2px"> <a href="mailto:kevinb@google.com" target="_blank">kevinb@google.com</a></span></div></div></div></div></div></div></div></div>