<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<font size="4"><font face="monospace">Yet another attempt at
updating SoV to reflect the current thinking. Please review.<br>
<br>
<br>
# State of Valhalla<br>
## Part 2: The Language Model {.subtitle}<br>
<br>
#### Brian Goetz {.author}<br>
#### July 2022 {.date}<br>
<br>
> _This is the second of three documents describing the
current State of<br>
Valhalla. The first is [The Road to Valhalla](01-background);
the<br>
third is [The JVM Model](03-vm-model)._<br>
<br>
This document describes the directions for the Java _language_
charted by<br>
Project Valhalla. (In this document, we use "currently" to
describe the<br>
language as it stands today, without value classes.)<br>
<br>
Valhalla started with the goal of providing user-programmable
classes which can<br>
be flat and dense in memory. Numerics are one of the motivating
use cases;<br>
adding new primitive types directly to the language has a very
high barrier. As<br>
we learned from [Growing a Language][growing] there are
infinitely many numeric<br>
types we might want to add to Java, but the proper way to do
that is via<br>
libraries, not as a language feature.<br>
<br>
## Primitive and objects today<br>
<br>
Java currently has eight built-in primitive types. Primitives
represent pure<br>
_values_; any `int` value of "3" is equivalent to, and
indistinguishable from,<br>
any other `int` value of "3". Because primitives are "just
their bits" with no<br>
ancillarly state such as object identity, they are _freely
copyable_; whether<br>
there is one copy of the `int` value "3", or millions, doesn't
matter to the<br>
execution of the program. With the exception of the unusual
treatment of exotic<br>
floating point values such as `NaN`, the `==` operator on
primitives performs a<br>
_substitutibility test_ -- it asks "are these two values the
same value".<br>
<br>
Java also has _objects_, and each object has a unique _object
identity_. This<br>
means that each object must live in exactly one place (at any
given time), and<br>
this has consequences for how the JVM lays out objects in
memory. Objects in<br>
Java are not manipulated or accessed directly, but instead
through _object<br>
references_. Object references are also a kind of value -- they
encode the<br>
identity of the object to which they refer, and the `==`
operator on object<br>
references also performs a substitutibility test, asking "do
these two<br>
references refer to the same object." Accordingly, object
_references_ (like<br>
other values) can be freely copied, but the objects they refer
to cannot. <br>
<br>
This dichotomy -- that the universe of values consists of
primitives and object<br>
references -- has long been at the core of Java's design. JVMS
2.2 (Data Types)<br>
opens with:<br>
<br>
> There are two kinds of values that can be stored in
variables, passed as<br>
> arguments, returned by methods, and operated upon:
primitive values and<br>
> reference values.<br>
<br>
Primitives and objects currently differ in almost every
conceivable way:<br>
<br>
| Primitives |
Objects |<br>
| ------------------------------------------ |
---------------------------------- |<br>
| No identity (pure values) |
Identity |<br>
| `==` compares values | `==` compares
object identity |<br>
| Built-in | Declared in
classes |<br>
| No members (fields, methods, constructors) | Members
(including mutable fields) |<br>
| No supertypes or subtypes | Class and
interface inheritance |<br>
| Accessed directly | Accessed via
object references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value is
null |<br>
| Arrays are monomorphic | Arrays are
covariant |<br>
| May tear under race | Initialization
safety guarantees |<br>
| Have reference companions (boxes) | Don't need
reference companions |<br>
<br>
Primitives embody a number tradeoffs aimed at maximizing the
performance and<br>
usability of the primitive types. Reference types default to
`null`, meaning<br>
"referring to no object", and must be initialized before use;
primitives default<br>
to a usable zero value (which for most primitives is the
additive identity) and<br>
therefore may be used without initialization. (If primitives
were nullable like<br>
references, not only would this be less convenient in many
situations, but they<br>
would likely consume additional memory footprint to accomodate
the possibility<br>
of nullity, as most primitives already use all their bit
patterns.) Similarly,<br>
reference types provide initialization safety guarantees for
final fields even<br>
under a certain category of data races (this is where we get the
"immutable<br>
objects are always thread-safe" rule from); primitives allow
tearing under race<br>
for larger-than-32-bit values. We could characterize the design
principles<br>
behind these tradeoffs are "make objects safer, make primitives
faster."<br>
<br>
The following figure illustrates the current universe of Java's
types. The<br>
upper left quadrant is the built-in primitives; the rest of the
space is<br>
reference types. In the upper-right, we have the abstract
reference types --<br>
abstract classes, interfaces, and `Object` (which, though
concrete, acts more<br>
like an interface than a concrete class). The built-in
primitives have wrappers<br>
or boxes, which are reference types.<br>
<br>
<figure><br>
<a href="field-type-zoo.pdf" title="Click for PDF"><br>
<img src="field-type-zoo-old.png" alt="Current universe
of Java field types"/><br>
</a><br>
</figure><br>
<br>
Valhalla aims to unify primitives and objects such that both are
declared with<br>
classes, but maintains the special runtime characteristics --
flatness and<br>
density -- that primitives currently enjoy. <br>
<br>
### Primitives and boxes today<br>
<br>
The built-in primitives are best understood as _pairs_ of types:
the primitive<br>
type (`int`) and its reference companion type (`Integer`), with
built-in<br>
conversions between the two. The two types have different
characteristics that<br>
makes each more or less appropriate for a given situations.
Primitives are<br>
optimized for efficient storage and access: they are
monomorphic, not nullable,<br>
tolerate uninitialized (zero) values, and larger primitive types
(`long`,<br>
`double`) may tear under racy access. The box types add back
the affordances of<br>
references -- nullity, polymorphism, interoperation with
generics, and<br>
initialization safety -- but at a cost. <br>
<br>
Valhalla generalizes this primitive-box relationship, in a way
that is more<br>
regular and extensible and reduces the "boxing tax".<br>
<br>
## Eliminating unwanted object identity<br>
<br>
Many impediments to optimization stem from _unwanted object
identity_. For many<br>
classes, not only is identity not directly useful, it can be a
source of bugs.<br>
For example, due to caching, `Integer` can be accidentally
compared correctly<br>
with `==` just often enough that people keep doing it.
Similarly, [value-based<br>
classes][valuebased] such as `Optional` have no need for
identity, but pay the<br>
costs of having identity anyway. <br>
<br>
Valhalla allows classes to explicitly disavow identity by
declaring them as<br>
_value classes_. The instances of a value class are called
_value objects_. <br>
<br>
```<br>
value class Point implements Serializable {<br>
int x;<br>
int y;<br>
<br>
Point(int x, int y) { <br>
this.x = x;<br>
this.y = y;<br>
}<br>
<br>
Point scale(int s) { <br>
return new Point(s*x, s*y);<br>
}<br>
}<br>
```<br>
<br>
This says that an `Point` is a class whose instances have no
identity. As a<br>
consequence, it must give up the things that depend on identity;
the class and<br>
its fields are implicitly final. Additionally, operations that
depended on<br>
identity must either be adjusted (`==` on value objects compares
state, not<br>
identity) or disallowed (it is illegal to lock on a value
object.)<br>
<br>
Value classes can still have most of the affordances of classes
-- fields,<br>
methods, constructors, type parameters, superclasses (with some
restrictions),<br>
nested classes, class literals, interfaces, etc. The classes
they can extend<br>
are restricted: `Object` or abstract classes with no instance
fields, empty<br>
no-arg constructor bodies, no other constructors, no instance
initializers, no<br>
synchronized methods, and whose superclasses all meet this same
set of<br>
conditions. (`Number` is an example of such an abstract class.)<br>
<br>
Because `Point` has value semantics, `==` compares by state
rather than<br>
identity. This means that value objects, like primitives, are
_freely<br>
copyable_; we can explode them into their fields and
re-aggregate them into<br>
another value object, and we cannot tell the difference. <br>
<br>
So far we've addressed the first two lines in our table of
differences; rather<br>
than all objects having identity, classes can opt into, or out
of, object<br>
identity for their instances. By allowing classes to exclude
unwanted identity,<br>
we free the runtime to make better layout and compilation
decisions.<br>
<br>
### Example: immutable cursors<br>
<br>
Collections today use `Iterator` to facilitate traversal through
the collection,<br>
which store iteration state in mutable fields. While heroic
optimizations such<br>
as _escape analysis_ can sometimes eliminate the cost associated
with iterators,<br>
such optimizations are fragile and hard to rely on. Value
objects offer an<br>
iteration approach that is more reliably optimized: immutable
cursors. (Without<br>
value objects, immutable cursors would be prohibitively
expensive for<br>
iteration.)<br>
<br>
```<br>
value class ArrayCursor<T> { <br>
T[] array;<br>
int offset;<br>
<br>
public ArrayCursor(T[] array, int offset) { <br>
this.array = array;<br>
this.offset = offset;<br>
}<br>
<br>
public ArrayCursor(T[] array) { <br>
this(array, 0);<br>
}<br>
<br>
public boolean hasNext() { <br>
return offset < array.length;<br>
}<br>
<br>
public T next() { <br>
return array[offset];<br>
}<br>
<br>
public ArrayCursor<T> advance() { <br>
return new ArrayCursor(array, offset+1);<br>
}<br>
}<br>
```<br>
<br>
In looking at this code, we might mistakenly assume it will be
inefficient, as<br>
each loop iteration appears to allocate a new cursor:<br>
<br>
```<br>
for (ArrayCursor<T> c = new ArrayCursor<>(array); <br>
c.hasNext(); <br>
c = c.advance()) {<br>
// use c.next();<br>
}<br>
```<br>
<br>
In reality, we should expect that _no_ cursors are actually
allocated here. An<br>
`ArrayCursor` is just its two fields, and the runtime is free to
scalarize the<br>
object into its fields and hoist them into registers. The
calling convention<br>
for `advance` is optimized so that both receiver and return
value are<br>
scalarized. Even without inlining `advance`, no allocation will
take place,<br>
just some shuffling of the values in registers. And if
`advance` is inlined,<br>
the client code will compile down to having a single register
increment and<br>
compare in the loop header. <br>
<br>
### Migration<br>
<br>
The JDK (as well as other libraries) has many [value-based
classes][valuebased]<br>
such as `Optional` and `LocalDateTime`. Value-based classes
adhere to the<br>
semantic restrictions of value classes, but are still identity
classes -- even<br>
though they don't want to be. Value-based classes can be
migrated to true value<br>
classes simply by redeclaring them as value classes, which is
both source- and<br>
binary-compatible. <br>
<br>
We plan to migrate many value-based classes in the JDK to value
classes.<br>
Additionally, the primitive wrappers can be migrated to value
classes as well,<br>
making the conversion between `int` and `Integer` cheaper; see
"Migrating the<br>
legacy primitives" below. (In some cases, this may be
_behaviorally_<br>
incompatible for code that synchronizes on the primitive
wrappers. [JEP<br>
390][jep390] has supported both compile-time and runtime
warnings for<br>
synchronizing on primitive wrappers since Java 16.) <br>
<br>
<figure><br>
<a href="field-type-zoo.pdf" title="Click for PDF"><br>
<img src="field-type-zoo-mid.png" alt="Java field types
adding value classes"/><br>
</a><br>
</figure><br>
<br>
### Identity-sensitive operations<br>
<br>
Certain operations are currently defined in terms of object
identity. As we've<br>
already seen, some of these, like equality, can be sensibly
extended to cover<br>
all instances. Others, like synchronization, will become
partial. <br>
Identity-sensitive operations include:<br>
<br>
- **Equality.** We extend `==` on references to include
references to value<br>
objects. Where it currently has a meaning, the new
definition coincides<br>
with that meaning.<br>
<br>
- **System::identityHashCode.** The main use of
`identityHashCode` is in the<br>
implementation of data structures such as
`IdentityHashMap`. We can extend<br>
`identityHashCode` in the same way we extend equality --
deriving a hash on<br>
value objects from the hash of all the fields.<br>
<br>
- **Synchronization.** This becomes a partial operation. If
we can<br>
statically detect that a synchronization will fail at
runtime (including<br>
declaring a `synchronized` method in a value class), we can
issue a<br>
compilation error; if not, attempts to lock on a value
object results in<br>
`IllegalMonitorStateException`. This is justifiable because
it is<br>
intrinsically imprudent to lock on an object for which you
do not have a<br>
clear understanding of its locking protocol; locking on an
arbitrary<br>
`Object` or interface instance is doing exactly that.<br>
<br>
- **Weak, soft, and phantom references.** Capturing an exotic
reference to a<br>
value object becomes a partial operation, as these are
intrinsically tied to<br>
reachability (and hence to identity). However, we will
likely make<br>
enhancements to `WeakHashMap` to support mixed identity and
value keys. <br>
<br>
### Value classes and records<br>
<br>
While records have a lot in common with value classes -- they
are final and<br>
their fields are final -- they are still identity classes.
Records embody a<br>
tradeoff: give up on decoupling the API from the representation,
and in return<br>
get various syntactic and semantic benefits. Value classes
embody another<br>
tradeoff: give up identity, and get various semantic and
performance benefits.<br>
If we are willing to give up both, we can get both sets of
benefits, by<br>
declaring a _value record_. <br>
<br>
```<br>
value record NameAndScore(String name, int score) { }<br>
```<br>
<br>
Value records combine the data-carrier idiom of records with the
improved <br>
scalarization and flattening benefits of value classes. <br>
<br>
In theory, it would be possible to apply `value` to certain
enums as well, but<br>
this is not currently possible because the `java.lang.Enum` base
class that<br>
enums extend do not meet the requirements for superclasses of
value classes (it<br>
has fields and non-empty constructors).<br>
<br>
### Value and reference companion types<br>
<br>
Value classes are generalizations of primitives. Since
primitives have a<br>
reference companion type, value classes actually give rise to
_pairs_ of types:<br>
a value type and a reference type. We've seen the reference
type already; for<br>
the value class `ArrayCursor`, the reference type is called
`ArrayCursor`, just<br>
as with identity classes. The full name for the reference type
is<br>
`ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for
that. (This<br>
aliasing is what allows value-based classes to be compatibly
migrated to value<br>
classes.) The value type is called `ArrayCursor.val`, and the
two types have the<br>
same conversions between them as primitives do today with their
boxes. The<br>
default value of the value type is the one for which all fields
take on their<br>
default value; the default value of the reference type is, like
all reference<br>
types, null. We will refer to the value type of a value class
as the _value<br>
companion type_.<br>
<br>
Just as with today's primitives and their boxes, the reference
and value<br>
companion types of a value class differ in their support for
nullity,<br>
polymorphism, treatment of uninitialized variables, and safety
guarantees under<br>
race. Value companion types, like primitive types, are
monomorphic,<br>
non-nullable, tolerate uninitialized (zero) values, and (under
some<br>
circumstances) may tear under racy access. Reference types are
polymorphic,<br>
nullable, and offer the initialization safety guarantees for
final fields that<br>
we have come to expect from identity objects. <br>
<br>
Unlike with today's primitives, the "boxing" and "unboxing"
conversions between<br>
the reference and value companion types are not nearly as heavy
or wasteful,<br>
because of the lack of identity. A variable of type `Point.val`
holds a "bare"<br>
value object; a variable of type `Point.ref` holds a _reference
to_ a value<br>
object. For many use cases, the reference type will offer good
enough<br>
performance; in some cases, it may be desire to additionally
give up the<br>
affordances of reference-ness to make further flatness and
footprint gains. See<br>
[Performance Model](05-performance-model) for more details on
the specific<br>
tradeoffs.<br>
<br>
In our diagram, these new types show up as another entity that
straddles the<br>
line between primitives and identity-free references, alongside
the legacy<br>
primitives: <br>
<br>
** UPDATE DIAGRAM **<br>
<br>
<figure><br>
<a href="field-type-zoo.pdf" title="Click for PDF"><br>
<img src="field-type-zoo-new.png" alt="Java field types
with extended primitives"/><br>
</a><br>
</figure><br>
<br>
### Member access<br>
<br>
Both the reference and value companion types have the same
members. Unlike<br>
today's primitives, value companion types can be used as
receivers to access<br>
fields and invoke methods (subject to the usual accessibility
constraints): <br>
<br>
```<br>
Point.val p = new Point(1, 2);<br>
assert p.x == 1;<br>
<br>
p = p.scale(2);<br>
assert p.x == 2;<br>
```<br>
<br>
### Polymorphism<br>
<br>
An identity class `C` that extends `D` sets up a subtyping
(is-a) relationship<br>
between `C` and `D`. For value classes, the same thing happens
between its<br>
_reference type_ and the declared supertypes. (Reference types
are<br>
polymorphic; value types are not.) This means that if we
declare:<br>
<br>
```<br>
value class UnsignedShort extends Number <br>
implements
Comparable<UnsignedShort> { <br>
...<br>
}<br>
```<br>
<br>
then `UnsignedShort` is a subtype of `Number` and
`Comparable<UnsignedShort>`,<br>
and we can ask questions about subtyping using `instanceof` or
pattern matching.<br>
What happens if we ask such a question of the value companion
type?<br>
<br>
```<br>
UnsignedShort.val us = ...<br>
if (us instanceof Number) { ... }<br>
```<br>
<br>
Since subtyping is defined only on reference types, the
`instanceof` operator<br>
(and corresponding type patterns) will behave as if both sides
were lifted to<br>
the appropriate reference type (unboxed), and then we can appeal
to subtyping.<br>
(This may trigger fears of expensive boxing conversions, but in
reality no<br>
actual allocation will happen.)<br>
<br>
We introduce a new relationship between types based on `extends`
/ `implements`<br>
clauses, which we'll call "extends": we define `A extends B` as
meaning `A <: B`<br>
when A is a reference type, and `A.ref <: B` when A is a
value companion type.<br>
The `instanceof` relation, reflection, and pattern matching are
updated to use<br>
"extends".<br>
<br>
### Array covariance<br>
<br>
Arrays of reference types are _covariant_; this means that if `A
<: B`, then<br>
`A[] <: B[]`. This allows `Object[]` to be the "top array
type" -- but only for<br>
arrays of references. Arrays of primitives are currently left
out of this<br>
story. We unify the treatment of arrays by defining array
covariance over the<br>
new "extends" relationship; if A _extends_ B, then `A[] <:
B[]`. This means<br>
that for a value class P, `P.val[] <: P.ref[] <:
Object[]`; when we migrate the<br>
primitive types to be value classes, then `Object[]` is finally
the top type for<br>
all arrays. (When the built-in primitives are migrated to value
classes, this<br>
means `int[] <: Integer[] <: Object[]` too.)<br>
<br>
### Equality<br>
<br>
For values, as with primitives, `==` compares by state rather
than by identity.<br>
Two value objects are `==` if they are of the same type and
their fields are<br>
pairwise equal, where equality is defined by `==` for primitives
(except `float`<br>
and `double`, which are compared with `Float::equals` and
`Double::equals` to<br>
avoid anomalies), `==` for references to identity objects, and
recursively with<br>
`==` for references to value objects. In no case is a value
object ever `==` to<br>
an identity object.<br>
<br>
When comparing two object _references_ with `==`, they are equal
if they are<br>
both null, or if they are both references to the same identity
object, or they<br>
are both references to value objects that are `==`. (When
comparing a value<br>
type with a reference type, we treat this as if we convert the
value to a<br>
reference, and proceed as per comparing references.) This means
that the<br>
following will succeed: <br>
<br>
```<br>
Point.val p = new Point(3, 4);<br>
Point pr = p;<br>
assert p == pr;<br>
```<br>
<br>
The base implementation of `Object::equals` delegates to `==`,
which is a<br>
suitable default for both reference and value classes. <br>
<br>
### Serialization<br>
<br>
If a value class implements `Serializable`, this is also really
a statement<br>
about the reference type. Just as with other aspects described
here,<br>
serialization of value companions can be defined by converting
to the<br>
corresponding reference type and serializing that, and reversing
the process at<br>
deserialization time.<br>
<br>
Serialization currently uses object identity to preserve the
topology of an<br>
object graph. This generalizes cleanly to objects without
identity, because<br>
`==` on value objects treats two identical copies of a value
object as equal. <br>
So any observations we make about graph topology prior to
serialization with<br>
`==` are consistent with those after deserialization.<br>
<br>
## Refining the value companion<br>
<br>
Value classes have several options for refining the behavior of
the value<br>
companion type and how they are exposed to clients.<br>
<br>
### Classes with no good default value<br>
<br>
For a value class `C`, the default value of `C.ref` is the same
as any other<br>
reference type: `null`. For the value companion type `C.val`,
the default value<br>
is the one where all of its fields are initialized to their
default value (0 for<br>
numbers, false for boolean, null for references.)<br>
<br>
The built-in primitives reflect the design assumption that zero
is a reasonable<br>
default. The choice to use a zero default for uninitialized
variables was one<br>
of the central tradeoffs in the design of the built-in
primitives. It gives us<br>
a usable initial value (most of the time), and requires less
storage footprint<br>
than a representation that supports null (`int` uses all 2^32 of
its bit<br>
patterns, so a nullable `int` would have to either make some 32
bit signed<br>
integers unrepresentable, or use a 33rd bit). This was a
reasonable tradeoff<br>
for the built-in primitives, and is also a reasonable tradeoff
for many other<br>
potential value classes (such as complex numbers, 2D points,
half-floats, etc).<br>
<br>
But for other potential value classes, such as `LocalDate`,
there simply _is_ no<br>
reasonable default. If we choose to represent a date as the
number of days<br>
since some some epoch, there will invariably be bugs that stem
from<br>
uninitialized dates; we've all been mistakenly told by computers
that something<br>
that never happened actually happened on or near 1 January
1970. Even if we<br>
could choose a default other than the zero representation as a
default, an<br>
uninitialized date is still likely to be an error -- there
simply is no good<br>
default date value. <br>
<br>
For this reason, value classes have the choice of
_encapsulating_ their value<br>
companion type. If the class is willing to tolerate an
uninitialized (zero)<br>
value, it can freely share its `.val` companion with the world;
if uninitialized<br>
values are dangerous (such as for `LocalDate`), the value
companion can be<br>
encapsulated to the class or package, and clients can use the
reference<br>
companion. Encapsulation is accomplished using ordinary access
control. By<br>
default, the value companion is `private` to the value class (it
need not be<br>
declared explicitly); a class that wishes to share its value
companion more<br>
broadly can do so by declaring it explicitly:<br>
<br>
```<br>
public value record Complex(double real, double imag) { <br>
public value companion Complex.val;<br>
}<br>
```<br>
<br>
### Atomicity and tearing<br>
<br>
For the primitive types longer than 32 bits (long and double),
it was always<br>
possible that reads and writes from different threads (without
suitable<br>
coordination) were not atomic with respect to each other. This
means that, if<br>
accessed under data race, a long or double field or array
element could be seen<br>
to "tear", where a read sees the low 32 bits of one write and
the high 32 bits<br>
of another. (Declaring the containing field `volatile` is
sufficient to restore<br>
atomicity, as is properly coordinating with locks or other
concurrency control,<br>
or not sharing across threads in the first place.)<br>
<br>
This was a pragmatic tradeoff given the hardware of the time;
the cost of 64-bit<br>
atomicity on 1995 hardware would have been prohibitive, and
problems only arise<br>
when the program already has data races -- and most numeric code
deals entirely<br>
with thread-local data. Just like with the tradeoff of nulls vs
zeros, the<br>
design of the built-in primitives permits tearing as part of a
tradeoff between<br>
performance and correctness, where we chose "as fast as
possible" for<br>
primitives, and more safety for reference types.<br>
<br>
Today, most JVMs give us atomic loads and stores of 64-bit
primitives, because<br>
the hardware already makes them cheap enough. But value classes
bring us back<br>
to 1995; atomic loads and stores of larger-than-64-bit values
are still<br>
expensive on many CPUs, leaving us with a choice of "make
operations on value<br>
types slower" or permitting tearing when accessed under race. <br>
<br>
It would not be wise for the language to select a
one-size-fits-all policy about<br>
tearing; choosing "no tearing" means that types like `Complex`
are slower than<br>
they need to be, even in a single-threaded program; choosing
"tearing" means<br>
that classes like `Range` can be seen to not exhibit invariants
asserted by<br>
their constructor. Class authors can choose, with full
knowledge of their<br>
domain, whether their types can tolerate tearing. The default
is no tearing<br>
(following the principle of "safe by default"); a class can opt
for greater<br>
flattening (at the cost of potential tearing) by declaring the
value companion<br>
as `non-atomic`:<br>
<br>
```<br>
public value record Complex(double real, double imag) { <br>
public non-atomic value companion Complex.val;<br>
}<br>
```<br>
<br>
For classes like `Complex`, all of whose bit patterns are valid,
this is very<br>
much like the choice around `long` in 1995. For other classes
that might have<br>
nontrivial representational invariants -- specifically,
invariants that relate<br>
multiple fields, such as ensuring that a range goes from low to
high -- they<br>
likely want to stick to the default of atomicity. <br>
<br>
## Do we really need two types?<br>
<br>
It is sensible to ask: why do we need companion types at all?
This is analogous<br>
to the need for boxes in 1995: we'd made one set of tradeoffs
for primitives<br>
favoring performance (monomorphic, non-nullable, zero-default,
tolerant of<br>
non-initialization, tolerant of tearing under race, unrelated to
`Object`), and<br>
another for references, favoring flexibility and safety. Most
of the time, we<br>
ignored the primitive wrapper classes, but sometimes we needed
to temporarily<br>
suppress one of these properties, such as when interoperating
with code that<br>
expects an `Object` or the ability to express "no value". The
reasons we needed<br>
boxes in 1995 still apply today: sometimes we need the
affordances of<br>
references, and in those cases, we appeal to the reference
companion. <br>
<br>
Reasons we might want to use the reference companion include: <br>
<br>
- **Interoperation with reference types.** Value classes can
implement<br>
interfaces and extend classes (including `Object` and some
abstract classes),<br>
which means some class and interface types are going to be
polymorphic over<br>
both identity and primitive objects. This polymorphism is
achieved through<br>
object references; a reference to `Object` may be a reference
to an identity<br>
object, or a reference to a value object. <br>
<br>
- **Nullability.** Nullability is an affordance of object
_references_, not<br>
objects themselves. Most of the time, it makes sense that
value types are<br>
non-nullable (as the primitives are today), but there may be
situations where<br>
null is a semantically important value. Using the reference
companion when<br>
nullability is required is semantically clear, and avoids the
need to invent<br>
new sentinel values for "no value."<br>
<br>
This need comes up when migrating existing classes; the
method `Map::get`<br>
uses `null` to signal that the requested key was not present
in the map. But,<br>
if the `V` parameter to `Map` is a value type, `null` is not
a valid value.<br>
We can capture the "`V` or null" requirement by changing the
descriptor of<br>
`Map::get` to:<br>
<br>
```<br>
public V.ref get(K key);<br>
```<br>
<br>
where, whatever type `V` is instantiated as, `Map::get`
returns the reference<br>
companion. (For a type `V` that already is a reference type,
this is just `V`<br>
itself.) This captures the notion that the return type of
`Map::get` will<br>
either be a reference to a `V`, or the `null` reference.
(This is a<br>
compatible change, since both erase to the same thing.)<br>
<br>
- **Self-referential types.** Some types may want to directly
or indirectly<br>
refer to themselves, such as the "next" field in the node
type of a linked<br>
list:<br>
<br>
```<br>
class Node<T> {<br>
T theValue;<br>
Node<T> nextNode;<br>
}<br>
```<br>
<br>
We might want to represent this as a value class, but if the
type of<br>
`nextNode` were `Node.val<T>`, the layout of `Node`
would be<br>
self-referential, since we would be trying to flatten a
`Node` into its own<br>
layout. <br>
<br>
- **Protection from tearing.** For a value class with a
non-atomic value<br>
companion type, we may want to use the reference companion in
cases where we<br>
are concerned about tearing; because loads and stores of
references are<br>
atomic, `P.ref` is immune to the tearing under race that
`P.val` might be<br>
subject to.<br>
<br>
- **Compatibility with existing boxing.** Autoboxing is
convenient, in that it<br>
lets us pass a primitive where a reference is required. But
boxing affects<br>
far more than assignment conversion; it also affects method
overload<br>
selection. The rules are designed to prefer overloads that
require no<br>
conversions to those requiring boxing (or varargs)
conversions. Having both<br>
a value and reference type for every value class means that
these rules can<br>
be cleanly and intuitively extended to cover value classes.<br>
<br>
### Choosing which to use<br>
<br>
How would we choose between declaring an identity class or a
value class, and<br>
the various options on value companions? Here are some quick
rules of thumb for<br>
declaring classes:<br>
<br>
- If you need mutability, subclassing, locking, or aliasing,
choose an identity<br>
class. <br>
- Otherwise, choose a value class. If uninitialized (zero)
values are<br>
unacceptable, leave the value companion encapsulated; if zero
is a reasonable<br>
default value, make the value companion `public`.<br>
- If there are no cross-field invariants and you are willing to
tolerate<br>
possible tearing to enable more flattening, make the value
companion<br>
`non-atomic`.<br>
<br>
## Migrating the legacy primitives<br>
<br>
As part of generalizing primitives, we want to adjust the
built-in primitives to<br>
behave as consistently with value classes as possible. While we
can't change<br>
the fact that `int`'s reference companion is the oddly-named
`Integer`, we can<br>
give them more uniform aliases (`int.ref` is an alias for
`Integer`; `int` is an<br>
alias for `Integer.val`) -- so that we can use a consistent rule
for naming<br>
companions. Similarly, we can extend member access to the
legacy primitives<br>
(`3.getClass()`) and adjust `int[]` to be a subtype of
`Integer[]` (and therefore<br>
of `Object[]`.)<br>
<br>
We will redeclare `Integer` as a value class with a public value
companion:<br>
<br>
```<br>
value class Integer { <br>
public value companion Integer.val;<br>
<br>
// existing methods<br>
}<br>
```<br>
<br>
where the type name `int` is an alias for `Integer.val`. <br>
<br>
## Unifying primitives with classes<br>
<br>
Earlier, we had a chart of the differences between primitive and
reference<br>
types: <br>
<br>
| Primitives |
Objects |<br>
| ------------------------------------------ |
---------------------------------- |<br>
| No identity (pure values) |
Identity |<br>
| `==` compares values | `==` compares
object identity |<br>
| Built-in | Declared in
classes |<br>
| No members (fields, methods, constructors) | Members
(including mutable fields) |<br>
| No supertypes or subtypes | Class and
interface inheritance |<br>
| Accessed directly | Accessed via
object references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value is
null |<br>
| Arrays are monomorphic | Arrays are
covariant |<br>
| May tear under race | Initialization
safety guarantees |<br>
| Have reference companions (boxes) | Don't need
reference companions |<br>
<br>
The addition of value classes addresses many of these directly.
Rather than<br>
saying "classes have identity, primitives do not", we make
identity an optional<br>
characteristic of classes (and derive equality semantics from
that.) Rather<br>
than primitives being built in, we derive all types, including
primitives, from<br>
classes, and endow value companion types with the members and
supertypes<br>
declared with the value class. Rather than having primitive
arrays be<br>
monomorphic, we make all arrays covariant under the `extends`
relation. <br>
<br>
The remaining differences now become differences between
reference types and<br>
value types:<br>
<br>
| Value types | Reference
types |<br>
| --------------------------------------------- |
-------------------------------- |<br>
| Accessed directly | Accessed via
object references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value
is null |<br>
| May tear under race, if declared `non-atomic` | Initialization
safety guarantees |<br>
<br>
The current dichotomy between primitives and references morphs
to one between<br>
value objects and references, where the legacy primitives become
(slightly<br>
special) value objects, and, finally, "everything is an object".<br>
<br>
## Summary<br>
<br>
Valhalla unifies, to the extent possible, primitives and
objects. The<br>
following table summarizes the transition from the current world
to Valhalla.<br>
<br>
| Current World |
Valhalla |<br>
| ------------------------------------------- |
--------------------------------------------------------- |<br>
| All objects have identity | Some objects
have identity |<br>
| Fixed, built-in set of primitives | Open-ended set
of primitives, declared via classes |<br>
| Primitives don't have methods or supertypes | Primitives are
classes, with methods and supertypes |<br>
| Primitives have ad-hoc boxes | Primitives have
regularized reference companions |<br>
| Boxes have accidental identity | Reference
companions have no identity |<br>
| Boxing and unboxing conversions | Primitive
reference and value conversions, but same rules |<br>
| Primitive arrays are monomorphic | All arrays are
covariant |<br>
<br>
<br>
[valuebased]:
<a class="moz-txt-link-freetext" href="https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html">https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html</a><br>
[growing]: <a class="moz-txt-link-freetext" href="https://dl.acm.org/doi/abs/10.1145/1176617.1176621">https://dl.acm.org/doi/abs/10.1145/1176617.1176621</a><br>
[jep390]: <a class="moz-txt-link-freetext" href="https://openjdk.java.net/jeps/390">https://openjdk.java.net/jeps/390</a><br>
<br>
<br>
<br>
</font></font>
</body>
</html>