Updated SoV, take 3

Brian Goetz brian.goetz at oracle.com
Tue Jul 26 18:18:03 UTC 2022


Yet another attempt at updating SoV to reflect the current thinking.  
Please review.


# State of Valhalla
## Part 2: The Language Model {.subtitle}

#### Brian Goetz {.author}
#### July 2022 {.date}

 > _This is the second of three documents describing the current State of
   Valhalla.  The first is [The Road to Valhalla](01-background); the
   third is [The JVM Model](03-vm-model)._

This document describes the directions for the Java _language_ charted by
Project Valhalla.  (In this document, we use "currently" to describe the
language as it stands today, without value classes.)

Valhalla started with the goal of providing user-programmable classes 
which can
be flat and dense in memory.  Numerics are one of the motivating use cases;
adding new primitive types directly to the language has a very high 
barrier.  As
we learned from [Growing a Language][growing] there are infinitely many 
numeric
types we might want to add to Java, but the proper way to do that is via
libraries, not as a language feature.

## Primitive and objects today

Java currently has eight built-in primitive types.  Primitives represent 
pure
_values_; any `int` value of "3" is equivalent to, and indistinguishable 
from,
any other `int` value of "3".  Because primitives are "just their bits" 
with no
ancillarly state such as object identity, they are _freely copyable_; 
whether
there is one copy of the `int` value "3", or millions, doesn't matter to the
execution of the program.  With the exception of the unusual treatment 
of exotic
floating point values such as `NaN`, the `==` operator on primitives 
performs a
_substitutibility test_ -- it asks "are these two values the same value".

Java also has _objects_, and each object has a unique _object 
identity_.  This
means that each object must live in exactly one place (at any given 
time), and
this has consequences for how the JVM lays out objects in memory.  
Objects in
Java are not manipulated or accessed directly, but instead through _object
references_.  Object references are also a kind of value -- they encode the
identity of the object to which they refer, and the `==` operator on object
references also performs a substitutibility test, asking "do these two
references refer to the same object."  Accordingly, object _references_ 
(like
other values) can be freely copied, but the objects they refer to cannot.

This dichotomy -- that the universe of values consists of primitives and 
object
references -- has long been at the core of Java's design.  JVMS 2.2 
(Data Types)
opens with:

 > There are two kinds of values that can be stored in variables, passed as
 > arguments, returned by methods, and operated upon: primitive values and
 > reference values.

Primitives and objects currently differ in almost every conceivable way:

| Primitives                                 | 
Objects                            |
| ------------------------------------------ | 
---------------------------------- |
| No identity (pure values)                  | 
Identity                           |
| `==` compares values                       | `==` compares object 
identity      |
| Built-in                                   | Declared in 
classes                |
| No members (fields, methods, constructors) | Members (including 
mutable fields) |
| No supertypes or subtypes                  | Class and interface 
inheritance    |
| Accessed directly                          | Accessed via object 
references     |
| Not nullable                               | 
Nullable                           |
| Default value is zero                      | Default value is 
null              |
| Arrays are monomorphic                     | Arrays are 
covariant               |
| May tear under race                        | Initialization safety 
guarantees   |
| Have reference companions (boxes)          | Don't need reference 
companions    |

Primitives embody a number tradeoffs aimed at maximizing the performance and
usability of the primitive types.  Reference types default to `null`, 
meaning
"referring to no object", and must be initialized before use; primitives 
default
to a usable zero value (which for most primitives is the additive 
identity) and
therefore may be used without initialization.  (If primitives were 
nullable like
references, not only would this be less convenient in many situations, 
but they
would likely consume additional memory footprint to accomodate the 
possibility
of nullity, as most primitives already use all their bit patterns.)  
Similarly,
reference types provide initialization safety guarantees for final 
fields even
under a certain category of data races (this is where we get the "immutable
objects are always thread-safe" rule from); primitives allow tearing 
under race
for larger-than-32-bit values.  We could characterize the design principles
behind these tradeoffs are "make objects safer, make primitives faster."

The following figure illustrates the current universe of Java's types.  The
upper left quadrant is the built-in primitives; the rest of the space is
reference types.  In the upper-right, we have the abstract reference 
types --
abstract classes, interfaces, and `Object` (which, though concrete, acts 
more
like an interface than a concrete class).  The built-in primitives have 
wrappers
or boxes, which are reference types.

<figure>
   <a href="field-type-zoo.pdf" title="Click for PDF">
     <img src="field-type-zoo-old.png" alt="Current universe of Java 
field types"/>
   </a>
</figure>

Valhalla aims to unify primitives and objects such that both are 
declared with
classes, but maintains the special runtime characteristics -- flatness and
density -- that primitives currently enjoy.

### Primitives and boxes today

The built-in primitives are best understood as _pairs_ of types: the 
primitive
type (`int`) and its reference companion type (`Integer`), with built-in
conversions between the two.  The two types have different 
characteristics that
makes each more or less appropriate for a given situations. Primitives are
optimized for efficient storage and access: they are monomorphic, not 
nullable,
tolerate uninitialized (zero) values, and larger primitive types (`long`,
`double`) may tear under racy access.  The box types add back the 
affordances of
references -- nullity, polymorphism, interoperation with generics, and
initialization safety -- but at a cost.

Valhalla generalizes this primitive-box relationship, in a way that is more
regular and extensible and reduces the "boxing tax".

## Eliminating unwanted object identity

Many impediments to optimization stem from _unwanted object identity_. 
For many
classes, not only is identity not directly useful, it can be a source of 
bugs.
For example, due to caching, `Integer` can be accidentally compared 
correctly
with `==` just often enough that people keep doing it. Similarly, 
[value-based
classes][valuebased] such as `Optional` have no need for identity, but 
pay the
costs of having identity anyway.

Valhalla allows classes to explicitly disavow identity by declaring them as
_value classes_.  The instances of a value class are called _value 
objects_.

```
value class Point implements Serializable {
     int x;
     int y;

     Point(int x, int y) {
         this.x = x;
         this.y = y;
     }

     Point scale(int s) {
         return new Point(s*x, s*y);
     }
}
```

This says that an `Point` is a class whose instances have no identity.  As a
consequence, it must give up the things that depend on identity; the 
class and
its fields are implicitly final.  Additionally, operations that depended on
identity must either be adjusted (`==` on value objects compares state, not
identity) or disallowed (it is illegal to lock on a value object.)

Value classes can still have most of the affordances of classes -- fields,
methods, constructors, type parameters, superclasses (with some 
restrictions),
nested classes, class literals, interfaces, etc.  The classes they can 
extend
are restricted: `Object` or abstract classes with no instance fields, empty
no-arg constructor bodies, no other constructors, no instance 
initializers, no
synchronized methods, and whose superclasses all meet this same set of
conditions.  (`Number` is an example of such an abstract class.)

Because `Point` has value semantics, `==` compares by state rather than
identity.  This means that value objects, like primitives, are _freely
copyable_; we can explode them into their fields and re-aggregate them into
another value object, and we cannot tell the difference.

So far we've addressed the first two lines in our table of differences; 
rather
than all objects having identity, classes can opt into, or out of, object
identity for their instances.  By allowing classes to exclude unwanted 
identity,
we free the runtime to make better layout and compilation decisions.

### Example: immutable cursors

Collections today use `Iterator` to facilitate traversal through the 
collection,
which store iteration state in mutable fields.  While heroic 
optimizations such
as _escape analysis_ can sometimes eliminate the cost associated with 
iterators,
such optimizations are fragile and hard to rely on.  Value objects offer an
iteration approach that is more reliably optimized: immutable cursors. 
(Without
value objects, immutable cursors would be prohibitively expensive for
iteration.)

```
value class ArrayCursor<T> {
     T[] array;
     int offset;

     public ArrayCursor(T[] array, int offset) {
         this.array = array;
         this.offset = offset;
     }

     public ArrayCursor(T[] array) {
         this(array, 0);
     }

     public boolean hasNext() {
         return offset < array.length;
     }

     public T next() {
         return array[offset];
     }

     public ArrayCursor<T> advance() {
         return new ArrayCursor(array, offset+1);
     }
}
```

In looking at this code, we might mistakenly assume it will be 
inefficient, as
each loop iteration appears to allocate a new cursor:

```
for (ArrayCursor<T> c = new ArrayCursor<>(array);
      c.hasNext();
      c = c.advance()) {
     // use c.next();
}
```

In reality, we should expect that _no_ cursors are actually allocated 
here.  An
`ArrayCursor` is just its two fields, and the runtime is free to 
scalarize the
object into its fields and hoist them into registers.  The calling 
convention
for `advance` is optimized so that both receiver and return value are
scalarized.  Even without inlining `advance`, no allocation will take place,
just some shuffling of the values in registers.  And if `advance` is 
inlined,
the client code will compile down to having a single register increment and
compare in the loop header.

### Migration

The JDK (as well as other libraries) has many [value-based 
classes][valuebased]
such as `Optional` and `LocalDateTime`.  Value-based classes adhere to the
semantic restrictions of value classes, but are still identity classes 
-- even
though they don't want to be.  Value-based classes can be migrated to 
true value
classes simply by redeclaring them as value classes, which is both 
source- and
binary-compatible.

We plan to migrate many value-based classes in the JDK to value classes.
Additionally, the primitive wrappers can be migrated to value classes as 
well,
making the conversion between `int` and `Integer` cheaper; see 
"Migrating the
legacy primitives" below.  (In some cases, this may be _behaviorally_
incompatible for code that synchronizes on the primitive wrappers.  [JEP
390][jep390] has supported both compile-time and runtime warnings for
synchronizing on primitive wrappers since Java 16.)

<figure>
   <a href="field-type-zoo.pdf" title="Click for PDF">
     <img src="field-type-zoo-mid.png" alt="Java field types adding 
value classes"/>
   </a>
</figure>

### Identity-sensitive operations

Certain operations are currently defined in terms of object identity.  
As we've
already seen, some of these, like equality, can be sensibly extended to 
cover
all instances.  Others, like synchronization, will become partial.
Identity-sensitive operations include:

   - **Equality.**  We extend `==` on references to include references 
to value
     objects.  Where it currently has a meaning, the new definition 
coincides
     with that meaning.

   - **System::identityHashCode.**  The main use of `identityHashCode` 
is in the
     implementation of data structures such as `IdentityHashMap`.  We 
can extend
     `identityHashCode` in the same way we extend equality -- deriving a 
hash on
     value objects from the hash of all the fields.

   - **Synchronization.**  This becomes a partial operation.  If we can
     statically detect that a synchronization will fail at runtime 
(including
     declaring a `synchronized` method in a value class), we can issue a
     compilation error; if not, attempts to lock on a value object 
results in
     `IllegalMonitorStateException`.  This is justifiable because it is
     intrinsically imprudent to lock on an object for which you do not 
have a
     clear understanding of its locking protocol; locking on an arbitrary
     `Object` or interface instance is doing exactly that.

   - **Weak, soft, and phantom references.**  Capturing an exotic 
reference to a
     value object becomes a partial operation, as these are 
intrinsically tied to
     reachability (and hence to identity).  However, we will likely make
     enhancements to `WeakHashMap` to support mixed identity and value 
keys.

### Value classes and records

While records have a lot in common with value classes -- they are final and
their fields are final -- they are still identity classes. Records embody a
tradeoff: give up on decoupling the API from the representation, and in 
return
get various syntactic and semantic benefits.  Value classes embody another
tradeoff: give up identity, and get various semantic and performance 
benefits.
If we are willing to give up both, we can get both sets of benefits, by
declaring a _value record_.

```
value record NameAndScore(String name, int score) { }
```

Value records combine the data-carrier idiom of records with the improved
scalarization and flattening benefits of value classes.

In theory, it would be possible to apply `value` to certain enums as 
well, but
this is not currently possible because the `java.lang.Enum` base class that
enums extend do not meet the requirements for superclasses of value 
classes (it
has fields and non-empty constructors).

### Value and reference companion types

Value classes are generalizations of primitives.  Since primitives have a
reference companion type, value classes actually give rise to _pairs_ of 
types:
a value type and a reference type.  We've seen the reference type 
already; for
the value class `ArrayCursor`, the reference type is called 
`ArrayCursor`, just
as with identity classes.  The full name for the reference type is
`ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for that.  (This
aliasing is what allows value-based classes to be compatibly migrated to 
value
classes.) The value type is called `ArrayCursor.val`, and the two types 
have the
same conversions between them as primitives do today with their boxes.  The
default value of the value type is the one for which all fields take on 
their
default value; the default value of the reference type is, like all 
reference
types, null.  We will refer to the value type of a value class as the _value
companion type_.

Just as with today's primitives and their boxes, the reference and value
companion types of a value class differ in their support for nullity,
polymorphism, treatment of uninitialized variables, and safety 
guarantees under
race.  Value companion types, like primitive types, are monomorphic,
non-nullable, tolerate uninitialized (zero) values, and (under some
circumstances) may tear under racy access.  Reference types are polymorphic,
nullable, and offer the initialization safety guarantees for final 
fields that
we have come to expect from identity objects.

Unlike with today's primitives, the "boxing" and "unboxing" conversions 
between
the reference and value companion types are not nearly as heavy or wasteful,
because of the lack of identity.  A variable of type `Point.val` holds a 
"bare"
value object; a variable of type `Point.ref` holds a _reference to_ a value
object.  For many use cases, the reference type will offer good enough
performance; in some cases, it may be desire to additionally give up the
affordances of reference-ness to make further flatness and footprint 
gains.  See
[Performance Model](05-performance-model) for more details on the specific
tradeoffs.

In our diagram, these new types show up as another entity that straddles the
line between primitives and identity-free references, alongside the legacy
primitives:

** UPDATE DIAGRAM **

<figure>
   <a href="field-type-zoo.pdf" title="Click for PDF">
     <img src="field-type-zoo-new.png" alt="Java field types with 
extended primitives"/>
   </a>
</figure>

### Member access

Both the reference and value companion types have the same members. Unlike
today's primitives, value companion types can be used as receivers to access
fields and invoke methods (subject to the usual accessibility constraints):

```
Point.val p = new Point(1, 2);
assert p.x == 1;

p = p.scale(2);
assert p.x == 2;
```

### Polymorphism

An identity class `C` that extends `D` sets up a subtyping (is-a) 
relationship
between `C` and `D`.  For value classes, the same thing happens between its
  _reference type_ and the declared supertypes.  (Reference types are
  polymorphic; value types are not.)  This means that if we declare:

```
value class UnsignedShort extends Number
                           implements Comparable<UnsignedShort> {
    ...
}
```

then `UnsignedShort` is a subtype of `Number` and 
`Comparable<UnsignedShort>`,
and we can ask questions about subtyping using `instanceof` or pattern 
matching.
What happens if we ask such a question of the value companion type?

```
UnsignedShort.val us = ...
if (us instanceof Number) { ... }
```

Since subtyping is defined only on reference types, the `instanceof` 
operator
(and corresponding type patterns) will behave as if both sides were 
lifted to
the appropriate reference type (unboxed), and then we can appeal to 
subtyping.
(This may trigger fears of expensive boxing conversions, but in reality no
actual allocation will happen.)

We introduce a new relationship between types based on `extends` / 
`implements`
clauses, which we'll call "extends": we define `A extends B` as meaning 
`A <: B`
when A is a reference type, and `A.ref <: B` when A is a value companion 
type.
The `instanceof` relation, reflection, and pattern matching are updated 
to use
"extends".

### Array covariance

Arrays of reference types are _covariant_; this means that if `A <: B`, then
`A[] <: B[]`.  This allows `Object[]` to be the "top array type" -- but 
only for
arrays of references.  Arrays of primitives are currently left out of this
story.   We unify the treatment of arrays by defining array covariance 
over the
new "extends" relationship; if A _extends_ B, then `A[] <: B[]`.  This means
that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when we 
migrate the
primitive types to be value classes, then `Object[]` is finally the top 
type for
all arrays.  (When the built-in primitives are migrated to value 
classes, this
means `int[] <: Integer[] <: Object[]` too.)

### Equality

For values, as with primitives, `==` compares by state rather than by 
identity.
Two value objects are `==` if they are of the same type and their fields are
pairwise equal, where equality is defined by `==` for primitives (except 
`float`
and `double`, which are compared with `Float::equals` and 
`Double::equals` to
avoid anomalies), `==` for references to identity objects, and 
recursively with
`==` for references to value objects.  In no case is a value object ever 
`==` to
an identity object.

When comparing two object _references_ with `==`, they are equal if they are
both null, or if they are both references to the same identity object, 
or they
are both references to value objects that are `==`.  (When comparing a value
type with a reference type, we treat this as if we convert the value to a
reference, and proceed as per comparing references.)  This means that the
following will succeed:

```
Point.val p = new Point(3, 4);
Point pr = p;
assert p == pr;
```

The base implementation of `Object::equals` delegates to `==`, which is a
suitable default for both reference and value classes.

### Serialization

If a value class implements `Serializable`, this is also really a statement
about the reference type.  Just as with other aspects described here,
serialization of value companions can be defined by converting to the
corresponding reference type and serializing that, and reversing the 
process at
deserialization time.

Serialization currently uses object identity to preserve the topology of an
object graph.  This generalizes cleanly to objects without identity, because
`==` on value objects treats two identical copies of a value object as 
equal.
So any observations we make about graph topology prior to serialization with
`==` are consistent with those after deserialization.

## Refining the value companion

Value classes have several options for refining the behavior of the value
companion type and how they are exposed to clients.

### Classes with no good default value

For a value class `C`, the default value of `C.ref` is the same as any other
reference type: `null`.  For the value companion type `C.val`, the 
default value
is the one where all of its fields are initialized to their default 
value (0 for
numbers, false for boolean, null for references.)

The built-in primitives reflect the design assumption that zero is a 
reasonable
default.  The choice to use a zero default for uninitialized variables 
was one
of the central tradeoffs in the design of the built-in primitives.  It 
gives us
a usable initial value (most of the time), and requires less storage 
footprint
than a representation that supports null (`int` uses all 2^32 of its bit
patterns, so a nullable `int` would have to either make some 32 bit signed
integers unrepresentable, or use a 33rd bit).  This was a reasonable 
tradeoff
for the built-in primitives, and is also a reasonable tradeoff for many 
other
potential value classes (such as complex numbers, 2D points, 
half-floats, etc).

But for other potential value classes, such as `LocalDate`, there simply 
_is_ no
reasonable default.  If we choose to represent a date as the number of days
since some some epoch, there will invariably be bugs that stem from
uninitialized dates; we've all been mistakenly told by computers that 
something
that never happened actually happened on or near 1 January 1970.  Even if we
could choose a default other than the zero representation as a default, an
uninitialized date is still likely to be an error -- there simply is no good
default date value.

For this reason, value classes have the choice of _encapsulating_ their 
value
companion type.  If the class is willing to tolerate an uninitialized (zero)
value, it can freely share its `.val` companion with the world; if 
uninitialized
values are dangerous (such as for `LocalDate`), the value companion can be
encapsulated to the class or package, and clients can use the reference
companion.  Encapsulation is accomplished using ordinary access control.  By
default, the value companion is `private` to the value class (it need not be
declared explicitly); a class that wishes to share its value companion more
broadly can do so by declaring it explicitly:

```
public value record Complex(double real, double imag) {
     public value companion Complex.val;
}
```

### Atomicity and tearing

For the primitive types longer than 32 bits (long and double), it was always
possible that reads and writes from different threads (without suitable
coordination) were not atomic with respect to each other.  This means 
that, if
accessed under data race, a long or double field or array element could 
be seen
to "tear", where a read sees the low 32 bits of one write and the high 
32 bits
of another.  (Declaring the containing field `volatile` is sufficient to 
restore
atomicity, as is properly coordinating with locks or other concurrency 
control,
or not sharing across threads in the first place.)

This was a pragmatic tradeoff given the hardware of the time; the cost 
of 64-bit
atomicity on 1995 hardware would have been prohibitive, and problems 
only arise
when the program already has data races -- and most numeric code deals 
entirely
with thread-local data.  Just like with the tradeoff of nulls vs zeros, the
design of the built-in primitives permits tearing as part of a tradeoff 
between
performance and correctness, where we chose "as fast as possible" for
primitives, and more safety for reference types.

Today, most JVMs give us atomic loads and stores of 64-bit primitives, 
because
the hardware already makes them cheap enough.  But value classes bring 
us back
to 1995; atomic loads and stores of larger-than-64-bit values are still
expensive on many CPUs, leaving us with a choice of "make operations on 
value
types slower" or permitting tearing when accessed under race.

It would not be wise for the language to select a one-size-fits-all 
policy about
tearing; choosing "no tearing" means that types like `Complex` are 
slower than
they need to be, even in a single-threaded program; choosing "tearing" means
that classes like `Range` can be seen to not exhibit invariants asserted by
their constructor.  Class authors can choose, with full knowledge of their
domain, whether their types can tolerate tearing.  The default is no tearing
(following the principle of "safe by default"); a class can opt for greater
flattening (at the cost of potential tearing) by declaring the value 
companion
as `non-atomic`:

```
public value record Complex(double real, double imag) {
     public non-atomic value companion Complex.val;
}
```

For classes like `Complex`, all of whose bit patterns are valid, this is 
very
much like the choice around `long` in 1995.  For other classes that 
might have
nontrivial representational invariants -- specifically, invariants that 
relate
multiple fields, such as ensuring that a range goes from low to high -- they
likely want to stick to the default of atomicity.

## Do we really need two types?

It is sensible to ask: why do we need companion types at all? This is 
analogous
to the need for boxes in 1995: we'd made one set of tradeoffs for primitives
favoring performance (monomorphic, non-nullable, zero-default, tolerant of
non-initialization, tolerant of tearing under race, unrelated to 
`Object`), and
another for references, favoring flexibility and safety.  Most of the 
time, we
ignored the primitive wrapper classes, but sometimes we needed to 
temporarily
suppress one of these properties, such as when interoperating with code that
expects an `Object` or the ability to express "no value".  The reasons 
we needed
boxes in 1995 still apply today: sometimes we need the affordances of
references, and in those cases, we appeal to the reference companion.

Reasons we might want to use the reference companion include:

  - **Interoperation with reference types.**  Value classes can implement
    interfaces and extend classes (including `Object` and some abstract 
classes),
    which means some class and interface types are going to be 
polymorphic over
    both identity and primitive objects.  This polymorphism is achieved 
through
    object references; a reference to `Object` may be a reference to an 
identity
    object, or a reference to a value object.

  - **Nullability.**  Nullability is an affordance of object 
_references_, not
    objects themselves.  Most of the time, it makes sense that value 
types are
    non-nullable (as the primitives are today), but there may be 
situations where
    null is a semantically important value.  Using the reference 
companion when
    nullability is required is semantically clear, and avoids the need 
to invent
    new sentinel values for "no value."

    This need comes up when migrating existing classes; the method 
`Map::get`
    uses `null` to signal that the requested key was not present in the 
map. But,
    if the `V` parameter to `Map` is a value type, `null` is not a valid 
value.
    We can capture the "`V` or null" requirement by changing the 
descriptor of
    `Map::get` to:

    ```
    public V.ref get(K key);
    ```

    where, whatever type `V` is instantiated as, `Map::get` returns the 
reference
    companion. (For a type `V` that already is a reference type, this is 
just `V`
    itself.) This captures the notion that the return type of `Map::get` 
will
    either be a reference to a `V`, or the `null` reference. (This is a
    compatible change, since both erase to the same thing.)

  - **Self-referential types.**  Some types may want to directly or 
indirectly
    refer to themselves, such as the "next" field in the node type of a 
linked
    list:

    ```
    class Node<T> {
        T theValue;
        Node<T> nextNode;
    }
    ```

    We might want to represent this as a value class, but if the type of
    `nextNode` were `Node.val<T>`, the layout of `Node` would be
    self-referential, since we would be trying to flatten a `Node` into 
its own
    layout.

  - **Protection from tearing.**  For a value class with a non-atomic value
    companion type, we may want to use the reference companion in cases 
where we
    are concerned about tearing; because loads and stores of references are
    atomic, `P.ref` is immune to the tearing under race that `P.val` 
might be
    subject to.

  - **Compatibility with existing boxing.**  Autoboxing is convenient, 
in that it
    lets us pass a primitive where a reference is required.  But boxing 
affects
    far more than assignment conversion; it also affects method overload
    selection.  The rules are designed to prefer overloads that require no
    conversions to those requiring boxing (or varargs) conversions.  
Having both
    a value and reference type for every value class means that these 
rules can
    be cleanly and intuitively extended to cover value classes.

### Choosing which to use

How would we choose between declaring an identity class or a value 
class, and
the various options on value companions?  Here are some quick rules of 
thumb for
declaring classes:

  - If you need mutability, subclassing, locking, or aliasing, choose an 
identity
    class.
  - Otherwise, choose a value class.  If uninitialized (zero) values are
    unacceptable, leave the value companion encapsulated; if zero is a 
reasonable
    default value, make the value companion `public`.
  - If there are no cross-field invariants and you are willing to tolerate
    possible tearing to enable more flattening, make the value companion
    `non-atomic`.

## Migrating the legacy primitives

As part of generalizing primitives, we want to adjust the built-in 
primitives to
behave as consistently with value classes as possible.  While we can't 
change
the fact that `int`'s reference companion is the oddly-named `Integer`, 
we can
give them more uniform aliases (`int.ref` is an alias for `Integer`; 
`int` is an
alias for `Integer.val`) -- so that we can use a consistent rule for naming
companions.  Similarly, we can extend member access to the legacy primitives
(`3.getClass()`) and adjust `int[]` to be a subtype of `Integer[]` (and 
therefore
of `Object[]`.)

We will redeclare `Integer` as a value class with a public value companion:

```
value class Integer {
     public value companion Integer.val;

     // existing methods
}
```

where the type name `int` is an alias for `Integer.val`.

## Unifying primitives with classes

Earlier, we had a chart of the differences between primitive and reference
types:

| Primitives                                 | 
Objects                            |
| ------------------------------------------ | 
---------------------------------- |
| No identity (pure values)                  | 
Identity                           |
| `==` compares values                       | `==` compares object 
identity      |
| Built-in                                   | Declared in 
classes                |
| No members (fields, methods, constructors) | Members (including 
mutable fields) |
| No supertypes or subtypes                  | Class and interface 
inheritance    |
| Accessed directly                          | Accessed via object 
references     |
| Not nullable                               | 
Nullable                           |
| Default value is zero                      | Default value is 
null              |
| Arrays are monomorphic                     | Arrays are 
covariant               |
| May tear under race                        | Initialization safety 
guarantees   |
| Have reference companions (boxes)          | Don't need reference 
companions    |

The addition of value classes addresses many of these directly. Rather than
saying "classes have identity, primitives do not", we make identity an 
optional
characteristic of classes (and derive equality semantics from that.)  Rather
than primitives being built in, we derive all types, including 
primitives, from
classes, and endow value companion types with the members and supertypes
declared with the value class.  Rather than having primitive arrays be
monomorphic, we make all arrays covariant under the `extends` relation.

The remaining differences now become differences between reference types and
value types:

| Value types                                   | Reference 
types                  |
| --------------------------------------------- | 
-------------------------------- |
| Accessed directly                             | Accessed via object 
references   |
| Not nullable                                  | 
Nullable                         |
| Default value is zero                         | Default value is 
null            |
| May tear under race, if declared `non-atomic` | Initialization safety 
guarantees |

The current dichotomy between primitives and references morphs to one 
between
value objects and references, where the legacy primitives become (slightly
special) value objects, and, finally, "everything is an object".

## Summary

Valhalla unifies, to the extent possible, primitives and objects.   The
following table summarizes the transition from the current world to 
Valhalla.

| Current World                               | 
Valhalla                                                  |
| ------------------------------------------- | 
--------------------------------------------------------- |
| All objects have identity                   | Some objects have 
identity                                |
| Fixed, built-in set of primitives           | Open-ended set of 
primitives, declared via classes        |
| Primitives don't have methods or supertypes | Primitives are classes, 
with methods and supertypes       |
| Primitives have ad-hoc boxes                | Primitives have 
regularized reference companions          |
| Boxes have accidental identity              | Reference companions 
have no identity                     |
| Boxing and unboxing conversions             | Primitive reference and 
value conversions, but same rules |
| Primitive arrays are monomorphic            | All arrays are 
covariant                                  |


[valuebased]: 
https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html
[growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621
[jep390]: https://openjdk.java.net/jeps/390


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220726/e81ee13c/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list