User model stacking: current status

Brian Goetz brian.goetz at oracle.com
Thu Jun 23 19:01:24 UTC 2022


On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:
> All else being equal, the idea to use "inaccessible value type" over 
> "value type doesn't exist" feels very good and simplifying, with the 
> main problem that the syntax can't help but be gross.

A few weeks in, and this latest stacking is still feeling pretty good:

  - There are no coarse buckets any more; there are just identity 
classes and value classes.
  - Value classes have ref and val companion types with the obvious 
properties.  (Notably, refs are always atomic.)
  - For `value class C`, C as a type is an alias for `C.ref`.
  - The bucket formerly known as B2 becomes "value class, whose .val 
type is private."  This is the default for a value class.
  - The bucket formerly known as B3a is denoted by explicitly making the 
val companion public, with a public modifier on a "member" of the class.
  - The bucket formerly known as B3n is denoted by explicitly making the 
val companion public and non-atomic, again using modifiers.

I went and updated the State of the Values document to use the new 
terminology, test-driving some new syntax.  (Usual rules: syntax 
comments are premature at this time.)  I was very pleased with the 
result, because almost all the changes were small changes in terminology 
(e.g., "value companion type"), and eliminating the clumsy distinction 
between value classes and primitive classes.  Overall the structure 
remains the same, but feels more compact and clean.  MD source is below, 
for review.

Kevin's two questions remain, but I don't think they get in the way of 
refining the model in this way:

  - Have we made the right choices around == ?
  - Are we missing a big opportunity by not spelling Complex.val with a 
bang?



# State of Valhalla
## Part 2: The Language Model {.subtitle}

#### Brian Goetz {.author}
#### June 2022 {.date}

 > _This is the second of three documents describing the current State of
   Valhalla.  The first is [The Road to Valhalla](01-background); the
   third is [The JVM Model](03-vm-model)._

This document describes the directions for the Java _language_ charted by
Project Valhalla.  (In this document, we use "currently" to describe the
language as it stands today, without value classes.)

Valhalla started with the goal of providing user-programmable classes 
which can
be flat and dense in memory.  Numerics are one of the motivating use cases;
adding new primitive types directly to the language has a very high 
barrier.  As
we learned from [Growing a Language][growing] there are infinitely many 
numeric
types we might want to add to Java, but the proper way to do that is via
libraries, not as a language feature.

## Primitive and reference types in Java today

Java currently has eight built-in primitive types.  Primitives represent 
pure
_values_; any `int` value of "3" is equivalent to, and indistinguishable 
from,
any other `int` value of "3".  Primitives are monolithic (their bits 
cannot be
addressed individually) and have no canonical location, and so are _freely
copyable_. With the exception of the unusual treatment of exotic 
floating point
values such as `NaN`, the `==` operator performs a _substitutibility 
test_ -- it
asks "are these two values the same value".

Java also has _objects_, and each object has a unique _object identity_. 
Because
of identity, objects are not freely copyable; each object lives in 
exactly one
place at any given time, and to access its state we have to go to that 
place.
But we mostly don't notice this because objects are not manipulated or 
accessed
directly, but instead through _object references_.  Object references 
are also a
kind of value -- they encode the identity of the object to which they 
refer, and
the `==` operator on object references asks "do these two references 
refer to
the same object."  Accordingly, object _references_ (like other values) 
can be
freely copied, but the objects they refer to cannot.

Primitives and objects differ in almost every conceivable way:

| Primitives                                 | 
Objects                            |
| ------------------------------------------ | 
---------------------------------- |
| No identity (pure values)                  | 
Identity                           |
| `==` compares values                       | `==` compares object 
identity      |
| Built-in                                   | Declared in 
classes                |
| No members (fields, methods, constructors) | Members (including 
mutable fields) |
| No supertypes or subtypes                  | Class and interface 
inheritance    |
| Accessed directly                          | Accessed via object 
references     |
| Not nullable                               | 
Nullable                           |
| Default value is zero                      | Default value is 
null              |
| Arrays are monomorphic                     | Arrays are 
covariant               |
| May tear under race                        | Initialization safety 
guarantees   |
| Have reference companions (boxes)          | Don't need reference 
companions    |

The design of primitives represents various tradeoffs aimed at maximizing
performance and usability of the primtive types.  Reference types default to
`null`, meaning "referring to no object"; primitives default to a usable 
zero
value (which for most primitives is the additive identity). Reference types
provide initialization safety guarantees against a certain category of data
races; primitives allow tearing under race for larger-than-32-bit values.
We could characterize the design principles behind these tradeoffs are "make
objects safer, make primitives faster."

The following figure illustrates the current universe of Java's types.  The
upper left quadrant is the built-in primitives; the rest of the space is
reference types.  In the upper-right, we have the abstract reference 
types --
abstract classes, interfaces, and `Object` (which, though concrete, acts 
more
like an interface than a concrete class).  The built-in primitives have 
wrappers
or boxes, which are reference types.

<figure>
   <a href="field-type-zoo.pdf" title="Click for PDF">
     <img src="field-type-zoo-old.png" alt="Current universe of Java 
field types"/>
   </a>
</figure>

Valhalla aims to unify primitives and objects in that they can both be
declared with classes, but maintains the special runtime characteristics
primitives have.  But while everyone likes the flatness and density that
user-definable value types promise, in some cases we want them to be 
more like
classical objects (nullable, non-tearable), and in other cases we want 
them to
be more like classical primitives (trading some safety for performance).

## Value classes: separating references from identity

Many of the impediments to optimization that Valhalla seeks to remove center
around _unwanted object identity_.  The primitive wrapper classes have 
identity,
but it is a purely accidental one.  Not only is it not directly useful, 
it can
be a source of bugs.  For example, due to caching, `Integer` can be 
accidentally
compared correctly with `==` just often enough that people keep doing it.
Similarly, [value-based classes][valuebased] such as `Optional` have no 
need for
identity, but pay the costs of having identity anyway.

Our first step is allowing class declarations to explicitly disavow 
identity, by
declaring themselves as _value classes_.  The instances of a value class are
called _value objects_.

```
value class ArrayCursor<T> {
     T[] array;
     int offset;

     public ArrayCursor(T[] array, int offset) {
         this.array = array;
         this.offset = offset;
     }

     public boolean hasNext() {
         return offset < array.length;
     }

     public T next() {
         return array[offset];
     }

     public ArrayCursor<T> advance() {
         return new ArrayCursor(array, offset+1);
     }
}
```

This says that an `ArrayCursor` is a class whose instances have no 
identity --
that instead they have _value semantics_.  As a consequence, it must 
give up the
things that depend on identity; the class and its fields are implicitly 
final.

But, value classes are still classes, and can have most of the things 
classes
can have -- fields, methods, constructors, type parameters, superclasses 
(with
some restrictions), nested classes, class literals, interfaces, etc.  The
classes they can extend are restricted: `Object` or abstract classes with no
instance fields, empty no-arg constructor bodies, no other constructors, 
no instance
initializers, no synchronized methods, and whose superclasses all meet 
this same
set of conditions.  (`Number` meets these conditions.)

Classes in Java give rise to types; the class `ArrayCursor` gives rise 
to a type
`ArrayCursor` (actually a parametric family of instantiations 
`ArrayCursor<T>`.)
`ArrayCursor` is still a reference type, just one whose references refer to
value objects rather than identity objects. For the types in the upper-right
quadrant of the diagram (interfaces, abstract classes, and `Object`), 
references
to these types might refer to either an identity object or a value object.
(Historically, JVMs were effectively forced to represent object 
references with
pointers; for references to value objects, JVMs now have more flexibility.)

Because `ArrayCursor` is a reference type, it is nullable (because 
references
are nullable), its default value is null, and loads and stores of 
references are
atomic with respect to each other even in the presence of data races, 
providing
the initialization safety we are used to with classical objects.

Because instances of `ArrayCursor` have value semantics, `==` compares 
by state
rather than identity.  This means that value objects, like primitives, are
_freely copyable_; we can explode them into their fields and 
re-aggregate them
into another value object, and we cannot tell the difference. (Because they
have no identity, some identity-sensitive operations, such as 
synchronization,
are disallowed.)

So far we've addressed the first two lines of the table of differences 
above;
rather than identity being a property of all object instances, classes can
decide whether their instances have identity or not.  By allowing 
classes that
don't need identity to exclude it, we free the runtime to make better 
layout and
compilation decisions -- and avoid a whole category of bugs.

In looking at the code for `ArrayCursor`, we might mistakenly assume it 
will be
inefficient, as each loop iteration appears to allocate a new cursor:

```
for (ArrayCursor<T> c = Arrays.cursor(array);
      c.hasNext();
      c = c.advance()) {
     // use c.next();
}
```

One should generally expect here that _no_ cursors are actually allocated.
Because an `ArrayCursor` is just its two fields, these fields will 
routinely get
scalarized and hoisted into registers, and the constructor call in `advance`
will typically compile down to incrementing one of these registers.

### Migration

The JDK (as well as other libraries) has many [value-based 
classes][valuebased]
such as `Optional` and `LocalDateTime`.  Value-based classes adhere to the
semantic restrictions of value classes, but are still identity classes 
-- even
though they don't want to be.  Value-based classes can be migrated to 
true value
classes simply by redeclaring them as value classes, which is both 
source- and
binary-compatible.

We plan to migrate many value-based classes in the JDK to value classes.
Additionally, the primitive wrappers can be migrated to value classes as 
well,
making the conversion between `int` and `Integer` cheaper; see the section
"Legacy Primitives" below.  (In some cases, this may be _behaviorally_
incompatible for code that synchronizes on the primitive wrappers.  [JEP
390][jep390] has supported both compile-time and runtime warnings for
synchronizing on primitive wrappers since Java 16.)

<figure>
   <a href="field-type-zoo.pdf" title="Click for PDF">
     <img src="field-type-zoo-mid.png" alt="Java field types adding 
value classes"/>
   </a>
</figure>

### Equality

Earlier we said that `==` compares value objects by state rather than by
identity.  More precisely, two value objects are `==` if they are of the 
same
type, and each of their fields are pairwise equal, where equality is 
given by
`==` for primitives (except `float` and `double`, which are compared with
`Float::equals` and `Double::equals` to avoid anomalies), `==` for 
references to
identity objects, and recursively with `==` for references to value 
objects.  In
no case is a value object ever `==` to a reference to an identity object.

### Value records

While records have a lot in common with value classes -- they are final and
their fields are final -- they are still identity classes. Records embody a
tradeoff: give up on decoupling the API from the representation, and in 
return
get various syntactic and semantic benefits.  Value classes embody another
tradeoff: give up identity, and get various semantic and performance 
benefits.
If we are willing to give up both, we can get both sets of benefits.

```
value record NameAndScore(String name, int score) { }
```

Value records combine the data-carrier idiom of records with the improved
scalarization and flattening benefits of value classes.

In theory, it would be possible to apply `value` to certain enums as 
well, but
this is not currently possible because the `java.lang.Enum` base class that
enums extend do not meet the requirements for superclasses of value 
classes (it
has fields and non-empty constructors).

## Unboxing values for flatness and density

Value classes shed object identity, gaining a host of performance and
predictability benefits in the process.  They are an ideal replacement 
for many
of today's value-based classes, fully preserving their semantics (except 
for the
accidental identity these classes never wanted).  But identity-free 
reference
types are only one point a spectrum of tradeoffs between abstraction and
performance, and other desired use cases -- such as numerics -- may want a
different set of tradeoffs.

Reference types are nullable, and therefore must account for null somehow in
their representation, which may involve additional footprint. Similarly, 
they
offer the initialization safety guarantees for final fields that we come to
expect from identity objects, which may entail limits on flatness.  For 
certain
use cases, it may be desire to additionally give up something else to make
further flatness and footprint gains -- and that something else is
reference-ness.

The built-in primitives are best understood as _pairs_ of types: a primitive
type (e.g., `int`) and its reference companion or box (`Integer`), with
conversions between the two (boxing and unboxing.)  We have both types 
because
the two have different characteristics.  Primitives are optimized for 
efficient
storage and access: they are not nullable, they tolerate uninitialized 
(zero)
values, and larger primitive types (`long`, `double`) may tear under racy
access.  References err on the side of safety and flexibility; they support
nullity, polymorphism, and offer initialization safety (freedom from 
tearing),
but by comparison to primitives, they pay a footprint and indirection cost.

For these reasons, value classes give rise to pairs of types as well: a
reference type and a _value companion type_.  We've seen the reference 
type so
far; for a value class `Point`, the reference type is called `Point`.  
(The full
name for the reference type is `Point.ref`; `Point` is an alias for 
that.)  The
value companion type is called `Point.val`, and the two types have the same
conversions between them as primitives do today with their boxes.  (If 
we are
talking explicitly about the value companion type of a value class, we may
sometimes describe the corresponding reference type as its _reference
companion_.)

```
value class Point implements Serializable {
     int x;
     int y;

     Point(int x, int y) {
         this.x = x;
         this.y = y;
     }

     Point scale(int s) {
         return new Point(s*x, s*y);
     }
}
```

The default value of the value companion type is the one for which all 
fields
take on their default value; the default value of the reference type is, 
like
all reference types, null.

In our diagram, these new types show up as another entity that straddles the
line between primitives and identity-free references, alongside the legacy
primitives:

** UPDATE DIAGRAM **

<figure>
   <a href="field-type-zoo.pdf" title="Click for PDF">
     <img src="field-type-zoo-new.png" alt="Java field types with 
extended primitives"/>
   </a>
</figure>

### Member access

Both the reference and value companion types are seen to have the same 
instance
members.  Unlike today's primitives, value companion types can be used as
receivers to access fields and invoke methods, subject to accessibility
constraints:

```
Point.val p = new Point(1, 2);
assert p.x == 1;

p = p.scale(2);
assert p.x == 2;
```

### Polymorphism

When we declare a class today, we set up a subtyping (is-a) relationship 
between
the declared class and its supertypes.  When we declare a value class, 
we set up
a subtyping relationship between the _reference type_ and the declared
supertypes. This means that if we declare:

```
value class UnsignedShort extends Number
                           implements Comparable<UnsignedShort> {
    ...
}
```

then `UnsignedShort` is a subtype of `Number` and 
`Comparable<UnsignedShort>`,
and we can ask questions about subtyping using `instanceof` or pattern 
matching.
What happens if we ask such a question of the value companion type?

```
UnsignedShort.val us = ...
if (us instanceof Number) { ... }
```

Since subtyping is defined only on reference types, the `instanceof` 
operator
(and corresponding type patterns) will behave as if both sides were 
lifted to
the approrpriate reference type, and we can answer the question that 
way.  (This
may trigger fears of expensive boxing conversions, but in reality no actual
allocation will happen.)

We introduce a new relationship based on `extends` / `implements` 
clauses, which
we'll call "extends"; we define `A extends B` as meaning `A <: B` when A 
is a
reference type, and `A.ref <: B` when A is a value companion type.  The
`instanceof` relation, reflection, and pattern matching are updated to use
"extends".

### Arrays

Arrays of reference types are _covariant_; this means that if `A <: B`, then
`A[] <: B[]`.  This allows `Object[]` to be the "top array type", at 
least for
arrays of references.  But arrays of primitives are currently left out 
of this
story.   We can unify the treatment of arrays by defining array 
covariance over
the new "extends" relationship; if A extends B, then `A[] <: B[]`.  For 
a value
class P, `P.val[] <: P.ref[] <: Object[]`, finally making `Object[]` the top
type for all arrays.

### Equality

Just as with `instanceof`, we define `==` on values by appealing to the
reference companion (though no actual boxing need occur). Evaluating `a 
== b`,
where one or both operands are of a value companion type, can be defined 
as if
the operands are first converted to their corresponding reference type, 
and then
comparing the results.  This means that the following will succeed:

```
Point.val p = new Point(3, 4);
Point pr = p;
assert p == pr;
```

The base implementation of `Object::equals` delegates to `==`, which is a
suitable default for both reference and value classes.

### Serialization

If a value class implements `Serializable`, this is also really a statement
about the reference type.  Just as with other aspects described here,
serialization of value companions can be defined by converting to the
corresponding reference type and serializing that, and reversing the 
process at
deserialization time.

Serialization currently uses object identity to preserve the topology of an
object graph.  This generalizes cleanly to objects without identity, because
`==` on value objects treats two identical copies of a value object as 
equal.
So any observations we make about graph topology prior to serialization with
`==` are consistent with those after deserialization.

### Identity-sensitive operations

Certain operations are currently defined in terms of object identity.  
As we've
already seen, some of these, like equality, can be sensibly extended to 
cover
all instances.  Others, like synchronization, will become partial.
Identity-sensitive operations include:

   - **Equality.**  We extend `==` on references to include references 
to value
     objects.  Where it currently has a meaning, the new definition 
coincides
     with that meaning.

   - **System::identityHashCode.**  The main use of `identityHashCode` 
is in the
     implementation of data structures such as `IdentityHashMap`.  We 
can extend
     `identityHashCode` in the same way we extend equality -- deriving a 
hash on
     primitive objects from the hash of all the fields.

   - **Synchronization.**  This becomes a partial operation.  If we can
     statically detect that a synchronization will fail at runtime 
(including
     declaring a `synchronized` method in a value class), we can issue a
     compilation error; if not, attempts to lock on a value object 
results in
     `IllegalMonitorStateException`.  This is justifiable because it is
     intrinsically imprudent to lock on an object for which you do not 
have a
     clear understanding of its locking protocol; locking on an arbitrary
     `Object` or interface instance is doing exactly that.

   - **Weak, soft, and phantom references.**  Capturing an exotic 
reference to a
     value object becomes a partial operation, as these are 
intrinsically tied to
     reachability (and hence to identity).  However, we will likely make
     enhancements to `WeakHashMap` to support mixed identity and value 
keys.

### What about Object?

The root class `Object` poses an unusual problem, in that every class must
extend it directly or indirectly, but it is also instantiable 
(non-abstract),
and its instances have identity -- it is common to use `new Object()` as 
a way
to obtain a new object identity for purposes of locking.

## Why two types?

It is sensible to ask: why do we need companion types at all? This is 
analogous
to the need for boxes in 1995: we'd made one set of tradeoffs for 
primitives,
favoring performance (non-nullable, zero-default, tolerant of
non-initialization, tolerant of tearing under race, unrelated to 
`Object`), and
another for references, favoring flexibility and safety.  Most of the 
time, we
ignored the primitive wrapper classes, but sometimes we needed to 
temporarily
suppress one of these properties, such as when interoperating with code that
expects an `Object` or the ability to express "no value".  The reasons 
we needed
boxes in 1995 still apply today: sometimes we need the affordances of
references, and in those cases, we appeal to the reference companion.

Reasons we might want to use the reference companion include:

  - **Interoperation with reference types.**  Value classes can implement
    interfaces and extend classes (including `Object` and some abstract 
classes),
    which means some class and interface types are going to be 
polymorphic over
    both identity and primitive objects.  This polymorphism is achieved 
through
    object references; a reference to `Object` may be a reference to an 
identity
    object, or a reference to a value object.

  - **Nullability.**  Nullability is an affordance of object 
_references_, not
    objects themselves.  Most of the time, it makes sense that primitive 
types
    are non-nullable (as the primitives are today), but there may be 
situations
    where null is a semantically important value.  Using the reference 
companion
    when nullability is required is semantically clear, and avoids the 
need to
    invent new sentinel values for "no value."

    This need comes up when migrating existing classes; the method 
`Map::get`
    uses `null` to signal that the requested key was not present in the 
map. But,
    if the `V` parameter to `Map` is a primitive class, `null` is not a 
valid
    value.  We can capture the "`V` or null" requirement by changing the
    descriptor of `Map::get` to:

    ```
    public V.ref get(K key);
    ```

    where, whatever type `V` is instantiated as, `Map::get` returns the 
reference
    companion. (For a type `V` that already is a reference type, this is 
just `V`
    itself.) This captures the notion that the return type of `Map::get` 
will
    either be a reference to a `V`, or the `null` reference. (This is a
    compatible change, since both erase to the same thing.)


  - **Self-referential types.**  Some types may want to directly or 
indirectly
    refer to themselves, such as the "next" field in the node type of a 
linked
    list:

    ```
    class Node<T> {
        T theValue;
        Node<T> nextNode;
    }
    ```

    We might want to represent this as a value class, but if the type of
    `nextNode` were `Node.val<T>`, the layout of `Node` would be
    self-referential, since we would be trying to flatten a `Node` into 
its own
    layout.

  - **Protection from tearing.**  For a value class with a non-atomic value
    companion type, we may want to use the reference companion in cases 
where we
    are concerned about tearing; because loads and stores of references are
    atomic, `P.ref` is immune to the tearing under race that `P.val` 
might be
    subject to.

  - **Compatibility with existing boxing.**  Autoboxing is convenient, 
in that it
    lets us pass a primitive where a reference is required.  But boxing 
affects
    far more than assignment conversion; it also affects method overload
    selection.  The rules are designed to prefer overloads that require no
    conversions to those requiring boxing (or varargs) conversions.  
Having both
    a value and reference type for every value class means that these 
rules can
    be cleanly and intuitively extended to cover value classes.

## Refining the value companion

Value classes have several options for refining the behavior of the value
companion type and how they are exposed to clients.

### Classes with no good default value

For a value class `C`, the default value of `C.ref` is the same as any other
reference type: `null`.  For the value companion type `C.val`, the 
default value
is the one where all of its fields are initialized to their default value.

The built-in primitives reflect the design assumption that zero is a 
reasonable
default.  The choice to use a zero default for uninitialized variables 
was one
of the central tradeoffs in the design of the built-in primitives.  It 
gives us
a usable initial value (most of the time), and requires less storage 
footprint
than a representation that supports null (`int` uses all 2^32 of its bit
patterns, so a nullable `int` would have to either make some 32 bit signed
integers unrepresentable, or use a 33rd bit).  This was a reasonable 
tradeoff
for the built-in primitives, and is also a reasonable tradeoff for many 
(but not
all) other potential value classes (such as complex numbers, 2D points,
half-floats, etc).

But for others potential value classes, such as `LocalDate`, there _is_ no
reasonable default.  If we choose to represent a date as the number of days
since some some epoch, there will invariably be bugs that stem from
uninitialized dates; we've all been mistakenly told by computers that 
something
will happen on or near 1 January 1970.  Even if we could choose a 
default other
than the zero representation, an uninitialized date is still likely to be an
error -- there simply is no good default date value.

For this reason, value classes have the choice of encapsulating or exposing
their value companion type.  If the class is willing to tolerate an
uninitialized (zero) value, it can freely share its `.val` companion 
with the
world; if uninitialized values are dangerous (such as for `LocalDate`), 
it can
be encapsulated to the class or package.

Encapsulation is accomplished using ordinary access control.  By 
default, the
value companion is `private`, and need not be declared explicitly; a 
class that
wishes to share its value companion can make it public:

```
public value record Complex(double real, double imag) {
     public value companion Complex.val;
}
```

### Atomicity and tearing

For the primitive types longer than 32 bits (long and double), it is not
guaranteed that reads and writes from different threads (without suitable
coordination) are atomic with respect to each other.  The result is that, if
accessed under data race, a long or double field or array element can be 
seen to
"tear", and a read might see the low 32 bits of one write and the high 
32 bits
of another.  (Declaring the containing field `volatile` is sufficient to 
restore
atomicity, as is properly coordinating with locks or other concurrency 
control,
or not sharing across threads in the first place.)

This was a pragmatic tradeoff given the hardware of the time; the cost 
of 64-bit
atomicity on 1995 hardware would have been prohibitive, and problems 
only arise
when the program already has data races -- and most numeric code deals with
thread-local data.  Just like with the tradeoff of nulls vs zeros, the 
design of
the built-in primitives permits tearing as part of a tradeoff between
performance and correctness, where primitives chose "as fast as 
possible" and
reference types chose more safety.

Today, most JVMs give us atomic loads and stores of 64-bit primitives, 
because
the hardware makes them cheap enough.  But value classes bring us back to
1995; atomic loads and stores of larger-than-64-bit values are still 
expensive
on many CPUs, leaving us with a choice of "make operations on primitives 
slower"
or permitting tearing when accessed under race.

It would not be wise for the language to select a one-size-fits-all 
policy about
tearing; choosing "no tearing" means that types like `Complex` are 
slower than
they need to be, even in a single-threaded program; choosing "tearing" means
that classes like `Range` can be seen to not exhibit invariants asserted by
their constructor.  Class authors have to choose, with full knowledge of 
their
domain, whether their types can tolerate tearing.  The default is no tearing
(safe by default); a class can opt for greater flattening at the cost of
potential tearing by declaring the value companion as `non-atomic`:

```
public value record Complex(double real, double imag) {
     public non-atomic value companion Complex.val;
}
```

For classes like `Complex`, all of whose bit patterns are valid, this is 
very
much like the choice around `long` in 1995.  For other classes that 
might have
nontrivial representational invariants, they likely want to stick to the 
default
of atomicity.

## Migrating legacy primitives

As part of generalizing primitives, we want to adjust the built-in 
primitives to
behave as consistently with value classes as possible.  While we can't 
change
the fact that `int`'s reference companion is the oddly-named `Integer`, 
we can give them
more uniform aliases (`int.ref` is an alias for `Integer`; `int` is an 
alias for
`Integer.val`) -- so that we can use a consistent rule for naming 
companions.
Similarly, we can extend member access to the legacy primitives, and allow
`int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.)

We will redeclare `Integer` as a value class with a public value companion:

```
value class Integer {
     public value companion Integer.val;

     // existing methods
}
```

where the type name `int` is an alias for `Integer.val`.  The primitive 
array
types will be retrofitted such that arrays of primitives are subtypes of 
arrays
of their boxes (`int[] <: Integer[]`).

## Unifying primitives with classes

Earlier, we had a chart of the differences between primitive and reference
types:

| Primitives                                 | 
Objects                            |
| ------------------------------------------ | 
---------------------------------- |
| No identity (pure values)                  | 
Identity                           |
| `==` compares values                       | `==` compares object 
identity      |
| Built-in                                   | Declared in 
classes                |
| No members (fields, methods, constructors) | Members (including 
mutable fields) |
| No supertypes or subtypes                  | Class and interface 
inheritance    |
| Accessed directly                          | Accessed via object 
references     |
| Not nullable                               | 
Nullable                           |
| Default value is zero                      | Default value is 
null              |
| Arrays are monomorphic                     | Arrays are 
covariant               |
| May tear under race                        | Initialization safety 
guarantees   |
| Have reference companions (boxes)          | Don't need reference 
companions    |

The addition of value classes addresses many of these directly. Rather than
saying "classes have identity, primitives do not", we make identity an 
optional
characteristic of classes (and derive equality semantics from that.)  Rather
than primitives being built in, we derive all types, including 
primitives, from
classes, and endow value companion types with the members and supertypes
declared with the value class.  Rather than having primitive arrays be
monomorphic, we make all arrays covariant under the `extends` relation.

The remaining differences now become differences between reference types and
value types:

| Value types                                   | Reference 
types                  |
| --------------------------------------------- | 
-------------------------------- |
| Accessed directly                             | Accessed via object 
references   |
| Not nullable                                  | 
Nullable                         |
| Default value is zero                         | Default value is 
null            |
| May tear under race, if declared `non-atomic` | Initialization safety 
guarantees |


### Choosing which to use

How would we choose between declaring an identity class or a value 
class, and
the various options on value companiones?  Here are some quick rules of 
thumb:

  - If you need mutability, subclassing, or aliasing, choose an identity 
class.
  - If uninitialized (zero) values are unacceptable, choose a value 
class with
    the value companion encapsulated.
  - If you have no cross-field invariants and are willing to tolerate 
tearing to
    enable more flattening, choose a value class with a non-atomic value
    companion.

## Summary

Valhalla unifies, to the extent possible, primitives and objects.   The
following table summarizes the transition from the current world to 
Valhalla.

| Current World                               | 
Valhalla                                                  |
| ------------------------------------------- | 
--------------------------------------------------------- |
| All objects have identity                   | Some objects have 
identity                                |
| Fixed, built-in set of primitives           | Open-ended set of 
primitives, declared via classes        |
| Primitives don't have methods or supertypes | Primitives are classes, 
with methods and supertypes       |
| Primitives have ad-hoc boxes                | Primitives have 
regularized reference companions          |
| Boxes have accidental identity              | Reference companions 
have no identity                     |
| Boxing and unboxing conversions             | Primitive reference and 
value conversions, but same rules |
| Primitive arrays are monomorphic            | All arrays are 
covariant                                  |


[valuebased]: 
https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html
[growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621
[jep390]: https://openjdk.java.net/jeps/390
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220623/abb97a45/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list