JEP update: Primitive Classes

Fri Dec 17 00:08:02 UTC 2021

First, I've made some minor revisions to the Value Objects JEP in the last couple of weeks. You can see it here:
https://openjdk.java.net/jeps/8277163

Second, I've put together a draft of a revised JEP 401, Primitive Classes. This removes content that became part of the Value Objects feature, and refines how we talk about the relationship between primitive types and reference types. Working outside of JBS for now, because I don't want to disrupt the already-Candidate JEP 401 artifact until we're at least ready to Submit the Value Objects piece.

A key idea is that primitive values and value objects are distinct entities, with different types, but they're both instances of the same class (thanks for the good ideas here, Kevin!).

(I'll acknowledge the ongoing discussion about whether "primitive" is the right term to use here. But for now, sticking with the status quo.)

Happy to hear your thoughts!

---

Summary
-------

Support new, developer-declared primitive types in Java. This is a
[preview language and VM feature](http://openjdk.java.net/jeps/12).

Goals
-----

This JEP introduces primitive classes, special kinds of
[value classes][jep-values] that define new primitive types.

The Java programming language will be enhanced to recognize primitive class
declarations and support new primitive types in its type system.

The Java Virtual Machine will be enhanced with a new `Q` carrier type to encode
declared primitive types.

Non-Goals
---------

This JEP is concerned with the core treatment of developer-declared primitives.
Additional features to improve integration with the Java programming language
are not covered here, but are expected to be developed in parallel.
Specifically:

-   [JEP 402][jep402] will enhance the basic primitives (`int`, `boolean`, etc.)
    by giving them primitive class declarations.

-   [A separate JEP][jep-generics] will update Java's generics so that primitive
    types can be used as type arguments.

Other followup efforts may enhance existing APIs to take advantage of primitive
classes, or introduce new language features and APIs built on top of primitive
classes.

Motivation
----------

Java developers work with two kinds of values: primitives and objects.

Primitives offer better performance, because they are typically *inlined*—stored
directly (without headers or pointers) in variables, on the computation stack,
and, ultimately, in CPU registers. Hence, memory reads do not have additional
indirections, primitive arrays are stored densely and contiguously in memory,
primitive-typed fields can be similarly compact, primitive values do not require
garbage collection, and primitive operations are performed within the CPU.

Objects offer better abstractions, including fields, methods, constructors,
access control, and nominal subtyping. But objects traditionally perform poorly
in comparison to primitives, because they are primarily stored in heap-allocated
memory and accessed by reference.

*Value objects*, introduced by [another JEP][jep-values], significantly improve
object performance in many contexts, providing a good fusion of the better
abstractions of objects with the better performance of primitives.

However, certain invariant properties of objects limit how much they can be
optimized—particularly when stored in fields and arrays. Specifically:

-   A variable of a reference type may be `null`, so the inlined layout of a
    value object typically requires some additional bits to encode `null`.
    For example, a variable storing an `int` can fit in 32 bits, but for a value
    class with a single `int` field, a variable of that class type could
    use up to 64 bits.

-   A variable of a reference type must be modified atomically. This often makes
    it impractical to inline a value object, because its layout would be too
    large for efficient atomic modification. Large primitive types (currently,
    `double` and `long`) make no such atomicity guarantees, so variables of
    these types can be modified efficiently without indirect representations
    (concurrency is instead managed at a higher level).

Primitive classes give developers the capability to define new primitive types
that aren't subject to these limitations. Programs can make use of class
features without giving up any of the performance benefits of primitives.

Applications of developer-declared primitives include:

-   Numbers of varieties not supported by the basic primitives, such as
    unsigned bytes, 128-bit integers, and half-precision floats;

-   Points, complex numbers, colors, vectors, and other multi-dimensional
    numerics;

-   Numbers with units—sizes, rates of change, currency, etc.;

-   Bitmasks and other compressed encodings of data;

-   Map entries and other data structure internals;

-   Data-carrying tuples and multiple returns;

-   Aggregations of other primitive types, potentially multiple layers deep

Description
-----------

The features described below are preview features, enabled with the
`--enable-preview` compile-time and runtime flags.

### Primitive classes

A *primitive class* is a special kind of value class that introduces a new
primitive type.

As value classes, primitive classes have no identity. This allows their
instances to be freely converted between value objects and simpler *primitive
values*. A primitive value can be thought of as a bare sequence of field values,
without any headers or extra pointers.

A primitive class is declared with the `primitive` contextual keyword.

```
primitive class Point implements Shape {
    private double x;
    private double y;

    public Point(double x, double y) {
        this.x = x;
        this.y = y;
    }

    public double x() { return x; }
    public double y() { return y; }

    public Point translate(double dx, double dy) {
        return new Point(x+dx, y+dy);
    }

    public boolean contains(Point p) {
        return equals(p);
    }
}

interface Shape {
    boolean contains(Point p);
}
```

(Alternatively, we might prefer the class to be declared as `primitive Point`.)

Primitive class declarations are subject to the [same restrictions][jep-values]
as other value class declarations. For example, the instance fields of a
primitive class are implicitly `final`, so cannot be assigned outside of a
constructor or initializer.

In addition, no instance field of a primitive class declaration may have a
primitive type that depends—directly or indirectly—on the declaring class. In
other words, with the exception of reference-typed fields, the class must allow
for flat, fixed-size layouts without cycles.

In most other ways, a primitive class declaration is just like any other class
declaration. It can have superinterfaces, type parameters, enclosing instances,
inner classes, overloaded constructors, `static` members, and the full range of
access restrictions on its members.

### Primitive types

The name of a primitive class denotes that class's primitive type. Primitive
types store instances of the named class as primitive values. Instances can be
created with normal class instance creation expressions.

```
Point p1 = new Point(1.0, -0.5);
```

Field access and method invocation are supported by primitive types. The members
of a primitive type are the same as the members of the class.

```
assert p1.x() == 1.0;
Point p2 = p1.translate(0.0, 1.0);
System.out.println(p2.toString());
```

Primitive types support the `==` and `!=` operators when comparing two values of
the same type. As is the case for value objects, the `==` comparison recursively
compares the values' fields.

```
Point p3 = new Point(1.8, 3.6);
Point p4 = p3.translate(0.0, 0.0);
assert p3 == p4;
```

Like a value class reference type, an expression of a primitive type cannot be
used as the operand of a `synchronized` statement.

*Unlike* other value classes, a `this` expression in the body of a primitive
class has a primitive type.

### Default values and `null`

Like the basic primitive types (`int`, `boolean`, etc.), declared primitive
types do not allow `null`.

Whenever a field or array component is created, the longstanding behavior is to
set its initial value to the *default value* of its type. For reference types,
this value is `null`, and for the basic primitive types, this value is 0 or
`false`.

For a declared primitive type, the default value is the *initial instance* of
the class: an instance whose fields are all set to their own default values.

```
Object[] os = new Object[5];
assert os[0] == null;
Point[] ps = new Point[5];
assert ps[0].x() == 0.0 && ps[0].y() == 0.0;
```

As shorthand, the default value of a primitive type can be expressed with the
class name followed by the `default` keyword.

```
assert Point.default.x() == 0.0 &&
       Point.default.y() == 0.0;
```

Note that the initial instance of a primitive class is created without invoking
any constructors or instance initializers, and is available to anyone with
access to the class (or its reflective `Class` object). Primitive classes are
not able to specify an initial instance that sets fields to something other than
their default values.

Methods of primitive classes should be designed to work on the initial instance.
If this isn't feasible (for example, a reference-typed field is expected to be
non-null), it may not be appropriate for the class to have a primitive type.
Instead, it can be declared as a normal value class.

### Multi-threaded reads and writes

As for the basic primitive types `double` and `long`, when a field or array
component has a declared primitive type, reads and writes might not be atomic.
As a result, in a multi-threaded program, unexpected instances may be
encountered.

``` 
Point[] ps = new Point[]{ new Point(0.0, 1.0) }; 
new Thread(() -> ps[0] = new Point(1.0, 0.0)).run(); 
Point p = ps[0]; // may be (1.0, 1.0), among other possibilities 
``` 

Like initial instances, primitive class instances produced by non-atomic reads
and writes are created without invoking any constructors or instance
initializers. There is no opportunity for the class to ensure that the field
values of the new object are compatible with each other (for example, a `start`
index may end up being greater than an `end` index).

To ensure that a particular primitive-typed field is always read from and
written to atomically, the field can be declared `volatile`. But there is no
mechanism for a primitive class to ensure that *all* fields and array components
of its type are considered volatile.

A class with a complex integrity constraint in its constructor may not be a good
candidate to be a primitive class. Instead, it can be declared as a normal value
class.

### Reference types

Primitive values are *monomorphic*—they belong to a single type with a specific
set of fields known at compile time and runtime. Values of different primitive
types can't be mixed.

To participate in the *polymorphic* reference type hierarchy, primitive values
are converted to value objects with a *value object conversion*. This occurs
implicitly when assigning from a primitive type to a reference type. The result
is an instance of the same class, just in a different form.

```
Shape s = p1; // value object conversion
assert s.getClass() == Point.class;
```

When invoking an inherited method of a primitive type, the receiver value
undergoes value object conversion to have the type expected by the method
declaration.

```
Point p = new Point(0.3, 7.2);
// toString is declared by Object
p.toString(); // value object conversion
```

It is sometimes useful to talk about the reference type of a primitive class.
This type is expressed with the class name followed by the `ref` contextual
keyword. A variable with a primitive class reference type stores either a value
object belonging to the named class or `null`. 

```
Point.ref[] prs = new Point.ref[10];
prs[1] = new Point(1.0, 1.0);
prs[4] = new Point(4.0, 4.0);
for (Point.ref pr : prs) {
    if (pr != null)
        System.out.println(pr);
}
```

The `ref` type is useful when `null` is needed or when the runtime
characteristics of reference types are preferred (for example, a large sparse
array might be more efficiently encoded with references).

The relationship between the types `Point` and `Point.ref` is similar to the
traditional relationship between the types `int` and `Integer`. However, `Point`
and `Point.ref` both correspond to the same class declaration; the values of
both types are instances of a single `Point` class. At run time, the conversion
between a primitive value and a value object is more lightweight than
traditional boxing conversion.

Value objects can be converted back to primitive values with a *primitive value
conversion*. `null` cannot be converted to a primitive value, so attempts to
convert it cause an exception.

```
Point p = prs[1]; // primitive value conversion
prs[1] = null;
p = prs[1]; // NullPointerException
```

When invoking a method overridden by a primitive class, the receiver object
undergoes primitive value conversion to have the type expected by the method
declaration.

```
Shape s = new Point(0.7, 3.2);
// 'contains' is declared by Point
s.contains(Point.default); // primitive value conversion
```

#### Overload resolution and type arguments

Value object conversion and primitive value conversion are allowed in *loose*,
but not *strict*, invocation contexts. This follows the pattern of boxing and
unboxing: a method overload that is applicable without applying the conversions
takes priority over one that requires them.

```
void m(Point p, int i) { ... }
void m(Point.ref pr, Integer i) { ... }

void test(Point.ref pr, Integer i) {
    m(pr, i); // prefers the second declaration
    m(pr, 0); // ambiguous
}
```

For now, Java's generics only work with reference types.
[Another JEP][jep-generics] will enhance generics to interoperate with primitive
types.

Thus, provisionally, type arguments must be inferred to be reference types. Type
inference treats value object and primitive value conversions the same as boxing
and unboxing—for example, a primitive value passed where an inferred type is
expected will lead to a reference-typed inference constraint.

```
var list = List.of(new Point(1.0, 5.0));
// infers List<Point.ref>
```

#### Array subtyping

Traditionally, primitive array types are not related to reference array
types—an `int[]` cannot be assigned to an `Object[]` variable.

Arrays of declared primitive types are more flexible: the type `Point[]` is a
subtype of `Point.ref[]`, which is a subtype of `Object[]`.

(Basic primitive array types like `int[]` will also gain this capability with
[JEP 402][jep402].)

When a reference is stored in an array of static type `Object[]`, if the array's
runtime component type is `Point` then the operation will perform both an array
store check (checking that the object is an instance of class `Point`) and a
primitive value conversion (converting the object to a primitive value).

Similarly, reading from an array of static type `Object[]` will cause a
value object conversion if the array stores primitive values.

```
Object replace(Object[] objs, int i, Object val) {
    Object result = objs[i]; // may perform value object conversion
    objs[i] = val; // may perform primitive value conversion
    return result;
}

Point[] ps = new Point[]{ new Point(3.0, -2.1) };
replace(ps, 0, new Point(-2.1, 3.0));
replace(ps, 0, null); // NPE from primitive value conversion
```

### `class` file representation & interpretation

A primitive class is declared in a `class` file using the `ACC_PRIMITIVE`
modifier (`0x0800`). At class load time, an error occurs if a primitive class is
not a value class (via `ACC_VALUE`, `0x0100`). At preparation time, an error
occurs if a primitive class has a primitive type circularity in its instance
fields.

A declared primitive type is represented with a new `Q` descriptor prefix
(`QPoint;`). The class's reference type is represented using the usual `L`
descriptor (`LPoint;`).

Primitive values with `Q` types are one-slot stack values, even though they may
represent aggregates of much more than 32 or 64 bits. No particular encoding of
primitive values is mandated.

Verification treats a `Q` type as a subtype of the corresponding `L` type—e.g.,
`QPoint;` is a subtype of `LPoint;`. Conversions from primitive values to value
objects occur implicitly, as needed.

The `this` parameter of a primitive class's instance method has a primitive
type.

Classes mentioned by primitive types in field and method descriptors are loaded
during linkage, before the first access of that field or method.

A `CONSTANT_Class` constant pool entry may refer to a primitive type using a `Q`
descriptor as a "class name". A `CONSTANT_Class` using the plain name of a
primitive class represents the class's reference type.

The `aconst_init` instruction may refer to either a primitive type or a
reference type. This determines whether a primitive value or a value object is
produced.

Similarly, a `CONSTANT_Fieldref` or `CONSTANT_Methodref` may refer to a field or
method as a member of a primitive type or a reference type. In the case of
`withfield`, this determines the result type of the operation.

The `anewarray` and `multianewarray` instructions can be used to create arrays
of declared primitive types. Array subtyping allows these arrays to be viewed as
instances of reference array types.

The `checkcast`, `instanceof`, and `aastore` opcodes support primitive value
types, performing primitive value conversions (including `null` checks) when
necessary.

Primitive classes may be initialized for the same reasons as other classes (for
example, before a static method is invoked). In addition, primitive class
initialization is triggered by the `aconst_init` instruction, by each of the
`anewarray` and `multianewarray` instructions when used with a primitive type,
and (recursively) by initialization of another class that declares a
primitive-typed field mentioning the primitive class.

### Core reflection

Every primitive class has a `java.lang.Class` object representing the class.
For both primitive values and value objects, the `getClass` method of the
class's instances returns this object. A class literal—`Point.class`—can also
be used to express this object.

Tentatively: this `Class` object returns `true` from the `isPrimitive` method,
and `getModifiers` shows its `Modifier.PRIMITIVE` flag set.

For uses that need to model *types*, there is one `Class` object representing
the primitive type, and another representing the reference type. Each of these
have the same behavior as the `Class` object representing the class in most
respects, except for methods to explicitly tell them apart and map from one to
the other.

Tentatively: the `Class` object representing the class doubles as a
representation of the primitive type. A separate `Class` object exist for the
purpose of representing the reference type.

### Other APIs

The following APIs also gain new behaviors:

-   `java.lang.constant` encodes `Q` types in `CONSTANT_Class` structures and
    field and method descriptors

-   `java.lang.invoke` recognizes `Q` types and supports `L`-to-`Q` conversions

-   `javax.lang.model` recognizes primitive class declarations

### Performance model

In typical usage, in heap storage and during fully-optimized code execution,
declared primitive types should have a footprint and execution overhead
comparable to the basic primitive types. For example, a `Point`, as declared
above, can be expected to directly occupy 128 bits in local variables,
parameters, fields, and array components. A field access simply extracts the
first or second 64 bits. There are no additional pointers or metadata fields.

Notably, a primitive class with a single instance field can be expected to have
minimal overhead compared to operating on a value of the field's type directly.

However, JVMs are ultimately free to encode primitive values however they see
fit. Some classes may be considered too large to represent inline. Certain
JVM components, in particular those that are less performance-tuned, may prefer
to interact with primitive values as objects. A primitive value might carry with
it a cached value object pointer to reduce the overhead of future conversions.
Etc.

Value objects that are instances of primitive classes can be expected to behave
much like instances of [other value classes][jep-values].

### HotSpot implementation

This section describes implementation details of this release of the HotSpot
virtual machine, for the information of OpenJDK engineers. These details are
subject to change in future releases and should not be assumed by users of
HotSpot or other JVMs.

Values of `Q` types in HotSpot are encoded as follows:

-   Primitive classes whose field layouts exceed a size threshold are always
    encoded as regular heap objects. Fields marked `volatile` always store
    regular heap objects.

-   Otherwise, primitive values are encoded in fields and arrays as a flattened
    sequence of field values. Array components may be padded to achieve good
    alignment.

-   In the interpreter and C1, primitive values on the stack are represented as
    value objects. Each read of a primitive-typed field or array allocates a
    heap object.

-   In C2, primitive values on the stack are scalarized, effectively encoding
    each field as a separate variable. Methods with Q-typed parameters support
    both a pointer-based entry point (for interpreter and C1 calls) and a
    scalarized entry point (for C2-to-C2 calls). Value objects are also
    scalarized when working with the primitive class's reference type. Heap
    allocations occur where any other supertype is used.

Default values are generally encoded as sequences of zeros, simplifying the task
of field and array creation. However, in cases where a field or array encodes
primitive values as heap pointers, the default value is a non-zero pointer.
(Circularities may require this value to be `null` temporarily, but the `null`
must be hidden from program code.)

Some array types, like `[Ljava/lang/Object;` and `[LPoint;`, allow for both
pointer-based and flattened arrays. Reads and writes for these types dynamically
check a flag and perform the necessary conversions when operating on flattened
arrays.

Alternatives
------------

Making use of the basic primitive types, rather than declaring new primitives,
will often produce a program with equivalent or slightly better performance.
However, this approach gives up the valuable abstractions provided by classes.
It's easy to, say, interpret a `double` with the wrong units, pass an
out-of-range `int` to a library method, or fail to keep two `boolean` flags
together in the right order.

Normal value classes provide many of the benefits of primitive classes, without
the substantial disruptions to the language and JVM type systems. With
additional innovation in JVM implementation techniques and hardware
capabilities, the gap may close further. However, the limitations outlined in
the "Motivation" section are pretty fundamental. For example, a value class type
wrapping a single `long` field and supporting the full range of `long` values
for that field can never be encoded in fewer than 65 bits. Primitive classes
give programmers who need fine-grained control a more reliable performance
model.

We considered many different approaches to boxing and polymorphism before
settling on a model in which primitive values and value objects are two
different representations, with two different types, of the same class
instances. This strategy balances the traditional understanding of primitive
types, with familiar semantics, performance expectations, and conversions to
objects, with the simplicity of a single named class declaration for modeling
data in both the primitive and reference spaces. Strategies in which a primitive
value *is a* object obscure some important differences between the types.
Strategies in which conversions occur between two different class-like entities
introduce distracting complexity.

Risks and Assumptions
---------------------

There are security risks involved in allowing instance creation outside of
constructors, via default instances and non-atomic reads and writes. Developers
will need to understand the implications, and recognize when it would be unsafe
to declare a class `primitive`.

This JEP does not address the interaction of primitive classes with the basic
primitives or generics; these features will be addressed by other JEPs (see
below). But, ultimately, all three JEPs will need to be completed to deliver a
cohesive language design.

Dependencies
-----------

This JEP depends on [Value Objects][jep-values], which establishes the semantics
of primitives when treated as objects. Primitive classes are a special case of
value classes.

In support of this JEP, there are separate efforts to improve the JVM
Specification (in particular its treatment of `class` file validation) and the
Java Language Specification (in particular its treatment of types). These
changes address technical debt and facilitate the specification of these new
features.

In [JEP 402][jep402] we propose to update the basic primitive types (`int`,
`boolean`, etc.) to be represented by primitive classes, unifying the two kinds
of primitive types. The existing wrapper classes will be repurposed to represent
the corresponding types' primitive classes.

In another JEP we will propose modifying the generics model in Java to make type
parameters *universal*—instantiable by all types, both reference and primitive.

In the future, JVM class and method specialization ([JEP 218][jep218], with
revisions) will allow generic classes and methods to specialize field, array,
and local variable layouts when parameterized by primitive types.

[jep402]: https://openjdk.java.net/jeps/402
[jep218]: https://openjdk.java.net/jeps/218
[jep-values]: https://openjdk.java.net/jeps/8277163
[jep-generics]: https://openjdk.java.net/jeps/8261529