We don't need no stinkin' Q descriptors
Brian Goetz
brian.goetz at oracle.com
Fri Jun 30 20:52:33 UTC 2023
In case the HTML got mangled by the mailer, I enclose the markdown
original here.
# We don't need no stinkin' Q types
In the last six months, we made a significant breakthrough at the
language/user
level -- to decompose B3 with its value and reference companions, into two
simpler concepts: implicit constructibility (a declaration-site
property) and
null restriction (a use-site property.) The .ref/.val distinction, and
all its
excess complexity, stemmed from the mistaken desire to model the int/Integer
divide directly. By breaking B3-ness down into more "primitive" properties
(some of which are shared with non-B3 classes), we arrived at a simpler
model;
no more ref/val projections, and more uniform treatment of X! (including
for B1
and B2 classes).
As we worked through the language and translation details, we continued
to seek
a lower energy state. We concluded that we can erase `X!` to `LX;` in a
number
of places (locals, method descriptors, verifier type system) while still
meeting
our performance objectives. Doing so eliminates a number of issues with
method
resolution and distinguishing overloads from overrides. In fact, we found
ourselves using Q for fewer and fewer things, at which point we started
to ask
ourselves: do we need Q descriptors at all?
In our VM, there is a (mostly) 1-1-1 correspondence between runtime types,
descriptors, and class mirrors. In a world where QFoo and LFoo are separate
runtime types, it makes sense for them to have their own descriptors and
mirrors. But as `Foo!` and `Foo?` have come together in the language,
mapping
to a VM which seems them as separate runtime types starts to show gaps.
The role of Q has historically been one of "other", rather than
something on its
own; any class which had a Q type, also had an L type, and Q was the "other
flavor." The "two flavors" orientation made sense when we were modeling the
int/Integer split; we needed two flavors for that in both language and
VM. The
language since discovered that we can break down the int/Integer divide
into two
more primitive notions -- implicit constructibility (an int can be used
without
calling a constructor, an Integer cannot) and non-nullity (non-identity plus
default constructibility plus non-nullity unlocks flattening.)
If Q is a valid descriptor and there is always a Q mirror, we are in a
stable
place with respect to runtime types. But if we intend to allow `m(Foo!)` to
override `m(Foo?)`, to be tolerant of bang-mismatches in method
resolution, and
give Q fewer jobs, then we are moving to an unstable place. We've
explored a
number of "only use Q for certain things" positions, and have found many
of them
to be unstable in various ways. The other stable point is that there
are no Q
types, and no Q mirrors -- but then we need some new channel to encode the
request to exclude null, and so give the VM the flattening hint that is
needed.
As it turns out, there are surprisingly few places that truly need such
a new
channel. We basically need the VM to take "Q-ness" into account in three
places:
- Field layout -- a field of type `Foo!` (where Foo is implicitly
constructible) needs a hint that this field is null-restricted, so
we can lay
it out flat.
- Array layout -- at the point of `anewarray` and friends, we need a
hint when
the component type is an implicitly-constructible, null-restricted type.
- Casting -- casts need to be able to express a value-set check for the
restricted value set of `Foo!` as well as the unrestricted value set of
`Foo`.
We are convinced that these three are all that is truly required to get the
flattening we want. So rather than invent new runtime types / mirrors /
descriptors that are going to flow everywhere (into reflection, method
handles,
verification, etc), let's invent the minimal additional classfile
surface and VM
model to model that. At the same time, let's make sure that the new thing
aligns with the new language model, where the star of the show is
null-restricted types.
#### What about species?
In separate investigations, we have a notion of "species" for a long
time, which
we know we're going to need when we get to specialization. Species form a
partition of a classes instances; every instance of a class belongs to
exactly
one species, and different species may have different layouts and value set
restrictions. And we struggled with species for a long time over the same
runtime type affordances (mirrors and descriptors) -- what does a field
descriptor for a field of type `ArrayList<int>` look like? What does
`getClass`
return?
In both cases, the constraints of compatibility have been pushing us towards
more erasure in descriptors and reflection, with side channels to
reconstruct
information necessary for optimized heap layout, and with separate API
points
for `getClass` vs `getSpecies`. While specialization is considerably more
complicated, nearly all the same considerations (descriptors, mirrors,
reflection) are present for null-restriction types. We took an earlier
swing at
unifying the two under the rubric of "type restrictions", but I think
our model
wasn't quite clean enough at the time to admit this unification. But I
think we
are now (almost) there, and the payoff is big.
What we concluded around species and specialization is that we would have to
continue to erase descriptors (`ArrayList<int>` as a method or field
descriptor
continues to erase to `LArrayList;`), that `getClass` returns the
primary mirror
(`ArrayList`), and that species information is pushed into a side channel.
These are pretty much the exact same considerations as for null-restriction
types.
#### Species and bang types are _refinement types_
A _refinement type_ is a type whose value set is that of another type,
plus a
predicate restricting the value set. A "bang" type `Point!` is a
refinement of
Point, where we eliminate the value `null`. (Other well-known
refinement types
from PL history include C enums and Pascal ranges.) Refinement types
are often
erased to their base type, but some refinements enable better layout.
Indeed,
our interest in Q types is flattening, and for an implicitly constructible
class, a variable holding a null-excluding type can be flattened. Similarly,
for a sufficiently constrained generic type (e.g., `Point[int,int]`),
the layout
of such a variable can be flattened as well.
What we previously called "type restrictions" in the [Parametric
VM](https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/parametric-vm/parametric-vm.md#type-restricted-methods-and-fields-and-the-typerestriction-attribute)
document is in fact a refinement type. We claim that we can design the
null-restriction channel in such a way that it can be extended, in some
reasonable way, to support more general specialization.
Both specialization, and null-restriction, are forms of refinement
types. Given
that we've already discovered that we need to erase these to their
primary (L)
type in a lot of places, let's stake out some general principles for
representing refinements in the VM:
- Refinement types are erased to their base type in method and field
descriptors.
- Refinement types do not have _class_ mirrors.
- `Object::getClass` returns a class mirror.
- Reflection deals in class mirrors, so refinements are erased from base
reflection.
- Method handles deal in class mirrors, so refinements are erased from
method
handles.
That's a lot of erasure, so we have to bake refinement back in where it
matters,
but we want to be careful to limit the "blast radius" of the refinement
information to where it does actually matter. The new channel that
encodes a
refinement type will appear only when needed to carry out the tasks listed
above: field declaration, array creation, and casting.
- Fields are enhanced with some sort of "refinement" attribute, which (a)
guards against stores of bad values (the field equivalent of
`ArrayStoreException`) and (b) enables flatter layouts when the
refinement
permits.
- Array creation (`anewarray` / `multianewarray') is enhanced to support
creating arrays with refined component types, enabling the same benefits
(storage safety / layout flattening.)
- Casting is enhanced to support refinements. This is needed mostly
because of
erasure -- we are erasing away refinement information and sometimes
need to
reassert it.
- When we get to specialization, `new` is enhanced to support
refinements, and
possibly method declarations (to enable calling convention
optimization in
the presence of highly specialized types like `Point[int,int]`.)
We had previously been assuming that `[QPoint` is somehow more of a
"real" type
than (specialized) `Point[int,int]`, but I think we are better served seeing
them both as refinements, where we continue to report a broad type but
sort-of-secretly use refinement information to optimize layout.
## A strawman
What follows is a strawman that eliminates Qs completely, replacing the
few jobs
Q has (field layout, array layout, and casts) with a single mechanism for
refinement types which stays in the background until explicitly summoned. We
believe the model outlined here can extend cleanly to species, as well
as `B1!`
types like `String!` as well. Call this No-Q world. This should not be
taken
as a concrete proposal, as much as a sketch of the concepts and the
players.
We have come to believe that adding Q descriptors to the JVM specification,
while perhaps the right move in a from-scratch VM design, would be
overreach as
an evolutionary step. For old APIs to adopt new descriptors will
require many
bridge methods with complex properties. To avoid such bridges, old APIs
would
be forbidden from mentioning the new types. For these reasons, new
descriptors,
and the mirrors that would accompany them, are quite literally a bridge
too far.
Accordingly, in No-Q world, descriptors reclaim their former role:
describing
primitives and classes. Field and method descriptors will use `L`
descriptors,
even when carrying a null-restricted value (or a species.) Similarly, class
mirrors return to their former role: describing classfiles and non-refined
VM-derived types (such as array types.)
As a self-imposed rule of this essay, we will not appeal to runtime support,
condy or indy. Everything will be done with bytecodes, descriptors, constant
pool entries, and other classfile structures, and not via specially-known
methods. As this is a strawman, we may indulge in some "wasteful"
design, which
can be transformed or lumped in later iterations. The new elements of the
design are:
- A new reflective concept for `RefinementType`, which represents a
refinement
of an existing (class) type.
- A new reflective concept for `RepresentableType`, which is the common
supertype between `Class` and `RefinementType`.
- New constant pool forms representing null-restriction of classes and of
arrays.
- A new field attribute called `FieldRefinement`.
- Adjustments to various bytecodes to interact with the new constant pool
forms.
- Additions to reflective APIs.
## Refined types
A refined type is a combination of a type (called the base type) and a
value set
restriction for that type which excludes some values in the value set of the
base type. Null-restricted types, arrays of null-restricted types, and
eventually, species of generics are refined types.
Refined types can be represented by a reflective object
```
sealed interface RefinementType<T> implements RepresentableType<T> {
RepresentableType<T> baseType();
}
```
The type parameter `T` represents the base type.
There are initially two implementations of `RefinementType`, which may
be private,
and are known to the VM:
```
private record NullRestrictedClass<T>(Class<T> baseType)
implements RefinementType<T> { }
private record NullRestrictedArray<T extends Object[]>(Class<T> baseType)
implements RefinementType<T> { }
```
#### Constant pool entries
The two jobs for null restriction must be representable in the constant
pool: a
null-restricted B3, and an array of a null-restricted B3. (These
correspond to
`Constant_Class_info` with a descriptor of `QFoo;` and `[QFoo;` in the
traditional design.) In addition to being referenced by bytecodes and
attributes, such constants should ideally be loadable, evaluating to a
`RefinementType` or `RepresentableType`.
The exact form of the constant pool entry (whether new bespoke constant pool
entries, ad-hoc extensions to Constant_Class_info, or condy) can be
bikeshod at
the appropriate time; there are clearly tradeoffs here.
Initially, null-restricted types must be implicitly constructible (B3),
which
would be checked when the constant is resolved. Eventually, we can relax
null-restriction to support all class types. Similarly, we may initially
restrict to one-dimensional flat arrays, and leave `multianewarray` to
its old
job.
#### Representable types
The new common superinterface between `Class` and `RefinementType`
exists so that
both classes and class refinements can be used as array components, type
parameters for specializations, etc. Some operations from `Class`, such as
casting, may be pulled up into this interface.
```
sealed interface RepresentableType<T> {
T cast(Object o) throws ClassCastException;
...
}
```
#### Refined fields
Any field whose type is a null-restricted implicitly constructible class
may be
considered by the VM as a candidate for flattening. Rather than using
`field_info.descriptor_index` to encode a null-restricted type, we
continue to
erase to the traditional `L` descriptor, but add a `FieldRefinement`
attribute
on the field. Similarly, `Constant_FieldRef_info` continues to link fields
using the `L` descriptor.
```
FieldRefinement {
u2 name_index; // "FieldRefinement"
u4 length;
u2 refinement_index; // symbolic reference to a RefinementType
}
```
The symbolic reference must be to a null-restricted, implicitly
constructible
class type, not an array type. We may relax this restriction later.
Additionally, a field refinement may affect the behavior of `putfield`.
For a
null-restricted class, attempts to `putfield` a null will result in
`NullPointerException` (or perhaps a more general `FieldStoreException`.)
Looking ahead, for the null-restriction of a B1 or B2 class, there is no
change
to the layout but we could enforce the storage restriction on
`putfield.` When
we get to species, the refinement for a species may affect the layout, and
attempting to store a value of the wrong species may result in an
exception or
in an automatic conversion.
It is a free choice as to whether we want to translate a field of type
`Point![]` using an array refinement or fully erase it to `Point[]`.
#### Refined casts
The operand of a `checkcast` or `instanceof` may be a symbolic reference
to a
class or refinement. (Since `instanceof` is null-hostile, changing
`instanceof`
is not necessary now, but when we get to species, we will need to be able to
test for species membership.) The `cast` operation may be pulled up from
`Class` to `RepresentableType` so that casts can be done reflectively with
either a `Class` or a refinement.
#### Refined array creation
An `anewarray` may make a symbolic reference to a class refinement type,
as well
as to a class, array, or interface type.
For a refined array, `a.getClass()` continues to return the primary
mirror for
the array type, and `Class::getComponentType` on that array continues to
return
the primary mirror for the component type, but we may provide an
additional API
point akin to `getComponentType` that returns a `RepresentableType`
which may be
a `RefinementType`.
Arrays of null-restricted values can be created reflectively; the existing
`Array::newInstance` method will get an overload that takes
`RepresentableType`.
`Arrays::copyOf` when presented with a refined array type will create a
refined
array.
#### Refinement information stays in the background until summoned
The place where we need discipline is avoiding the temptation of "but
someone
might profitably use the information that this field holds a flat
array." Yes,
they might -- but supporting that as a general-purpose runtime type (with
descriptor and mirror) has costs.
The model proposed here resists the temptation to redefine mirrors,
descriptors,
symbolic resolution, and reflection, instead leaning on erasure here for
both
null-restriction and specialization, and providing a secondary reflective
channel (which almost no users will actually need) to get refinement
information. (An example of code that needs to summon refinement
information is
Arrays::copy, which would need to fetch the refined component type and
instantiate an array using the refined type; most other reflective code
would
not need to even be aware of it.)
#### Bonus round: specialization
The framework so far seems to accomodate specialization fairly well.
There'll
be a new subtype of `RefinementType` to represent a specialization, a
reflective
method for creating such specialization such as:
static<T> SpecializedType<T> specialization(Class<T> baseClass,
RepresentableType<?>... arguments)
and a new way to get such a type refinement in the constant pool
(possibly just
a condy whose bootstrap is the above method.) The `new` bytecode is
extended to
accept a specialization refinement. Field refinements would then be able to
refer to specialization refinements.
## Conclusions
In the current world we have a (mostly) 1:1:1 relationship between runtime
types, descriptors, and mirrors; a model where species/refinements are
not full
runtime types preserves this. The surface area where refinement information
leaks to users who are not prepared for it is dramatically smaller.
Refinements
are not full runtime types, they don't have full Class mirrors. We
erase down
to real runtime types in descriptors and in reflective API points like
`Object::getClass`. This seems a powerful simplification, and one that
aligns
with the previous language simplification. To summarize:
- Yes, we should get rid of Q descriptors, but should do so in a more
principled way by getting rid of Q as a runtime type entirely,
replacing it
with a refinement type which stays in the background until it is
actually
needed.
- We should erase Q from method and field descriptors and from the obvious
mirrors, because refinement information is on a need-to-know basis.
- Refinement information primarily flows from source -> classfile ->
VM, and
mostly does not flow in the other direction. Specialized reflection
might
expose it, but we should do so not on general principles, but based
on where
it is actually needed by the programming model.
- Null restriction is more like specialization than not; they are both
value
set refinements that possibly enable layout optimization, and we
should seek
to treat them the same.
- While leaving the door open for additional kinds of species and type
migration, we use our new powers, at first, only to define
flattenable fields
and flattenable one-dimensional arrays.
More information about the valhalla-spec-experts
mailing list