We don't need no stinkin' Q descriptors

Fri Jun 30 20:52:33 UTC 2023

In case the HTML got mangled by the mailer, I enclose the markdown 
original here.

# We don't need no stinkin' Q types

In the last six months, we made a significant breakthrough at the 
language/user
level -- to decompose B3 with its value and reference companions, into two
simpler concepts: implicit constructibility (a declaration-site 
property) and
null restriction (a use-site property.)  The .ref/.val distinction, and 
all its
excess complexity, stemmed from the mistaken desire to model the int/Integer
divide directly.  By breaking B3-ness down into more "primitive" properties
(some of which are shared with non-B3 classes), we arrived at a simpler 
model;
no more ref/val projections, and more uniform treatment of X! (including 
for B1
and B2 classes).

As we worked through the language and translation details, we continued 
to seek
a lower energy state.  We concluded that we can erase `X!` to `LX;` in a 
number
of places (locals, method descriptors, verifier type system) while still 
meeting
our performance objectives.  Doing so eliminates a number of issues with 
method
resolution and distinguishing overloads from overrides.  In fact, we found
ourselves using Q for fewer and fewer things, at which point we started 
to ask
ourselves: do we need Q descriptors at all?

In our VM, there is a (mostly) 1-1-1 correspondence between runtime types,
descriptors, and class mirrors.  In a world where QFoo and LFoo are separate
runtime types, it makes sense for them to have their own descriptors and
mirrors.  But as `Foo!` and `Foo?` have come together in the language, 
mapping
to a VM which seems them as separate runtime types starts to show gaps.

The role of Q has historically been one of "other", rather than 
something on its
own; any class which had a Q type, also had an L type, and Q was the "other
flavor."  The "two flavors" orientation made sense when we were modeling the
int/Integer split; we needed two flavors for that in both language and 
VM.  The
language since discovered that we can break down the int/Integer divide 
into two
more primitive notions -- implicit constructibility (an int can be used 
without
calling a constructor, an Integer cannot) and non-nullity (non-identity plus
default constructibility plus non-nullity unlocks flattening.)

If Q is a valid descriptor and there is always a Q mirror, we are in a 
stable
place with respect to runtime types.  But if we intend to allow `m(Foo!)` to
override `m(Foo?)`, to be tolerant of bang-mismatches in method 
resolution, and
give Q fewer jobs, then we are moving to an unstable place.  We've 
explored a
number of "only use Q for certain things" positions, and have found many 
of them
to be unstable in various ways.  The other stable point is that there 
are no Q
types, and no Q mirrors -- but then we need some new channel to encode the
request to exclude null, and so give the VM the flattening hint that is 
needed.

As it turns out, there are surprisingly few places that truly need such 
a new
channel.  We basically need the VM to take "Q-ness" into account in three
places:

  - Field layout -- a field of type `Foo!` (where Foo is implicitly
    constructible) needs a hint that this field is null-restricted, so 
we can lay
    it out flat.
  - Array layout -- at the point of `anewarray` and friends, we need a 
hint when
    the component type is an implicitly-constructible, null-restricted type.
  - Casting -- casts need to be able to express a value-set check for the
    restricted value set of `Foo!` as well as the unrestricted value set of
    `Foo`.

We are convinced that these three are all that is truly required to get the
flattening we want.  So rather than invent new runtime types / mirrors /
descriptors that are going to flow everywhere (into reflection, method 
handles,
verification, etc), let's invent the minimal additional classfile 
surface and VM
model to model that.  At the same time, let's make sure that the new thing
aligns with the new language model, where the star of the show is
null-restricted types.

#### What about species?

In separate investigations, we have a notion of "species" for a long 
time, which
we know we're going to need when we get to specialization.  Species form a
partition of a classes instances; every instance of a class belongs to 
exactly
one species, and different species may have different layouts and value set
restrictions.  And we struggled with species for a long time over the same
runtime type affordances (mirrors and descriptors) -- what does a field
descriptor for a field of type `ArrayList<int>` look like? What does 
`getClass`
return?

In both cases, the constraints of compatibility have been pushing us towards
more erasure in descriptors and reflection, with side channels to 
reconstruct
information necessary for optimized heap layout, and with separate API 
points
for `getClass` vs `getSpecies`.  While specialization is considerably more
complicated, nearly all the same considerations (descriptors, mirrors,
reflection) are present for null-restriction types.  We took an earlier 
swing at
unifying the two under the rubric of "type restrictions", but I think 
our model
wasn't quite clean enough at the time to admit this unification. But I 
think we
are now (almost) there, and the payoff is big.

What we concluded around species and specialization is that we would have to
continue to erase descriptors (`ArrayList<int>` as a method or field 
descriptor
continues to erase to `LArrayList;`), that `getClass` returns the 
primary mirror
(`ArrayList`), and that species information is pushed into a side channel.
These are pretty much the exact same considerations as for null-restriction
types.

#### Species and bang types are _refinement types_

A _refinement type_ is a type whose value set is that of another type, 
plus a
predicate restricting the value set.  A "bang" type `Point!` is a 
refinement of
Point, where we eliminate the value `null`.  (Other well-known 
refinement types
from PL history include C enums and Pascal ranges.)  Refinement types 
are often
erased to their base type, but some refinements enable better layout.  
Indeed,
our interest in Q types is flattening, and for an implicitly constructible
class, a variable holding a null-excluding type can be flattened. Similarly,
for a sufficiently constrained generic type (e.g., `Point[int,int]`), 
the layout
of such a variable can be flattened as well.

What we previously called "type restrictions" in the [Parametric
VM](https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/parametric-vm/parametric-vm.md#type-restricted-methods-and-fields-and-the-typerestriction-attribute)
document is in fact a refinement type.  We claim that we can design the
null-restriction channel in such a way that it can be extended, in some
reasonable way, to support more general specialization.

Both specialization, and null-restriction, are forms of refinement 
types.  Given
that we've already discovered that we need to erase these to their 
primary (L)
type in a lot of places, let's stake out some general principles for
representing refinements in the VM:

  - Refinement types are erased to their base type in method and field
    descriptors.
  - Refinement types do not have _class_ mirrors.
  - `Object::getClass` returns a class mirror.
  - Reflection deals in class mirrors, so refinements are erased from base
    reflection.
  - Method handles deal in class mirrors, so refinements are erased from 
method
    handles.

That's a lot of erasure, so we have to bake refinement back in where it 
matters,
but we want to be careful to limit the "blast radius" of the refinement
information to where it does actually matter.  The new channel that 
encodes a
refinement type will appear only when needed to carry out the tasks listed
above: field declaration, array creation, and casting.

  - Fields are enhanced with some sort of "refinement" attribute, which (a)
    guards against stores of bad values (the field equivalent of
    `ArrayStoreException`) and (b) enables flatter layouts when the 
refinement
    permits.
  - Array creation (`anewarray` / `multianewarray') is enhanced to support
    creating arrays with refined component types, enabling the same benefits
    (storage safety / layout flattening.)
  - Casting is enhanced to support refinements.  This is needed mostly 
because of
    erasure -- we are erasing away refinement information and sometimes 
need to
    reassert it.
  - When we get to specialization, `new` is enhanced to support 
refinements, and
    possibly method declarations (to enable calling convention 
optimization in
    the presence of highly specialized types like `Point[int,int]`.)

We had previously been assuming that `[QPoint` is somehow more of a 
"real" type
than (specialized) `Point[int,int]`, but I think we are better served seeing
them both as refinements, where we continue to report a broad type but
sort-of-secretly use refinement information to optimize layout.

## A strawman

What follows is a strawman that eliminates Qs completely, replacing the 
few jobs
Q has (field layout, array layout, and casts) with a single mechanism for
refinement types which stays in the background until explicitly summoned. We
believe the model outlined here can extend cleanly to species, as well 
as `B1!`
types like `String!` as well.  Call this No-Q world.  This should not be 
taken
as a concrete proposal, as much as a sketch of the concepts and the 
players.

We have come to believe that adding Q descriptors to the JVM specification,
while perhaps the right move in a from-scratch VM design, would be 
overreach as
an evolutionary step.  For old APIs to adopt new descriptors will 
require many
bridge methods with complex properties.  To avoid such bridges, old APIs 
would
be forbidden from mentioning the new types.  For these reasons, new 
descriptors,
and the mirrors that would accompany them, are quite literally a bridge 
too far.
Accordingly, in No-Q world, descriptors reclaim their former role: 
describing
primitives and classes.  Field and method descriptors will use `L` 
descriptors,
even when carrying a null-restricted value (or a species.) Similarly, class
mirrors return to their former role: describing classfiles and non-refined
VM-derived types (such as array types.)

As a self-imposed rule of this essay, we will not appeal to runtime support,
condy or indy. Everything will be done with bytecodes, descriptors, constant
pool entries, and other classfile structures, and not via specially-known
methods.  As this is a strawman, we may indulge in some "wasteful" 
design, which
can be transformed or lumped in later iterations.  The new elements of the
design are:

  - A new reflective concept for `RefinementType`, which represents a 
refinement
    of an existing (class) type.
  - A new reflective concept for `RepresentableType`, which is the common
    supertype between `Class` and `RefinementType`.
  - New constant pool forms representing null-restriction of classes and of
    arrays.
  - A new field attribute called `FieldRefinement`.
  - Adjustments to various bytecodes to interact with the new constant pool
    forms.
  - Additions to reflective APIs.

## Refined types

A refined type is a combination of a type (called the base type) and a 
value set
restriction for that type which excludes some values in the value set of the
base type.  Null-restricted types, arrays of null-restricted types, and
eventually, species of generics are refined types.

Refined types can be represented by a reflective object

```
sealed interface RefinementType<T> implements RepresentableType<T> {
     RepresentableType<T> baseType();
}
```

The type parameter `T` represents the base type.

There are initially two implementations of `RefinementType`, which may 
be private,
and are known to the VM:

```
private record NullRestrictedClass<T>(Class<T> baseType)
         implements RefinementType<T> { }

private record NullRestrictedArray<T extends Object[]>(Class<T> baseType)
         implements RefinementType<T> { }
```

#### Constant pool entries

The two jobs for null restriction must be representable in the constant 
pool: a
null-restricted B3, and an array of a null-restricted B3.  (These 
correspond to
`Constant_Class_info` with a descriptor of `QFoo;` and `[QFoo;` in the
traditional design.)  In addition to being referenced by bytecodes and
attributes, such constants should ideally be loadable, evaluating to a
`RefinementType` or `RepresentableType`.

The exact form of the constant pool entry (whether new bespoke constant pool
entries, ad-hoc extensions to Constant_Class_info, or condy) can be 
bikeshod at
the appropriate time; there are clearly tradeoffs here.

Initially, null-restricted types must be implicitly constructible (B3), 
which
would be checked when the constant is resolved.  Eventually, we can relax
null-restriction to support all class types.  Similarly, we may initially
restrict to one-dimensional flat arrays, and leave `multianewarray` to 
its old
job.

#### Representable types

The new common superinterface between `Class` and `RefinementType` 
exists so that
both classes and class refinements can be used as array components, type
parameters for specializations, etc.  Some operations from `Class`, such as
casting, may be pulled up into this interface.

```
sealed interface RepresentableType<T> {
     T cast(Object o) throws ClassCastException;
     ...
}
```

#### Refined fields

Any field whose type is a null-restricted implicitly constructible class 
may be
considered by the VM as a candidate for flattening.  Rather than using
`field_info.descriptor_index` to encode a null-restricted type, we 
continue to
erase to the traditional `L` descriptor, but add a `FieldRefinement` 
attribute
on the field.  Similarly, `Constant_FieldRef_info` continues to link fields
using the `L` descriptor.

```
FieldRefinement {
     u2 name_index;        // "FieldRefinement"
     u4 length;
     u2 refinement_index;  // symbolic reference to a RefinementType
}
```

The symbolic reference must be to a null-restricted, implicitly 
constructible
class type, not an array type.  We may relax this restriction later.

Additionally, a field refinement may affect the behavior of `putfield`.  
For a
null-restricted class, attempts to `putfield` a null will result in
`NullPointerException` (or perhaps a more general `FieldStoreException`.)

Looking ahead, for the null-restriction of a B1 or B2 class, there is no 
change
to the layout but we could enforce the storage restriction on 
`putfield.`  When
we get to species, the refinement for a species may affect the layout, and
attempting to store a value of the wrong species may result in an 
exception or
in an automatic conversion.

It is a free choice as to whether we want to translate a field of type
`Point![]` using an array refinement or fully erase it to `Point[]`.

#### Refined casts

The operand of a `checkcast` or `instanceof` may be a symbolic reference 
to a
class or refinement.  (Since `instanceof` is null-hostile, changing 
`instanceof`
is not necessary now, but when we get to species, we will need to be able to
test for species membership.)  The `cast` operation may be pulled up from
`Class` to `RepresentableType` so that casts can be done reflectively with
either a `Class` or a refinement.

#### Refined array creation

An `anewarray` may make a symbolic reference to a class refinement type, 
as well
as to a class, array, or interface type.

For a refined array, `a.getClass()` continues to return the primary 
mirror for
the array type, and `Class::getComponentType` on that array continues to 
return
the primary mirror for the component type, but we may provide an 
additional API
point akin to `getComponentType` that returns a `RepresentableType` 
which may be
a `RefinementType`.

Arrays of null-restricted values can be created reflectively; the existing
`Array::newInstance` method will get an overload that takes 
`RepresentableType`.
`Arrays::copyOf` when presented with a refined array type will create a 
refined
array.

#### Refinement information stays in the background until summoned

The place where we need discipline is avoiding the temptation of "but 
someone
might profitably use the information that this field holds a flat 
array."  Yes,
they might -- but supporting that as a general-purpose runtime type (with
descriptor and mirror) has costs.

The model proposed here resists the temptation to redefine mirrors, 
descriptors,
symbolic resolution, and reflection, instead leaning on erasure here for 
both
null-restriction and specialization, and providing a secondary reflective
channel (which almost no users will actually need) to get refinement
information.  (An example of code that needs to summon refinement 
information is
Arrays::copy, which would need to fetch the refined component type and
instantiate an array using the refined type; most other reflective code 
would
not need to even be aware of it.)

#### Bonus round: specialization

The framework so far seems to accomodate specialization fairly well.  
There'll
be a new subtype of `RefinementType` to represent a specialization, a 
reflective
method for creating such specialization such as:

     static<T> SpecializedType<T> specialization(Class<T> baseClass,
RepresentableType<?>... arguments)

and a new way to get such a type refinement in the constant pool 
(possibly just
a condy whose bootstrap is the above method.)  The `new` bytecode is 
extended to
accept a specialization refinement.  Field refinements would then be able to
refer to specialization refinements.

## Conclusions

In the current world we have a (mostly) 1:1:1 relationship between runtime
types, descriptors, and mirrors; a model where species/refinements are 
not full
runtime types preserves this.  The surface area where refinement information
leaks to users who are not prepared for it is dramatically smaller. 
Refinements
are not full runtime types, they don't have full Class mirrors.  We 
erase down
to real runtime types in descriptors and in reflective API points like
`Object::getClass`.  This seems a powerful simplification, and one that 
aligns
with the previous language simplification.  To summarize:

  - Yes, we should get rid of Q descriptors, but should do so in a more
    principled way by getting rid of Q as a runtime type entirely, 
replacing it
    with a refinement type which stays in the background until it is 
actually
    needed.
  - We should erase Q from method and field descriptors and from the obvious
    mirrors, because refinement information is on a need-to-know basis.
  - Refinement information primarily flows from source -> classfile -> 
VM, and
    mostly does not flow in the other direction.  Specialized reflection 
might
    expose it, but we should do so not on general principles, but based 
on where
    it is actually needed by the programming model.
  - Null restriction is more like specialization than not; they are both 
value
    set refinements that possibly enable layout optimization, and we 
should seek
    to treat them the same.
  - While leaving the door open for additional kinds of species and type
    migration, we use our new powers, at first, only to define 
flattenable fields
    and flattenable one-dimensional arrays.