We don't need no stinkin' Q descriptors

Fri Jun 30 20:51:43 UTC 2023

This mail summarizes some discussions we’ve been having about 
eliminating Q descriptors from the VM design. Over time, we’ve been 
giving Q fewer and fewer jobs to do, to the point where (perhaps 
surprisingly) we can replace the remaining jobs with less intrusive 
mechanisms. Additionally, as the language model has simplified, the gap 
between the language and VM has increased, and the proposal herein 
offers a path to narrowing that gap.

I’ll be on vacation for a while, but Dan and John will be able to carry 
forward this discussion.

Please bear in mind that this is a very rough draft of direction; we 
don’t need to bikeshed anything right now, as much as agree that there 
is a better, simpler, more aligned direction than we had previously.

  We don’t need no stinkin’ Q types

In the last six months, we made a significant breakthrough at the 
language/user
level — to decompose B3 with its value and reference companions, into two
simpler concepts: implicit constructibility (a declaration-site 
property) and
null restriction (a use-site property.) The .ref/.val distinction, and 
all its
excess complexity, stemmed from the mistaken desire to model the int/Integer
divide directly. By breaking B3-ness down into more “primitive” properties
(some of which are shared with non-B3 classes), we arrived at a simpler 
model;
no more ref/val projections, and more uniform treatment of X! (including 
for B1
and B2 classes).

As we worked through the language and translation details, we continued 
to seek
a lower energy state. We concluded that we can erase |X!| to |LX;| in a 
number
of places (locals, method descriptors, verifier type system) while still 
meeting
our performance objectives. Doing so eliminates a number of issues with 
method
resolution and distinguishing overloads from overrides. In fact, we found
ourselves using Q for fewer and fewer things, at which point we started 
to ask
ourselves: do we need Q descriptors at all?

In our VM, there is a (mostly) 1-1-1 correspondence between runtime types,
descriptors, and class mirrors. In a world where QFoo and LFoo are separate
runtime types, it makes sense for them to have their own descriptors and
mirrors. But as |Foo!| and |Foo?| have come together in the language, 
mapping
to a VM which seems them as separate runtime types starts to show gaps.

The role of Q has historically been one of “other”, rather than 
something on its
own; any class which had a Q type, also had an L type, and Q was the “other
flavor.” The “two flavors” orientation made sense when we were modeling the
int/Integer split; we needed two flavors for that in both language and 
VM. The
language since discovered that we can break down the int/Integer divide 
into two
more primitive notions — implicit constructibility (an int can be used 
without
calling a constructor, an Integer cannot) and non-nullity (non-identity plus
default constructibility plus non-nullity unlocks flattening.)

If Q is a valid descriptor and there is always a Q mirror, we are in a 
stable
place with respect to runtime types. But if we intend to allow |m(Foo!)| to
override |m(Foo?)|, to be tolerant of bang-mismatches in method 
resolution, and
give Q fewer jobs, then we are moving to an unstable place. We’ve explored a
number of “only use Q for certain things” positions, and have found many 
of them
to be unstable in various ways. The other stable point is that there are 
no Q
types, and no Q mirrors — but then we need some new channel to encode the
request to exclude null, and so give the VM the flattening hint that is 
needed.

As it turns out, there are surprisingly few places that truly need such 
a new
channel. We basically need the VM to take “Q-ness” into account in three
places:

  * Field layout — a field of type |Foo!| (where Foo is implicitly
    constructible) needs a hint that this field is null-restricted, so
    we can lay
    it out flat.
  * Array layout — at the point of |anewarray| and friends, we need a
    hint when
    the component type is an implicitly-constructible, null-restricted type.
  * Casting — casts need to be able to express a value-set check for the
    restricted value set of |Foo!| as well as the unrestricted value set of
    |Foo|.

We are convinced that these three are all that is truly required to get the
flattening we want. So rather than invent new runtime types / mirrors /
descriptors that are going to flow everywhere (into reflection, method 
handles,
verification, etc), let’s invent the minimal additional classfile 
surface and VM
model to model that. At the same time, let’s make sure that the new thing
aligns with the new language model, where the star of the show is
null-restricted types.

        What about species?

In separate investigations, we have a notion of “species” for a long 
time, which
we know we’re going to need when we get to specialization. Species form a
partition of a classes instances; every instance of a class belongs to 
exactly
one species, and different species may have different layouts and value set
restrictions. And we struggled with species for a long time over the same
runtime type affordances (mirrors and descriptors) — what does a field
descriptor for a field of type |ArrayList<int>| look like? What does 
|getClass|
return?

In both cases, the constraints of compatibility have been pushing us towards
more erasure in descriptors and reflection, with side channels to 
reconstruct
information necessary for optimized heap layout, and with separate API 
points
for |getClass| vs |getSpecies|. While specialization is considerably more
complicated, nearly all the same considerations (descriptors, mirrors,
reflection) are present for null-restriction types. We took an earlier 
swing at
unifying the two under the rubric of “type restrictions”, but I think 
our model
wasn’t quite clean enough at the time to admit this unification. But I 
think we
are now (almost) there, and the payoff is big.

What we concluded around species and specialization is that we would have to
continue to erase descriptors (|ArrayList<int>| as a method or field 
descriptor
continues to erase to |LArrayList;|), that |getClass| returns the 
primary mirror
(|ArrayList|), and that species information is pushed into a side channel.
These are pretty much the exact same considerations as for null-restriction
types.

        Species and bang types are /refinement types/

A /refinement type/ is a type whose value set is that of another type, 
plus a
predicate restricting the value set. A “bang” type |Point!| is a 
refinement of
Point, where we eliminate the value |null|. (Other well-known refinement 
types
from PL history include C enums and Pascal ranges.) Refinement types are 
often
erased to their base type, but some refinements enable better layout. 
Indeed,
our interest in Q types is flattening, and for an implicitly constructible
class, a variable holding a null-excluding type can be flattened. Similarly,
for a sufficiently constrained generic type (e.g., |Point[int,int]|), 
the layout
of such a variable can be flattened as well.

What we previously called “type restrictions” in the Parametric
VM 
<https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/parametric-vm/parametric-vm.md#type-restricted-methods-and-fields-and-the-typerestriction-attribute>
document is in fact a refinement type. We claim that we can design the
null-restriction channel in such a way that it can be extended, in some
reasonable way, to support more general specialization.

Both specialization, and null-restriction, are forms of refinement 
types. Given
that we’ve already discovered that we need to erase these to their 
primary (L)
type in a lot of places, let’s stake out some general principles for
representing refinements in the VM:

  * Refinement types are erased to their base type in method and field
    descriptors.
  * Refinement types do not have /class/ mirrors.
  * |Object::getClass| returns a class mirror.
  * Reflection deals in class mirrors, so refinements are erased from base
    reflection.
  * Method handles deal in class mirrors, so refinements are erased from
    method
    handles.

That’s a lot of erasure, so we have to bake refinement back in where it 
matters,
but we want to be careful to limit the “blast radius” of the refinement
information to where it does actually matter. The new channel that encodes a
refinement type will appear only when needed to carry out the tasks listed
above: field declaration, array creation, and casting.

  * Fields are enhanced with some sort of “refinement” attribute, which (a)
    guards against stores of bad values (the field equivalent of
    |ArrayStoreException|) and (b) enables flatter layouts when the
    refinement
    permits.
  * Array creation (|anewarray| / `multianewarray’) is enhanced to support
    creating arrays with refined component types, enabling the same benefits
    (storage safety / layout flattening.)
  * Casting is enhanced to support refinements. This is needed mostly
    because of
    erasure — we are erasing away refinement information and sometimes
    need to
    reassert it.
  * When we get to specialization, |new| is enhanced to support
    refinements, and
    possibly method declarations (to enable calling convention
    optimization in
    the presence of highly specialized types like |Point[int,int]|.)

We had previously been assuming that |[QPoint| is somehow more of a 
“real” type
than (specialized) |Point[int,int]|, but I think we are better served seeing
them both as refinements, where we continue to report a broad type but
sort-of-secretly use refinement information to optimize layout.

    A strawman

What follows is a strawman that eliminates Qs completely, replacing the 
few jobs
Q has (field layout, array layout, and casts) with a single mechanism for
refinement types which stays in the background until explicitly summoned. We
believe the model outlined here can extend cleanly to species, as well 
as |B1!|
types like |String!| as well. Call this No-Q world. This should not be taken
as a concrete proposal, as much as a sketch of the concepts and the 
players.

We have come to believe that adding Q descriptors to the JVM specification,
while perhaps the right move in a from-scratch VM design, would be 
overreach as
an evolutionary step. For old APIs to adopt new descriptors will require 
many
bridge methods with complex properties. To avoid such bridges, old APIs 
would
be forbidden from mentioning the new types. For these reasons, new 
descriptors,
and the mirrors that would accompany them, are quite literally a bridge 
too far.
Accordingly, in No-Q world, descriptors reclaim their former role: 
describing
primitives and classes. Field and method descriptors will use |L| 
descriptors,
even when carrying a null-restricted value (or a species.) Similarly, class
mirrors return to their former role: describing classfiles and non-refined
VM-derived types (such as array types.)

As a self-imposed rule of this essay, we will not appeal to runtime support,
condy or indy. Everything will be done with bytecodes, descriptors, constant
pool entries, and other classfile structures, and not via specially-known
methods. As this is a strawman, we may indulge in some “wasteful” 
design, which
can be transformed or lumped in later iterations. The new elements of the
design are:

  * A new reflective concept for |RefinementType|, which represents a
    refinement
    of an existing (class) type.
  * A new reflective concept for |RepresentableType|, which is the common
    supertype between |Class| and |RefinementType|.
  * New constant pool forms representing null-restriction of classes and of
    arrays.
  * A new field attribute called |FieldRefinement|.
  * Adjustments to various bytecodes to interact with the new constant pool
    forms.
  * Additions to reflective APIs.

    Refined types

A refined type is a combination of a type (called the base type) and a 
value set
restriction for that type which excludes some values in the value set of the
base type. Null-restricted types, arrays of null-restricted types, and
eventually, species of generics are refined types.

Refined types can be represented by a reflective object

|sealed interface RefinementType<T> implements RepresentableType<T> { 
RepresentableType<T> baseType(); } |

The type parameter |T| represents the base type.

There are initially two implementations of |RefinementType|, which may 
be private,
and are known to the VM:

|private record NullRestrictedClass<T>(Class<T> baseType) implements 
RefinementType<T> { } private record NullRestrictedArray<T extends 
Object[]>(Class<T> baseType) implements RefinementType<T> { } |

        Constant pool entries

The exact form of the constant pool entry (whether new bespoke constant pool
entries, ad-hoc extensions to Constant_Class_info, or condy) can be 
bikeshod at
the appropriate time; there are clearly tradeoffs here.

Initially, null-restricted types must be implicitly constructible (B3), 
which
would be checked when the constant is resolved. Eventually, we can relax
null-restriction to support all class types. Similarly, we may initially
restrict to one-dimensional flat arrays, and leave |multianewarray| to 
its old
job.

        Representable types

|sealed interface RepresentableType<T> { T cast(Object o) throws 
ClassCastException; ... } |

        Refined fields

|FieldRefinement { u2 name_index; // "FieldRefinement" u4 length; u2 
refinement_index; // symbolic reference to a RefinementType } |

The symbolic reference must be to a null-restricted, implicitly 
constructible
class type, not an array type. We may relax this restriction later.

Looking ahead, for the null-restriction of a B1 or B2 class, there is no 
change
to the layout but we could enforce the storage restriction on 
|putfield.| When
we get to species, the refinement for a species may affect the layout, and
attempting to store a value of the wrong species may result in an 
exception or
in an automatic conversion.

It is a free choice as to whether we want to translate a field of type
|Point![]| using an array refinement or fully erase it to |Point[]|.

        Refined casts

        Refined array creation

An |anewarray| may make a symbolic reference to a class refinement type, 
as well
as to a class, array, or interface type.

        Refinement information stays in the background until summoned

The place where we need discipline is avoiding the temptation of “but 
someone
might profitably use the information that this field holds a flat 
array.” Yes,
they might — but supporting that as a general-purpose runtime type (with
descriptor and mirror) has costs.

The model proposed here resists the temptation to redefine mirrors, 
descriptors,
symbolic resolution, and reflection, instead leaning on erasure here for 
both
null-restriction and specialization, and providing a secondary reflective
channel (which almost no users will actually need) to get refinement
information. (An example of code that needs to summon refinement 
information is
Arrays::copy, which would need to fetch the refined component type and
instantiate an array using the refined type; most other reflective code 
would
not need to even be aware of it.)

        Bonus round: specialization

The framework so far seems to accomodate specialization fairly well. 
There’ll
be a new subtype of |RefinementType| to represent a specialization, a 
reflective
method for creating such specialization such as:

|static<T> SpecializedType<T> specialization(Class<T> baseClass, 
RepresentableType<?>... arguments) |

and a new way to get such a type refinement in the constant pool 
(possibly just
a condy whose bootstrap is the above method.) The |new| bytecode is 
extended to
accept a specialization refinement. Field refinements would then be able to
refer to specialization refinements.

    Conclusions

In the current world we have a (mostly) 1:1:1 relationship between runtime
types, descriptors, and mirrors; a model where species/refinements are 
not full
runtime types preserves this. The surface area where refinement information
leaks to users who are not prepared for it is dramatically smaller. 
Refinements
are not full runtime types, they don’t have full Class mirrors. We erase 
down
to real runtime types in descriptors and in reflective API points like
|Object::getClass|. This seems a powerful simplification, and one that 
aligns
with the previous language simplification. To summarize:

  * Yes, we should get rid of Q descriptors, but should do so in a more
    principled way by getting rid of Q as a runtime type entirely,
    replacing it
    with a refinement type which stays in the background until it is
    actually
    needed.
  * We should erase Q from method and field descriptors and from the obvious
    mirrors, because refinement information is on a need-to-know basis.
  * Refinement information primarily flows from source -> classfile ->
    VM, and
    mostly does not flow in the other direction. Specialized reflection
    might
    expose it, but we should do so not on general principles, but based
    on where
    it is actually needed by the programming model.
  * Null restriction is more like specialization than not; they are both
    value
    set refinements that possibly enable layout optimization, and we
    should seek
    to treat them the same.
  * While leaving the door open for additional kinds of species and type
    migration, we use our new powers, at first, only to define
    flattenable fields
    and flattenable one-dimensional arrays.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20230630/8c7986e3/attachment-0001.htm>