We don't need no stinkin' Q descriptors

Thu Jul 27 20:29:46 UTC 2023

Overall, this approach sounds reasonable to me and aligns with the
direction we've been discussing during EG meetings.  I do have some
concerns about how it will be lowered to the classfile.

If I read this correctly, only two bytecodes will be impacted: checkcast
and anewarray.  Is that correct?

Both bytecodes currently take a constant pool index to a CONSTANT_Class.
One of the benefits of "Q"s was that they allowed us to "smuggle" the extra
information we needed into the existing CONSTANT_Class without needing to
introduce new constant pool forms.  With the change to use RefinementTypes,
I'm not clear on how we'll express the information in the constant pool
without needing to make the bytecodes accept two different CP entry types
which is unfortunate (mostly for the interpreter).

One option is to introduce a new CONSTANT_RefinementType entry that
expresses both the RefinementType ("rfType") and the base type ("baseType")
being refined:

CONSTANT_RefinementType_Info {
  u1 tag; // CONSTANT_RefinementType 21
  u2 rfType_idx; // CONSTANT_Class for the RefinementType
{NullRestrictedClass, NullRestrictedArray}
  u2 baseType_idx; // CONSTANT_Class for the baseType
}

with some resolution rules that require "rtType" to be a subclass of
RefinementType and "baseType" to agree with the constraints required by the
rtType {implicitly constructable B3, or array of
implicitly constructable B3}.

A naive interpreter implementation would need to check the "tag" in
checkcast/anewarray before reading the CP data but clever encodings should
be able to avoid that overhead.  And the JIT of course can figure it out
cheaply at compile time so no extra cost for the JIT.

This encoding won't necessarily create an instance of the RefinementType
while resolving the CP entry, but it would have all the data necessary for
the VM to enforce the requirements.  It does push the knowledge of the
restrictions for the refinement types into the VM (unfortunate) but there
may be ways to pull those back up into the class library by using an upcall
during resolution.

This is a rough sketch of a possible CP encoding to ensure there's a
reasonable path forward here.  Anyone have a better encoding or other
thoughts on how this could be lowered to the classfile?

I was also wondering about how this will be expressed in MethodHandles if
we don't differentiate Q from L any more.... It means we don't have a
java.lang.Class for the "Q" flavour so we can't use MethodTypes to
differentiate any more (and that includes losing asType / castArguments to
do casts).  Would we need to add a new MH combinator to handle
RefinementType casts?

Sorry if this is jumping too far into the weeds; my thought process was to
poke at challenging areas and ensure the bits hold together.

--Dan

On Fri, Jun 30, 2023 at 4:52 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> This mail summarizes some discussions we’ve been having about eliminating
> Q descriptors from the VM design. Over time, we’ve been giving Q fewer and
> fewer jobs to do, to the point where (perhaps surprisingly) we can replace
> the remaining jobs with less intrusive mechanisms. Additionally, as the
> language model has simplified, the gap between the language and VM has
> increased, and the proposal herein offers a path to narrowing that gap.
>
> I’ll be on vacation for a while, but Dan and John will be able to carry
> forward this discussion.
>
> Please bear in mind that this is a very rough draft of direction; we don’t
> need to bikeshed anything right now, as much as agree that there is a
> better, simpler, more aligned direction than we had previously.
> We don’t need no stinkin’ Q types
>
> In the last six months, we made a significant breakthrough at the
> language/user
> level — to decompose B3 with its value and reference companions, into two
> simpler concepts: implicit constructibility (a declaration-site property)
> and
> null restriction (a use-site property.) The .ref/.val distinction, and all
> its
> excess complexity, stemmed from the mistaken desire to model the
> int/Integer
> divide directly. By breaking B3-ness down into more “primitive” properties
> (some of which are shared with non-B3 classes), we arrived at a simpler
> model;
> no more ref/val projections, and more uniform treatment of X! (including
> for B1
> and B2 classes).
>
> As we worked through the language and translation details, we continued to
> seek
> a lower energy state. We concluded that we can erase X! to LX; in a number
> of places (locals, method descriptors, verifier type system) while still
> meeting
> our performance objectives. Doing so eliminates a number of issues with
> method
> resolution and distinguishing overloads from overrides. In fact, we found
> ourselves using Q for fewer and fewer things, at which point we started to
> ask
> ourselves: do we need Q descriptors at all?
>
> In our VM, there is a (mostly) 1-1-1 correspondence between runtime types,
> descriptors, and class mirrors. In a world where QFoo and LFoo are separate
> runtime types, it makes sense for them to have their own descriptors and
> mirrors. But as Foo! and Foo? have come together in the language, mapping
> to a VM which seems them as separate runtime types starts to show gaps.
>
> The role of Q has historically been one of “other”, rather than something
> on its
> own; any class which had a Q type, also had an L type, and Q was the “other
> flavor.” The “two flavors” orientation made sense when we were modeling the
> int/Integer split; we needed two flavors for that in both language and VM.
> The
> language since discovered that we can break down the int/Integer divide
> into two
> more primitive notions — implicit constructibility (an int can be used
> without
> calling a constructor, an Integer cannot) and non-nullity (non-identity
> plus
> default constructibility plus non-nullity unlocks flattening.)
>
> If Q is a valid descriptor and there is always a Q mirror, we are in a
> stable
> place with respect to runtime types. But if we intend to allow m(Foo!) to
> override m(Foo?), to be tolerant of bang-mismatches in method resolution,
> and
> give Q fewer jobs, then we are moving to an unstable place. We’ve explored
> a
> number of “only use Q for certain things” positions, and have found many
> of them
> to be unstable in various ways. The other stable point is that there are
> no Q
> types, and no Q mirrors — but then we need some new channel to encode the
> request to exclude null, and so give the VM the flattening hint that is
> needed.
>
> As it turns out, there are surprisingly few places that truly need such a
> new
> channel. We basically need the VM to take “Q-ness” into account in three
> places:
>
>    - Field layout — a field of type Foo! (where Foo is implicitly
>    constructible) needs a hint that this field is null-restricted, so we
>    can lay
>    it out flat.
>    - Array layout — at the point of anewarray and friends, we need a hint
>    when
>    the component type is an implicitly-constructible, null-restricted
>    type.
>    - Casting — casts need to be able to express a value-set check for the
>    restricted value set of Foo! as well as the unrestricted value set of
>    Foo.
>
> We are convinced that these three are all that is truly required to get the
> flattening we want. So rather than invent new runtime types / mirrors /
> descriptors that are going to flow everywhere (into reflection, method
> handles,
> verification, etc), let’s invent the minimal additional classfile surface
> and VM
> model to model that. At the same time, let’s make sure that the new thing
> aligns with the new language model, where the star of the show is
> null-restricted types.
> What about species?
>
> In separate investigations, we have a notion of “species” for a long time,
> which
> we know we’re going to need when we get to specialization. Species form a
> partition of a classes instances; every instance of a class belongs to
> exactly
> one species, and different species may have different layouts and value set
> restrictions. And we struggled with species for a long time over the same
> runtime type affordances (mirrors and descriptors) — what does a field
> descriptor for a field of type ArrayList<int> look like? What does
> getClass
> return?
>
> In both cases, the constraints of compatibility have been pushing us
> towards
> more erasure in descriptors and reflection, with side channels to
> reconstruct
> information necessary for optimized heap layout, and with separate API
> points
> for getClass vs getSpecies. While specialization is considerably more
> complicated, nearly all the same considerations (descriptors, mirrors,
> reflection) are present for null-restriction types. We took an earlier
> swing at
> unifying the two under the rubric of “type restrictions”, but I think our
> model
> wasn’t quite clean enough at the time to admit this unification. But I
> think we
> are now (almost) there, and the payoff is big.
>
> What we concluded around species and specialization is that we would have
> to
> continue to erase descriptors (ArrayList<int> as a method or field
> descriptor
> continues to erase to LArrayList;), that getClass returns the primary
> mirror
> (ArrayList), and that species information is pushed into a side channel.
> These are pretty much the exact same considerations as for null-restriction
> types.
> Species and bang types are *refinement types*
>
> A *refinement type* is a type whose value set is that of another type,
> plus a
> predicate restricting the value set. A “bang” type Point! is a refinement
> of
> Point, where we eliminate the value null. (Other well-known refinement
> types
> from PL history include C enums and Pascal ranges.) Refinement types are
> often
> erased to their base type, but some refinements enable better layout.
> Indeed,
> our interest in Q types is flattening, and for an implicitly constructible
> class, a variable holding a null-excluding type can be flattened.
> Similarly,
> for a sufficiently constrained generic type (e.g., Point[int,int]), the
> layout
> of such a variable can be flattened as well.
>
> What we previously called “type restrictions” in the Parametric
> VM
> <https://github.com/openjdk/valhalla-docs/blob/main/site/design-notes/parametric-vm/parametric-vm.md#type-restricted-methods-and-fields-and-the-typerestriction-attribute>
> document is in fact a refinement type. We claim that we can design the
> null-restriction channel in such a way that it can be extended, in some
> reasonable way, to support more general specialization.
>
> Both specialization, and null-restriction, are forms of refinement types.
> Given
> that we’ve already discovered that we need to erase these to their primary
> (L)
> type in a lot of places, let’s stake out some general principles for
> representing refinements in the VM:
>
>    - Refinement types are erased to their base type in method and field
>    descriptors.
>    - Refinement types do not have *class* mirrors.
>    - Object::getClass returns a class mirror.
>    - Reflection deals in class mirrors, so refinements are erased from
>    base
>    reflection.
>    - Method handles deal in class mirrors, so refinements are erased from
>    method
>    handles.
>
> That’s a lot of erasure, so we have to bake refinement back in where it
> matters,
> but we want to be careful to limit the “blast radius” of the refinement
> information to where it does actually matter. The new channel that encodes
> a
> refinement type will appear only when needed to carry out the tasks listed
> above: field declaration, array creation, and casting.
>
>    - Fields are enhanced with some sort of “refinement” attribute, which
>    (a)
>    guards against stores of bad values (the field equivalent of
>    ArrayStoreException) and (b) enables flatter layouts when the
>    refinement
>    permits.
>    - Array creation (anewarray / `multianewarray’) is enhanced to support
>    creating arrays with refined component types, enabling the same
>    benefits
>    (storage safety / layout flattening.)
>    - Casting is enhanced to support refinements. This is needed mostly
>    because of
>    erasure — we are erasing away refinement information and sometimes
>    need to
>    reassert it.
>    - When we get to specialization, new is enhanced to support
>    refinements, and
>    possibly method declarations (to enable calling convention
>    optimization in
>    the presence of highly specialized types like Point[int,int].)
>
> We had previously been assuming that [QPoint is somehow more of a “real”
> type
> than (specialized) Point[int,int], but I think we are better served seeing
> them both as refinements, where we continue to report a broad type but
> sort-of-secretly use refinement information to optimize layout.
> A strawman
>
> What follows is a strawman that eliminates Qs completely, replacing the
> few jobs
> Q has (field layout, array layout, and casts) with a single mechanism for
> refinement types which stays in the background until explicitly summoned.
> We
> believe the model outlined here can extend cleanly to species, as well as
> B1!
> types like String! as well. Call this No-Q world. This should not be taken
> as a concrete proposal, as much as a sketch of the concepts and the
> players.
>
> We have come to believe that adding Q descriptors to the JVM specification,
> while perhaps the right move in a from-scratch VM design, would be
> overreach as
> an evolutionary step. For old APIs to adopt new descriptors will require
> many
> bridge methods with complex properties. To avoid such bridges, old APIs
> would
> be forbidden from mentioning the new types. For these reasons, new
> descriptors,
> and the mirrors that would accompany them, are quite literally a bridge
> too far.
> Accordingly, in No-Q world, descriptors reclaim their former role:
> describing
> primitives and classes. Field and method descriptors will use L
> descriptors,
> even when carrying a null-restricted value (or a species.) Similarly, class
> mirrors return to their former role: describing classfiles and non-refined
> VM-derived types (such as array types.)
>
> As a self-imposed rule of this essay, we will not appeal to runtime
> support,
> condy or indy. Everything will be done with bytecodes, descriptors,
> constant
> pool entries, and other classfile structures, and not via specially-known
> methods. As this is a strawman, we may indulge in some “wasteful” design,
> which
> can be transformed or lumped in later iterations. The new elements of the
> design are:
>
>    - A new reflective concept for RefinementType, which represents a
>    refinement
>    of an existing (class) type.
>    - A new reflective concept for RepresentableType, which is the common
>    supertype between Class and RefinementType.
>    - New constant pool forms representing null-restriction of classes and
>    of
>    arrays.
>    - A new field attribute called FieldRefinement.
>    - Adjustments to various bytecodes to interact with the new constant
>    pool
>    forms.
>    - Additions to reflective APIs.
>
> Refined types
>
> A refined type is a combination of a type (called the base type) and a
> value set
> restriction for that type which excludes some values in the value set of
> the
> base type. Null-restricted types, arrays of null-restricted types, and
> eventually, species of generics are refined types.
>
> Refined types can be represented by a reflective object
>
> sealed interface RefinementType<T> implements RepresentableType<T> {
>     RepresentableType<T> baseType();
> }
>
> The type parameter T represents the base type.
>
> There are initially two implementations of RefinementType, which may be
> private,
> and are known to the VM:
>
> private record NullRestrictedClass<T>(Class<T> baseType)
>         implements RefinementType<T> { }
>
> private record NullRestrictedArray<T extends Object[]>(Class<T> baseType)
>         implements RefinementType<T> { }
>
> Constant pool entries
>
> The two jobs for null restriction must be representable in the constant
> pool: a
> null-restricted B3, and an array of a null-restricted B3. (These
> correspond to
> Constant_Class_info with a descriptor of QFoo; and [QFoo; in the
> traditional design.) In addition to being referenced by bytecodes and
> attributes, such constants should ideally be loadable, evaluating to a
> RefinementType or RepresentableType.
>
> The exact form of the constant pool entry (whether new bespoke constant
> pool
> entries, ad-hoc extensions to Constant_Class_info, or condy) can be
> bikeshod at
> the appropriate time; there are clearly tradeoffs here.
>
> Initially, null-restricted types must be implicitly constructible (B3),
> which
> would be checked when the constant is resolved. Eventually, we can relax
> null-restriction to support all class types. Similarly, we may initially
> restrict to one-dimensional flat arrays, and leave multianewarray to its
> old
> job.
> Representable types
>
> The new common superinterface between Class and RefinementType exists so
> that
> both classes and class refinements can be used as array components, type
> parameters for specializations, etc. Some operations from Class, such as
> casting, may be pulled up into this interface.
>
> sealed interface RepresentableType<T> {
>     T cast(Object o) throws ClassCastException;
>     ...
> }
>
> Refined fields
>
> Any field whose type is a null-restricted implicitly constructible class
> may be
> considered by the VM as a candidate for flattening. Rather than using
> field_info.descriptor_index to encode a null-restricted type, we continue
> to
> erase to the traditional L descriptor, but add a FieldRefinement attribute
> on the field. Similarly, Constant_FieldRef_info continues to link fields
> using the L descriptor.
>
> FieldRefinement {
>     u2 name_index;        // "FieldRefinement"
>     u4 length;
>     u2 refinement_index;  // symbolic reference to a RefinementType
> }
>
> The symbolic reference must be to a null-restricted, implicitly
> constructible
> class type, not an array type. We may relax this restriction later.
>
> Additionally, a field refinement may affect the behavior of putfield. For
> a
> null-restricted class, attempts to putfield a null will result in
> NullPointerException (or perhaps a more general FieldStoreException.)
>
> Looking ahead, for the null-restriction of a B1 or B2 class, there is no
> change
> to the layout but we could enforce the storage restriction on putfield.
> When
> we get to species, the refinement for a species may affect the layout, and
> attempting to store a value of the wrong species may result in an
> exception or
> in an automatic conversion.
>
> It is a free choice as to whether we want to translate a field of type
> Point![] using an array refinement or fully erase it to Point[].
> Refined casts
>
> The operand of a checkcast or instanceof may be a symbolic reference to a
> class or refinement. (Since instanceof is null-hostile, changing
> instanceof
> is not necessary now, but when we get to species, we will need to be able
> to
> test for species membership.) The cast operation may be pulled up from
> Class to RepresentableType so that casts can be done reflectively with
> either a Class or a refinement.
> Refined array creation
>
> An anewarray may make a symbolic reference to a class refinement type, as
> well
> as to a class, array, or interface type.
>
> For a refined array, a.getClass() continues to return the primary mirror
> for
> the array type, and Class::getComponentType on that array continues to
> return
> the primary mirror for the component type, but we may provide an
> additional API
> point akin to getComponentType that returns a RepresentableType which may
> be
> a RefinementType.
>
> Arrays of null-restricted values can be created reflectively; the existing
> Array::newInstance method will get an overload that takes
> RepresentableType.
> Arrays::copyOf when presented with a refined array type will create a
> refined
> array.
> Refinement information stays in the background until summoned
>
> The place where we need discipline is avoiding the temptation of “but
> someone
> might profitably use the information that this field holds a flat array.”
> Yes,
> they might — but supporting that as a general-purpose runtime type (with
> descriptor and mirror) has costs.
>
> The model proposed here resists the temptation to redefine mirrors,
> descriptors,
> symbolic resolution, and reflection, instead leaning on erasure here for
> both
> null-restriction and specialization, and providing a secondary reflective
> channel (which almost no users will actually need) to get refinement
> information. (An example of code that needs to summon refinement
> information is
> Arrays::copy, which would need to fetch the refined component type and
> instantiate an array using the refined type; most other reflective code
> would
> not need to even be aware of it.)
> Bonus round: specialization
>
> The framework so far seems to accomodate specialization fairly well.
> There’ll
> be a new subtype of RefinementType to represent a specialization, a
> reflective
> method for creating such specialization such as:
>
> static<T> SpecializedType<T> specialization(Class<T> baseClass,
>                                             RepresentableType<?>... arguments)
>
> and a new way to get such a type refinement in the constant pool (possibly
> just
> a condy whose bootstrap is the above method.) The new bytecode is
> extended to
> accept a specialization refinement. Field refinements would then be able to
> refer to specialization refinements.
> Conclusions
>
> In the current world we have a (mostly) 1:1:1 relationship between runtime
> types, descriptors, and mirrors; a model where species/refinements are not
> full
> runtime types preserves this. The surface area where refinement information
> leaks to users who are not prepared for it is dramatically smaller.
> Refinements
> are not full runtime types, they don’t have full Class mirrors. We erase
> down
> to real runtime types in descriptors and in reflective API points like
> Object::getClass. This seems a powerful simplification, and one that
> aligns
> with the previous language simplification. To summarize:
>
>    - Yes, we should get rid of Q descriptors, but should do so in a more
>    principled way by getting rid of Q as a runtime type entirely,
>    replacing it
>    with a refinement type which stays in the background until it is
>    actually
>    needed.
>    - We should erase Q from method and field descriptors and from the
>    obvious
>    mirrors, because refinement information is on a need-to-know basis.
>    - Refinement information primarily flows from source -> classfile ->
>    VM, and
>    mostly does not flow in the other direction. Specialized reflection
>    might
>    expose it, but we should do so not on general principles, but based on
>    where
>    it is actually needed by the programming model.
>    - Null restriction is more like specialization than not; they are both
>    value
>    set refinements that possibly enable layout optimization, and we
>    should seek
>    to treat them the same.
>    - While leaving the door open for additional kinds of species and type
>    migration, we use our new powers, at first, only to define flattenable
>    fields
>    and flattenable one-dimensional arrays.
>
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20230727/56e4c3cc/attachment-0001.htm>