Evolving CONSTANT_Class

Thu Jun 4 19:16:59 UTC 2020

Some thoughts, working backwards from species, that may inform this decision.

A *species* is a specialization of a (generic) class or interface, where by "specialization" we mean the class/interface declaration interpreted in the context of a constant pool that has been modified by inserting certain resolved constants.

At the use site (think 'new'), we informally talk about a species like 'List[Val]'. What this means is "the species produced by resolving 'List', resolving 'Val', and modifying the constant pool of 'List' with the resolved 'Val'".

It will also be common to talk about species like 'List[T]', where 'T' is represented by a constant pool entry that will be filled in with a live constant.

This suggests that our representation of a species should combine 1) a pointer to a Class constant, and 2) pointers to other resolvable constants (typically, but maybe not exclusively, representing types).

I think we intuitively want to encode a species with something like 'Class("LList[QVal;];")', but this encoding is flawed:
- There's no constant pool entry to cache the resolution of List
- There's no constant pool entry to cache the resolution of Val
- There's no way to encode a live type argument (List[T]), so we'd need a separate encoding for that
- Depending on the domain of type arguments (can I use an integer?), there's no descriptor string encoding for many other type arguments; again, we'd need a separate encoding

I'm appealing here to a design principle that seems to have driven the original constant pool design: Class constants are for things that get resolved (and can be cached); descriptor strings are little more than fancy names. This principle doesn't always get followed: the verifier sometimes loads classes named by descriptors; array type class constants resolve their element types without a separate entry; more recently, StackMapTables use Class constants to represent types, and MethodTypes resolve method descriptors "as if" there were class constants for all of the parameter types. But I think these, especially the recent ones, are mistakes, and I still think the original notion is a useful separation of concerns that we should try to follow in our design.

Implications, if you buy this argument:

- There's got to be some sort of new CONSTANT_Species entry consisting of pointers to the generic class and the type arguments.

- For class-flavored references that allow species (super_class, interfaces, new, maybe this_class), either a Class can point to a Species, or a Species can appear as an alternative to a Class.

- For type-flavored references (Methodref, instanceof, anewarray), again we need either a Class/Type that can point to the Species, or we allow the Species as an alternative to be referenced directly. A distinct problem here is that we need a way to express whether the species type is an L type or a Q type. Maybe that's an extra layer, or maybe it's built into CONSTANT_Species. (This is really the same problem as what we do about L vs. Q class types, but without the legacy constraints.)

- For bare descriptors (type of a field), it's fine to use something like "LList[QVal;];". Or maybe it's useful to describe descriptors in terms of Class/Species constants. In any case, there's still a need to figure out how to parameterize a descriptor with live constants ("LList[$T];"), but I think this can be set aside as a separate problem.

-----

Bonus round: generic methods.

Generic methods work a lot like species—at the use site, we need to be able to refer to a method in the context of a constant pool that has been modified by inserting certain resolved constants. (We might even want to use the term "species" here, too. Or maybe it's "specialized method", where "specialized class" = "species".)

The existing representation of a method to be invoked is a Methodref, which has pointers to a Class constant, a name string, and a descriptor string.

So I think we need CONSTANT_SpecializedMethodref, which has 1) a pointer to a Methodref constant, and 2) pointers to some resolvable constants (typically, but maybe not exclusively, representing types). (Caveat: there are some details about the interaction between type arguments, overriding, and method resolution that I'm hand-waving about. Maybe the encoding will be stacked a little differently.)

Again, we can either somehow wrap the SpecializedMethodref in a Methodref (this seems a lot more awkward that it does when wrapping a Species in a Class), or we can allow the use sites (invoke instructions, mostly) to point to either Methodrefs or SpecializedMethodrefs.

-----

Where this leaves me (acknowledging that I've made some leaps that some people might be more skeptical of) is pretty down on options (1) and (2). If we do (4), CONSTANT_Type is going to be heavily overloaded: it can refer to a descriptor, a SpeciesType, an ArrayType (for arrays of species types), a type variable, etc. Basically, the distinction between (3) and (4) amounts to whether outside references can point to one of many alternatives, or whether they're all routed through a CONSTANT_Type, which then points to one of the alternatives. I can imagine good arguments for both of those alternatives.