What's in a CONSTANT_Class?

Dan Smith daniel.smith at oracle.com
Fri Jun 9 00:00:34 UTC 2017


Some initial notes below attempting to flesh out what our two long-term options look like.

> On Jun 7, 2017, at 1:53 PM, John Rose <john.r.rose at oracle.com> wrote:

> Comparing these options in detail makes me comfortable with
> declaring that a CONSTANT_Class is *mainly* a file reference,
> and *also* an L-mode type.

Let me highlight this as the source of all these problems. Trying to make a single constant pool entry represent two different things is painful. It leads to confusion about the model, tortured language explaining basic things like what gets "returned" from resolution, attempts to explain away cases that don't follow the rules, bugs, etc.

That said, we must live with the legacy of years ago and make the best of it. Looking at the two viable strategies:

> 1. Wrap a new CP node (a "mode node") around the file-oriented C_Class node - Q[Class["Foo"]]

Here's the syntax I would use, more or less:

CONSTANT_Class_info {
    u1 tag; // 7
    u2 name_index; // Utf8
}

CONSTANT_PrimitiveType_info {
   u1 tag; // 19
   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
                 // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
}

CONSTANT_ClassType_info {
   u1 tag; // 20
   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
   u2 class_index; // Class
}

CONSTANT_ArrayType_info {
   u1 tag; // 21
   u2 component_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
}

CONSTANT_SpeciesType_info {
    u1 tag; //22
    u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
    u2 class_index; // Class
    u2 enclosing_index; // ClassType or SpeciesType
    u2 typearg_count;
    u2 typeargs[typearg_count]; // PrimitiveType, ClassType, ArrayType, or SpeciesType
}

CONSTANT_MethodDescriptor_info {
   u1 tag; // 23
   u2 parameter_count;
   u2 parameter_descriptors[parameter_count]; // PrimitiveType, ClassType, ArrayType, or SpeciesType
   u2 return_descriptor; // PrimitiveType, ClassType, ArrayType, SpeciesType, or 0 (void)
}

CONSTANT_FieldDescriptor_info { // is this wrapper useful?
    u1 tag; // 24
    u2 type_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
}

(I thought about a CONSTANT_Type_info union rather than all these flavors of type constants, but it's not great because 1) constant pool entries already form a tagged union, so we don't need another union layer, and 2) CONSTANT_Class_info can also be used to represent types—once you've got 2 flavors, might as well have 5+.)


> 2. Insert a new CP node inside the type-oriented C_Class node - Class[Q["Foo"]] or Class[Q[File["Foo"]]]

Possible syntax for this:

CONSTANT_Class_info {
    u1 tag; // 7
    u2 name_index; // Utf8, PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, SpeciesDescriptor
}

CONSTANT_PrimitiveDescriptor_info {
   u1 tag; // 19
   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
                 // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
}

CONSTANT_ClassDescriptor_info {
   u1 tag; // 20
   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
   u2 class_index; // ClassFile
}

CONSTANT_ClassFile_info {
   u1 tag; // 25
   u2 class_index; // Utf8
}

CONSTANT_ArrayDescriptor_info {
   u1 tag; // 21
   u2 component_index; // PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, or SpeciesDescriptor
}

CONSTANT_SpeciesDescriptor_info {
    u1 tag; //22
    u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
    u2 class_index; // ClassFile
    u2 enclosing_index; // ClassDescriptor or SpeciesDescriptor
    u2 typearg_count;
    u2 typeargs[typearg_count]; // PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, or SpeciesDescriptor
}

CONSTANT_MethodDescriptor_info {
   u1 tag; // 23
   u2 parameter_count;
   u2 parameter_descriptors[parameter_count]; // PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, or SpeciesDescriptor
   u2 return_descriptor; // PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, SpeciesDescriptor, or 0 (void)
}

CONSTANT_FieldDescriptor_info { // is this wrapper useful?
    u1 tag; // 24
    u2 type_index; // PrimitiveDescriptor, ClassDescriptor, ArrayDescriptor, or SpeciesDescriptor
}

--------

Here's an overview of spec changes, assuming one of the sets of syntactic changes above. As I look at this, both approaches seem mostly fine. Option (1) has messier rules for resolution, because it has to deal with the duality of CONSTANT_Class. Option (2) has messier treatment of this_class, in exchange for eliminating the duality of CONSTANT_Class.

The rules about where types can appear can be additive (new constants allowed in certain places) or negative (certain kinds of CONSTANT_Class disallowed in certain places), but either way, you've *mostly* got to touch all of the same places.


Syntax

Need to describe where certain kinds of types or class references can appear. In option (1), some of this can be enforced to some extent by limiting the types of constants allowed in certain places. But, generally, both option (1) and option (2) will need informal format or static constraints (4.8, 4.9.1) that disallow certain structures that encode certain kinds of types.

Descriptors of fields/methods can be expressed as strings or MethodDescriptor/FieldDescriptor structures. CONSTANT_NameAndType, CONSTANT_MethodType, LocalVariableTable, and annotations allow descriptor_index to point to any of these (prohibiting method or field descriptors as appropriate). "The same descriptor" is defined as a recursive comparison of the parts. It does not involve resolution or loading. It allows a string descriptor to possibly match a structured MethodDescriptor/FieldDescriptor. (This definition applies, among other things, to the prohibition of duplicate field/method declarations.)

A (maybe) comprehensive list of where classes/types can appear:

- Simple class references (CONSTANT_Class with a simple class name for (1), CONSTANT_Class representing a class type for (2)):
ClassFile.this_type
InnerClasses
EnclosingMethod

(All we want is the class, but for compatibility a CONSTANT_Class must be allowed here, so (2) takes the position that these are encoded as types.)

- Any class type (CONSTANT_Class with a simple class name or CONSTANT_ClassType/CONSTANT_SpeciesType for (1), CONSTANT_Class representing a class type for (2)):
ClassFile.super_class
Fieldref.class_index
Methodref.class_index
InterfaceMethodref.class_index

- Reference class type (CONSTANT_Class or CONSTANT_ClassType/CONSTANT_SpeciesType representing a reference class type for (1), CONSTANT_Class representing a reference class type for (2)):
new
Code.exception_table.catch_type
Exceptions.exception_index_table

- Array type (CONSTANT_Class representing an array type or CONSTANT_ArrayType for (1), CONSTANT_Class representing an array type for (2)):
multianewarray

- Reference type (CONSTANT_Class, CONSTANT_ArrayType, or CONSTANT_ClassType/CONSTANT_SpeciesType repesenting a reference class type for (1), CONSTANT_Class representing a reference type for (2)):
instanceof
checkcast

- Any type (CONSTANT_Class, CONSTANT_ArrayType, CONSTANT_ClassType, CONSTANT_SpeciesType, or CONSTANT_PrimitiveType for (1), CONSTANT_Class for (2)):
anewarray
ldc
verification_type_info.Object_variable_info
BootstrapMethods.bootstrap_arguments


Verification

- Types and descriptors of all forms can be parsed to verification types without any resolution or loading. (Many of the changes in the current value classes spec are there to support this.)


Resolution

For (1), a CONSTANT_Class can be "resolved" or "resolved as a type". Plain resolution is only allowed where we've asserted that the name is not an array type descriptor. It produces a loaded class. In contexts where type structures can appear, if a CONSTANT_Class is also allowed, resolving the type implicitly means the CONSTANT_Class is "resolved as a type", which will treat it as a ClassType with mode 'L'. Resolution of a type produces a java.lang.Class (or some equivalent internal representation).

For (2), a CONSTANT_ClassFile is always resolved to a loaded class. A CONSTANT_Class is always resolved to a type.

In either case, descriptors are not resolved. (This includes all the type-related structures called "descriptors" in (2). Though the implementation might choose to lazily cache some resolved types with them.)


Semantics

- Various cleanups to ensure that, downstream from resolution, we're talking about "types" rather than "classes and interfaces". (Again, much of this is already in the value classes spec.)


—Dan


More information about the valhalla-spec-observers mailing list