What's in a CONSTANT_Class?

John Rose john.r.rose at oracle.com
Fri Jun 9 02:44:25 UTC 2017


(more comments)

On Jun 8, 2017, at 5:00 PM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
> 
> CONSTANT_Class_info {
>    u1 tag; // 7
>    u2 name_index; // Utf8
> }

If we decide to sideline the previous guy as a False Friend,
then this is the place where resolution really happens:

CONSTANT_ClassFile_info {
   u1 tag; // 25
   u2 name_index; // Utf8
}

> CONSTANT_PrimitiveType_info {
>   u1 tag; // 19
>   u1 type_code; // 'Z'=90 or 4, 'C'=67 or 5, 'B'=66 or 8, 'S'=83 or 9
>                 // 'I'=73 or 10, 'J'=74 or 11, 'F'=70 or 6, 'D'=68 or 7
> }

Alternative encoding:  Assign a compact range of tags 32..39,
one per primitive.  Another alternative:  Hardwire the top 8 CP
indexes (starting at 2^16-9).  But these alternatives just remove
a minor eyesore from class files; instead of lots of UTF8 encodings
there will be a little dance at the beginning of every CP that
recalls to mind the perennial favorites 'int', 'boolean', etc.
For the CP type system, one type for primitives is better,
I guess.

I slightly prefer the smaller code points, because they are
easier to decode with a short array.  But a perfect hash code
would be a clever alternative for either encoding.

If we use Utf8 strings for types (in non-legacy CP structure)
then the actual ASCII code points would be more appealing.

> 
> CONSTANT_ClassType_info {
>   u1 tag; // 20
>   u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
>   u2 class_index; // Class
s/Class/ClassFile/
> }
> 
> CONSTANT_ArrayType_info {
>   u1 tag; // 21
>   u2 component_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
> }
> 
> CONSTANT_SpeciesType_info {
>    u1 tag; //22
>    u1 mode_code; // 'L'=76 or 12, 'Q'=81 or 13
>    u2 class_index; // Class
>    u2 enclosing_index; // ClassType or SpeciesType
>    u2 typearg_count;
>    u2 typeargs[typearg_count]; // PrimitiveType, ClassType, ArrayType, or SpeciesType
> }
s/Class/ClassFile/
…which raises the question of whether the species is type-like or file-like.
The mode_code also raises this question.
Why must a mode also be assigned when a template is expanded?
When a class file is loaded, a mode is not assigned.
Perhaps both class files and species are "pre-types", things with
names and typed members, but which are not yet themselves types.

> CONSTANT_MethodDescriptor_info {
>   u1 tag; // 23
>   u2 parameter_count;
>   u2 parameter_descriptors[parameter_count]; // PrimitiveType, ClassType, ArrayType, or SpeciesType
>   u2 return_descriptor; // PrimitiveType, ClassType, ArrayType, SpeciesType, or 0 (void)
> }

The void quasi-type should be lumped into PrimitiveType, for the sake
of ldc (void.class).

> CONSTANT_FieldDescriptor_info { // is this wrapper useful?
>    u1 tag; // 24
>    u2 type_index; // PrimitiveType, ClassType, ArrayType, or SpeciesType
> }

I don't think this wrapper is useful.  Instead we have the lopsided
distinction between the star in FieldRef[,NameAndType[,*]] and the star
in MethodRef[,NameAndType[,*]].  In the case of FieldRef, it is any
of the types (but not PT-void), and in the case of MethodRef, it is
a MethodDescriptor.

MethodDescriptor is an extra tricky nut to crack here, I think, because
it has an unlimited arity.  That makes logical sense, but major JVMs
(IBM, ours) have baked in an assumption that CP entries are fixed
in size except for Utf8 strings.  In JSR 292 we pushed the BSM
specifiers into a side table for this reason.  We could put method
descriptor lists into a similar side table.  I don't have a good suggestion
here.  For method types the flat Utf8 strings are seductive, at least until
you have 100 repetitions of the substring "Ljava/lang/Object;".

If we break the arity limit of 2, then we should also consider merging
NameAndType into FieldRef and MethodRef, at which point the
genericity of NameAndType becomes moot.  The three components
of a FieldRef would be (holder:ClassType,name:Utf8,:type:XType)
and the components of a MethodRef could be (holder:ClassType,
name:Utf8,descr:MethodDescriptor).  At that point the MethodD.
could be unfolded into the MethodRef, right?  Then the only
high-arity node would be MethodRef.  (Except for C_MethodType.
But that could be made a legacy guy also, since he is built on
top of flat strings, and condy can materialize him easily enough.)

> (I thought about a CONSTANT_Type_info union rather than all these flavors of type constants, but it's not great because 1) constant pool entries already form a tagged union, so we don't need another union layer, and 2) CONSTANT_Class_info can also be used to represent types—once you've got 2 flavors, might as well have 5+.)

Yep.  And you could push that a little farther by giving each
PrimitiveType its own tag.  The PTs are the odd thing here.
There are no constants except them that have a payload
of less than a byte.  Just as constants seem to have a
maximum size (arity 2) they also seem to have a minimum
size (32 bits or so).  Note that very small integer constants
(which would correspond to PT sub-tags) are *not* usually
stored in the CP; they are loaded with short instructions
like "bipush", not "ldc".

— John



More information about the valhalla-spec-observers mailing list