Minimal Parametric VM ?
Remi Forax
forax at univ-mlv.fr
Tue Jan 31 10:25:42 UTC 2023
Hi all,
I've started to implement a prototype (far from finished) of the parametric VM based on John position paper.
https://github.com/forax/civilizer
Most of the design is great (even really great) but i think it goes too deep and there is a minimal parametric vm that is hidden inside.
By Minimal Parameteric VM, or MPVM, i mean a barebone design which is just enough to able to specialize parametric class and parametric method so a List<Complex> is using an array Complex instead of an array of Object which seems a nice intermediary goal.
So i propose to simplify the design as an intermediary step with the explicit goal that the MPVM should be able to specialize generics over value types, not more.
The main difference is that the MPVM does not need to deal with subtyping of parametrized classes, so the opcode checkcast and instanceof does not need to be specialized and calling methods on a parametric class does not require the owner+type parameters to be reified in the bytecode.
- The Parametric attribute:
A parametric class or a parametric method as declared as such if there is a the class attribute (the method attribute) Parametric is defined.
You can not have more than one Parametric attribute by class/method.
Parametric_attribute {
u2 attribute_name_index; // Parametric
u4 attribute_length;
u2 anchor_index;
}
A parametric attribute references a CONSTANT_Anchor_info that after resolution stores a couple of Objects, the first one is the class parameter, the second one is the method parameter. It works that way.
CONSTANT_Anchor_info {
u1 tag; // CONSTANT_Anchor = 21
u2 bootstrap_method_attr_index; at runtime, CallSite.target: MH (Anchor)Anchor
}
When a parametric class/parametric method is instantiated with a parameter, the VM creates an Anchor object containing the parameter. The bootstrap method of the CONSTANT_Anchor_info is called to get a method handle (that takes an Anchor and returns an Anchor). The target of the BSM is called with the anchor created by the VM and here the jdk code can erase the parameter or do whatever should be done. The resulting Anchor is stored as result in a constant pool (it becomes a loadable constant that can be referenced by ldc or bootstrap method constants).
The Anchor object is a value record:
value record Anchor(Object parameter) {}
- Parametrized opcodes
The opcodes new, aconst_init, anewarray, invokestatic, invokevirtual, invokeinterface and invokespecial can specify a parameter.
For that, instead of referencing a CONSTANT_Class_info or an XMethodref, they reference a CONSTANT_Linkage_info that itself reference the right constant
CONSTANT_Linkage_info {
u1 tag; // JVM_CONSTANT_Linkage = 22
u2 parameter_index;
u2 reference_index; // CONSTANT_Class_info or XMethodref
}
The parameter_index references a loadable constants (the usuals + CONSTANT_Anchor_info). The reference_index references either a CONSTANT_Class_info or an XMethodref depending on the opcode.
At runtime, the constant referenced by the parameter_index is a Species object for new, aconst_init and anewarray and a Linkage object for the invoke* opcodes.
value record Species(Class<?> raw, Object parameters) {}
value record Linkage(Object parameters) {}
A species object is defined by a runtime class (so it can represent classes that only available at runtime like the secondary type of a zero default value class) and a parameter. A linkage object only store a parameter.
value record Species(Class<?> raw, Object parameters) {}
value record Linkage(Object parameters) {}
Chain of constants and runtime representation depending on the opcode:
new (CONSTANT_Linkage_info -> CONSTANT_Class_info), at runtime Species
aconst_init (CONSTANT_Linkage_info -> CONSTANT_Class_info), at runtime Species
anewarray (CONSTANT_Linkage_info -> CONSTANT_Class_info), at runtime Species
invokestatic, invokevirtual, invokeinterface, invokespecial (CONSTANT_Linkage_info -> XMethodref) at runtime Linkage(parameters)
At runtime, when one of the opcodes new, aconst_init and anewarray is first called, the VM checks that the raw class of the species is parametric, then parameter_index is resolved, then the VM calls the the BSM of the anchor and create a parametric version of the class with the parameter of the Anchor if it does not already exist. This parametric class is stored as the class of the instance created.
Ar runtime, when one of the opcodes invoke* is first call, the parameter_index is resolved, the the VM checks that the raw class of the species is parametric, then parameter_index is resolved, then the VM calls the the BSM of the anchor and create a parametric version of the method with the parameter of the Anchor if it does not already exist.
- Class Pool segregation
Because the Anchors are the roots of the constant dynamic trees, the VM can segregate the constant pool items as described in John's paper.
- Class that inherits/implements parametric class
A class (parametric or not), can reference parametric class/interfaces, so the supername and interfaces of the class header may reference a CONSTANT_Linkage_info (that itself reference a CONSTANT_Class_info) resolved as a Species at runtime.
- Type Restriction
In order to avoid type pollution to propagate, fields and method can defined the attribute TypeRestriction that defines restriction (Class at runtime) on the method parameters and field.
TypeRestriction_attribute {
u2 attribute_name_index; // TypeRestriction
u4 attribute_length;
u2 restrictions_count;
u2 restrictions[restrictions_count]; // at runtime Class
}
(Note: there is no need of validating return value for the MPVM but the class corresponding to the return type can be present).
- Comparaison with John's vision
It's the cheap version, it still require a lot of works but it has the advantage of being simpler, less opcodes to change, subtyping is not changed, the callee site does not do more validation and is in my opinion a good first step.
Rémi
More information about the valhalla-spec-observers
mailing list