cleaning up the CP in JDK 999 (and thinking clearly about the CP in JDK 10/11)
John Rose
john.r.rose at oracle.com
Wed May 24 19:16:52 UTC 2017
This message needs an "Impractical Content" warning, but I want to log an interesting line of thought raised today in our Valhalla meeting with IBM. It actually is practical to think about, as a mental model.
We could, if we chose to do it at some far-future date, clarify the roles of CP entries by splitting them as thoroughly as possible into functionally distinct types.
Here's the idea:
Revamp CP to clearly distinguish (a) artifact references from (b) usage requests.
Artifact references are named class-files and named parts thereof: Class[ref], Fieldref, Methodref.
Usage requests are specific operations which may be performed on those named entities: InvokeStatic, InvokeSpecial, InvokeVirtual, InvokeInterface, InvokeOnValue, GetField, GetStatic, PutField, PutStatic, WithField.
Linking an artifact reference loads the artifact, which implements some usage requests but not all.
Linking a usage request verifies that the usage is well-formed and caches (on the usage CP entry, not the artifact CP entry) whatever bits help the instruction go fast after linkage.
(Indy/Condy don't work directly on named artifacts, so they are off to the side.)
It's safe to say we will never do all of this, but it helps, I think, as a mental framework when considering all the funky overloading inside today's CP.
(But, those request types do look a lot like MethodHandle constants. Funny…)
The worst overloading in the CP is the need to store double resolution information on an [Interface]Methodref in case it has to handle both invokespecial and invoke[virtual,interface]. But there are also lots of little status bits to support dynamic checks of {get,put}{static,field}. The J9 guys commented that the double-resolution thing is familiar to them.
Lots of that implementation noise would drop away if there were enough CP entries to go around for each distinct type of reference.
Why would we even consider such a change? Because right now we have to add more usage request types to cover value types, and eventually templates/generics. In today's EG meeting I was advocating pushing harder on the current model of fewer, more overloaded constants for Valhalla, because that's what we do today. Then we collectively realized that constant overloading has always been such a royal pain that nobody wants to keep doing it. So, for the purpose of argument, we pivoted toward the other extreme, in a brief discussion of a hyper-split CP with basically one constant per instruction type.
(I think, as a matter of design esthetic, Java tends to lump more than it splits. The original decision in CP design, to lump more functions onto fewer CP entries, made a superficially simpler constant pool. It has been a burden on JVM implementors, who agonize even to this day on how to make CP a random access data structure with element types of widely varying size.)
This idea of CP splitting is food for thought which (I think) can help us settle more confidently on a fair compromise in a real release.
This design approach prompts us to consider, in the nearer term, a few new CP types, incrementally added to the current design. For example, QType[…] which derives the Q-mode version from a class artifact:
ldc[ CONSTANT_QType[Class["Foo"]] ]
getfield[ CONSTANT_Fieldref[QType[Class["Foo"]], NameAndType["bar", "I"]] ]
invokespecial[ CONSTANT_Methodref[QType[Class["Foo"]], NameAndType["baz", "()F"]] ]
Or maybe the Q-mode-ness goes into the field or method reference:
ldc[ CONSTANT_Dynamic[[get the Q-type from the L-type], Class["Foo"]] ]
getfield[ CONSTANT_VFieldref[Class["Foo"]], NameAndType["bar", "I"]]
invokespecial[ CONSTANT_VMethodref[Class["Foo"]], NameAndType["baz", "()F"]]
In any case, mode information (Q vs. L vs. …) is incompressible. What I mean is that the Q/L distinction has to go somewhere, either the instruction or the symbolic reference stored in the constant pool. (Symbolic, not resolved, is an important distinction here. The instruction can always resolve the CP reference and dip into the runtime bits, but the verifier greatly prefers to operate on the pre-resolved symbolic references.)
To avoid the extra CP types, we could squeeze all the mode-ish bits up into the instructions as follows:
vldc[ Class["Foo"]] ]
vgetfield[ CONSTANT_Fieldref[QType[Class["Foo"]], NameAndType["bar", "I"]] ]
vinvoke[ CONSTANT_Methodref[QType[Class["Foo"]], NameAndType["baz", "()F"]] ]
But the problem with modal-instructions-plus-nonmodal-constants is that each constant has to be prepared to be resolved in several modes. (Lumping constants means more resolution information per constant.) Splitting the constants allows (though does not require) nonmodal constants.
As noted in the meeting, a possible simplification of Minimal Value Types is we don't need to overload Class since we could easily have different names running around: Class["Foo"] vs. Class["Foo$DVT"] or the like. That means that we could, for the moment, continue to overload Fieldref and Methodref, as long as each as only used in exactly one mode (to be hashed out at link-time). But that's only a short-term help, not a long term design.
— John
P.S. We could also duplicate the mode information in both CP and instruction:
vgetfield[ CONSTANT_VFieldref[Class["Foo"]], NameAndType["bar", "I"]]
vinvoke[ CONSTANT_VMethodref[Class["Foo"]], NameAndType["baz", "()F"]]
…Thus starting down Overkill Road, which leads to Crazytown:
vgetfield[ CONSTANT_VFieldref[ QType[Class[";QFoo;"]]], VNameAndType["bar", "I"]]
P.P.S. If we go with modey CP constants, I think we need to admit, as a concession to the legacies of history, that the CONSTANT_Class guy will forever denote an L-mode type (unless we do LType[Class["foo"]]?) and we will need a different CP constant (and maybe even condy for ldc) to refer to its Q-type or U-type.
P.P.P.S. If we tried to do all of the above CP splitting for real we'd be breaking so much glass that we'd feel compelled to address other design points, such as heterogeneous CPs (another not-so-good legacy), a limit of two components per CP entry, and of course the 16-bit limit. Dealing with all of that at once will be a tarpit, and we're already too busy doing important stuff. So file this note also under "Hard decisions to make when our grandchildren revamp the whole class-file format."
More information about the valhalla-spec-observers
mailing list