RFR: JDK-8222373 Improve CDS performance for custom class loaders

Mon Jun 24 17:13:04 UTC 2019

On Jun 20, 2019, at 12:12 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
> ...
> I have a rough idea -- let's have a higher-level representation of the bytecode stream than byte[] or ByteBuffer, to make optimization possible.

Template classes will need something similar, so maybe there is a common design point in here.

The key feature of template classes is that their loading sequence is split into parts:

1. Load the template, defining an abstract API and preprocessing incomplete parts.
2. Specialize the loaded template, completing all parts, loading a specialized class ("species").

Step 1 happens once per template.  Step 2 can happen any number of types, with varying template parameters.

I think that, at least in the internals, there will be a BSM-like (now, that's a surprise) template specialization function (TSF) declared in step 1 and executed in step 2.

The TSF will need to consult the results of step one and request the JVM to combine them into the desired species.  Ad hoc logic may be involved, such as detecting if T in List<T> is Comparable and mixing in Comparable into the resulting List, with a standard comparison method (e.g. lexicographic compare lifted from the elemental compares).

Key questions:

A. What API reflects the template chunks loaded in step 1?  Probably not just byte streams, more like what you are reaching for here, Ioi.

B. What API loads the customized chunks?  Probably (again) not today's ClassLoader which requires byte streams.

I'm slowly working on an answer to A, in the form of a "class excavator" which pulls out structural information from live JVM metadata, without "deadening" it into a byte stream.  Not ready for prime time, but the basic idea is to present the logical schema of the loaded class file (in terms of the original classfile structure) in a style similar to the existing constant pool reflection API (which is JDK internal).

Once we get a fuller answer for A we can try to invert it to get an answer for B.  That is, if we know what we'd like to see in the JVM (via the excavator) we want to tell the JVM to establish some new species definition, related to the excavated data.

For most JDK work we can wrap low-level excavator/establisher mechanisms (which don't need to be very O-O) in a thin layer of value types and interfaces.

I'm using new terms to help me think of this as a new kind of API, not just a small variation on class loading or reflection.  The new feature is that the reflective part comes first, and is followed (as a sort of reversed operation) by the loading part.  Also new is that we don't want to abstract everything through serialized byte streams, since they make it very hard to share features—but (I think) species need to share template metadata, not just make copies of it.

— John

> So we could have a new API in ClassLoader
> 
>     protected final Class<?> defineClass(String name, String location,
> ProtectionDomain protectionDomain)
> 
> and its behavior is equivalent to the following
> 
>    {
>         byte[] b = read_buffer_from(location);
>         return defineClass(name, b, 0, b.len, protectionDomain);
>    }
> 
> 
> examples would be:
> 
>      defineClass("java/lang/Object", "jrt:/java.base/java/lang/Object.class", NULL);
>      defineClass("com/foo/Bar", "file://path/com.foo.jar!com/foo/Bar.class", myProtectionDomain);
> 
> Note that the type and value of <location> is just for illustrative purposes. We might use a different type (URI??). It's just a way to name a location that you can read a byte buffer from.
> 
> The protectionDomain will need to be created by the caller. The use of the protectionDomain will be no different than the existing defineClass() APIs. Specifically, it will not be used in any way to fetch the buffer.
> 
> When CDS is enabled, the VM can check if the name+location matches a class in the CDS archive. If so, the class is loaded without fetching the buffer.
> 
> The caller doesn't need to know if CDS is enabled or not.
> 
> 
> (We probably don't want a String type but a more abstract type. That way we can evolve it to allow more complex representations, such as "read the bytecode stream from here, but replace the method name "Foo" to "Bar", and add a new integer field "X" ....
> 
> If John Rose was to design this, I am sure he will call it something like BytecodeStreamDynamic :-)
> 
> This may actually reduce the use of ASM. E.g., today people are forced to write their own bytecodes, even if they just want some simple transformation on template classes).
> 
> 
> Thanks
> - Ioi