<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>TL;DR<br>
</p>
<p>I want to discuss how Java classes are initialized as a part of
the implementation of the up-coming JEP 483 [1].</p>
<p>This is by no means a clean design. It's a compromise due to the
complexity of the Java classes, such as java.lang.invoke.*, that
JEP 483 wants to optimize.<br>
</p>
<p>In the future, we can make thing much better by, perhaps,
refactoring the Java classes.</p>
<p>Here, I want to get a consensus if the current implementation in
JEP 483 is "good enough" to be integrated into the mainline. <br>
</p>
<p><br>
</p>
<p><b>(A) Background</b><br>
</p>
<p>JEP 483 includes the AOT-linking of invokedynamic call sites. As
a result, it becomes necessary for the AOT cache to store objects
such as MethodTypes, as well as generated LambdaForm classes.</p>
<p>A basic requirement for Java objects is -- when an object of type
X becomes accessible by Java bytecodes, we must have at least
started the initialization of X [2]. We want to preserve this
behavior for cached objects, so we won't run into
incompatibilities when using cached objects.</p>
<p>So how do we meet this requirement?</p>
<p><br>
</p>
<p><b>(B) Existing Solution in JDK Mainline</b><br>
</p>
<p>In the current JDK mainline, cached objects are "loaded" into a
Java program via explicit CDS.initializeFromArchive() calls. See
[3] for an example in java/lang/Integer$IntegerCache</p>
<p><font face="monospace">class IntegerCache {<br>
...<br>
// at this point, IntegerCache::archivedCache is null<br>
CDS.initializeFromArchive(IntegerCache.class);<br>
// at this point, IntegerCache::archivedCache is loaded from
the AOT cache<br>
</font></p>
<p>As part of the CDS.initializeFromArchive() call, the classes of
all objects that are reachable from archivedCache are initialized.
You can see this list classes by:<br>
</p>
<p><font face="monospace">$ java -Xshare:dump -Xlog:cds+heap | grep
IntegerCache<br>
[...]<br>
[1.063s][info][cds,heap] Archived object klass
java.lang.Integer$IntegerCache ( 0) => [Ljava.lang.Integer;<br>
[1.063s][info][cds,heap] Archived object klass
java.lang.Integer$IntegerCache ( 1) => java.lang.Integer</font></p>
<p>All classes listed above will be initialized before archivedCache
becomes non-null.<br>
</p>
<p><br>
</p>
<p><b>(C) New Challenges with java.lang.invoke</b><br>
</p>
<p>In JEP 483, in order to support AOT-linking of invokedynamic, we
run into a few issues.</p>
<p>#1. We can no longer use the CDS.initializeFromArchive() design
due to a circular dependency between MethodType and
DirectMethodHandle [4] [5].<br>
</p>
<p>#2. Some cached objects in java.lang.invoke may point to static
fields whose identity is significant. A simplified example looks
like this:</p>
<p><font face="monospace"> class A {<br>
static final Object x = new Object();<br>
}<br>
class B {<br>
Object y = A.x;<br>
boolean isValid() { return y == A.x; }<br>
}<br>
</font><br>
B cachedB = new B(); /* executed during assembly phase */<br>
</p>
<p>If we store cachedB into the AOT cache, we will recursively store
the cachedB.y field, which points to the version of A.x during the
assembly phase. During the production run, if we allow
A.<clinit> to be executed again, cacheB.isValid() will
return false because cachedB.y is now a different instance than
A.x.</p>
<p>In fact, #2 is not a new problem. There's already a debugging
class - CDSHeapVerifier - that checks for potential errors [6]<br>
</p>
<p><br>
</p>
<p><b>(D) Solution in JEP 483</b></p>
<p>In JEP 483, the solution is "AOT-initialized classes". For the
above example, we store class A in the AOT cache in the
"initialized" state. During the production run, A.<clinit>
is no longer executed, and A.x maintains its value as in the
assembly phase.</p>
<p>The process of finding all the "AOT-initialized classes" needed
by JEP 483:</p>
<p>- Perform a training run with an application that uses many
lambda expressions<br>
</p>
<p>- Create an AOT cache using data from this training run. Look for
warnings produced by CDSHeapVerifier:<br>
</p>
<p><font face="monospace">Archive heap points to a static field that
may be reinitialized at runtime:<br>
Field: java/lang/invoke/SimpleMethodHandle::BMH_SPECIES<br>
Value: java.lang.invoke.BoundMethodHandle$SpeciesData <br>
{0x000000060e8962f0} - klass:
'java/lang/invoke/BoundMethodHandle$SpeciesData' - flags: <br>
[...]<br>
--- trace begin ---<br>
[ 0] {0x000000060f410170} [Ljava.lang.Object; @[238]<br>
[ 1] {0x000000060f430520}
java.lang.invoke.BoundMethodHandle$Species_L::type (offset = 16)<br>
[ 2] {0x000000060e8ab130} java.lang.invoke.MethodType::form
(offset = 20)<br>
[ 3] {0x000000060e883b70}
java.lang.invoke.MethodTypeForm::lambdaForms (offset = 28)<br>
[ 4] {0x000000060e883b90} [Ljava.lang.Object; @[7]<br>
[ 5] {0x000000060e89a428} java.lang.invoke.LambdaForm::names
(offset = 32)<br>
[ 6] {0x000000060e89a3b0} [Ljava.lang.invoke.LambdaForm$Name;
@[0]<br>
[ 7] {0x000000060e897fb8}
java.lang.invoke.LambdaForm$Name::constraint (offset = 24)<br>
[ 8] {0x000000060e896ab8}
java.lang.invoke.BoundMethodHandle$SpeciesData::this$0 (offset =
40)<br>
[ 9] {0x000000060e895aa0}
java.lang.invoke.BoundMethodHandle$Specializer::topSpecies
(offset = 44)<br>
[10] {0x000000060e8962f0}
java.lang.invoke.BoundMethodHandle$SpeciesData<br>
--- trace end ---</font><br>
</p>
<p>In this example, we see a cached object (0x000000060e895aa0)
points to {0x000000060e8962f0}, which is in the static field
SimpleMethodHandle::BMH_SPECIES.</p>
<p>To get rid of this warning, we add SimpleMethodHandle to the list
in AOTClassInitializer::can_archive_initialized_mirror [7]<br>
</p>
<p>- Create the AOT cache again. You may see new warnings because
the mirror of SimpleMethodHandle may point to the static fields of
other clases.</p>
<p>- Keep doing this until you can no longer see these warnings<br>
</p>
<p><br>
</p>
<p><b>(E) List of aot-initialized classes</b></p>
<p>For a traning run like "javac HelloWorld.java", we can produce an
AOT cache that contains the following 22 aot-initialized classes</p>
<p><font face="monospace">java.lang.constant.ConstantDescs<br>
java.lang.constant.DynamicConstantDesc<br>
java.lang.Enum<br>
java.lang.invoke.BoundMethodHandle<br>
java.lang.invoke.BoundMethodHandle$Specializer<br>
java.lang.invoke.BoundMethodHandle$Species_L<br>
java.lang.invoke.BoundMethodHandle$Species_LL<br>
java.lang.invoke.ClassSpecializer<br>
java.lang.invoke.ClassSpecializer$1<br>
java.lang.invoke.ClassSpecializer$Factory<br>
java.lang.invoke.ClassSpecializer$SpeciesData<br>
java.lang.invoke.DelegatingMethodHandle<br>
java.lang.invoke.DirectMethodHandle<br>
java.lang.invoke.DirectMethodHandle$Holder<br>
java.lang.invoke.LambdaForm<br>
java.lang.invoke.LambdaForm$NamedFunction<br>
java.lang.invoke.MethodHandle<br>
java.lang.invoke.MethodType$AOTHolder<br>
java.lang.invoke.SimpleMethodHandle<br>
java.lang.Object<br>
jdk.internal.constant.PrimitiveClassDescImpl<br>
jdk.internal.constant.ReferenceClassDescImpl<br>
<br>
<br>
</font>Plus the following 7 enum types that have customized
<clinit> code<br>
</p>
<p><font face="monospace">java.lang.constant.DirectMethodHandleDesc$Kind<br>
java.lang.invoke.LambdaForm$BasicType<br>
java.lang.invoke.VarHandle$AccessMode<br>
java.lang.invoke.VarHandle$AccessType<br>
java.lang.reflect.AccessFlag$Location<br>
java.util.stream.StreamOpFlag<br>
sun.invoke.util.Wrapper</font></p>
<p><font face="monospace"><br>
</font></p>
<p><b>(F) List of Init-at-JVM-start classes</b><br>
</p>
<p>During the production run, these classes are loaded into the VM
in the "initialized" state. As a result, the static fields of
these classes become reachable. We must initialize the classes of
all objects that are reachable from these static fields. There are
24 such classes.<br>
</p>
<p><font face="monospace">java.lang.ArithmeticException<br>
java.lang.ArrayIndexOutOfBoundsException<br>
java.lang.ArrayStoreException<br>
java.lang.Class<br>
java.lang.ClassCastException<br>
java.lang.Double<br>
java.lang.Float<br>
java.lang.Integer<br>
java.lang.InternalError<br>
java.lang.NullPointerException<br>
java.lang.invoke.BoundMethodHandle$Specializer$Factory<br>
java.lang.invoke.BoundMethodHandle$SpeciesData<br>
java.lang.invoke.DirectMethodHandle$Accessor<br>
java.lang.invoke.DirectMethodHandle$Constructor<br>
java.lang.invoke.Invokers<br>
java.lang.invoke.LambdaForm$Name<br>
java.lang.invoke.LambdaFormEditor$Transform<br>
java.lang.invoke.MemberName<br>
java.lang.invoke.MemberName$Factory<br>
java.lang.invoke.MethodHandleImpl$IntrinsicMethodHandle<br>
java.lang.invoke.MethodType<br>
java.lang.invoke.MethodTypeForm<br>
java.util.EnumMap<br>
java.util.concurrent.ConcurrentHashMap</font><br>
<br>
</p>
<p>(E.g., MethodType$AOTHolder contains a HashMap that stores many
MethodTypes, so we must initialize the MethodType class. Note that
HashMap is not in the list because it doesn't have a
<clinit>. For clarify, I have omitted all classes that can
be trivially initialized).<br>
<br>
</p>
<p>(G) Testing and Validation<br>
</p>
<p>The output "Archive heap points to a static field that may be
reinitialized" is checked by more than 300 CDS test cases [8].
This ensures that we have a correct list of AOT-initialized
classes. This will also catch any future changes in the Java
classes in java.lang.invoke that may be incompatible with JEP 483.<br>
</p>
<p>Also, since the (E) and (F) lists are small (about 50 classes),
it's possible for a human to examine those classes to find
potential problems</p>
<p>- For example, an AOT-initialized class shouldn't be dependent on
the environment, such as storing the current time of day, etc.</p>
<p><br>
</p>
<p><b>(H) Extensibility, or lack thereof</b></p>
<p>This design limits future Leyden optimization as any significant
increase of the list of classes in (E) and (F) will make human
validation much more difficult. We should consider using automated
validation tools, as well as refactoring the Java classes to make
it easier to decide what classes can be AOT-initialized.<br>
</p>
<p><br>
</p>
<p>=======</p>
<p>[1] <a class="moz-txt-link-freetext" href="https://openjdk.org/jeps/483">https://openjdk.org/jeps/483</a><br>
</p>
<p>[2] Due to <clinit> circularity, it may be possible to
created an instance of X before X::<clinit> finishes.</p>
<p>[3]
<a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk/blob/602408e4f3848b30299ea94264e88ead5361a310/src/java.base/share/classes/java/lang/Integer.java#L957-L969">https://github.com/openjdk/jdk/blob/602408e4f3848b30299ea94264e88ead5361a310/src/java.base/share/classes/java/lang/Integer.java#L957-L969</a></p>
<p>[4]
<a class="moz-txt-link-freetext" href="https://mail.openjdk.org/pipermail/leyden-dev/2024-August/000911.html">https://mail.openjdk.org/pipermail/leyden-dev/2024-August/000911.html</a></p>
<p>[5]
<a class="moz-txt-link-freetext" href="https://github.com/openjdk/leyden/blob/3a84df9d9860e743684e335282e3910b14cc982b/src/hotspot/share/cds/heapShared.cpp#L1415-L1466">https://github.com/openjdk/leyden/blob/3a84df9d9860e743684e335282e3910b14cc982b/src/hotspot/share/cds/heapShared.cpp#L1415-L1466</a></p>
<p>[6]
<a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk/blob/master/src/hotspot/share/cds/cdsHeapVerifier.cpp">https://github.com/openjdk/jdk/blob/master/src/hotspot/share/cds/cdsHeapVerifier.cpp</a></p>
<p>[7]
<a class="moz-txt-link-freetext" href="https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/src/hotspot/share/cds/aotClassInitializer.cpp#L178-L192">https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/src/hotspot/share/cds/aotClassInitializer.cpp#L178-L192</a></p>
<p>[8]
<a class="moz-txt-link-freetext" href="https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/test/lib/jdk/test/lib/cds/CDSTestUtils.java#L289">https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/test/lib/jdk/test/lib/cds/CDSTestUtils.java#L289</a><br>
</p>
</body>
</html>