Class initialization in JEP 483
ioi.lam at oracle.com
ioi.lam at oracle.com
Thu Oct 3 05:13:15 UTC 2024
TL;DR
I want to discuss how Java classes are initialized as a part of the
implementation of the up-coming JEP 483 [1].
This is by no means a clean design. It's a compromise due to the
complexity of the Java classes, such as java.lang.invoke.*, that JEP 483
wants to optimize.
In the future, we can make thing much better by, perhaps, refactoring
the Java classes.
Here, I want to get a consensus if the current implementation in JEP 483
is "good enough" to be integrated into the mainline.
*(A) Background*
JEP 483 includes the AOT-linking of invokedynamic call sites. As a
result, it becomes necessary for the AOT cache to store objects such as
MethodTypes, as well as generated LambdaForm classes.
A basic requirement for Java objects is -- when an object of type X
becomes accessible by Java bytecodes, we must have at least started the
initialization of X [2]. We want to preserve this behavior for cached
objects, so we won't run into incompatibilities when using cached objects.
So how do we meet this requirement?
*(B) Existing Solution in JDK Mainline*
In the current JDK mainline, cached objects are "loaded" into a Java
program via explicit CDS.initializeFromArchive() calls. See [3] for an
example in java/lang/Integer$IntegerCache
class IntegerCache {
...
// at this point, IntegerCache::archivedCache is null
CDS.initializeFromArchive(IntegerCache.class);
// at this point, IntegerCache::archivedCache is loaded from the AOT
cache
As part of the CDS.initializeFromArchive() call, the classes of all
objects that are reachable from archivedCache are initialized. You can
see this list classes by:
$ java -Xshare:dump -Xlog:cds+heap | grep IntegerCache
[...]
[1.063s][info][cds,heap] Archived object klass
java.lang.Integer$IntegerCache ( 0) => [Ljava.lang.Integer;
[1.063s][info][cds,heap] Archived object klass
java.lang.Integer$IntegerCache ( 1) => java.lang.Integer
All classes listed above will be initialized before archivedCache
becomes non-null.
*(C) New Challenges with java.lang.invoke*
In JEP 483, in order to support AOT-linking of invokedynamic, we run
into a few issues.
#1. We can no longer use the CDS.initializeFromArchive() design due to a
circular dependency between MethodType and DirectMethodHandle [4] [5].
#2. Some cached objects in java.lang.invoke may point to static fields
whose identity is significant. A simplified example looks like this:
class A {
static final Object x = new Object();
}
class B {
Object y = A.x;
boolean isValid() { return y == A.x; }
}
B cachedB = new B(); /* executed during assembly phase */
If we store cachedB into the AOT cache, we will recursively store the
cachedB.y field, which points to the version of A.x during the assembly
phase. During the production run, if we allow A.<clinit> to be executed
again, cacheB.isValid() will return false because cachedB.y is now a
different instance than A.x.
In fact, #2 is not a new problem. There's already a debugging class -
CDSHeapVerifier - that checks for potential errors [6]
*(D) Solution in JEP 483*
In JEP 483, the solution is "AOT-initialized classes". For the above
example, we store class A in the AOT cache in the "initialized" state.
During the production run, A.<clinit> is no longer executed, and A.x
maintains its value as in the assembly phase.
The process of finding all the "AOT-initialized classes" needed by JEP 483:
- Perform a training run with an application that uses many lambda
expressions
- Create an AOT cache using data from this training run. Look for
warnings produced by CDSHeapVerifier:
Archive heap points to a static field that may be reinitialized at runtime:
Field: java/lang/invoke/SimpleMethodHandle::BMH_SPECIES
Value: java.lang.invoke.BoundMethodHandle$SpeciesData
{0x000000060e8962f0} - klass:
'java/lang/invoke/BoundMethodHandle$SpeciesData' - flags:
[...]
--- trace begin ---
[ 0] {0x000000060f410170} [Ljava.lang.Object; @[238]
[ 1] {0x000000060f430520}
java.lang.invoke.BoundMethodHandle$Species_L::type (offset = 16)
[ 2] {0x000000060e8ab130} java.lang.invoke.MethodType::form (offset = 20)
[ 3] {0x000000060e883b70} java.lang.invoke.MethodTypeForm::lambdaForms
(offset = 28)
[ 4] {0x000000060e883b90} [Ljava.lang.Object; @[7]
[ 5] {0x000000060e89a428} java.lang.invoke.LambdaForm::names (offset = 32)
[ 6] {0x000000060e89a3b0} [Ljava.lang.invoke.LambdaForm$Name; @[0]
[ 7] {0x000000060e897fb8} java.lang.invoke.LambdaForm$Name::constraint
(offset = 24)
[ 8] {0x000000060e896ab8}
java.lang.invoke.BoundMethodHandle$SpeciesData::this$0 (offset = 40)
[ 9] {0x000000060e895aa0}
java.lang.invoke.BoundMethodHandle$Specializer::topSpecies (offset = 44)
[10] {0x000000060e8962f0} java.lang.invoke.BoundMethodHandle$SpeciesData
--- trace end ---
In this example, we see a cached object (0x000000060e895aa0) points to
{0x000000060e8962f0}, which is in the static field
SimpleMethodHandle::BMH_SPECIES.
To get rid of this warning, we add SimpleMethodHandle to the list in
AOTClassInitializer::can_archive_initialized_mirror [7]
- Create the AOT cache again. You may see new warnings because the
mirror of SimpleMethodHandle may point to the static fields of other clases.
- Keep doing this until you can no longer see these warnings
*(E) List of aot-initialized classes*
For a traning run like "javac HelloWorld.java", we can produce an AOT
cache that contains the following 22 aot-initialized classes
java.lang.constant.ConstantDescs
java.lang.constant.DynamicConstantDesc
java.lang.Enum
java.lang.invoke.BoundMethodHandle
java.lang.invoke.BoundMethodHandle$Specializer
java.lang.invoke.BoundMethodHandle$Species_L
java.lang.invoke.BoundMethodHandle$Species_LL
java.lang.invoke.ClassSpecializer
java.lang.invoke.ClassSpecializer$1
java.lang.invoke.ClassSpecializer$Factory
java.lang.invoke.ClassSpecializer$SpeciesData
java.lang.invoke.DelegatingMethodHandle
java.lang.invoke.DirectMethodHandle
java.lang.invoke.DirectMethodHandle$Holder
java.lang.invoke.LambdaForm
java.lang.invoke.LambdaForm$NamedFunction
java.lang.invoke.MethodHandle
java.lang.invoke.MethodType$AOTHolder
java.lang.invoke.SimpleMethodHandle
java.lang.Object
jdk.internal.constant.PrimitiveClassDescImpl
jdk.internal.constant.ReferenceClassDescImpl
Plus the following 7 enum types that have customized <clinit> code
java.lang.constant.DirectMethodHandleDesc$Kind
java.lang.invoke.LambdaForm$BasicType
java.lang.invoke.VarHandle$AccessMode
java.lang.invoke.VarHandle$AccessType
java.lang.reflect.AccessFlag$Location
java.util.stream.StreamOpFlag
sun.invoke.util.Wrapper
*(F) List of Init-at-JVM-start classes*
During the production run, these classes are loaded into the VM in the
"initialized" state. As a result, the static fields of these classes
become reachable. We must initialize the classes of all objects that are
reachable from these static fields. There are 24 such classes.
java.lang.ArithmeticException
java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayStoreException
java.lang.Class
java.lang.ClassCastException
java.lang.Double
java.lang.Float
java.lang.Integer
java.lang.InternalError
java.lang.NullPointerException
java.lang.invoke.BoundMethodHandle$Specializer$Factory
java.lang.invoke.BoundMethodHandle$SpeciesData
java.lang.invoke.DirectMethodHandle$Accessor
java.lang.invoke.DirectMethodHandle$Constructor
java.lang.invoke.Invokers
java.lang.invoke.LambdaForm$Name
java.lang.invoke.LambdaFormEditor$Transform
java.lang.invoke.MemberName
java.lang.invoke.MemberName$Factory
java.lang.invoke.MethodHandleImpl$IntrinsicMethodHandle
java.lang.invoke.MethodType
java.lang.invoke.MethodTypeForm
java.util.EnumMap
java.util.concurrent.ConcurrentHashMap
(E.g., MethodType$AOTHolder contains a HashMap that stores many
MethodTypes, so we must initialize the MethodType class. Note that
HashMap is not in the list because it doesn't have a <clinit>. For
clarify, I have omitted all classes that can be trivially initialized).
(G) Testing and Validation
The output "Archive heap points to a static field that may be
reinitialized" is checked by more than 300 CDS test cases [8]. This
ensures that we have a correct list of AOT-initialized classes. This
will also catch any future changes in the Java classes in
java.lang.invoke that may be incompatible with JEP 483.
Also, since the (E) and (F) lists are small (about 50 classes), it's
possible for a human to examine those classes to find potential problems
- For example, an AOT-initialized class shouldn't be dependent on the
environment, such as storing the current time of day, etc.
*(H) Extensibility, or lack thereof*
This design limits future Leyden optimization as any significant
increase of the list of classes in (E) and (F) will make human
validation much more difficult. We should consider using automated
validation tools, as well as refactoring the Java classes to make it
easier to decide what classes can be AOT-initialized.
=======
[1] https://openjdk.org/jeps/483
[2] Due to <clinit> circularity, it may be possible to created an
instance of X before X::<clinit> finishes.
[3]
https://github.com/openjdk/jdk/blob/602408e4f3848b30299ea94264e88ead5361a310/src/java.base/share/classes/java/lang/Integer.java#L957-L969
[4] https://mail.openjdk.org/pipermail/leyden-dev/2024-August/000911.html
[5]
https://github.com/openjdk/leyden/blob/3a84df9d9860e743684e335282e3910b14cc982b/src/hotspot/share/cds/heapShared.cpp#L1415-L1466
[6]
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/cds/cdsHeapVerifier.cpp
[7]
https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/src/hotspot/share/cds/aotClassInitializer.cpp#L178-L192
[8]
https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/test/lib/jdk/test/lib/cds/CDSTestUtils.java#L289
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20241002/44a895f2/attachment.htm>
More information about the leyden-dev
mailing list