Class initialization in JEP 483
ioi.lam at oracle.com
ioi.lam at oracle.com
Mon Oct 7 21:39:23 UTC 2024
I’ve done a (short) exercise to see what happens if all classes are
AOT-inited. I quickly ran into problems because the first classes have
intertwined states between the Java and native. For example, a large
part of the module graph is implemented natively.
So I tried something less drastic -- AOT-init all of java.lang.invoke.
This seems doable, as java.lang.invoke carriers few native states. So
perhaps we can use this as an intermediate solution before we can solver
the larger problem.
I have a prototype here:
https://github.com/iklam/jdk/pull/1
https://bugs.openjdk.org/browse/JDK-8341600
Essentially, I am trying to preserve the states of the classes used by
java.lang.invoke (in terms of the values of their static variables). As
a result, when an aot-resolved invokedynamic call site is invoked during
the production run, it's "view of the world" is the same as during the
assembly phase.
Again, this is far from ideal, but I think it's better than my previous
solution, which is less predictable as it allows some of the
java.lang.invoke classes to be initialized at application start-up.
Any thoughts?
Thanks
- Ioi
Implementation
==========
To be precise, I am not AOT-initing every class in java.lang.invoke.
Rather, I am marking every class that have at least one instance in the
object graph of the aot-resolved lambda call sites. Such classes are
marked as AOT-inited.
The algorithm automatically discovers all classes that need to be
AOT-inited:
- Scan all constant pools of all classes in the AOT cache. This will
discover all the Java objects that represent the resolved invokedynamic
call sites.
- Find the reachable classes [see John's definition of CLASS
REACHABILITY below] from these objects
- Mark the reachable classes as AOT-inited. Scan the static fields of
these classes. You will find more reachable classes.
- Repeat the above step until you can find no more reachable classes.
I had to make two change in the core library. See NoAOT in
java.lang.Class and java.lang.invoke.MethodType in the above PR.
E.g., in Class.java
- private static ReflectionFactory reflectionFactory;
+ private static class NonAOT {
+ private static ReflectionFactory reflectionFactory;
+ }
I did this because reflectionFactory holds onto many states that the AOT
cache cannot handle yet (e.g., it transitively refers to
Unsafe.theUnsafe), so I just move this static field out of Class. My
assumption (not validated yet) is that if ReflectionFactory.<clinit> is
run during app start-up, it won't affect the validity of the
AOT-resolved indy call sites.
There was only one class that I had to explicitly remove from the manual
AOT-inited list -- java/util/concurrent/ConcurrentHashMap. The reason is
this field:
private static Unsafe U = Unsafe.getUnsafe();
I could probably move this field into an "NoAOT" inner class, but I
worry if this might cause any performance issues as the "U" field is
very frequently used. Hopefully ConcurrentHashMap is unrelated with
java.lang.invoke so running its <clinit> during app start-up won't be an
issue.
Manual Configuration & Validation
=====================
The algorithm can't discover all classes whose static fields need to be
preserved. For example, DirectMethodHandle::IMPL_NAMES and
MethodHandles::IMPL_NAMES are the same object, but the algorithm finds
only DirectMethodHandle (as MethodHandles is a "static-only" classes so
there are no instances of it).
See comments in aotClassInitializer.cpp for more details. This file also
has a list of classes (including MethodHandles) that are manually marked
as "need to be AOT-inited".
There's a limited amount of validation (CDSHeapverifier) that finds
potential problems where you missed classes that need to be manually
marked. This is also explained in aotClassInitializer.cpp.
Caveats
=====
CDSHeapverifier is incomplete
(a) It only checks the objects that are directly pointed to by static
fields. In the DirectMethodHandle::IMPL_NAMES example, if MethodHandles
were coded like this, then it would not be discovered by CDSHeapverifier
- static final MemberName.Factory IMPL_NAMES = MemberName.getFactory();
+ static final Object[] IMPL_NAMES = new Object[] {
MemberName.getFactory() };
(b) CDSHeapverifier tracks only object fields. It doesn't check for
dependencies that may be exposed through non-objects fileds
(end)
On 10/3/24 10:29 PM, ioi.lam at oracle.com wrote:
> Hi John,
>
> Thanks for your comments. I have a few responses in-line.
>
> Also, to recap what we discussed in today's Leyden meeting, I am going
> to see if it's possible to make all relevant classes AOT-initialized,
> and thus avoid potential issues of startup-initialization.
>
> More below ...
>
> On 10/3/24 2:50 AM, John Rose wrote:
>> On 2 Oct 2024, at 22:13, ioi.lam at oracle.com wrote:
>>
>>> TL;DR
>>>
>>> I want to discuss how Java classes are initialized as a part of the
>>> implementation of the up-coming JEP 483 [1].
>>>
>>> This is by no means a clean design. It's a compromise due to the
>>> complexity of the Java classes, such as java.lang.invoke.*, that JEP
>>> 483 wants to optimize.
>>>
>>> In the future, we can make thing much better by, perhaps,
>>> refactoring the Java classes.
>>>
>>> Here, I want to get a consensus if the current implementation in JEP
>>> 483 is "good enough" to be integrated into the mainline.
>> This is a useful analysis. Related to it, I have various ideas about
>> “shaping” the JDK classes to cooperate better with the AOT cache, and
>> also adding new checks to the JVM to help detect classes which are
>> “not in good shape”. For now I hope we can determine correctness by
>> inspection, and quickly follow up with work on (simple!) tools for
>> validating the necessary assumptions. (Your email goes a long way
>> towards clarifying what those necessary assumptions are.) That way
>> maintainers won’t break the subtle init-order dependencies between
>> java.base classes, with routine changes.
>>
>>> *(A) Background*
>>>
>>> JEP 483 includes the AOT-linking of invokedynamic call sites. As a
>>> result, it becomes necessary for the AOT cache to store objects such
>>> as MethodTypes, as well as generated LambdaForm classes.
>> I have a “stretch goal” in mind here: I think if we sort out the
>> initialization of lambdas, MHs, MTs, LFs, and the rest of the indy
>> infrastructure, we may understand the problem well enough to allow
>> very early initialization of those system components. (And also, how
>> to use the AOT cache to optimize their startup.) That in turn will
>> enable us to use indy (and condy) with few or no restrictions on
>> their usage. Recent JDK changes to convert lambdas to inner classes
>> (or similar indy-avoidance tactics) will not be needed if indy is
>> robustly initialized at an early point. But this requires mastering
>> the bootstrap logic. Hence the stretch goal, since we are talking
>> here about getting the bootstrap logic (specifically java.base
>> init-order) under control.
>>
>>> A basic requirement for Java objects is --
>>> when an object of type X becomes accessible by Java bytecodes, we
>>> must have at least started the initialization of X [2]. We want to
>>> preserve this behavior for cached objects, so we won't run into
>>> incompatibilities when using cached objects.
>> This is an interesting principle. It is normally true apart from the
>> AOT cache, but it puts an extra burden on access to the AOT heap.
>> Specifically, you can’t open up any (sub-)graph in the AOT heap for
>> access until all classes mentioned in that heap (sub-)graph are at
>> least starting their initialization.
>>
>> There’s a little more, in fact: If some class C is starting to init,
>> but has not finished its init, then the access must be initially
>> confined to the thread that is running C::<clinit>. This is the
>> normal JVM rule (other threads block until C::<clinit> returns) and
>> must be emulated by the AOT cache as well.
>>
>> (Background: The AOT cache has a heap segment defining one or more
>> graphs of objects. An object graph has one or more roots. Such a
>> graph is an “asset” in the AOT cache, that can be “adopted” into the
>> running JVM as an interconnected set of Java objects. Multiple AOT
>> graphs might appear if we have separately defined decisions to adopt
>> them. Or, we might just say “take the whole graph or leave it”. For
>> future work, it is somewhat clear we wish to allow Leyden-aware
>> applications to layer AOT object graphs of their own on top of the
>> java.base object graph or graphs. But we are not there yet; all of
>> this is happening inside java.base. We must learn to walk before we
>> can think about running.)
>>
>> Definition of “CLASS REACHABILITY”: If an object X is in the AOT
>> heap, then a class C is said to be “reachable from X” if C is the
>> class of X, or is a superclass or an implemented interface of X’s
>> class. In addition, we say that C is “reachable from X” (in a
>> recursive sense) if C is reachable from another object Y which is
>> referenced by a non-static field of X, or another object Y which is
>> referenced by a static field of a class S (already) reachable from
>> X. This definition is simply a transitive closure of an
>> appropriately defined graph, where instances X point to their classes
>> C and their non-static field referents, and classes C point to their
>> super-types D and their static field referents Y.
>>
>> With that definition under our belt, we can say that when an object
>> X, originally in the AOT heap, becomes reachable to java.base code
>> (in the normal GC sense, during JDK startup), then every class
>> reachable from X must have an appropriate initialization state. The
>> definition of “appropriate” here is tricky.
>>
>> Note that we can calculate this class-reachability statically. So,
>> when it helps, we can make lists of classes C reachable from some AOT
>> heap root X. Or we can take the union of all such lists, reachable
>> from the whole AOT heap.
>>
>> So what is the most useful definition of an “appropriate
>> initialization state” of classes which become reachable as a result
>> of adopting AOT heap objects?
>> For maximum compatibility with existing Java practice, we would say
>> that the initialization state must be either DONE or else it is
>> STARTED, and the access to the AOT heap object is in the same thread
>> that is running the clinit method.
>>
>> I suggest that we might want to simplify this rule. It would be
>> nicer (for us, not for JDK programmers) if an AOT heap object X was
>> adopted only after all reachable classes C (reachable from X) were in
>> the DONE state. Maybe with one exception: If some reachable class C
>> (from X) is in the STARTED state, then C is that unique class whose
>> <clinit> method is the youngest <clinit> method on the current
>> thread. This exception allows the <clinit> method for MethodType to
>> adopt some AOT asset in the AOT heap which contains a zillion
>> MethodType objects, but nothing else that is not fully initialized.
>>
>> There are immediate objections to such a simplification, but I think
>> they can be overcome, and I suspect the end state will be better for
>> all of us. The MethodType class recursively initializes the
>> MethodHandle class, so that it can build an invoker cache (via some
>> intermediate non-public class). So you need MH.class initialized to
>> make MHs for MT, and that happens during MT’s clinit, and there is a
>> dependency loop, aka “chicken and egg” problem. If we want to
>> experiment with simplified initialization conditions, we will have to
>> break such loops. It is possible, and may even be profitable for
>> us. In the end, simpler initialization sequences will be easier to
>> optimize and cache.
>>
>>> So how do we meet this requirement?
>>>
>>>
>>> *(B) Existing Solution in JDK Mainline*
>>>
>>> In the current JDK mainline, cached objects are "loaded" into a Java
>>> program via explicit CDS.initializeFromArchive() calls. See [3] for
>>> an example in java/lang/Integer$IntegerCache
>>>
>>> class IntegerCache {
>>> ...
>>> // at this point, IntegerCache::archivedCache is null
>>> CDS.initializeFromArchive(IntegerCache.class);
>>> // at this point, IntegerCache::archivedCache is loaded from the
>>> AOT cache
>> The field IntegerCache.archivedCache (N.B. the :: syntax is only for
>> methods, not fields) is a root which points into the AOT heap graph.
>> Very cool! This does not violate the Java language rules, as long as
>> the strange object graph is indistinguishable from an object graph
>> which might have been created according to the usual rules of
>> computation, from the context of the clinit method of the relevant
>> class (IntegerCache). In fact, when booting an exploded build there
>> is no AOT cache at all, and the necessary integer cache is built “the
>> hard way” and stored into a variable that CDS knows about. This
>> handshake (by which the cache built the hard way is converted into an
>> AOT asset, and then adopted into a production run) will get more and
>> more regular over time, we think.
>>
>>> As part of the CDS.initializeFromArchive() call, the classes of all
>>> objects that are reachable from archivedCache are initialized.
>> (I am assuming that “classes of all objects reachable” is closely
>> aligned with my definition above, of class reachability!)
>
> Yes
>
>>
>> Ioi: Please explain in a bit more detail why this is true. What part
>> of the JVM (or JDK) grabs all the reachable classes and forcibly
>> initializes them? Or, more subtly, why are we so very lucky that, by
>> the time we can grab a pointer to that archived heap subgraph,
>> somehow magically the classes are already initialized?
>>
>> (And are they really in the DONE init state, or might they be in the
>> STARTED state in the current thread?)
>
> In the AOT cache, there's a data structure that remembers the list of
> classes that are printed by the "IntegerCache ( 1) =>
> java.lang.Integer" logs below. When CDS.initializeFromArchive() is
> called to fetch IntegerCache::archivedCache, it will finish
> initializing that list of classes before returning control back to the
> Java code.
>
>
>>> You can see this list classes by:
>>>
>>> $ java -Xshare:dump -Xlog:cds+heap | grep IntegerCache
>>> [...]
>>> [1.063s][info][cds,heap] Archived object klass
>>> java.lang.Integer$IntegerCache ( 0) => [Ljava.lang.Integer;
>>> [1.063s][info][cds,heap] Archived object klass
>>> java.lang.Integer$IntegerCache ( 1) => java.lang.Integer
>>>
>>> All classes listed above will be initialized before archivedCache
>>> becomes non-null.
>> This is the crucial property to engineer into our system!
>>
>>>
>>> *(C) New Challenges with java.lang.invoke*
>>>
>>> In JEP 483, in order to support AOT-linking of invokedynamic, we run
>>> into a few issues.
>>>
>>> #1. We can no longer use the CDS.initializeFromArchive() design due
>>> to a circular dependency between MethodType and DirectMethodHandle
>>> [4] [5].
>> Maybe we can find a solution here which embraces the circularity, by
>> making the whole bundle of mutually-recursive classes be
>> AOT-initialized. So it doesn’t matter what order they are
>> initialized, for the very special reason that the initialization
>> takes place “as if instantaneously”.
>>
>> Maybe in this case, or in future similar cases, we will need to break
>> the circularity. If that is the case, then something like this will
>> happen: First MethodType gets fully initialized, and during that
>> process NO INSTANTIATION of MethodHandles is performed. (How
>> draconian!) Later, MethodHandle is fully initialized and all its
>> code enjoys the full initialization of MethodType. For the class
>> MethodHandle, there might be hundreds or thousands of AOT heap
>> objects that are method handles, containing pointers to AOT method
>> type objects.
>>
>> The present problem is that a MethodType has a cached struct (or
>> array) of invokers, and each invoker is a MH, and each MH has a MT.
>> That is a circularity. Can it be broken? Yes, by changes to JDK
>> code. I suspect this is in our future, although at present I hope we
>> can get away with the trick (mentioned above) of simultaneous
>> initialization (of MT and MH at the same time, also entangled classes
>> like DMH).
>>
>>> #2. Some cached objects in java.lang.invoke may point to static
>>> fields whose identity is significant. A simplified example looks
>>> like this:
>>>
>>> class A {
>>> static final Object x = new Object();
>>> }
>>> class B {
>>> Object y = A.x;
>>> boolean isValid() { return y == A.x; }
>>> }
>>>
>>> B cachedB = new B(); /* executed during assembly phase */
>>>
>>> If we store cachedB into the AOT cache, we will recursively store
>>> the cachedB.y field, which points to the version of A.x during the
>>> assembly phase. During the production run, if we allow A.<clinit> to
>>> be executed again, cacheB.isValid() will return false because
>>> cachedB.y is now a different instance than A.x.
>>>
>>> In fact, #2 is not a new problem. There's already a debugging class
>>> - CDSHeapVerifier - that checks for potential errors [6]
>> We need rules for object identities as well. I see two of them.
>> First, if an object X is known to be private, and all the private
>> code that works on it does not inspect its identity (via X==Y or
>> System::identityHashCode(X)) we can certify that there is no danger.
>> Second, if X is public or escapes in some way to “random” user code
>> (that might do X==Y etc.) then we must ensure that, IF the AOT heap
>> graph including X is adopted, THEN all environmental queries have
>> been respected, and there will never be a reason to un-adopt X.
>>
>>> *(D) Solution in JEP 483*
>>>
>>> In JEP 483, the solution is "AOT-initialized classes". For the above
>>> example, we store class A in the AOT cache in the "initialized"
>>> state. During the production run, A.<clinit> is no longer executed,
>>> and A.x maintains its value as in the assembly phase.
>>
>> Definition: An “AOT initialized class” is one whose Class object
>> exists in the AOT heap graph, and that object is populated with the
>> initial values of all static fields of that class.
>>
>> Definition: A post-AOT initialized class is any class that is not
>> AOT-initialized (in some Leyden AOT cache configuration). Post-AOT
>> initialized classes can be further split into “startup initialized”
>> classes (whose clinit methods the JVM runs directly in the premain
>> phase) and “JIT initialized” classes (whose clinit methods are run at
>> the last possible moment).
>>
>> Definition: A “premain initialized” class is either AOT initialized,
>> or else startup initialized, but not JIT initialized. Recall that
>> “premain” is the bootstrapping phase before the application main is
>> invoked. The init actions that happen during premain might be AOT
>> actions (which seem to happen immediately) or startup actions (slower
>> but hopefully still very fast).
>>
>> When we can do this AOT initialization optimization for some class C,
>> it is very powerful. It allows us to include AOT graph objects at
>> will, from which C is reachable, and not worry about the timing of
>> their adoption. It is as if all such C were initialized very, very
>> quickly, at JVM startup, before any other action happens.
>>
>>> The process of finding all the "AOT-initialized classes" needed by
>>> JEP 483:
>>>
>>> - Perform a training run with an application that uses many lambda
>>> expressions
>>>
>>> - Create an AOT cache using data from this training run. Look for
>>> warnings produced by CDSHeapVerifier:
>>>
>>> Archive heap points to a static field that may be reinitialized at
>>> runtime:
>> I have a problem with this formulation: We never set out to
>> “reinitialize” any class, so “may be reinitialized” is a
>> counter-factual statement. We only set out to AOT-initialize classes
>> if the user’s observation of them sees the classes to be initialized
>> exactly once, but very, very quickly.
>>
>> I would prefer to rephrase the error message (and underlying concept)
>> as:
>>
>> “Archive heap points to a static field that requires post-AOT
>> initialization”
>>
>> (or “startup initialization” or some such). Situations which tempt
>> us to think of reinitialization are situations where we are creating
>> bootstrapping bugs.
>>
>>> Field: java/lang/invoke/SimpleMethodHandle::BMH_SPECIES
>>> Value: java.lang.invoke.BoundMethodHandle$SpeciesData
>>> {0x000000060e8962f0} - klass:
>>> 'java/lang/invoke/BoundMethodHandle$SpeciesData' - flags:
>>> [...]
>>> --- trace begin ---
>>> [ 0] {0x000000060f410170} [Ljava.lang.Object; @[238]
>>> [ 1] {0x000000060f430520}
>>> java.lang.invoke.BoundMethodHandle$Species_L::type (offset = 16)
>>> [ 2] {0x000000060e8ab130} java.lang.invoke.MethodType::form (offset
>>> = 20)
>>> [ 3] {0x000000060e883b70}
>>> java.lang.invoke.MethodTypeForm::lambdaForms (offset = 28)
>>> [ 4] {0x000000060e883b90} [Ljava.lang.Object; @[7]
>>> [ 5] {0x000000060e89a428} java.lang.invoke.LambdaForm::names (offset
>>> = 32)
>>> [ 6] {0x000000060e89a3b0} [Ljava.lang.invoke.LambdaForm$Name; @[0]
>>> [ 7] {0x000000060e897fb8}
>>> java.lang.invoke.LambdaForm$Name::constraint (offset = 24)
>>> [ 8] {0x000000060e896ab8}
>>> java.lang.invoke.BoundMethodHandle$SpeciesData::this$0 (offset = 40)
>>> [ 9] {0x000000060e895aa0}
>>> java.lang.invoke.BoundMethodHandle$Specializer::topSpecies (offset =
>>> 44)
>>> [10] {0x000000060e8962f0}
>>> java.lang.invoke.BoundMethodHandle$SpeciesData
>>> --- trace end ---
>>>
>>> In this example, we see a cached object (0x000000060e895aa0) points
>>> to {0x000000060e8962f0}, which is in the static field
>>> SimpleMethodHandle::BMH_SPECIES.
>>>
>>> To get rid of this warning, we add SimpleMethodHandle to the list in
>>> AOTClassInitializer::can_archive_initialized_mirror [7]
>>>
>>> - Create the AOT cache again. You may see new warnings because the
>>> mirror of SimpleMethodHandle may point to the static fields of other
>>> clases.
>>>
>>> - Keep doing this until you can no longer see these warnings
>> This is a good process. I think we want to build in enough bootstrap
>> checking logic into the JVM so that a maintainer who breaks this
>> discipline will hear about it quickly, without having to enable
>> warnings and then sift through them.
>>
>> I am beginning to think that we should experiment with annotations on
>> java.base classes that express our intention to AOT-initialize them,
>> with VM checks (at least in debug builds) that ensure this is safe to
>> do.
>>
>>> *(E) List of aot-initialized classes*
>>>
>>> For a traning run like "javac HelloWorld.java", we can produce an
>>> AOT cache that contains the following 22 aot-initialized classes
>>>
>>> java.lang.constant.ConstantDescs
>>> java.lang.constant.DynamicConstantDesc
>>> java.lang.Enum
>>> java.lang.invoke.BoundMethodHandle
>>> java.lang.invoke.BoundMethodHandle$Specializer
>>> java.lang.invoke.BoundMethodHandle$Species_L
>>> java.lang.invoke.BoundMethodHandle$Species_LL
>>> java.lang.invoke.ClassSpecializer
>>> java.lang.invoke.ClassSpecializer$1
>>> java.lang.invoke.ClassSpecializer$Factory
>>> java.lang.invoke.ClassSpecializer$SpeciesData
>>> java.lang.invoke.DelegatingMethodHandle
>>> java.lang.invoke.DirectMethodHandle
>>> java.lang.invoke.DirectMethodHandle$Holder
>>> java.lang.invoke.LambdaForm
>>> java.lang.invoke.LambdaForm$NamedFunction
>>> java.lang.invoke.MethodHandle
>>> java.lang.invoke.MethodType$AOTHolder
>>> java.lang.invoke.SimpleMethodHandle
>>> java.lang.Object
>>> jdk.internal.constant.PrimitiveClassDescImpl
>>> jdk.internal.constant.ReferenceClassDescImpl
>>>
>>>
>>> Plus the following 7 enum types that have customized <clinit> code
>>>
>>> java.lang.constant.DirectMethodHandleDesc$Kind
>>> java.lang.invoke.LambdaForm$BasicType
>>> java.lang.invoke.VarHandle$AccessMode
>>> java.lang.invoke.VarHandle$AccessType
>>> java.lang.reflect.AccessFlag$Location
>>> java.util.stream.StreamOpFlag
>>> sun.invoke.util.Wrapper
>>>
>>>
>>> *(F) List of Init-at-JVM-start classes*
>>>
>>> During the production run, these classes are loaded into the VM in
>>> the "initialized" state. As a result, the static fields of these
>>> classes become reachable. We must initialize the classes of all
>>> objects that are reachable from these static fields. There are 24
>>> such classes.
>> I am not following here. If the class is in an initialized state, is
>> there some process that happens later which runs the clinit again?
>> (Is this the dreaded “reinitialization”?) If we allow this, it is
>> technical debt, and we have to have a plan to eliminate it, lest it
>> eventually become visible. (We can and must perform some kinds of
>> non-standard actions to boot up the first few classes, but they need
>> to be minimized.)
>
>
> For an example like this:
>
> /* @aot-initialized */ class A {
> static final B b = new B();
> }
>
> A is AOT-initialized, but B is not. When class A is loaded into the
> JVM, A.b becomes reachable from Java code. Therefore, we must
> initialize B, which is reachable from A's mirror.
>
> We are not "re-initializing B" in the sense that B's mirror already
> contains some non-default values before B.<clinit> is executed.
> Instead, when B.<clinit> is executed, B's static fields are all zeros
> and nulls.
>>
>>> java.lang.ArithmeticException
>>> java.lang.ArrayIndexOutOfBoundsException
>>> java.lang.ArrayStoreException
>>> java.lang.Class
>>> java.lang.ClassCastException
>>> java.lang.Double
>>> java.lang.Float
>>> java.lang.Integer
>>> java.lang.InternalError
>>> java.lang.NullPointerException
>>> java.lang.invoke.BoundMethodHandle$Specializer$Factory
>>> java.lang.invoke.BoundMethodHandle$SpeciesData
>>> java.lang.invoke.DirectMethodHandle$Accessor
>>> java.lang.invoke.DirectMethodHandle$Constructor
>>> java.lang.invoke.Invokers
>>> java.lang.invoke.LambdaForm$Name
>>> java.lang.invoke.LambdaFormEditor$Transform
>>> java.lang.invoke.MemberName
>>> java.lang.invoke.MemberName$Factory
>>> java.lang.invoke.MethodHandleImpl$IntrinsicMethodHandle
>>> java.lang.invoke.MethodType
>>> java.lang.invoke.MethodTypeForm
>>> java.util.EnumMap
>>> java.util.concurrent.ConcurrentHashMap
>>>
>>> (E.g., MethodType$AOTHolder contains a HashMap that stores many
>>> MethodTypes, so we must initialize the MethodType class. Note that
>>> HashMap is not in the list because it doesn't have a <clinit>. For
>>> clarify, I have omitted all classes that can be trivially initialized).
>> The only thing that might stop us from AOT-initializing a class with
>> no clinit would be a superclass (or interface) that has a clinit
>> which is not AOT initializable.
>>
>> Connecting back to the reachability concept: As soon as some X in
>> the AOT heap become reachable (in the normal GC sense), then all
>> classes C reachable from X must be initialized (or perhaps “started”
>> in the current thread). It would be best, of course, if those
>> classes were all AOT initialized. If not, we need a story to ensure
>> they are startup initialized (if not AOT initialized).
>>
>> I think this means that if some class C is AOT initialized, then its
>> super S must also be AOT initialized, or else any access to an
>> instance X which can reach C must be organized so that the startup
>> initialization of S gets run first.
>>
>> I think our current state of the art is that we run such startup
>> initializers ASAP, and test that there are no bugs. It would be good
>> to have more accurate checks as well, if we can define them suitably.
>>
>>> (G) Testing and Validation
>>>
>>> The output "Archive heap points to a static field that may be
>>> reinitialized" is checked by more than 300 CDS test cases [8]. This
>>> ensures that we have a correct list of AOT-initialized classes. This
>>> will also catch any future changes in the Java classes in
>>> java.lang.invoke that may be incompatible with JEP 483.
>>>
>>> Also, since the (E) and (F) lists are small (about 50 classes), it's
>>> possible for a human to examine those classes to find potential
>>> problems
>>>
>>> - For example, an AOT-initialized class shouldn't be dependent on
>>> the environment, such as storing the current time of day, etc.
>> I agree that human examination is sufficient to start with, even to
>> detect environmental dependencies. This is what we should ship
>> with. But we should build better detection tools soon, so that
>> routine enhancements don’t run afoul of those checks (the dreaded
>> “reinitialization”).
>>
>>> *(H) Extensibility, or lack thereof*
>>>
>>> This design limits future Leyden optimization as any significant
>>> increase of the list of classes in (E) and (F) will make human
>>> validation much more difficult. We should consider using automated
>>> validation tools, as well as refactoring the Java classes to make it
>>> easier to decide what classes can be AOT-initialized.
>> Yes! Preach it!
>>
>> — John
>>
>>>
>>> =======
>>>
>>> [1] https://openjdk.org/jeps/483
>>>
>>> [2] Due to <clinit> circularity, it may be possible to created an
>>> instance of X before X::<clinit> finishes.
>>>
>>> [3]
>>> https://github.com/openjdk/jdk/blob/602408e4f3848b30299ea94264e88ead5361a310/src/java.base/share/classes/java/lang/Integer.java#L957-L969
>>>
>>> [4]
>>> https://mail.openjdk.org/pipermail/leyden-dev/2024-August/000911.html
>>>
>>> [5]
>>> https://github.com/openjdk/leyden/blob/3a84df9d9860e743684e335282e3910b14cc982b/src/hotspot/share/cds/heapShared.cpp#L1415-L1466
>>>
>>> [6]
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/cds/cdsHeapVerifier.cpp
>>>
>>> [7]
>>> https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/src/hotspot/share/cds/aotClassInitializer.cpp#L178-L192
>>>
>>> [8]
>>> https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/test/lib/jdk/test/lib/cds/CDSTestUtils.java#L289
More information about the leyden-dev
mailing list