Class initialization in JEP 483

ioi.lam at oracle.com ioi.lam at oracle.com
Mon Oct 7 21:39:23 UTC 2024


I’ve done a (short) exercise to see what happens if all classes are 
AOT-inited. I quickly ran into problems because the first classes have 
intertwined states between the Java and native. For example, a large 
part of the module graph is implemented natively.

So I tried something less drastic -- AOT-init all of java.lang.invoke. 
This seems doable, as java.lang.invoke carriers few native states. So 
perhaps we can use this as an intermediate solution before we can solver 
the larger problem.

I have a prototype here:

     https://github.com/iklam/jdk/pull/1
     https://bugs.openjdk.org/browse/JDK-8341600

Essentially, I am trying to preserve the states of the classes used by 
java.lang.invoke (in terms of the values of their static variables). As 
a result, when an aot-resolved invokedynamic call site is invoked during 
the production run, it's "view of the world" is the same as during the 
assembly phase.

Again, this is far from ideal, but I think it's better than my previous 
solution, which is less predictable as it allows some of the 
java.lang.invoke classes to be initialized at application start-up.

Any thoughts?

Thanks

- Ioi



Implementation
==========

To be precise, I am not AOT-initing every class in java.lang.invoke. 
Rather, I am marking every class that have at least one instance in the 
object graph of the aot-resolved lambda call sites. Such classes are 
marked as AOT-inited.

The algorithm automatically discovers all classes that need to be 
AOT-inited:

- Scan all constant pools of all classes in the AOT cache. This will 
discover all the Java objects that represent the resolved invokedynamic 
call sites.
- Find the reachable classes [see John's definition of CLASS 
REACHABILITY below] from these objects
- Mark the reachable classes as AOT-inited. Scan the static fields of 
these classes. You will find more reachable classes.
- Repeat the above step until you can find no more reachable classes.

I had to make two change in the core library. See NoAOT in 
java.lang.Class and java.lang.invoke.MethodType in the above PR.

E.g., in Class.java

- private static ReflectionFactory reflectionFactory;
+ private static class NonAOT {
+       private static ReflectionFactory reflectionFactory;
+   }

I did this because reflectionFactory holds onto many states that the AOT 
cache cannot handle yet (e.g., it transitively refers to 
Unsafe.theUnsafe), so I just move this static field out of Class. My 
assumption (not validated yet) is that if ReflectionFactory.<clinit> is 
run during app start-up, it won't affect the validity of the 
AOT-resolved indy call sites.

There was only one class that I had to explicitly remove from the manual 
AOT-inited list -- java/util/concurrent/ConcurrentHashMap. The reason is 
this field:

     private static Unsafe U = Unsafe.getUnsafe();

I could probably move this field into an "NoAOT" inner class, but I 
worry if this might cause any performance issues as the "U" field is 
very frequently used. Hopefully ConcurrentHashMap is unrelated with 
java.lang.invoke so running its <clinit> during app start-up won't be an 
issue.


Manual Configuration & Validation
=====================

The algorithm can't discover all classes whose static fields need to be 
preserved. For example, DirectMethodHandle::IMPL_NAMES and 
MethodHandles::IMPL_NAMES are the same object, but the algorithm finds 
only DirectMethodHandle (as MethodHandles is a "static-only" classes so 
there are no instances of it).

See comments in aotClassInitializer.cpp for more details. This file also 
has a list of classes (including MethodHandles) that are manually marked 
as "need to be AOT-inited".

There's a limited amount of validation (CDSHeapverifier) that finds 
potential problems where you missed classes that need to be manually 
marked. This is also explained in aotClassInitializer.cpp.


Caveats
=====

CDSHeapverifier is incomplete

(a) It only checks the objects that are directly pointed to by static 
fields. In the DirectMethodHandle::IMPL_NAMES example, if MethodHandles 
were coded like this, then it would not be discovered by CDSHeapverifier

- static final MemberName.Factory IMPL_NAMES = MemberName.getFactory();
+ static final Object[] IMPL_NAMES = new Object[] { 
MemberName.getFactory() };

(b) CDSHeapverifier tracks only object fields. It doesn't check for 
dependencies that may be exposed through non-objects fileds



(end)

On 10/3/24 10:29 PM, ioi.lam at oracle.com wrote:
> Hi John,
>
> Thanks for your comments. I have a few responses in-line.
>
> Also, to recap what we discussed in today's Leyden meeting, I am going 
> to see if it's possible to make all relevant classes AOT-initialized, 
> and thus avoid potential issues of startup-initialization.
>
> More below ...
>
> On 10/3/24 2:50 AM, John Rose wrote:
>> On 2 Oct 2024, at 22:13, ioi.lam at oracle.com wrote:
>>
>>> TL;DR
>>>
>>> I want to discuss how Java classes are initialized as a part of the 
>>> implementation of the up-coming JEP 483 [1].
>>>
>>> This is by no means a clean design. It's a compromise due to the 
>>> complexity of the Java classes, such as java.lang.invoke.*, that JEP 
>>> 483 wants to optimize.
>>>
>>> In the future, we can make thing much better by, perhaps, 
>>> refactoring the Java classes.
>>>
>>> Here, I want to get a consensus if the current implementation in JEP 
>>> 483 is "good enough" to be integrated into the mainline.
>> This is a useful analysis.  Related to it, I have various ideas about 
>> “shaping” the JDK classes to cooperate better with the AOT cache, and 
>> also adding new checks to the JVM to help detect classes which are 
>> “not in good shape”.   For now I hope we can determine correctness by 
>> inspection, and quickly follow up with work on (simple!) tools for 
>> validating the necessary assumptions.  (Your email goes a long way 
>> towards clarifying what those necessary assumptions are.)  That way 
>> maintainers won’t break the subtle init-order dependencies between 
>> java.base classes, with routine changes.
>>
>>> *(A) Background*
>>>
>>> JEP 483 includes the AOT-linking of invokedynamic call sites. As a 
>>> result, it becomes necessary for the AOT cache to store objects such 
>>> as MethodTypes, as well as generated LambdaForm classes.
>> I have a “stretch goal” in mind here:  I think if we sort out the 
>> initialization of lambdas, MHs, MTs, LFs, and the rest of the indy 
>> infrastructure, we may understand the problem well enough to allow 
>> very early initialization of those system components.  (And also, how 
>> to use the AOT cache to optimize their startup.)  That in turn will 
>> enable us to use indy (and condy) with few or no restrictions on 
>> their usage.  Recent JDK changes to convert lambdas to inner classes 
>> (or similar indy-avoidance tactics) will not be needed if indy is 
>> robustly initialized at an early point.  But this requires mastering 
>> the bootstrap logic.  Hence the stretch goal, since we are talking 
>> here about getting the bootstrap logic (specifically java.base 
>> init-order) under control.
>>
>>> A basic requirement for Java objects is --
>>> when an object of type X becomes accessible by Java bytecodes, we 
>>> must have at least started the initialization of X [2]. We want to 
>>> preserve this behavior for cached objects, so we won't run into 
>>> incompatibilities when using cached objects.
>> This is an interesting principle.  It is normally true apart from the 
>> AOT cache, but it puts an extra burden on access to the AOT heap.  
>> Specifically, you can’t open up any (sub-)graph in the AOT heap for 
>> access until all classes mentioned in that heap (sub-)graph are at 
>> least starting their initialization.
>>
>> There’s a little more, in fact:  If some class C is starting to init, 
>> but has not finished its init, then the access must be initially 
>> confined to the thread that is running C::<clinit>.  This is the 
>> normal JVM rule (other threads block until C::<clinit> returns) and 
>> must be emulated by the AOT cache as well.
>>
>> (Background:  The AOT cache has a heap segment defining one or more 
>> graphs of objects.  An object graph has one or more roots. Such a 
>> graph is an “asset” in the AOT cache, that can be “adopted” into the 
>> running JVM as an interconnected set of Java objects.  Multiple AOT 
>> graphs might appear if we have separately defined decisions to adopt 
>> them.  Or, we might just say “take the whole graph or leave it”.  For 
>> future work, it is somewhat clear we wish to allow Leyden-aware 
>> applications to layer AOT object graphs of their own on top of the 
>> java.base object graph or graphs.  But we are not there yet; all of 
>> this is happening inside java.base.  We must learn to walk before we 
>> can think about running.)
>>
>> Definition of “CLASS REACHABILITY”:  If an object X is in the AOT 
>> heap, then a class C is said to be “reachable from X” if C is the 
>> class of X, or is a superclass or an implemented interface of X’s 
>> class.  In addition, we say that C is “reachable from X” (in a 
>> recursive sense) if C is reachable from another object Y which is 
>> referenced by a non-static field of X, or another object Y which is 
>> referenced by a static field of a class S (already) reachable from 
>> X.  This definition is simply a transitive closure of an 
>> appropriately defined graph, where instances X point to their classes 
>> C and their non-static field referents, and classes C point to their 
>> super-types D and their static field referents Y.
>>
>> With that definition under our belt, we can say that when an object 
>> X, originally in the AOT heap, becomes reachable to java.base code 
>> (in the normal GC sense, during JDK startup), then every class 
>> reachable from X must have an appropriate initialization state.  The 
>> definition of “appropriate” here is tricky.
>>
>> Note that we can calculate this class-reachability statically. So, 
>> when it helps, we can make lists of classes C reachable from some AOT 
>> heap root X.  Or we can take the union of all such lists, reachable 
>> from the whole AOT heap.
>>
>> So what is the most useful definition of an “appropriate 
>> initialization state” of classes which become reachable as a result 
>> of adopting AOT heap objects?
>> For maximum compatibility with existing Java practice, we would say 
>> that the initialization state must be either DONE or else it is 
>> STARTED, and the access to the AOT heap object is in the same thread 
>> that is running the clinit method.
>>
>> I suggest that we might want to simplify this rule.  It would be 
>> nicer (for us, not for JDK programmers) if an AOT heap object X was 
>> adopted only after all reachable classes C (reachable from X) were in 
>> the DONE state.  Maybe with one exception:  If some reachable class C 
>> (from X) is in the STARTED state, then C is that unique class whose 
>> <clinit> method is the youngest <clinit> method on the current 
>> thread.  This exception allows the <clinit> method for MethodType to  
>> adopt some AOT asset in the AOT heap which contains a zillion 
>> MethodType objects, but nothing else that is not fully initialized.
>>
>> There are immediate objections to such a simplification, but I think 
>> they can be overcome, and I suspect the end state will be better for 
>> all of us.  The MethodType class recursively initializes the 
>> MethodHandle class, so that it can build an invoker cache (via some 
>> intermediate non-public class).  So you need MH.class initialized to 
>> make MHs for MT, and that happens during MT’s clinit, and there is a 
>> dependency loop, aka “chicken and egg” problem.  If we want to 
>> experiment with simplified initialization conditions, we will have to 
>> break such loops.  It is possible, and may even be profitable for 
>> us.  In the end, simpler initialization sequences will be easier to 
>> optimize and cache.
>>
>>> So how do we meet this requirement?
>>>
>>>
>>> *(B) Existing Solution in JDK Mainline*
>>>
>>> In the current JDK mainline, cached objects are "loaded" into a Java 
>>> program via explicit CDS.initializeFromArchive() calls. See [3] for 
>>> an example in java/lang/Integer$IntegerCache
>>>
>>> class IntegerCache {
>>>    ...
>>>    // at this point, IntegerCache::archivedCache is null
>>>    CDS.initializeFromArchive(IntegerCache.class);
>>>    // at this point, IntegerCache::archivedCache is loaded from the 
>>> AOT cache
>> The field IntegerCache.archivedCache (N.B. the :: syntax is only for 
>> methods, not fields) is a root which points into the AOT heap graph.  
>> Very cool!  This does not violate the Java language rules, as long as 
>> the strange object graph is indistinguishable from an object graph 
>> which might have been created according to the usual rules of 
>> computation, from the context of the clinit method of the relevant 
>> class (IntegerCache).  In fact, when booting an exploded build there 
>> is no AOT cache at all, and the necessary integer cache is built “the 
>> hard way” and stored into a variable that CDS knows about.  This 
>> handshake (by which the cache built the hard way is converted into an 
>> AOT asset, and then adopted into a production run) will get more and 
>> more regular over time, we think.
>>
>>> As part of the CDS.initializeFromArchive() call, the classes of all 
>>> objects that are reachable from archivedCache are initialized.
>> (I am assuming that “classes of all objects reachable” is closely 
>> aligned with my definition above, of class reachability!)
>
> Yes
>
>>
>> Ioi:  Please explain in a bit more detail why this is true. What part 
>> of the JVM (or JDK) grabs all the reachable classes and forcibly 
>> initializes them?  Or, more subtly, why are we so very lucky that, by 
>> the time we can grab a pointer to that archived heap subgraph, 
>> somehow magically the classes are already initialized?
>>
>> (And are they really in the DONE init state, or might they be in the 
>> STARTED state in the current thread?)
>
> In the AOT cache, there's a data structure that remembers the list of 
> classes that are printed by the "IntegerCache ( 1) => 
> java.lang.Integer" logs below. When CDS.initializeFromArchive() is 
> called to fetch IntegerCache::archivedCache, it will finish 
> initializing that list of classes before returning control back to the 
> Java code.
>
>
>>> You can see this list classes by:
>>>
>>> $ java -Xshare:dump -Xlog:cds+heap | grep IntegerCache
>>> [...]
>>> [1.063s][info][cds,heap] Archived object klass 
>>> java.lang.Integer$IntegerCache ( 0) => [Ljava.lang.Integer;
>>> [1.063s][info][cds,heap] Archived object klass 
>>> java.lang.Integer$IntegerCache ( 1) => java.lang.Integer
>>>
>>> All classes listed above will be initialized before archivedCache 
>>> becomes non-null.
>> This is the crucial property to engineer into our system!
>>
>>>
>>> *(C) New Challenges with java.lang.invoke*
>>>
>>> In JEP 483, in order to support AOT-linking of invokedynamic, we run 
>>> into a few issues.
>>>
>>> #1. We can no longer use the CDS.initializeFromArchive() design due 
>>> to a circular dependency between MethodType and DirectMethodHandle 
>>> [4] [5].
>> Maybe we can find a solution here which embraces the circularity, by 
>> making the whole bundle of mutually-recursive classes be 
>> AOT-initialized.  So it doesn’t matter what order they are 
>> initialized, for the very special reason that the initialization 
>> takes place “as if instantaneously”.
>>
>> Maybe in this case, or in future similar cases, we will need to break 
>> the circularity.  If that is the case, then something like this will 
>> happen:  First MethodType gets fully initialized, and during that 
>> process NO INSTANTIATION of MethodHandles is performed.  (How 
>> draconian!)  Later, MethodHandle is fully initialized and all its 
>> code enjoys the full initialization of MethodType.  For the class 
>> MethodHandle, there might be hundreds or thousands of AOT heap 
>> objects that are method handles, containing pointers to AOT method 
>> type objects.
>>
>> The present problem is that a MethodType has a cached struct (or 
>> array) of invokers, and each invoker is a MH, and each MH has a MT.  
>> That is a circularity.  Can it be broken?  Yes, by changes to JDK 
>> code.  I suspect this is in our future, although at present I hope we 
>> can get away with the trick (mentioned above) of simultaneous 
>> initialization (of MT and MH at the same time, also entangled classes 
>> like DMH).
>>
>>> #2. Some cached objects in java.lang.invoke may point to static 
>>> fields whose identity is significant. A simplified example looks 
>>> like this:
>>>
>>>      class A {
>>>           static final Object x = new Object();
>>>      }
>>>      class B {
>>>            Object y = A.x;
>>>            boolean isValid() { return y == A.x; }
>>>      }
>>>
>>>      B cachedB = new B();     /* executed during assembly phase */
>>>
>>> If we store cachedB into the AOT cache, we will recursively store 
>>> the cachedB.y field, which points to the version of A.x during the 
>>> assembly phase. During the production run, if we allow A.<clinit> to 
>>> be executed again, cacheB.isValid() will return false because 
>>> cachedB.y is now a different instance than A.x.
>>>
>>> In fact, #2 is not a new problem. There's already a debugging class 
>>> - CDSHeapVerifier - that checks for potential errors [6]
>> We need rules for object identities as well.  I see two of them.  
>> First, if an object X is known to be private, and all the private 
>> code that works on it does not inspect its identity (via X==Y or 
>> System::identityHashCode(X)) we can certify that there is no danger.  
>> Second, if X is public or escapes in some way to “random” user code 
>> (that might do X==Y etc.) then we must ensure that, IF the AOT heap 
>> graph including X is adopted, THEN all environmental queries have 
>> been respected, and there will never be a reason to un-adopt X.
>>
>>> *(D) Solution in JEP 483*
>>>
>>> In JEP 483, the solution is "AOT-initialized classes". For the above 
>>> example, we store class A in the AOT cache in the "initialized" 
>>> state. During the production run, A.<clinit> is no longer executed, 
>>> and A.x maintains its value as in the assembly phase.
>>
>> Definition:  An “AOT initialized class” is one whose Class object 
>> exists in the AOT heap graph, and that object is populated with the 
>> initial values of all static fields of that class.
>>
>> Definition:  A post-AOT initialized class is any class that is not 
>> AOT-initialized (in some Leyden AOT cache configuration). Post-AOT 
>> initialized classes can be further split into “startup initialized” 
>> classes (whose clinit methods the JVM runs directly in the premain 
>> phase) and “JIT initialized” classes (whose clinit methods are run at 
>> the last possible moment).
>>
>> Definition:  A “premain initialized” class is either AOT initialized, 
>> or else startup initialized, but not JIT initialized.  Recall that 
>> “premain” is the bootstrapping phase before the application main is 
>> invoked.  The init actions that happen during premain might be AOT 
>> actions (which seem to happen immediately) or startup actions (slower 
>> but hopefully still very fast).
>>
>> When we can do this AOT initialization optimization for some class C, 
>> it is very powerful.  It allows us to include AOT graph objects at 
>> will, from which C is reachable, and not worry about the timing of 
>> their adoption.  It is as if all such C were initialized very, very 
>> quickly, at JVM startup, before any other action happens.
>>
>>> The process of finding all the "AOT-initialized classes" needed by 
>>> JEP 483:
>>>
>>> - Perform a training run with an application that uses many lambda 
>>> expressions
>>>
>>> - Create an AOT cache using data from this training run. Look for 
>>> warnings produced by CDSHeapVerifier:
>>>
>>> Archive heap points to a static field that may be reinitialized at 
>>> runtime:
>> I have a problem with this formulation:  We never set out to 
>> “reinitialize” any class, so “may be reinitialized” is a 
>> counter-factual statement.  We only set out to AOT-initialize classes 
>> if the user’s observation of them sees the classes to be initialized 
>> exactly once, but very, very quickly.
>>
>> I would prefer to rephrase the error message (and underlying concept) 
>> as:
>>
>> “Archive heap points to a static field that requires post-AOT 
>> initialization”
>>
>> (or “startup initialization” or some such).  Situations which tempt 
>> us to think of reinitialization are situations where we are creating 
>> bootstrapping bugs.
>>
>>> Field: java/lang/invoke/SimpleMethodHandle::BMH_SPECIES
>>> Value: java.lang.invoke.BoundMethodHandle$SpeciesData
>>> {0x000000060e8962f0} - klass: 
>>> 'java/lang/invoke/BoundMethodHandle$SpeciesData' - flags:
>>> [...]
>>> --- trace begin ---
>>> [ 0] {0x000000060f410170} [Ljava.lang.Object; @[238]
>>> [ 1] {0x000000060f430520} 
>>> java.lang.invoke.BoundMethodHandle$Species_L::type (offset = 16)
>>> [ 2] {0x000000060e8ab130} java.lang.invoke.MethodType::form (offset 
>>> = 20)
>>> [ 3] {0x000000060e883b70} 
>>> java.lang.invoke.MethodTypeForm::lambdaForms (offset = 28)
>>> [ 4] {0x000000060e883b90} [Ljava.lang.Object; @[7]
>>> [ 5] {0x000000060e89a428} java.lang.invoke.LambdaForm::names (offset 
>>> = 32)
>>> [ 6] {0x000000060e89a3b0} [Ljava.lang.invoke.LambdaForm$Name; @[0]
>>> [ 7] {0x000000060e897fb8} 
>>> java.lang.invoke.LambdaForm$Name::constraint (offset = 24)
>>> [ 8] {0x000000060e896ab8} 
>>> java.lang.invoke.BoundMethodHandle$SpeciesData::this$0 (offset = 40)
>>> [ 9] {0x000000060e895aa0} 
>>> java.lang.invoke.BoundMethodHandle$Specializer::topSpecies (offset = 
>>> 44)
>>> [10] {0x000000060e8962f0} 
>>> java.lang.invoke.BoundMethodHandle$SpeciesData
>>> --- trace end ---
>>>
>>> In this example, we see a cached object  (0x000000060e895aa0) points 
>>> to {0x000000060e8962f0}, which is in the static field 
>>> SimpleMethodHandle::BMH_SPECIES.
>>>
>>> To get rid of this warning, we add SimpleMethodHandle to the list in 
>>> AOTClassInitializer::can_archive_initialized_mirror [7]
>>>
>>> - Create the AOT cache again. You may see new warnings because the 
>>> mirror of SimpleMethodHandle may point to the static fields of other 
>>> clases.
>>>
>>> - Keep doing this until you can no longer see these warnings
>> This is a good process.  I think we want to build in enough bootstrap 
>> checking logic into the JVM so that a maintainer who breaks this 
>> discipline will hear about it quickly, without having to enable 
>> warnings and then sift through them.
>>
>> I am beginning to think that we should experiment with annotations on 
>> java.base classes that express our intention to AOT-initialize them, 
>> with VM checks (at least in debug builds) that ensure this is safe to 
>> do.
>>
>>> *(E) List of aot-initialized classes*
>>>
>>> For a traning run like "javac HelloWorld.java", we can produce an 
>>> AOT cache that contains the following 22 aot-initialized classes
>>>
>>> java.lang.constant.ConstantDescs
>>> java.lang.constant.DynamicConstantDesc
>>> java.lang.Enum
>>> java.lang.invoke.BoundMethodHandle
>>> java.lang.invoke.BoundMethodHandle$Specializer
>>> java.lang.invoke.BoundMethodHandle$Species_L
>>> java.lang.invoke.BoundMethodHandle$Species_LL
>>> java.lang.invoke.ClassSpecializer
>>> java.lang.invoke.ClassSpecializer$1
>>> java.lang.invoke.ClassSpecializer$Factory
>>> java.lang.invoke.ClassSpecializer$SpeciesData
>>> java.lang.invoke.DelegatingMethodHandle
>>> java.lang.invoke.DirectMethodHandle
>>> java.lang.invoke.DirectMethodHandle$Holder
>>> java.lang.invoke.LambdaForm
>>> java.lang.invoke.LambdaForm$NamedFunction
>>> java.lang.invoke.MethodHandle
>>> java.lang.invoke.MethodType$AOTHolder
>>> java.lang.invoke.SimpleMethodHandle
>>> java.lang.Object
>>> jdk.internal.constant.PrimitiveClassDescImpl
>>> jdk.internal.constant.ReferenceClassDescImpl
>>>
>>>
>>> Plus the following 7 enum types that have customized <clinit> code
>>>
>>> java.lang.constant.DirectMethodHandleDesc$Kind
>>> java.lang.invoke.LambdaForm$BasicType
>>> java.lang.invoke.VarHandle$AccessMode
>>> java.lang.invoke.VarHandle$AccessType
>>> java.lang.reflect.AccessFlag$Location
>>> java.util.stream.StreamOpFlag
>>> sun.invoke.util.Wrapper
>>>
>>>
>>> *(F) List of Init-at-JVM-start classes*
>>>
>>> During the production run, these classes are loaded into the VM in 
>>> the "initialized" state. As a result, the static fields of these 
>>> classes become reachable. We must initialize the classes of all 
>>> objects that are reachable from these static fields. There are 24 
>>> such classes.
>> I am not following here.  If the class is in an initialized state, is 
>> there some process that happens later which runs the clinit again?  
>> (Is this the dreaded “reinitialization”?)  If we allow this, it is 
>> technical debt, and we have to have a plan to eliminate it, lest it 
>> eventually become visible.  (We can and must perform some kinds of 
>> non-standard actions to boot up the first few classes, but they need 
>> to be minimized.)
>
>
> For an example like this:
>
> /* @aot-initialized */ class A {
>     static final B b = new B();
> }
>
> A is AOT-initialized, but B is not. When class A is loaded into the 
> JVM, A.b becomes reachable from Java code. Therefore, we must 
> initialize B, which is reachable from A's mirror.
>
> We are not "re-initializing B" in the sense that B's mirror already 
> contains some non-default values before B.<clinit> is executed. 
> Instead, when B.<clinit> is executed, B's static fields are all zeros 
> and nulls.
>>
>>> java.lang.ArithmeticException
>>> java.lang.ArrayIndexOutOfBoundsException
>>> java.lang.ArrayStoreException
>>> java.lang.Class
>>> java.lang.ClassCastException
>>> java.lang.Double
>>> java.lang.Float
>>> java.lang.Integer
>>> java.lang.InternalError
>>> java.lang.NullPointerException
>>> java.lang.invoke.BoundMethodHandle$Specializer$Factory
>>> java.lang.invoke.BoundMethodHandle$SpeciesData
>>> java.lang.invoke.DirectMethodHandle$Accessor
>>> java.lang.invoke.DirectMethodHandle$Constructor
>>> java.lang.invoke.Invokers
>>> java.lang.invoke.LambdaForm$Name
>>> java.lang.invoke.LambdaFormEditor$Transform
>>> java.lang.invoke.MemberName
>>> java.lang.invoke.MemberName$Factory
>>> java.lang.invoke.MethodHandleImpl$IntrinsicMethodHandle
>>> java.lang.invoke.MethodType
>>> java.lang.invoke.MethodTypeForm
>>> java.util.EnumMap
>>> java.util.concurrent.ConcurrentHashMap
>>>
>>> (E.g., MethodType$AOTHolder contains a HashMap that stores many 
>>> MethodTypes, so we must initialize the MethodType class. Note that 
>>> HashMap is not in the list because it doesn't have a <clinit>. For 
>>> clarify, I have omitted all classes that can be trivially initialized).
>> The only thing that might stop us from AOT-initializing a class with 
>> no clinit would be a superclass (or interface) that has a clinit 
>> which is not AOT initializable.
>>
>> Connecting back to the reachability concept:  As soon as some X in 
>> the AOT heap become reachable (in the normal GC sense), then all 
>> classes C reachable from X must be initialized (or perhaps “started” 
>> in the current thread).  It would be best, of course, if those 
>> classes were all AOT initialized.  If not, we need a story to ensure 
>> they are startup initialized (if not AOT initialized).
>>
>> I think this means that if some class C is AOT initialized, then its 
>> super S must also be AOT initialized, or else any access to an 
>> instance X which can reach C must be organized so that the startup 
>> initialization of S gets run first.
>>
>> I think our current state of the art is that we run such startup 
>> initializers ASAP, and test that there are no bugs.  It would be good 
>> to have more accurate checks as well, if we can define them suitably.
>>
>>> (G) Testing and Validation
>>>
>>> The output "Archive heap points to a static field that may be 
>>> reinitialized" is checked by more than 300 CDS test cases [8]. This 
>>> ensures that we have a correct list of AOT-initialized classes. This 
>>> will also catch any future changes in the Java classes in 
>>> java.lang.invoke that may be incompatible with JEP 483.
>>>
>>> Also, since the (E) and (F) lists are small (about 50 classes), it's 
>>> possible for a human to examine those classes to find potential 
>>> problems
>>>
>>> - For example, an AOT-initialized class shouldn't be dependent on 
>>> the environment, such as storing the current time of day, etc.
>> I agree that human examination is sufficient to start with, even to 
>> detect environmental dependencies.  This is what we should ship 
>> with.  But we should build better detection tools soon, so that 
>> routine enhancements don’t run afoul of those checks (the dreaded 
>> “reinitialization”).
>>
>>> *(H) Extensibility, or lack thereof*
>>>
>>> This design limits future Leyden optimization as any significant 
>>> increase of the list of classes in (E) and (F) will make human 
>>> validation much more difficult. We should consider using automated 
>>> validation tools, as well as refactoring the Java classes to make it 
>>> easier to decide what classes can be AOT-initialized.
>> Yes!  Preach it!
>>
>> — John
>>
>>>
>>> =======
>>>
>>> [1] https://openjdk.org/jeps/483
>>>
>>> [2] Due to <clinit> circularity, it may be possible to created an 
>>> instance of X before X::<clinit> finishes.
>>>
>>> [3] 
>>> https://github.com/openjdk/jdk/blob/602408e4f3848b30299ea94264e88ead5361a310/src/java.base/share/classes/java/lang/Integer.java#L957-L969
>>>
>>> [4] 
>>> https://mail.openjdk.org/pipermail/leyden-dev/2024-August/000911.html
>>>
>>> [5] 
>>> https://github.com/openjdk/leyden/blob/3a84df9d9860e743684e335282e3910b14cc982b/src/hotspot/share/cds/heapShared.cpp#L1415-L1466
>>>
>>> [6] 
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/cds/cdsHeapVerifier.cpp
>>>
>>> [7] 
>>> https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/src/hotspot/share/cds/aotClassInitializer.cpp#L178-L192
>>>
>>> [8] 
>>> https://github.com/iklam/jdk/blob/49eb47b6a9889625dc8bac6922cec6a5625a26b2/test/lib/jdk/test/lib/cds/CDSTestUtils.java#L289


More information about the leyden-dev mailing list