RFR: 8365932: Implementation of JEP 516: Ahead-of-Time Object Caching with Any GC

Thu Oct 9 15:45:21 UTC 2025

This is the implementation of JEP 516: Ahead-of-Time Object Caching with Any GC.

The current mechanism for the AOT cache to cache heap objects is by using mmap to place bytes from a file directly in the GC managed heap. This mechanism poses compatibility challenges that all GCs have to have bit by bit identical object and reference formats, as the layout decisions are offline. This has so far meant that AOT cache optimizations requiring heap objects are not available when using ZGC. This work ensures that all GCs, including ZGC, are able to use the more advanced AOT cache functionality going forward.

This JEP introduces a new mechanism for archiving a primordial heap, without such compatibility problems. It embraces online layouts and allocates objects one by one, linking them using the Access API, like normal objects. This way, archived objects quack like any other object to the GC, and the GC implementations are decoupled from the archiving mechanism.

The key to doing this GC agnostic object loading is to represent object references between objects as object indices (e.g. 1, 2, 3) instead of raw pointers that we hope all GCs will recognise the same. These object indices become the key way of identifying objects. One table maps object indices to archived objects, and another table maps object indices to heap objects that have been allocated at runtime. This allows online linking of the materialized heap objects.

The main interface to the cached heap is roots. Different components can register object roots at dump time. Each root gets assigned a root index. At runtime, requests can be made to get a reference to an object at a root index. The new implementation uses lazy materialization and concurrency. When a thread asks for a root object, it must ensure that the given root object and its transitively reachable objects are reachable. A new background thread called the AOTThread, tries to perform the bulk of the work, so that the startup impact of processing the objects one by one is not impacting the bootstrapping thread.

Since the background thread performs the bulk of the work, the archived is laid out to ensure it can run as fast as possible.
Objects are laid out inf DFS pre order over the roots in the archive, such that the object indices and the DFS traversal orders are the same. This way, the DFS traversal that the background thread is performing is the same order as linearly materializing the objects one by one in the order they are laid out in the archive. I call this iterative materialization, as opposed to traversing materialization used by the application threads if necessary.

Background materialization is performed in batches. Each batch has N roots and the background thread will ensure when processing a batch, that these roots are transitively materialized. In order to help with this, there is a table that for each root index knows the max DFS index transitively reachable. This allows each batch being processed by the background thread to define a range of objects currently being materialized: the first object and the last object of the batch.

This allows cheaper synchronization across threads. Each batch is claimed under a lock, but the actual processing of the batch may be performed without the lock, allowing the application thread(s) to perform traversals concurrently, without stepping on each others toes. When an application thread traverses through a root, for each visited object it can easily determine if this has already been transitively materialized by the background thread (before the current batch range), implying no further traversal is needed, or if the object is ahead of the current batch implying traversal is needed, or if the batch range currently being materialized contains the object, implying we should wait for it to finish. Intersections are rare, and for the most part, they can run independently.

Unlike the mapping solution, the new solution does not dump the entire string table. Instead, interned strings in the archive are marked as interned strings, and the loader knows when unpacking them that the string should be interned. An application thread might call intern before or after, but it does not matter; in both cases all links will be linked to the interned string, whichever one that is.

In order to speed up materialization, before GC is allowed to run, the materialization code maps object indices to objects using raw oops. It also copies archived object payload to the heap object (including object indices in the reference slots), and then fixes up the references in-place. This is only okay before GC is allowed to run. After that point in the bootstrapping, if materialization has not yet completed, a more careful strategy is used where primitive ranges are copied in bulk, but object references are copied one by one, so that the heap object never has its reference fields intermittently containing garbage object indices that can trip up a concurrent GC.

It is not currently intended to remove the previous heap archiving mechanism. Heuristically, we will pick the new mechanism when compressed oops is off. The JDK caches without compressed oops will hence use this new object streaming mechanism, while the other caches will use the old mapping mechanism. The HeapShared class has been used as a common interface between the two heap archive writers and loaders.

A lot of testing has been done with all combinations of GCs, AOTClassLinking, COH, COOPs, streaming/mapping, etc. Existing AOT/CDS tests have been adapted to manage with both heap archiving solutions.

-------------

Commit messages:
 - whitespace fixes
 - 8365932: Implementation of JEP 516: Ahead-of-Time Object Caching with Any GC

Changes: https://git.openjdk.org/jdk/pull/27732/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27732&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8365932
  Stats: 8639 lines in 109 files changed: 5870 ins; 2312 del; 457 mod
  Patch: https://git.openjdk.org/jdk/pull/27732.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27732/head:pull/27732

PR: https://git.openjdk.org/jdk/pull/27732