[crac] RFR: [CRaC] Support checkpointing with --patch-module by treating patch JARs as persistent

mazhen duke at openjdk.org
Mon Aug 4 18:42:15 UTC 2025


On Wed, 16 Jul 2025 06:31:34 GMT, Timofei Pushkin <tpushkin at openjdk.org> wrote:

>> #### Summary
>> 
>> This change enhances CRaC's checkpointing capabilities to correctly handle applications launched with the `--patch-module` option. Previously, using a JAR file with `--patch-module` would cause checkpointing to fail due to unmanaged file descriptors. This patch resolves the issue by treating these JARs as persistent resources, similar to how classpath JARs are handled.
>> 
>> #### Problem
>> 
>> When a JAR file is specified via `--patch-module`, it is opened by two distinct mechanisms within the JVM:
>> 
>> 1.  **Native `ClassLoader`**: The C++ `ClassLoader` implementation opens the JAR file to build its internal class path structures. This results in a file descriptor that is not managed by CRaC's Java-level resource tracking, leading to it being flagged as a "BAD: opened by application" file during `check_fds`.
>> 2.  **Java `ModulePatcher`**: The Java-level `jdk.internal.module.ModulePatcher` also opens the same JAR to scan for packages and resources. While this could be made CRaC-aware, the native-level file descriptor remains an issue.
>> 
>> This dual-opening mechanism leads to `CheckpointOpenFileException` being thrown for both file descriptors during a checkpoint, preventing a successful image creation. The log output clearly illustrates this problem:
>> 
>> 
>> $ jcmd 3403046 JDK.checkpoint
>> 3403046:
>> ...
>> JVM: FD fd=4 type=regular path="/home/mazhen/works/patch-demo/legacy-patch.jar" BAD: opened by application
>> JVM: FD fd=5 type=regular path="/home/mazhen/works/patch-demo/legacy-patch.jar" OK: claimed by java code
>> ...
>> An exception during a checkpoint operation:
>> jdk.internal.crac.mirror.CheckpointException
>>     Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: legacy-patch.jar
>>         at java.base/jdk.internal.crac.JDKFileResource.lambda$beforeCheckpoint$0(JDKFileResource.java:90)
>>         ...
>>     Caused by: java.lang.Exception: This file descriptor was created by main at epoch:1752027938791 here
>>         at java.base/jdk.internal.crac.JDKFdResource.<init>(JDKFdResource.java:40)
>>         ...
>>         at java.base/jdk.internal.module.ModulePatcher$JarResourceFinder.<init>(ModulePatcher.java:433)
>>         ...
>>     Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: FD fd=4 type=regular path=/home/mazhen/works/patch-demo/legacy-patch.jar
>>         at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:115)
>>         ...
>> 
>> 
>> #### Solution
>> 
>> The fundamental...
>
> src/hotspot/share/classfile/classLoader.cpp line 763:
> 
>> 761:   GrowableArray<int> fds;
>> 762:   assert(Thread::current()->is_VM_thread(), "should be called from VM op");
>> 763:   if (_patch_mod_entries == NULL) {
> 
> Suggestion:
> 
>   if (_patch_mod_entries == nullptr) {

Thanks! Good catch. Using nullptr is definitely better practice. I'll make the change.

> src/hotspot/share/classfile/classLoader.cpp line 766:
> 
>> 764:     return fds;
>> 765:   }
>> 766:   // we don't use mutexes here because it is called from VM op
> 
> As I see it, all patch module entries are opened once at VM initialization (in `ClassLoader::setup_patch_mod_entries()`) and checkpoint cannot be called before then. If that is so, there should be no VM thread / mutex requirements to read it.

That's a great point. Your analysis of the `_patch_mod_entries` lifecycle makes perfect sense. Since it's effectively read-only after initialization, the VM thread assertion isn't necessary. I'll remove it and the related comment. Thanks for spotting this!

> test/jdk/jdk/crac/fileDescriptors/PatchModuleTest.java line 140:
> 
>> 138:         return (String) patcherClass.getMethod("getInfo").invoke(null);
>> 139:     }
>> 140: }
> 
> I haven't looked through the test but if you will be applying Radim's suggestions please also add a newline at the end of file

Good catch, thanks! I've pushed a commit with the added newline.

I've also left a comment on Radim's suggestion explaining why I'm sticking with the criu engine for this test to ensure a full physical restore is validated.

Appreciate the review!

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/241#discussion_r2215129538
PR Review Comment: https://git.openjdk.org/crac/pull/241#discussion_r2215132619
PR Review Comment: https://git.openjdk.org/crac/pull/241#discussion_r2218096036


More information about the crac-dev mailing list