CRaC: CheckpointException with file descriptors from JVM internals and native calls

ma zhen mz1999 at gmail.com
Wed Nov 12 09:29:27 UTC 2025


Hi everyone,

I'm encountering a CheckpointException when creating a checkpoint image
with CRaC. The root cause is that the application holds file descriptors
for files or directories.

Our application is quite complex, and after some investigation, I've found
that these files/directories are being opened by third-party libraries.
The challenge is that they are not opened through regular file I/O APIs,
which makes it impossible to handle them using File Descriptor Policies.

I've identified two specific scenarios:

1. A third-party library periodically fetches system resource information,
   which includes calling `OperatingSystemMXBean.getAvailableProcessors`.

   When the JVM determines the number of available CPU cores, if it detects
   that cgroups are available, it will read the resource limit file
   `cpu.cfs_quota_us`, even if the process is not in a container.
   The specific implementation logic can be found in
cgroupV1Subsystem_linux.cpp:
   (
https://github.com/openjdk/crac/blob/crac/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp
)

   If a checkpoint is triggered at this exact moment, an exception
   similar to the following occurs:

    Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException:
FD fd=57 type=regular
path=/sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.cfs_quota_us
        at
java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:115)
        at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:189)
        at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:315)
        at
java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:328)

2. For some reason, a third-party library periodically calls `File.list`
   to get the list of files in a specific directory.

   On Linux, the `list` method eventually calls the JNI method
   `Java_java_io_UnixFileSystem_list` which holds a directory file
   descriptor during its execution. This is defined in UnixFileSystem_md.c:
   (
https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/libjava/UnixFileSystem_md.c
)

   Similarly, if a checkpoint is triggered at this moment, an exception
   like the one below is thrown:

    jdk.internal.crac.mirror.CheckpointException
    Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException:
FD fd=46 type=directory path=.../WEB-INF/classes/WEB-INF/services
        at
java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:115)
        at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:189)
        at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:315)
        at
java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:328)


In both situations, if a checkpoint coincides with the execution of these
periodic tasks, the checkpoint is likely to fail.

My current workaround is to attempt the checkpoint multiple times, as it
will eventually succeed. While this allows me to bypass the issue, I would
like to know if there is a more optimal solution.

Thank you.

Best regards,
mazhen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20251112/ce8f92f6/attachment-0001.htm>


More information about the crac-dev mailing list