<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hello ma zhen,</p>
<p>apologies for an untimely response.</p>
<p>In general, both FD policies and CRaCAllowedOpenFilePrefixes are
really a workaround for apps that don't adhere to CRaC
requirements, rather than a proper solutions. But let's talk about
the problems individually:</p>
<p>1) When it comes to getAvailableProcessors() I think that opening
the cgroups info is an implementation detail, and CRaC JVM should
handle that transparently. There should be a hook (either in Java
code or in native, whichever is less intrusive) that will make the
file access and C/R mutually exclusive. We will gladly accept a PR
(with a test case, please).</p>
<p>2) Listing files is an interaction with the environment, and
application should stop that during C/R. Your observation about FD
policies makes sense; in fact in this case there is no resource
that could be linked into the FD policies; we would have to
explicitly synchronize with C/R and that would be expensive on
such a common function. From practical POV I understand that you
can't easily modify the 3rd party library and I am glad that it
works for you. Note though, that CRaCAllowedOpenFilePrefixes
basically relies on C/R engine to handle that FD correctly. And if
you attempt to restore on a system that does not host this
directory, the restore will fail.</p>
<p>Technically the getAvailableProcessors() is also an interaction
with the 'environment', with the machine it is currently running,
but the world is not black and white and my opinion is that this
should be transparent.</p>
<p>Radim</p>
<div class="moz-cite-prefix">On 11/14/25 09:01, ma zhen wrote:<br>
</div>
<blockquote type="cite" cite="mid:CA+U33_Nx=Wjxn+Vx99hwhKaowFM_p4KSPvf9CY0KQtV=dNeVDw@mail.gmail.com">
<table width="100%">
<tbody>
<tr>
<td><br>
</td>
<td width="100%">
<div><span>Caution:</span> This email originated from
outside of the organization. Do not click links or open
attachments unless you recognize the sender and know the
content is safe.
</div>
</td>
</tr>
</tbody>
</table>
<br>
<div>
<div dir="ltr">
<div dir="ltr">
<div>Hi everyone,</div>
<div><br>
</div>
<div>Following up on my own question, I believe I've found a
suitable solution and wanted to share it for the archives.</div>
<div><br>
</div>
<div>The issue was resolved using the VM option
`-XX:CRaCAllowedOpenFilePrefixes`. This option lets you
specify a comma-separated list of path prefixes that CRaC
should ignore if they are found open during a checkpoint.</div>
<div><br>
</div>
<div>(Reference: <a href="https://docs.azul.com/crac/usage/vm-options" moz-do-not-send="true" class="moz-txt-link-freetext">https://docs.azul.com/crac/usage/vm-options</a>)</div>
<div><br>
</div>
<div>Crucially, and what makes it a perfect solution for my
original problem, is that this option works for files
opened by native code (e.g., via JNI or internal JVM
functions). This is why it can handle the file descriptors
that were not manageable through standard CRaC resource
policies.</div>
<div><br>
</div>
<div>This directly addresses the two scenarios I described:</div>
<div><br>
</div>
<div>1. For the cgroup file opened by
`OperatingSystemMXBean`, I can now add</div>
<div> `/sys/fs/cgroup/` to the allowed prefixes.</div>
<div><br>
</div>
<div>2. For the directory descriptor held open by the native
implementation of</div>
<div> `File.list`, adding the application's base path
works perfectly.</div>
<div><br>
</div>
<div>This provides a much more robust solution than retrying
the checkpoint. I hope this is helpful for anyone else
running into similar issues.</div>
<div><br>
</div>
<div>Best regards,</div>
<div>mazhen</div>
</div>
</div>
<br>
<div class="gmail_quote gmail_quote_container">
<div dir="ltr" class="gmail_attr">ma zhen <<a href="mailto:mz1999@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">mz1999@gmail.com</a>>
于2025年11月12日周三 17:29写道:<br>
</div>
<blockquote class="gmail_quote">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hi everyone,</div>
<div><br>
</div>
<div>I'm encountering a CheckpointException when
creating a checkpoint image</div>
<div>with CRaC. The root cause is that the
application holds file descriptors</div>
<div>for files or directories.</div>
<div><br>
</div>
<div>
<div>Our application is quite complex, and
after some investigation, I've found </div>
<div>that these files/directories are being
opened by third-party libraries. </div>
<div>The challenge is that they are not opened
through regular file I/O APIs, </div>
<div>which makes it impossible to handle them
using File Descriptor Policies.</div>
</div>
<div><br>
</div>
<div>I've identified two specific scenarios:</div>
<div><br>
</div>
<div>1. A third-party library periodically
fetches system resource information,</div>
<div> which includes calling
`OperatingSystemMXBean.getAvailableProcessors`.</div>
<div><br>
</div>
<div> When the JVM determines the number of
available CPU cores, if it detects</div>
<div> that cgroups are available, it will read
the resource limit file</div>
<div> `cpu.cfs_quota_us`, even if the process
is not in a container.</div>
<div> The specific implementation logic can be
found in cgroupV1Subsystem_linux.cpp:</div>
<div> (<a href="https://github.com/openjdk/crac/blob/crac/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/crac/blob/crac/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp</a>)</div>
<div><br>
</div>
<div> If a checkpoint is triggered at this
exact moment, an exception</div>
<div> similar to the following occurs:</div>
<div><br>
</div>
<div> Suppressed:
jdk.internal.crac.mirror.impl.CheckpointOpenFileException:
FD fd=57 type=regular
path=/sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.cfs_quota_us</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(<a class="moz-txt-link-freetext" href="Core.java:115">Core.java:115</a>)</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(<a class="moz-txt-link-freetext" href="Core.java:189">Core.java:189</a>)</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore(<a class="moz-txt-link-freetext" href="Core.java:315">Core.java:315</a>)</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(<a class="moz-txt-link-freetext" href="Core.java:328">Core.java:328</a>)</div>
<div><br>
</div>
<div>2. For some reason, a third-party library
periodically calls `File.list`</div>
<div> to get the list of files in a specific
directory.</div>
<div><br>
</div>
<div> On Linux, the `list` method eventually
calls the JNI method</div>
<div> `Java_java_io_UnixFileSystem_list` which
holds a directory file</div>
<div> descriptor during its execution. This is
defined in UnixFileSystem_md.c:</div>
<div> (<a href="https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/libjava/UnixFileSystem_md.c" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/libjava/UnixFileSystem_md.c</a>)</div>
<div><br>
</div>
<div> Similarly, if a checkpoint is triggered
at this moment, an exception</div>
<div> like the one below is thrown:</div>
<div><br>
</div>
<div>
jdk.internal.crac.mirror.CheckpointException</div>
<div> Suppressed:
jdk.internal.crac.mirror.impl.CheckpointOpenFileException:
FD fd=46 type=directory
path=.../WEB-INF/classes/WEB-INF/services</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(<a class="moz-txt-link-freetext" href="Core.java:115">Core.java:115</a>)</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(<a class="moz-txt-link-freetext" href="Core.java:189">Core.java:189</a>)</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.checkpointRestore(<a class="moz-txt-link-freetext" href="Core.java:315">Core.java:315</a>)</div>
<div> at
java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(<a class="moz-txt-link-freetext" href="Core.java:328">Core.java:328</a>)</div>
<div><br>
</div>
<div><br>
</div>
<div>In both situations, if a checkpoint
coincides with the execution of these</div>
<div>periodic tasks, the checkpoint is likely to
fail.</div>
<div><br>
</div>
<div>My current workaround is to attempt the
checkpoint multiple times, as it</div>
<div>will eventually succeed. While this allows
me to bypass the issue, I would</div>
<div>like to know if there is a more optimal
solution.</div>
<div><br>
</div>
<div>Thank you.</div>
<div><br>
</div>
<div>
<div>Best regards,</div>
<div>mazhen</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>