<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>That's great, while currently similar hooks don't use RAII I
      think that's a reliable way to implement this.</p>
    <p>Please make sure that your implementation uses RW locking, not
      forcing mutual exclusion and unintended synchronization when the
      checkpoint is not happening. Alternatively, it might be possible
      to mark the entry to this section as critical and prevent VM
      thread from executing the C/R; I am not sure which alternative is
      more lightweight.</p>
    <p>Thanks in advance for the contribution!</p>
    <p>Radim</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 11/21/25 08:12, ma zhen wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CA+U33_P5TzLbpH18J62925vDVYqo7_LS=M1oj7DuMagUwdnjEg@mail.gmail.com">
      <table width="100%">
        <tbody>
          <tr>
            <td><br>
            </td>
            <td width="100%">
              <div><span>Caution:</span> This email originated from
                outside of the organization. Do not click links or open
                attachments unless you recognize the sender and know the
                content is safe.
              </div>
            </td>
          </tr>
        </tbody>
      </table>
      <br>
      <div>
        <div dir="ltr">Hi Radim,<br>
          <br>
          Thank you for your detailed and candid feedback.<br>
          <br>
          I fully agree with your assessment regarding both scenarios.
          You've clearly articulated why FD policies and
          CRaCAllowedOpenFilePrefixes are workarounds, and that a more
          transparent solution for JVM internals like
          getAvailableProcessors() is indeed the proper way forward.<br>
          <br>
          Regarding the getAvailableProcessors() issue and your
          suggestion for a PR, my current thinking is to introduce a
          lightweight synchronization mechanism in the native CRaC code.
          This would involve an RAII-style guard to mark the critical
          section during cgroup file access, ensuring mutual exclusion
          with checkpoint operations.<br>
          <br>
          I would be glad to attempt implementing this and contributing
          a PR with a test case.
          <br>
          <br>
          Best regards,<br>
          mazhen</div>
        <br>
        <div class="gmail_quote gmail_quote_container">
          <div dir="ltr" class="gmail_attr">Radim Vansa <<a href="mailto:rvansa@azul.com" moz-do-not-send="true" class="moz-txt-link-freetext">rvansa@azul.com</a>>
            于2025年11月19日周三 05:13写道:<br>
          </div>
          <blockquote class="gmail_quote">
            <div>
              <p>Hello ma zhen,</p>
              <p>apologies for an untimely response.</p>
              <p>In general, both FD policies and
                CRaCAllowedOpenFilePrefixes are really a workaround for
                apps that don't adhere to CRaC requirements, rather than
                a proper solutions. But let's talk about the problems
                individually:</p>
              <p>1) When it comes to getAvailableProcessors() I think
                that opening the cgroups info is an implementation
                detail, and CRaC JVM should handle that transparently.
                There should be a hook (either in Java code or in
                native, whichever is less intrusive) that will make the
                file access and C/R mutually exclusive. We will gladly
                accept a PR (with a test case, please).</p>
              <p>2) Listing files is an interaction with the
                environment, and application should stop that during
                C/R. Your observation about FD policies makes sense; in
                fact in this case there is no resource that could be
                linked into the FD policies; we would have to explicitly
                synchronize with C/R and that would be expensive on such
                a common function. From practical POV I understand that
                you can't easily modify the 3rd party library and I am
                glad that it works for you. Note though,
                that CRaCAllowedOpenFilePrefixes basically relies on C/R
                engine to handle that FD correctly. And if you attempt
                to restore on a system that does not host this
                directory, the restore will fail.</p>
              <p>Technically the getAvailableProcessors() is also an
                interaction with the 'environment', with the machine it
                is currently running, but the world is not black and
                white and my opinion is that this should be transparent.</p>
              <p>Radim</p>
              <div>On 11/14/25 09:01, ma zhen wrote:<br>
              </div>
              <blockquote type="cite">
                <table width="100%">
                  <tbody>
                    <tr>
                      <td><br>
                      </td>
                      <td width="100%">
                        <div><span>Caution:</span> This email originated
                          from outside of the organization. Do not click
                          links or open attachments unless you recognize
                          the sender and know the content is safe.
                        </div>
                      </td>
                    </tr>
                  </tbody>
                </table>
                <br>
                <div>
                  <div dir="ltr">
                    <div dir="ltr">
                      <div>Hi everyone,</div>
                      <div><br>
                      </div>
                      <div>Following up on my own question, I believe
                        I've found a suitable solution and wanted to
                        share it for the archives.</div>
                      <div><br>
                      </div>
                      <div>The issue was resolved using the VM option
                        `-XX:CRaCAllowedOpenFilePrefixes`. This option
                        lets you specify a comma-separated list of path
                        prefixes that CRaC should ignore if they are
                        found open during a checkpoint.</div>
                      <div><br>
                      </div>
                      <div>(Reference: <a href="https://docs.azul.com/crac/usage/vm-options" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">
                          https://docs.azul.com/crac/usage/vm-options</a>)</div>
                      <div><br>
                      </div>
                      <div>Crucially, and what makes it a perfect
                        solution for my original problem, is that this
                        option works for files opened by native code
                        (e.g., via JNI or internal JVM functions). This
                        is why it can handle the file descriptors that
                        were not manageable through standard CRaC
                        resource policies.</div>
                      <div><br>
                      </div>
                      <div>This directly addresses the two scenarios I
                        described:</div>
                      <div><br>
                      </div>
                      <div>1. For the cgroup file opened by
                        `OperatingSystemMXBean`, I can now add</div>
                      <div>   `/sys/fs/cgroup/` to the allowed prefixes.</div>
                      <div><br>
                      </div>
                      <div>2. For the directory descriptor held open by
                        the native implementation of</div>
                      <div>   `File.list`, adding the application's base
                        path works perfectly.</div>
                      <div><br>
                      </div>
                      <div>This provides a much more robust solution
                        than retrying the checkpoint. I hope this is
                        helpful for anyone else running into similar
                        issues.</div>
                      <div><br>
                      </div>
                      <div>Best regards,</div>
                      <div>mazhen</div>
                    </div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">ma zhen <<a href="mailto:mz1999@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mz1999@gmail.com</a>>
                      于2025年11月12日周三 17:29写道:<br>
                    </div>
                    <blockquote class="gmail_quote">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div dir="ltr">
                              <div dir="ltr">
                                <div dir="ltr">
                                  <div>Hi everyone,</div>
                                  <div><br>
                                  </div>
                                  <div>I'm encountering a
                                    CheckpointException when creating a
                                    checkpoint image</div>
                                  <div>with CRaC. The root cause is that
                                    the application holds file
                                    descriptors</div>
                                  <div>for files or directories.</div>
                                  <div><br>
                                  </div>
                                  <div>
                                    <div>Our application is quite
                                      complex, and after some
                                      investigation, I've found </div>
                                    <div>that these files/directories
                                      are being opened by third-party
                                      libraries. </div>
                                    <div>The challenge is that they are
                                      not opened through regular file
                                      I/O APIs, </div>
                                    <div>which makes it impossible to
                                      handle them using File Descriptor
                                      Policies.</div>
                                  </div>
                                  <div><br>
                                  </div>
                                  <div>I've identified two specific
                                    scenarios:</div>
                                  <div><br>
                                  </div>
                                  <div>1. A third-party library
                                    periodically fetches system resource
                                    information,</div>
                                  <div>   which includes calling
                                    `OperatingSystemMXBean.getAvailableProcessors`.</div>
                                  <div><br>
                                  </div>
                                  <div>   When the JVM determines the
                                    number of available CPU cores, if it
                                    detects</div>
                                  <div>   that cgroups are available, it
                                    will read the resource limit file</div>
                                  <div>   `cpu.cfs_quota_us`, even if
                                    the process is not in a container.</div>
                                  <div>   The specific implementation
                                    logic can be found in
                                    cgroupV1Subsystem_linux.cpp:</div>
                                  <div>   (<a href="https://github.com/openjdk/crac/blob/crac/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/crac/blob/crac/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp</a>)</div>
                                  <div><br>
                                  </div>
                                  <div>   If a checkpoint is triggered
                                    at this exact moment, an exception</div>
                                  <div>   similar to the following
                                    occurs:</div>
                                  <div><br>
                                  </div>
                                  <div>    Suppressed:
                                    jdk.internal.crac.mirror.impl.CheckpointOpenFileException:
                                    FD fd=57 type=regular
                                    path=/sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.cfs_quota_us</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(<a moz-do-not-send="true">Core.java:115</a>)</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(<a moz-do-not-send="true">Core.java:189</a>)</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.checkpointRestore(<a moz-do-not-send="true">Core.java:315</a>)</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(<a moz-do-not-send="true">Core.java:328</a>)</div>
                                  <div><br>
                                  </div>
                                  <div>2. For some reason, a third-party
                                    library periodically calls
                                    `File.list`</div>
                                  <div>   to get the list of files in a
                                    specific directory.</div>
                                  <div><br>
                                  </div>
                                  <div>   On Linux, the `list` method
                                    eventually calls the JNI method</div>
                                  <div> 
                                     `Java_java_io_UnixFileSystem_list`
                                    which holds a directory file</div>
                                  <div>   descriptor during its
                                    execution. This is defined in
                                    UnixFileSystem_md.c:</div>
                                  <div>   (<a href="https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/libjava/UnixFileSystem_md.c" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/crac/blob/crac/src/java.base/unix/native/libjava/UnixFileSystem_md.c</a>)</div>
                                  <div><br>
                                  </div>
                                  <div>   Similarly, if a checkpoint is
                                    triggered at this moment, an
                                    exception</div>
                                  <div>   like the one below is thrown:</div>
                                  <div><br>
                                  </div>
                                  <div>   
                                    jdk.internal.crac.mirror.CheckpointException</div>
                                  <div>    Suppressed:
                                    jdk.internal.crac.mirror.impl.CheckpointOpenFileException:
                                    FD fd=46 type=directory
                                    path=.../WEB-INF/classes/WEB-INF/services</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(<a moz-do-not-send="true">Core.java:115</a>)</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(<a moz-do-not-send="true">Core.java:189</a>)</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.checkpointRestore(<a moz-do-not-send="true">Core.java:315</a>)</div>
                                  <div>        at
                                    java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(<a moz-do-not-send="true">Core.java:328</a>)</div>
                                  <div><br>
                                  </div>
                                  <div><br>
                                  </div>
                                  <div>In both situations, if a
                                    checkpoint coincides with the
                                    execution of these</div>
                                  <div>periodic tasks, the checkpoint is
                                    likely to fail.</div>
                                  <div><br>
                                  </div>
                                  <div>My current workaround is to
                                    attempt the checkpoint multiple
                                    times, as it</div>
                                  <div>will eventually succeed. While
                                    this allows me to bypass the issue,
                                    I would</div>
                                  <div>like to know if there is a more
                                    optimal solution.</div>
                                  <div><br>
                                  </div>
                                  <div>Thank you.</div>
                                  <div><br>
                                  </div>
                                  <div>
                                    <div>Best regards,</div>
                                    <div>mazhen</div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>