<div dir="ltr"><div dir="ltr"><div>Hi Radim,</div><div><br></div><div>Thanks a lot for the detailed explanation! That completely cleared up my understanding of the design philosophy behind CRaC.</div><div><br></div><div>It makes perfect sense now that the goal isn't purely transparent restoration, but rather preserving the valuable internal JVM/application state while enabling robust adaptation to the new environment after restore – sacrificing some transparency for resilience by consciously managing external resources. </div><div><br></div><div>Great project, and I appreciate the insight. Hope to be able to contribute down the line!</div><div><br></div><div>Cheers,</div><div>Ma Zhen</div></div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">Radim Vansa <<a href="mailto:rvansa@azul.com">rvansa@azul.com</a>> 于2025年4月11日周五 15:17写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

  <div>

    <p>Hi Ma Zhen,</p>

    <p>you have correctly observed that closing file descriptors is

      rather an architectural choice than purely a technical need. CRIU

      is really capable of restoring the process as-is, as its main

      motivation is migration of running containers. Containers already

      define the filesystem, and the runtime is in control of external

      connections - e.g. CRIU can checkpoint and later restore an open

      socket connection, and the container runtime restores the 'second

      half' of the socket so that the pause is transparent to the

      running process.</p>

    <p>If this is what you want, there's nothing preventing you from

      using CRIU on a Java process manually - at the risk of breaking

      the internal logic of the application. However the point of CRaC

      is not such a transparent restore: we want to preserve the

      valuable state of JVM and application but adapt it to the new

      environment. We want to do a conscious decision about any resource

      external to the process. Being forced to gracefully adapt to the

      restore is a feature.</p>

    <p>Yes, we have File Descriptor policies, but that's not a solution

      - it provides a workaround for proof-of-concepts, until some code

      that you can't easily fix gets updated to support CRaC properly.

      Ideas meet practicality, and you are responsible for realizing

      what should be done with particular external resource.</p>

    <p>You're right that ATM we don't handle JDK Platform Logging (and

      neither JUL) configured to write to a file, and since that is JDK

      code out of user control it is a bug. We attempt to fix those one

      by one (PRs are welcome!).<br>

    </p>

    <p>I hope I have provided some insight to these choices - and yes, I

      understand the pain as we still have many places to fix.</p>

    <p>Cheers, </p>

    <p>Radim<br>

    </p>

    <div>On 10. 04. 25 11:30, ma zhen wrote:<br>

    </div>

    <blockquote type="cite">

      <table width="100%">

        <tbody>

          <tr>

            <td><br>

            </td>

            <td width="100%">

              <div><span>Caution:</span> This email originated from

                outside of the organization. Do not click links or open

                attachments unless you recognize the sender and know the

                content is safe.

              </div>

            </td>

          </tr>

        </tbody>

      </table>

      <br>

      <div>

        <div dir="ltr">

          <div dir="ltr">

            <div dir="ltr">

              <p>

                <span>Hi CRaC developers,</span></p>

              <p>

                <span><span>I'm currently

                    exploring the integration of CRaC support into our

                    company's middleware products. I'm also very

                    interested in the underlying implementation details

                    of CRaC and have been doing some research into its

                    mechanics.</span></span></p>

              <p>

                <span><span>As I

                    understand it, CRaC leverages CRIU under the hood

                    for checkpointing and restoring running processes.

                    My research indicates that CRIU itself is capable of

                    handling open file descriptors and established

                    network connections during the checkpoint/restore

                    cycle.</span></span></p>

              <p>

                <span><span>However, the

                    CRaC API requires developers to explicitly manage

                    these resources, typically by closing them in the </span><span>beforeCheckpoint()</span><span> and re-establishing

                    them in the </span><span>afterRestore()</span><span>.</span></span></p>

              <p>

                <span><span>To understand

                    the rationale behind this design choice, I looked

                    into the initial CRaC prototype, specifically the

                    first PR (<a href="https://github.com/openjdk/crac/pull/1" target="_blank">https://github.com/openjdk/crac/pull/1</a></span><span>). It appears that

                    even in this early version, the implementation

                    iterated through all process file descriptors during

                    checkpoint. It ignored certain FDs (like those

                    related to classpath files, </span><span>/dev/random</span><span>, </span><span>/dev/urandom</span><span>, and files marked </span><span>M_PERSISTENT</span><span> - though I'm unclear

                    on the exact meaning of </span><span>M_PERSISTENT</span><span> in this context). If

                    any other application-opened files remained, the

                    checkpoint process would fail. This suggests the

                    requirement for manual resource management was

                    present from the outset.</span></span></p>

              <p>

                <span><span>As I'm not

                    deeply familiar with JVM internals, I'm struggling

                    to fully grasp the reasoning. Was this restriction

                    primarily introduced to simplify the initial design

                    and implementation of CRaC within the JVM?</span></span></p>

              <p>

                <span><span>I also

                    noticed that current versions of CRaC include File

                    Descriptor Policies. These allow configuring an </span><span>action:

                    ignore</span><span> for

                    specific file descriptors, effectively delegating

                    their handling to CRIU. This seems to demonstrate

                    that letting CRIU manage certain open files </span><span>is</span><span> feasible within the

                    CRaC framework.</span></span></p>

              <p>

                <span><span>This leads me

                    to wonder: if delegation to CRIU is possible and

                    works (at least for some cases via policies), why

                    isn't relying on CRIU for resource handling the

                    default or more broadly encouraged approach? Why the

                    strict requirement for manual closure and reopening

                    in the general case?</span></span></p>

              <p>

                <span><span>For instance,

                    consider using </span><span>System.getLogger()</span><span> from the JDK

                    Platform Logging API. As application developers, we

                    don't typically manage the underlying file

                    descriptor for the log file directly. To make this

                    work with CRaC, we currently need to identify and

                    configure a File Descriptor Policy for it, which can

                    feel somewhat cumbersome. Wouldn't a smoother

                    experience involve CRaC (perhaps optionally)

                    defaulting to letting CRIU handle such internally

                    managed resources, like those opened by standard JDK

                    libraries?</span></span></p>

              <p>

                <span><span>I would

                    appreciate any insights or clarification you could

                    offer on the design philosophy behind CRaC's

                    approach to managing external resources like files

                    and sockets, especially in contrast to CRIU's

                    capabilities.</span></span></p>

              <p>

                <span><span>Thanks for

                    your time and any insights you can share.</span></span></p>

              <p>

                <span><span>Best regards,</span></span></p>

              <p>

                mazhen</p>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div>