<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Ma Zhen,</p>
<p>you have correctly observed that closing file descriptors is
rather an architectural choice than purely a technical need. CRIU
is really capable of restoring the process as-is, as its main
motivation is migration of running containers. Containers already
define the filesystem, and the runtime is in control of external
connections - e.g. CRIU can checkpoint and later restore an open
socket connection, and the container runtime restores the 'second
half' of the socket so that the pause is transparent to the
running process.</p>
<p>If this is what you want, there's nothing preventing you from
using CRIU on a Java process manually - at the risk of breaking
the internal logic of the application. However the point of CRaC
is not such a transparent restore: we want to preserve the
valuable state of JVM and application but adapt it to the new
environment. We want to do a conscious decision about any resource
external to the process. Being forced to gracefully adapt to the
restore is a feature.</p>
<p>Yes, we have File Descriptor policies, but that's not a solution
- it provides a workaround for proof-of-concepts, until some code
that you can't easily fix gets updated to support CRaC properly.
Ideas meet practicality, and you are responsible for realizing
what should be done with particular external resource.</p>
<p>You're right that ATM we don't handle JDK Platform Logging (and
neither JUL) configured to write to a file, and since that is JDK
code out of user control it is a bug. We attempt to fix those one
by one (PRs are welcome!).<br>
</p>
<p>I hope I have provided some insight to these choices - and yes, I
understand the pain as we still have many places to fix.</p>
<p>Cheers, </p>
<p>Radim<br>
</p>
<div class="moz-cite-prefix">On 10. 04. 25 11:30, ma zhen wrote:<br>
</div>
<blockquote type="cite" cite="mid:CA+U33_P+7i9X3d31Vfx9AkiYeeuROFBaWcDFPrXN37BY_y2Y9g@mail.gmail.com">
<table width="100%">
<tbody>
<tr>
<td><br>
</td>
<td width="100%">
<div><span>Caution:</span> This email originated from
outside of the organization. Do not click links or open
attachments unless you recognize the sender and know the
content is safe.
</div>
</td>
</tr>
</tbody>
</table>
<br>
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<p class="gmail-ng-star-inserted">
<span class="gmail-ng-star-inserted">Hi CRaC developers,</span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">I'm currently
exploring the integration of CRaC support into our
company's middleware products. I'm also very
interested in the underlying implementation details
of CRaC and have been doing some research into its
mechanics.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">As I
understand it, CRaC leverages CRIU under the hood
for checkpointing and restoring running processes.
My research indicates that CRIU itself is capable of
handling open file descriptors and established
network connections during the checkpoint/restore
cycle.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">However, the
CRaC API requires developers to explicitly manage
these resources, typically by closing them in the </span><span class="gmail-inline-code gmail-ng-star-inserted">beforeCheckpoint()</span><span class="gmail-ng-star-inserted"> and re-establishing
them in the </span><span class="gmail-inline-code gmail-ng-star-inserted">afterRestore()</span><span class="gmail-ng-star-inserted">.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">To understand
the rationale behind this design choice, I looked
into the initial CRaC prototype, specifically the
first PR (<a href="https://github.com/openjdk/crac/pull/1" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/crac/pull/1</a></span><span class="gmail-ng-star-inserted">). It appears that
even in this early version, the implementation
iterated through all process file descriptors during
checkpoint. It ignored certain FDs (like those
related to classpath files, </span><span class="gmail-inline-code gmail-ng-star-inserted">/dev/random</span><span class="gmail-ng-star-inserted">, </span><span class="gmail-inline-code gmail-ng-star-inserted">/dev/urandom</span><span class="gmail-ng-star-inserted">, and files marked </span><span class="gmail-inline-code gmail-ng-star-inserted">M_PERSISTENT</span><span class="gmail-ng-star-inserted"> - though I'm unclear
on the exact meaning of </span><span class="gmail-inline-code gmail-ng-star-inserted">M_PERSISTENT</span><span class="gmail-ng-star-inserted"> in this context). If
any other application-opened files remained, the
checkpoint process would fail. This suggests the
requirement for manual resource management was
present from the outset.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">As I'm not
deeply familiar with JVM internals, I'm struggling
to fully grasp the reasoning. Was this restriction
primarily introduced to simplify the initial design
and implementation of CRaC within the JVM?</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">I also
noticed that current versions of CRaC include File
Descriptor Policies. These allow configuring an </span><span class="gmail-inline-code gmail-ng-star-inserted">action:
ignore</span><span class="gmail-ng-star-inserted"> for
specific file descriptors, effectively delegating
their handling to CRIU. This seems to demonstrate
that letting CRIU manage certain open files </span><span class="gmail-ng-star-inserted">is</span><span class="gmail-ng-star-inserted"> feasible within the
CRaC framework.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">This leads me
to wonder: if delegation to CRIU is possible and
works (at least for some cases via policies), why
isn't relying on CRIU for resource handling the
default or more broadly encouraged approach? Why the
strict requirement for manual closure and reopening
in the general case?</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">For instance,
consider using </span><span class="gmail-inline-code gmail-ng-star-inserted">System.getLogger()</span><span class="gmail-ng-star-inserted"> from the JDK
Platform Logging API. As application developers, we
don't typically manage the underlying file
descriptor for the log file directly. To make this
work with CRaC, we currently need to identify and
configure a File Descriptor Policy for it, which can
feel somewhat cumbersome. Wouldn't a smoother
experience involve CRaC (perhaps optionally)
defaulting to letting CRIU handle such internally
managed resources, like those opened by standard JDK
libraries?</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">I would
appreciate any insights or clarification you could
offer on the design philosophy behind CRaC's
approach to managing external resources like files
and sockets, especially in contrast to CRIU's
capabilities.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">Thanks for
your time and any insights you can share.</span></span></p>
<p class="gmail-ng-star-inserted">
<span><span class="gmail-ng-star-inserted">Best regards,</span></span></p>
<p class="gmail-ng-star-inserted">
mazhen</p>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>