CFV: New Project: CRaC

Volker Simonis volker.simonis at gmail.com
Thu Aug 5 15:21:13 UTC 2021


On Thu, Aug 5, 2021 at 1:58 PM Alan Bateman <Alan.Bateman at oracle.com> wrote:
>
> On 05/08/2021 10:27, Andrew Dinn wrote:
> > Vote: abstain
> >
> > I don't want to stop this project being created by posting a veto. I
> > think it is only right and proper that members should be allowed to
> > investigate any important area for innovation uinder the aegis of a
> > dedicated project. However, I am unhappy with this vote being called
> > without a more clear conclusion to the prior discussion.
> >
> > Specifically, I believe the summary below fails to highlight the
> > critical need for a significant amount of code in the JDK runtime and
> > JVM to be involved in receiving and handling checkpoint and restore
> > events and, in direct consequence, being party to the the process that
> > saves or constructs a restartable runtime state (obvious areas include
> > network and file system i/o management, memory management, security
> > management, providers of randomness -- i.e. the same areas where
> > GraalVM has found it needs to enforce runtime initialization or
> > runtime repair of build-time inited state). My concern is that without
> > a close involvement of many existing JDK and JVM engineers in the
> > project and a strong commitment from them to supporting it the project
> > is very likely to fail. I don't believe we have met either of those
> > two requirements yet.
>
> I didn't get time to reply to the initial discussion but I share the
> concern that the changes are potentially invasive and will require
> auditing and work in many areas. There was discussion about this at
> FOSDEM and at least one OCW where concerns about security and other
> areas came up. For example, I remember at FOSDEM (I think after
> Christine Flood presented on CRIU) there was discussion about session
> keys and needing to have those to be invalidated in the checkpoint on
> disk and a complete re-initialization at restore. There was also
> discussion about the implications of adjusting the clock and the impact
> of re-connecting or invalidating file descriptors. I don't doubt that
> all challenges are solvable with effort but it does require a lot of
> components and areas to cooperate. So I think your comment on trying to
> get a wider set of contributors (those working on core and security
> libraries for example) is important.
>

I totally agree, but is aren't this good arguments for having a
project to investigate all these questions?

Establishing this project doesn't mean that we will deliver something
within the next month but rather that we'll have a common
infrastructure for experimenting and a central place for discussions.
Obviously, the larger the set of contributors will grow, the better
for the project. Needless to say that everybody is highly welcome to
share his experience and thoughts.

Best regards,
Volker

> -Alan


More information about the discuss mailing list