CFV: New Project: CRaC

Volker Simonis volker.simonis at gmail.com
Thu Aug 5 15:28:32 UTC 2021


On Thu, Aug 5, 2021 at 2:19 PM Jesper Wilhelmsson
<jesper.wilhelmsson at oracle.com> wrote:
>
> > 5 aug. 2021 kl. 13:56 skrev Alan Bateman <Alan.Bateman at oracle.com>:
> > On 05/08/2021 10:27, Andrew Dinn wrote:
> >> Vote: abstain
> >>
> >> I don't want to stop this project being created by posting a veto. I think it is only right and proper that members should be allowed to investigate any important area for innovation uinder the aegis of a dedicated project. However, I am unhappy with this vote being called without a more clear conclusion to the prior discussion.
> >>
> >> Specifically, I believe the summary below fails to highlight the critical need for a significant amount of code in the JDK runtime and JVM to be involved in receiving and handling checkpoint and restore events and, in direct consequence, being party to the the process that saves or constructs a restartable runtime state (obvious areas include network and file system i/o management, memory management, security management, providers of randomness -- i.e. the same areas where GraalVM has found it needs to enforce runtime initialization or runtime repair of build-time inited state). My concern is that without a close involvement of many existing JDK and JVM engineers in the project and a strong commitment from them to supporting it the project is very likely to fail. I don't believe we have met either of those two requirements yet.
> >
> > I didn't get time to reply to the initial discussion but I share the concern that the changes are potentially invasive and will require auditing and work in many areas. There was discussion about this at FOSDEM and at least one OCW where concerns about security and other areas came up. For example, I remember at FOSDEM (I think after Christine Flood presented on CRIU) there was discussion about session keys and needing to have those to be invalidated in the checkpoint on disk and a complete re-initialization at restore. There was also discussion about the implications of adjusting the clock and the impact of re-connecting or invalidating file descriptors. I don't doubt that all challenges are solvable with effort but it does require a lot of components and areas to cooperate. So I think your comment on trying to get a wider set of contributors (those working on core and security libraries for example) is important.
>
> I agree that this project could have seen a longer time for discussion to collect more contributors - especially given the time of the year. We have an ongoing research project together with two local universities in Sweden in which we have been exploring this area. I would expect that some of the researchers involved would have been interested in joining the discussion and maybe even the project, but they have been on vacation in July and are just starting to get back to work next week and have therefore missed the entire discussion.

Well, from my point of view, the fact that apparently many other are
already working on similar topics only confirms the need for a project
to collect all the various ideas and efforts in a central place. No
irrevocable decisions have been taken until now and everybody is still
highly welcome to join the project.

Looking forward hearing more details from the mentioned research projects,
Volker

>
> /Jesper
>


More information about the discuss mailing list