From gil at azul.com Fri Jun 14 03:13:48 2019 From: gil at azul.com (Gil Tene) Date: Fri, 14 Jun 2019 03:13:48 +0000 Subject: Workshop topic: Java on CRaC: coordinated instant start Message-ID: I'd like to propose a session topic for the August 1-2, 2019 workshop (it was suggested by the JVMLS committee that this would be a good topic for the workshop, rather than the summit...) Subject: Java on CRaC: coordinated instant-start Abstract: We propose adding a new Checkpoint / Restore-at-Checkpoint (CRaC) capability to OpenJDK, supported by a simple and robust API that would ensure applications and (most importantly) application frameworks are able to safely coordinate the checkpointing process and the state restoration activities needed to achieve near-instant-start of fully warmed application instances. Real code, examples, and actual numbers will be discussed.? Possible question for session participants to address: Q: What resources need coordination via this API? Q: What frameworks constitute a good critical set for common use cases (e..g tomcat + JDBC connection pool, and?) Q: What would it take for frameworks to start writing to such an API? ? Gil. From aph at redhat.com Fri Jun 14 08:27:02 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 14 Jun 2019 09:27:02 +0100 Subject: Workshop topic: Java on CRaC: coordinated instant start In-Reply-To: References: Message-ID: On 6/14/19 4:13 AM, Gil Tene wrote: > Abstract: > We propose adding a new Checkpoint / Restore-at-Checkpoint (CRaC) > capability to OpenJDK, supported by a simple and robust API that > would ensure applications and (most importantly) application > frameworks are able to safely coordinate the checkpointing process > and the state restoration activities needed to achieve near-instant-start > of fully warmed application instances. Real code, examples, and > actual numbers will be discussed.? > > Possible question for session participants to address: > > Q: What resources need coordination via this API? > > Q: What frameworks constitute a good critical set for > common use cases (e..g tomcat + JDBC connection pool, and?) > > Q: What would it take for frameworks to start writing > to such an API? Christine Flood has been working on this for some time, and will be presenting at conferences. I think you should co-ordinate with her. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gil at azul.com Sat Jun 15 07:32:42 2019 From: gil at azul.com (Gil Tene) Date: Sat, 15 Jun 2019 07:32:42 +0000 Subject: Workshop topic: Java on CRaC: coordinated instant start In-Reply-To: References: , Message-ID: <9FC71597-E5EE-47C1-B0BB-937AB0CE4006@azul.com> We are likely talking about very different things here. I had spoken with Christine at length at jFokus about her work with checkpointing, and shared soneddetsils of our work as well. I don?t believe this overlaps much with what she has been presenting on, and the field is pretty wide, with room for lots of work and ideas from lots of people. This is one. I plan to talk about a new API and behavior in java (e.g. to be added to a future OpenJDK version via a JEP and to be included in a future Java spec via a JSR) that would allow applications and application frameworks to coordinate with a checkpointing mechanism in the underlying platform. The coordination would focus on things like getting rid of ?problematic? external state going into a checkpoint and recreating un-captured state coming out of a checkpoint. The way the checkpoint state itself is captured and managed is orthogonal to the subject at hand: it could be captured by the runtime itself, by a CRIU or equivalent on e.g. windows, by a container system like Docker or k8s performing container checkpoints, etc. etc... it is the application semantics and the APIs needed for coordination that this talk will focus on, as well as on what successful use of such coordination APIs in e.g. tomcat/etc. can achieve. We?ve been working on variant forms of partial and complete check pointing for years now, including multiple different use modes, and have built up a taxonomy for some of them. Some modes are transparent (e.g. CRaM for Checkpoint / Resume at Main) while others are not. Some deal with checkpointing specific state (e.g. profiles, class data and metadata, compiled code and code cache) while others deal with wider (e.g. all or nearly all process memory contents, and some may even runtime-external state like file handles to files that reside within an immutable image). The specific CRaC use mode is an intentionally non-transparent (but rather coordinated) mode aimed at addressing a specific (and we think very common) use case of rolling out new code in e.g. DevOps workflows. Sent from Gil's iPhone > On Jun 14, 2019, at 1:27 AM, Andrew Haley wrote: > >> On 6/14/19 4:13 AM, Gil Tene wrote: >> Abstract: >> We propose adding a new Checkpoint / Restore-at-Checkpoint (CRaC) >> capability to OpenJDK, supported by a simple and robust API that >> would ensure applications and (most importantly) application >> frameworks are able to safely coordinate the checkpointing process >> and the state restoration activities needed to achieve near-instant-start >> of fully warmed application instances. Real code, examples, and >> actual numbers will be discussed.? >> >> Possible question for session participants to address: >> >> Q: What resources need coordination via this API? >> >> Q: What frameworks constitute a good critical set for >> common use cases (e..g tomcat + JDBC connection pool, and?) >> >> Q: What would it take for frameworks to start writing >> to such an API? > > Christine Flood has been working on this for some time, and will be > presenting at conferences. I think you should co-ordinate with her. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671