From christian.tzolov at gmail.com Mon Apr 3 20:30:16 2023 From: christian.tzolov at gmail.com (Christian Tzolov) Date: Mon, 3 Apr 2023 22:30:16 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed Message-ID: Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads. For example, let's have a *Processor* that performs continuous computations. This processor depends on a *ProcessorContext* and later must be fully initialized before the processor can process any data. When the application is first started (e.g. not from checkpoints) it ensures that the *ProcessorContext* is initialized before starting the *Processor* loop. To leverage CRaC I've implemented a *ProcessorContextResource* gracefully stops the context on *beforeCheckpoint* and then re-initialized it on *afterRestore*. When the checkpoint is performed, CRaC calls the *ProcessorContextResource.* *beforeCheckpoint* and also preserves the current *Processor* call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the *ProcessorContextResource.**afterRestore* complete. This expectedly crashes the processor. The https://github.com/tzolov/crac-demo illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md ( https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal snapshots of the observed behavior. I've used latest JDK CRaC release: openjdk 17-crac 2021-09-14 OpenJDK Runtime Environment (build 17-crac+5-19) OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) As I'm new to CRaC, I'd appreciate your thoughts on this issue. Cheers, Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvansa at azul.com Tue Apr 4 06:48:44 2023 From: rvansa at azul.com (Radim Vansa) Date: Tue, 4 Apr 2023 08:48:44 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: Hi Christian, I believe this is a common problem when porting existing architecture under CRaC; the obvious solution is to guard access to the resource (ProcessorContext in this case) with a RW lock that'd be read-acquired by 'regular' access and acquired for write in beforeCheckpoint/released in afterRestore. However this introduces extra synchronization (at least in form of volatile writes) even in case that C/R is not used at all, especially if the support is added into libraries. Anton Kozlov proposed techniques like RCU [1] but at this point there's no support for this in Java. Even the Linux implementation might require some additional properties from the code in critical (read) section like not calling any blocking code; this might be too limiting. The situation is simpler if the application uses a single threaded event-loop; beforeCheckpoint can enqueue a task that would, upon its execution, block on a primitive and notify the C/R notification thread that it may now deinit the resource; in afterRestore the resource is initialized and the eventloop is unblocked. This way we don't impose any extra overhead when C/R is happening. To avoid extra synchronization it could be technically possible to modify CRaC implementation to keep all other threads frozen during restore. There's a risk of some form of deadlock if the thread performing C/R would require other threads to progress, though, so any such solution would require extra thoughts. Besides, this does not guarantee exclusivity so the afterRestore would need to restore the resource to the *exactly* same state (as some of its before-checkpoint state might have leaked to the thread in Processor). In my opinion this is not the best way. The problem with RCU is tracking which threads are in the critical section. I've found RCU-like implementations for Java that avoid excessive overhead using a spread out array - each thread marks entering/leaving the critical section by writes to its own counter, preventing cache ping-pong (assuming no false sharing). Synchronizer thread uses another flag to request synchronization; reading this by each thread is not totally without cost but reasonably cheap, and in that case worker threads can enter a blocking slow path. The simple implementation assumes a fixed number of threads; if the list of threads is dynamic the solution would be probably more complicated. It might also make sense to implement this in native code with a per-CPU counters, rather than per-thread. A downside, besides some overhead in terms of both cycles and memory usage, is that we'd need to modify the code and explicitly mark the critical sections. Another solution could try to leverage existing JVM mechanics for code deoptimization, replacing the critical sections with a slower, blocking stub, and reverting back after restore. Or even independently requesting a safe-point and inspecting stack of threads until the synchronization is possible. So I probably can't offer a ready-to-use performant solution; pick your poison. The future, though, offers a few possibilities and I'd love to hear others' opinions about which one would look the most feasible. Because unless we offer something that does not harm a no-CRaC use-case I am afraid that the adoption will be quite limited. Cheers, Radim [1] https://en.wikipedia.org/wiki/Read-copy-update On 03. 04. 23 22:30, Christian Tzolov wrote: > Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads. > > For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data. > > When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop. > > To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore. > > When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor. > > The https://github.com/tzolov/crac-demo illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal snapshots of the observed behavior. > > I've used latest JDK CRaC release: > openjdk 17-crac 2021-09-14 > OpenJDK Runtime Environment (build 17-crac+5-19) > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) > > As I'm new to CRaC, I'd appreciate your thoughts on this issue. > > Cheers, > Christian > > > > From christian.tzolov at gmail.com Wed Apr 5 12:05:50 2023 From: christian.tzolov at gmail.com (Christian Tzolov) Date: Wed, 5 Apr 2023 14:05:50 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed Message-ID: Hi Radim, (Unfortunately, the mailing list didn't deliver the original message nor the replay. So i'm reposting with Re instead) The distributed process synchronization is a hard topic and, IMO, CRaC should at least offer some help with it. For example if the primary goal is to provide a functional warmed up clone of the application (as opposed to data replication), then perhaps we can relax the data consistency as long as the application and its components are up and running in the right order. Then approaches such as: > To avoid extra synchronization it could be technically possible to > modify CRaC implementation to keep all other threads frozen during > restore. or > Another solution could try to leverage existing JVM mechanics for code > deoptimization, replacing the critical sections with a slower, blocking > stub, and reverting back after restore. Or even independently requesting > a safe-point and inspecting stack of threads until the synchronization > is possible. would be a workable solution, despite the inconsistencies they may introduce. I agree with your observation that: > unless we offer something that does not harm a no-CRaC use-case > I am afraid that the adoption will be quite limited. I wrongly assumed that CRaC provides some process synchronization mechanism, while in reality it imposes a new programming model that leaves the synchronization tasks to the end developers. Then the complexity of using CRaC with existing applications would be an order of magnitude higher compared to making the same applications GraalVM compliant. Cheers, Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvansa at azul.com Wed Apr 5 12:38:31 2023 From: rvansa at azul.com (Radim Vansa) Date: Wed, 5 Apr 2023 14:38:31 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: Hi Christian, comments inline... On 05. 04. 23 14:05, Christian Tzolov wrote: > Hi Radim, > > (Unfortunately, the mailing list didn't deliver the original message nor the replay. So i'm reposting with Re instead) > > The distributed process synchronization is a hard topic and, IMO, CRaC should at least offer some help with it. I agree; what might be discussed is what portion of this 'help' should be a part of the API and its contracts, and what should be a part of the reference implementation. > For example if the primary goal is to provide a functional warmed up clone of the application (as opposed to data replication), then perhaps we can relax the data consistency as long as the application and its components are up and running in the right order. > > Then approaches such as: > >> To avoid extra synchronization it could be technically possible to >> modify CRaC implementation to keep all other threads frozen during >> restore. > or >> Another solution could try to leverage existing JVM mechanics for code >> deoptimization, replacing the critical sections with a slower, blocking >> stub, and reverting back after restore. Or even independently requesting >> a safe-point and inspecting stack of threads until the synchronization >> is possible. > would be a workable solution, despite the inconsistencies they may introduce. > > I agree with your observation that: >> unless we offer something that does not harm a no-CRaC use-case >> I am afraid that the adoption will be quite limited. > I wrongly assumed that CRaC provides some process synchronization mechanism, while in reality it imposes a new programming model that leaves the synchronization tasks to the end developers. > Then the complexity of using CRaC with existing applications would be an order of magnitude higher compared to making the same applications GraalVM compliant. Personally I find the term 'new programming model' rather exaggerated. CRaC does not pretend to offer a silver bullet without any effort made on the user side. However when you speak about 'relaxing consistency' I hear obscure bugs. I think that CRaC tries to be more conservative than GraalVM and not introduce any limitations in the runtime. Hence GraalVM-compliant and 100%-proof on GraalVM might be a different thing. (Please take my opinion with a grain of salt since my experience with Graal is rather limited.) Both JDK and Graal are driven by very smart developers; it's about compromises that each project takes. While we don't offer a perfect solution yet I hope that the number of 'friction points' in adoption is quite limited. Wish you the best luck with that! Cheers, Radim Vansa From heidinga at redhat.com Wed Apr 5 14:28:08 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 5 Apr 2023 10:28:08 -0400 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: Hi Radim, Thanks for the write up of the various options in this space. On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa wrote: > Hi Christian, > > I believe this is a common problem when porting existing architecture > under CRaC; the obvious solution is to guard access to the resource > (ProcessorContext in this case) with a RW lock that'd be read-acquired > by 'regular' access and acquired for write in beforeCheckpoint/released > in afterRestore. However this introduces extra synchronization (at least > in form of volatile writes) even in case that C/R is not used at all, > especially if the support is added into libraries. > I've seen variations of this approach go by in code reviews but have we written up a good example of how to do this well? Having a canonical pattern would help to highlight the best way to do it today and make the tradeoffs explicit. > > Anton Kozlov proposed techniques like RCU [1] but at this point there's > no support for this in Java. Even the Linux implementation might require > some additional properties from the code in critical (read) section like > not calling any blocking code; this might be too limiting. > > The situation is simpler if the application uses a single threaded > event-loop; beforeCheckpoint can enqueue a task that would, upon its > execution, block on a primitive and notify the C/R notification thread > that it may now deinit the resource; in afterRestore the resource is > initialized and the eventloop is unblocked. This way we don't impose any > extra overhead when C/R is happening. > That's a nice idea! > > To avoid extra synchronization it could be technically possible to > modify CRaC implementation to keep all other threads frozen during > restore. There's a risk of some form of deadlock if the thread > performing C/R would require other threads to progress, though, so any > such solution would require extra thoughts. Besides, this does not > guarantee exclusivity so the afterRestore would need to restore the > resource to the *exactly* same state (as some of its before-checkpoint > state might have leaked to the thread in Processor). In my opinion this > is not the best way. > This is the approach that OpenJ9 took to solve the consistency problems introduced by updating resources before / after checkpoints. OpenJ9 enters "single threaded mode" when creating the checkpoint and executing the before checkkpoint fixups. On restore, it continues in single-threaded mode while executing the after checkpoint fixups. This makes it easier to avoid additional runtime costs related to per-resource locking for checkpoints, but complicates locking and wait/notify in general. This means a checkpoint hook operation can't wait on another thread (would block indefinitely as other threads are paused), can't wait on a lock being held by another thread (again, would deadlock), and sending notify may result in inconsistent behaviour (wrong number of notifies received by other threads). See "The checkpointJVM() API" section of their blog post on CRIU for more details [0]. > > The problem with RCU is tracking which threads are in the critical > section. I've found RCU-like implementations for Java that avoid > excessive overhead using a spread out array - each thread marks > entering/leaving the critical section by writes to its own counter, > preventing cache ping-pong (assuming no false sharing). Synchronizer > thread uses another flag to request synchronization; reading this by > each thread is not totally without cost but reasonably cheap, and in > that case worker threads can enter a blocking slow path. The simple > implementation assumes a fixed number of threads; if the list of threads > is dynamic the solution would be probably more complicated. It might > also make sense to implement this in native code with a per-CPU > counters, rather than per-thread. A downside, besides some overhead in > terms of both cycles and memory usage, is that we'd need to modify the > code and explicitly mark the critical sections. > > Another solution could try to leverage existing JVM mechanics for code > deoptimization, replacing the critical sections with a slower, blocking > stub, and reverting back after restore. Or even independently requesting > a safe-point and inspecting stack of threads until the synchronization > is possible. > This will have a high risk of livelock. The OpenJ9 experience implementing single-threaded mode for CRIU indicates there are a lot of strange locking patterns in the world. > > So I probably can't offer a ready-to-use performant solution; pick your > poison. The future, though, offers a few possibilities and I'd love to > hear others' opinions about which one would look the most feasible. > Because unless we offer something that does not harm a no-CRaC use-case > I am afraid that the adoption will be quite limited. > Successful solutions will push the costs into the checkpoint / restore paths as much as possible. Going back to the explicit lock mechanism you first mentioned, I wonder if there's a role for java.lang.invoke.Switchpoint [1] here? Switchpoint was added as a tool for language implementers that wanted to be able speculate on a particular condition (ie: CHA assumptions) and get the same kind of low cost state change that existing JITTED code gets. I'm not sure how well that vision worked in practice or how well Hotspot optimizes it yet, but this might be a reason to push on its performance. Roughly the idea would be to add a couple of Switchpoints to jdk.crac.Core: public SwitchPoint getBeforeSwitchpoint(); public SwitchPoint getAfterSwitchpoint(); and users could then write their code using MethodHandles to implementing the branching logic: MethodHandle normalPath = ...... // existing code MethodHandle fallbackPath = ..... // before Checkpoint extra work MethodHandle guardWithTest = getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); and the jdk.crac.Core class would invalidate the "before" SwitchPoint prior to the checkpoint and "after" one after the restore. Aside from the painful programming model, this might give us the tools we need to make it performant. Needs more exploration and prototyping but would provide a potential path to reasonable performance by burying the extra locking in the fallback paths. And it would be a single pattern to optimize, rather than all the variations users could produce. --Dan [0] https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ [1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > > Cheers, > > Radim > > [1] https://en.wikipedia.org/wiki/Read-copy-update > > On 03. 04. 23 22:30, Christian Tzolov wrote: > > Hi, I'm testing CRaC in the context of long-running applications (e.g. > streaming, continuous processing ...) and I've stumbled on an issue related > to the coordination of the resolved threads. > > > > For example, let's have a Processor that performs continuous > computations. This processor depends on a ProcessorContext and later must > be fully initialized before the processor can process any data. > > > > When the application is first started (e.g. not from checkpoints) it > ensures that the ProcessorContext is initialized before starting the > Processor loop. > > > > To leverage CRaC I've implemented a ProcessorContextResource gracefully > stops the context on beforeCheckpoint and then re-initialized it on > afterRestore. > > > > When the checkpoint is performed, CRaC calls the > ProcessorContextResource.beforeCheckpoint and also preserves the current > Processor call stack. On Restore processor's call stack is expectedly > restored at the point it was stopped but unfortunately it doesn't wait for > the ProcessorContextResource.afterRestore complete. This expectedly crashes > the processor. > > > > The https://github.com/tzolov/crac-demo illustreates this issue. The > README explains how to reproduce the issue. The OUTPUT.md ( > https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal > snapshots of the observed behavior. > > > > I've used latest JDK CRaC release: > > openjdk 17-crac 2021-09-14 > > OpenJDK Runtime Environment (build 17-crac+5-19) > > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) > > > > As I'm new to CRaC, I'd appreciate your thoughts on this issue. > > > > Cheers, > > Christian > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvansa at azul.com Wed Apr 5 16:00:11 2023 From: rvansa at azul.com (Radim Vansa) Date: Wed, 5 Apr 2023 18:00:11 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: Lot of interesting ideas, comments inline.... On 05. 04. 23 16:28, Dan Heidinga wrote: > Hi Radim, > > Thanks for the write up of the various options in this space. > > On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa > wrote: > Hi Christian, > > I believe this is a common problem when porting existing architecture > under CRaC; the obvious solution is to guard access to the resource > (ProcessorContext in this case) with a RW lock that'd be read-acquired > by 'regular' access and acquired for write in beforeCheckpoint/released > in afterRestore. However this introduces extra synchronization (at least > in form of volatile writes) even in case that C/R is not used at all, > especially if the support is added into libraries. > > I've seen variations of this approach go by in code reviews but have we written up a good example of how to do this well? Having a canonical pattern would help to highlight the best way to do it today and make the tradeoffs explicit. TBH I am rather shy to demo a solution that's quite imperfect unless we find that there's no way around that. > > > Anton Kozlov proposed techniques like RCU [1] but at this point there's > no support for this in Java. Even the Linux implementation might require > some additional properties from the code in critical (read) section like > not calling any blocking code; this might be too limiting. > > The situation is simpler if the application uses a single threaded > event-loop; beforeCheckpoint can enqueue a task that would, upon its > execution, block on a primitive and notify the C/R notification thread > that it may now deinit the resource; in afterRestore the resource is > initialized and the eventloop is unblocked. This way we don't impose any > extra overhead when C/R is happening. > > That's a nice idea! > > > To avoid extra synchronization it could be technically possible to > modify CRaC implementation to keep all other threads frozen during > restore. There's a risk of some form of deadlock if the thread > performing C/R would require other threads to progress, though, so any > such solution would require extra thoughts. Besides, this does not > guarantee exclusivity so the afterRestore would need to restore the > resource to the *exactly* same state (as some of its before-checkpoint > state might have leaked to the thread in Processor). In my opinion this > is not the best way. > > This is the approach that OpenJ9 took to solve the consistency problems introduced by updating resources before / after checkpoints. OpenJ9 enters "single threaded mode" when creating the checkpoint and executing the before checkkpoint fixups. On restore, it continues in single-threaded mode while executing the after checkpoint fixups. This makes it easier to avoid additional runtime costs related to per-resource locking for checkpoints, but complicates locking and wait/notify in general. > > This means a checkpoint hook operation can't wait on another thread (would block indefinitely as other threads are paused), can't wait on a lock being held by another thread (again, would deadlock), and sending notify may result in inconsistent behaviour (wrong number of notifies received by other threads). See "The checkpointJVM() API" section of their blog post on CRIU for more details [0]. Great post, I should probably go through the whole blog. I think that the single-threaded mode is conceptually simple to think about and with the @NotCheckpointSafe annotation deals well with the issue. Have you run into any edge cases where this doesn't work well? For example I've seen a deadlock because in beforeCheckpoint something was supposed to run in the reference cleaner thread. Also, did you run any tests to see the performance impact of your changes to wait/notify? Do you also have to tweak higher-level synchronization such as java.util.concurrent.*? > > > The problem with RCU is tracking which threads are in the critical > section. I've found RCU-like implementations for Java that avoid > excessive overhead using a spread out array - each thread marks > entering/leaving the critical section by writes to its own counter, > preventing cache ping-pong (assuming no false sharing). Synchronizer > thread uses another flag to request synchronization; reading this by > each thread is not totally without cost but reasonably cheap, and in > that case worker threads can enter a blocking slow path. The simple > implementation assumes a fixed number of threads; if the list of threads > is dynamic the solution would be probably more complicated. It might > also make sense to implement this in native code with a per-CPU > counters, rather than per-thread. A downside, besides some overhead in > terms of both cycles and memory usage, is that we'd need to modify the > code and explicitly mark the critical sections. > > Another solution could try to leverage existing JVM mechanics for code > deoptimization, replacing the critical sections with a slower, blocking > stub, and reverting back after restore. Or even independently requesting > a safe-point and inspecting stack of threads until the synchronization > is possible. > > This will have a high risk of livelock. The OpenJ9 experience implementing single-threaded mode for CRIU indicates there are a lot of strange locking patterns in the world. There are weird patterns but a more fine-grained exclusivity should be more permissive than single-threaded mode which works as an implicit Big Fat Lock. Yes, any problems are probably much better reproducible with that one. > > > So I probably can't offer a ready-to-use performant solution; pick your > poison. The future, though, offers a few possibilities and I'd love to > hear others' opinions about which one would look the most feasible. > Because unless we offer something that does not harm a no-CRaC use-case > I am afraid that the adoption will be quite limited. > > Successful solutions will push the costs into the checkpoint / restore paths as much as possible. Going back to the explicit lock mechanism you first mentioned, I wonder if there's a role for java.lang.invoke.Switchpoint [1] here? Switchpoint was added as a tool for language implementers that wanted to be able speculate on a particular condition (ie: CHA assumptions) and get the same kind of low cost state change that existing JITTED code gets. I'm not sure how well that vision worked in practice or how well Hotspot optimizes it yet, but this might be a reason to push on its performance. > > Roughly the idea would be to add a couple of Switchpoints to jdk.crac.Core: > > public SwitchPoint getBeforeSwitchpoint(); > public SwitchPoint getAfterSwitchpoint(); > > and users could then write their code using MethodHandles to implementing the branching logic: > > MethodHandle normalPath = ...... // existing code > MethodHandle fallbackPath = ..... // before Checkpoint extra work > MethodHandle guardWithTest = getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); > > and the jdk.crac.Core class would invalidate the "before" SwitchPoint prior to the checkpoint and "after" one after the restore. Aside from the painful programming model, this might give us the tools we need to make it performant. > > Needs more exploration and prototyping but would provide a potential path to reasonable performance by burying the extra locking in the fallback paths. And it would be a single pattern to optimize, rather than all the variations users could produce. Well noted, I admit that I haven't heard about SwitchPoint before, I'll need some time to ingest it and maybe write a JMH tests to see. However from the first look it is not something that would be too convenient for users, I would consider some form of interception introduced through annotations (yes, I come from the EE world). Thanks! Radim > > --Dan > > > [0] https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ > [1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > > Cheers, > > Radim > > [1] https://en.wikipedia.org/wiki/Read-copy-update > > On 03. 04. 23 22:30, Christian Tzolov wrote: >> Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads. >> >> For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data. >> >> When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop. >> >> To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore. >> >> When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor. >> >> The https://github.com/tzolov/crac-demo illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal snapshots of the observed behavior. >> >> I've used latest JDK CRaC release: >> openjdk 17-crac 2021-09-14 >> OpenJDK Runtime Environment (build 17-crac+5-19) >> OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) >> >> As I'm new to CRaC, I'd appreciate your thoughts on this issue. >> >> Cheers, >> Christian >> >> >> >> > From heidinga at redhat.com Wed Apr 5 20:40:29 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 5 Apr 2023 16:40:29 -0400 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: On Wed, Apr 5, 2023 at 12:01?PM Radim Vansa wrote: > Lot of interesting ideas, comments inline.... > > On 05. 04. 23 16:28, Dan Heidinga wrote: > > Hi Radim, > > > > Thanks for the write up of the various options in this space. > > > > On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa rvansa at azul.com>> wrote: > > Hi Christian, > > > > I believe this is a common problem when porting existing architecture > > under CRaC; the obvious solution is to guard access to the resource > > (ProcessorContext in this case) with a RW lock that'd be read-acquired > > by 'regular' access and acquired for write in beforeCheckpoint/released > > in afterRestore. However this introduces extra synchronization (at least > > in form of volatile writes) even in case that C/R is not used at all, > > especially if the support is added into libraries. > > > > I've seen variations of this approach go by in code reviews but have we > written up a good example of how to do this well? Having a canonical > pattern would help to highlight the best way to do it today and make the > tradeoffs explicit. > > TBH I am rather shy to demo a solution that's quite imperfect unless we > find that there's no way around that. > That's fair. Sometimes it's easier for people to start from something and modify it to be a better pattern than to have to generate it from scratch. > > > > > > > > Anton Kozlov proposed techniques like RCU [1] but at this point there's > > no support for this in Java. Even the Linux implementation might require > > some additional properties from the code in critical (read) section like > > not calling any blocking code; this might be too limiting. > > > > The situation is simpler if the application uses a single threaded > > event-loop; beforeCheckpoint can enqueue a task that would, upon its > > execution, block on a primitive and notify the C/R notification thread > > that it may now deinit the resource; in afterRestore the resource is > > initialized and the eventloop is unblocked. This way we don't impose any > > extra overhead when C/R is happening. > > > > That's a nice idea! > > > > > > To avoid extra synchronization it could be technically possible to > > modify CRaC implementation to keep all other threads frozen during > > restore. There's a risk of some form of deadlock if the thread > > performing C/R would require other threads to progress, though, so any > > such solution would require extra thoughts. Besides, this does not > > guarantee exclusivity so the afterRestore would need to restore the > > resource to the *exactly* same state (as some of its before-checkpoint > > state might have leaked to the thread in Processor). In my opinion this > > is not the best way. > > > > This is the approach that OpenJ9 took to solve the consistency problems > introduced by updating resources before / after checkpoints. OpenJ9 enters > "single threaded mode" when creating the checkpoint and executing the > before checkkpoint fixups. On restore, it continues in single-threaded > mode while executing the after checkpoint fixups. This makes it easier to > avoid additional runtime costs related to per-resource locking for > checkpoints, but complicates locking and wait/notify in general. > > > > This means a checkpoint hook operation can't wait on another thread > (would block indefinitely as other threads are paused), can't wait on a > lock being held by another thread (again, would deadlock), and sending > notify may result in inconsistent behaviour (wrong number of notifies > received by other threads). See "The checkpointJVM() API" section of their > blog post on CRIU for more details [0]. > > Great post, I should probably go through the whole blog. I think that > the single-threaded mode is conceptually simple to think about and with > the @NotCheckpointSafe annotation deals well with the issue. Have you > run into any edge cases where this doesn't work well? For example I've > seen a deadlock because in beforeCheckpoint something was supposed to > run in the reference cleaner thread. > I think we're still playing whack-a-mole with the places that may need @NotCheckpointSafe. There's no good static analysis that will indicate where the annotation is needed so we're finding places to add it as we go. Examples include: MethodType interning - if a halted thread was interning a MethodType, and the checkpoint thread needs to resolve a MethodType, we deadlock. Also, ClassValue::initializeMap(), ClassSpecializer::findSpecies, ConcurrenthashMap::computeIfAbsent. The more places we add the annotation though, the harder it will be to find a "safe" point to take the checkpoint without a livelock. It's a mostly working bandaid rather than a general solution but it still might be good enough to make most applications work. > > Also, did you run any tests to see the performance impact of your > changes to wait/notify? Do you also have to tweak higher-level > synchronization such as java.util.concurrent.*? > The OpenJ9 locking code is quite different so results may vary. I don't recall any measurable impact due to the wait/notify changes as they already tend to be slow paths. I'm not aware of any changes to the j.u.c synchronization classes. As I said, whack-a-mole and I don't think this "mole" has poked its head up yet. > > > > > > > > The problem with RCU is tracking which threads are in the critical > > section. I've found RCU-like implementations for Java that avoid > > excessive overhead using a spread out array - each thread marks > > entering/leaving the critical section by writes to its own counter, > > preventing cache ping-pong (assuming no false sharing). Synchronizer > > thread uses another flag to request synchronization; reading this by > > each thread is not totally without cost but reasonably cheap, and in > > that case worker threads can enter a blocking slow path. The simple > > implementation assumes a fixed number of threads; if the list of threads > > is dynamic the solution would be probably more complicated. It might > > also make sense to implement this in native code with a per-CPU > > counters, rather than per-thread. A downside, besides some overhead in > > terms of both cycles and memory usage, is that we'd need to modify the > > code and explicitly mark the critical sections. > > > > Another solution could try to leverage existing JVM mechanics for code > > deoptimization, replacing the critical sections with a slower, blocking > > stub, and reverting back after restore. Or even independently requesting > > a safe-point and inspecting stack of threads until the synchronization > > is possible. > > > > This will have a high risk of livelock. The OpenJ9 experience > implementing single-threaded mode for CRIU indicates there are a lot of > strange locking patterns in the world. > > There are weird patterns but a more fine-grained exclusivity should be > more permissive than single-threaded mode which works as an implicit Big > Fat Lock. Yes, any problems are probably much better reproducible with > that one. > > > > > > > So I probably can't offer a ready-to-use performant solution; pick your > > poison. The future, though, offers a few possibilities and I'd love to > > hear others' opinions about which one would look the most feasible. > > Because unless we offer something that does not harm a no-CRaC use-case > > I am afraid that the adoption will be quite limited. > > > > Successful solutions will push the costs into the checkpoint / restore > paths as much as possible. Going back to the explicit lock mechanism you > first mentioned, I wonder if there's a role for > java.lang.invoke.Switchpoint [1] here? Switchpoint was added as a tool for > language implementers that wanted to be able speculate on a particular > condition (ie: CHA assumptions) and get the same kind of low cost state > change that existing JITTED code gets. I'm not sure how well that vision > worked in practice or how well Hotspot optimizes it yet, but this might be > a reason to push on its performance. > > > > Roughly the idea would be to add a couple of Switchpoints to > jdk.crac.Core: > > > > public SwitchPoint getBeforeSwitchpoint(); > > public SwitchPoint getAfterSwitchpoint(); > > > > and users could then write their code using MethodHandles to > implementing the branching logic: > > > > MethodHandle normalPath = ...... // existing code > > MethodHandle fallbackPath = ..... // before Checkpoint extra work > > MethodHandle guardWithTest = > getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); > > > > and the jdk.crac.Core class would invalidate the "before" SwitchPoint > prior to the checkpoint and "after" one after the restore. Aside from the > painful programming model, this might give us the tools we need to make it > performant. > > > > Needs more exploration and prototyping but would provide a potential > path to reasonable performance by burying the extra locking in the fallback > paths. And it would be a single pattern to optimize, rather than all the > variations users could produce. > > Well noted, I admit that I haven't heard about SwitchPoint before, I'll > need some time to ingest it and maybe write a JMH tests to see. However > from the first look it is not something that would be too convenient for > users, I would consider some form of interception introduced through > annotations (yes, I come from the EE world). > Not a fan of annotations in the long term. They are hard to maintain as the underlying implementation changes / is refactored. Even something as straightforward as the @CallerSensitive annotation is tough to maintain. These would be a lot harder as they would apply in many more circumstances. --Dan > > Thanks! > > Radim > > > > > > --Dan > > > > > > [0] > https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ > > [1] > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > > > > Cheers, > > > > Radim > > > > [1] https://en.wikipedia.org/wiki/Read-copy-update > > > > On 03. 04. 23 22:30, Christian Tzolov wrote: > >> Hi, I'm testing CRaC in the context of long-running applications (e.g. > streaming, continuous processing ...) and I've stumbled on an issue related > to the coordination of the resolved threads. > >> > >> For example, let's have a Processor that performs continuous > computations. This processor depends on a ProcessorContext and later must > be fully initialized before the processor can process any data. > >> > >> When the application is first started (e.g. not from checkpoints) it > ensures that the ProcessorContext is initialized before starting the > Processor loop. > >> > >> To leverage CRaC I've implemented a ProcessorContextResource gracefully > stops the context on beforeCheckpoint and then re-initialized it on > afterRestore. > >> > >> When the checkpoint is performed, CRaC calls the > ProcessorContextResource.beforeCheckpoint and also preserves the current > Processor call stack. On Restore processor's call stack is expectedly > restored at the point it was stopped but unfortunately it doesn't wait for > the ProcessorContextResource.afterRestore complete. This expectedly crashes > the processor. > >> > >> The https://github.com/tzolov/crac-demo illustreates this issue. The > README explains how to reproduce the issue. The OUTPUT.md ( > https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal > snapshots of the observed behavior. > >> > >> I've used latest JDK CRaC release: > >> openjdk 17-crac 2021-09-14 > >> OpenJDK Runtime Environment (build 17-crac+5-19) > >> OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) > >> > >> As I'm new to CRaC, I'd appreciate your thoughts on this issue. > >> > >> Cheers, > >> Christian > >> > >> > >> > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Thu Apr 6 12:11:03 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 6 Apr 2023 12:11:03 GMT Subject: [crac] RFR: Support repeated checkpoint and restore operations Message-ID: * VM option CRaCCheckpointTo is recognized when restoring the application (destination can be changed) * The main problem for checkpoint after restore was old checkpoint image mmapped to files (CRaC-specific CRIU optimization for faster boot). Before performing checkpoint we transparently swap this with memory using anonymous mapping. ------------- Commit messages: - Support repeated checkpoint and restore operations Changes: https://git.openjdk.org/crac/pull/57/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=57&range=00 Stats: 391 lines in 9 files changed: 349 ins; 15 del; 27 mod Patch: https://git.openjdk.org/crac/pull/57.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/57/head:pull/57 PR: https://git.openjdk.org/crac/pull/57 From duke at openjdk.org Thu Apr 6 13:31:42 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 6 Apr 2023 13:31:42 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: Message-ID: > There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Reset nanotime offset before calculating it again ------------- Changes: - all: https://git.openjdk.org/crac/pull/53/files - new: https://git.openjdk.org/crac/pull/53/files/35a9b128..b59d738a Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=53&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=53&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/53.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/53/head:pull/53 PR: https://git.openjdk.org/crac/pull/53 From duke at openjdk.org Thu Apr 6 13:31:42 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 6 Apr 2023 13:31:42 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: On Thu, 30 Mar 2023 14:26:15 GMT, Radim Vansa wrote: >> It estabilishes relation between real time and monotonic time, and it's sufficient to do that just once >> >> // 1st checkpoint >> checkpoint_millis = 1 >> checkpoint_nanos = 1_000_000 >> // almost immediate restore >> javaTimeMillis() -> 2 >> monotonic nanos -> 2_000_000 >> diff_millis = 1 >> javaTimeNanos_offset = 0 >> // second checkpoint does not record anything >> // 2nd restore >> javaTimeMills() -> 3 >> monotonic nanos -> 100_000_000 >> diff_millis = 2 >> javaTimeNanos_offset = 1_000_000 - 100_000_000 + 2 * 1_000_000 = -97_000_000 >> javaTimeNanos() -> system monotonic nanos + offset = 3_000_000 >> >> In the last step we've read a value that makes sense to compare to any nanoTime in any previous phase. > > Oh wait a sec, you're partially right - since we always use javaTimeNanos() if the offset calculated after the first restore wouldn't be zero, we wouldn't have this right. I should zero the offset before calculating it again. Too bad I can't create a test for that yet. Fixed the problem above; I did a (unpublished) merge with #57 and wrote a test to validate the behaviour. Can contribute it after #57 gets merged (possibly in another PR). ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1159798768 From christian.tzolov at gmail.com Thu Apr 6 13:59:26 2023 From: christian.tzolov at gmail.com (Christian Tzolov) Date: Thu, 6 Apr 2023 15:59:26 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: Hi Dan and Radim, Thanks for the feedback and suggestions! It is the first time I?m facing the java.lang.invoke.* API and it might take some time to wrap my head around it. So be prepared plese for lame questions, as those inlined below. On Wed, Apr 5, 2023 at 4:28?PM Dan Heidinga wrote: > Hi Radim, > > Thanks for the write up of the various options in this space. > > On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa wrote: > >> Hi Christian, >> >> I believe this is a common problem when porting existing architecture >> under CRaC; the obvious solution is to guard access to the resource >> (ProcessorContext in this case) with a RW lock that'd be read-acquired >> by 'regular' access and acquired for write in beforeCheckpoint/released >> in afterRestore. However this introduces extra synchronization (at least >> in form of volatile writes) even in case that C/R is not used at all, >> especially if the support is added into libraries. >> > > I've seen variations of this approach go by in code reviews but have we > written up a good example of how to do this well? Having a > canonical pattern would help to highlight the best way to do it today and > make the tradeoffs explicit. > @Radim, your ?guard access? suggestion made me realise that perhaps I?ve oversimplified my sample. So I?ve modified it a bit: https://github.com/tzolov/crac-demo/blob/main/src/main/java/com/example/crac/CrackDemoExt.java by introducing a new ProcessorState used by the Processor for its computation. At the same time I?ve removed the direct Processor dependency on the ProcessorContext. Instead the ProcessorContext is responsible for managing the lifecycle of the ProcessorState before the Processor can use it. Then given your original suggestion is it right to assume that the ?guard access to the resource? now should guard the ProcessorState not the ProcessorContext? And if this is true then how one would be able to identify all possible ?resources? to be guarded? > Anton Kozlov proposed techniques like RCU [1] but at this point there's >> no support for this in Java. Even the Linux implementation might require >> some additional properties from the code in critical (read) section like >> not calling any blocking code; this might be too limiting. >> >> The situation is simpler if the application uses a single threaded >> event-loop; beforeCheckpoint can enqueue a task that would, upon its >> execution, block on a primitive and notify the C/R notification thread >> that it may now deinit the resource; in afterRestore the resource is >> initialized and the eventloop is unblocked. This way we don't impose any >> extra overhead when C/R is happening. >> > > That's a nice idea! > > >> >> To avoid extra synchronization it could be technically possible to >> modify CRaC implementation to keep all other threads frozen during >> restore. There's a risk of some form of deadlock if the thread >> performing C/R would require other threads to progress, though, so any >> such solution would require extra thoughts. Besides, this does not >> guarantee exclusivity so the afterRestore would need to restore the >> resource to the *exactly* same state (as some of its before-checkpoint >> state might have leaked to the thread in Processor). In my opinion this >> is not the best way. >> > > This is the approach that OpenJ9 took to solve the consistency problems > introduced by updating resources before / after checkpoints. OpenJ9 enters > "single threaded mode" when creating the checkpoint and executing the > before checkkpoint fixups. On restore, it continues in single-threaded > mode while executing the after checkpoint fixups. This makes it easier to > avoid additional runtime costs related to per-resource locking for > checkpoints, but complicates locking and wait/notify in general. > > This means a checkpoint hook operation can't wait on another thread (would > block indefinitely as other threads are paused), can't wait on a lock being > held by another thread (again, would deadlock), and sending notify may > result in inconsistent behaviour (wrong number of notifies received by > other threads). See "The checkpointJVM() API" section of their blog post > on CRIU for more details [0]. > The "single thread mode", imo, corresponds to the "serializable isolation" approach in data processing and DB transactions. The OpenJ9 blogs are very informative and like the jdk invoke API would need time to digest. But I have one conceptual question. What part of this should/cloud be implemented by the CRaC inself and what abstractions should be exposed to the CRaC users? >> The problem with RCU is tracking which threads are in the critical >> section. I've found RCU-like implementations for Java that avoid >> excessive overhead using a spread out array - each thread marks >> entering/leaving the critical section by writes to its own counter, >> preventing cache ping-pong (assuming no false sharing). Synchronizer >> thread uses another flag to request synchronization; reading this by >> each thread is not totally without cost but reasonably cheap, and in >> that case worker threads can enter a blocking slow path. The simple >> implementation assumes a fixed number of threads; if the list of threads >> is dynamic the solution would be probably more complicated. It might >> also make sense to implement this in native code with a per-CPU >> counters, rather than per-thread. A downside, besides some overhead in >> terms of both cycles and memory usage, is that we'd need to modify the >> code and explicitly mark the critical sections. >> >> Another solution could try to leverage existing JVM mechanics for code >> deoptimization, replacing the critical sections with a slower, blocking >> stub, and reverting back after restore. Or even independently requesting >> a safe-point and inspecting stack of threads until the synchronization >> is possible. >> > > This will have a high risk of livelock. The OpenJ9 experience > implementing single-threaded mode for CRIU indicates there are a lot of > strange locking patterns in the world. > > >> >> So I probably can't offer a ready-to-use performant solution; pick your >> poison. The future, though, offers a few possibilities and I'd love to >> hear others' opinions about which one would look the most feasible. >> Because unless we offer something that does not harm a no-CRaC use-case >> I am afraid that the adoption will be quite limited. >> > > Successful solutions will push the costs into the checkpoint / restore > paths as much as possible. Going back to the explicit lock mechanism you > first mentioned, I wonder if there's a role for > java.lang.invoke.Switchpoint [1] here? Switchpoint was added as a tool for > language implementers that wanted to be able speculate on a particular > condition (ie: CHA assumptions) and get the same kind of low cost state > change that existing JITTED code gets. I'm not sure how well that vision > worked in practice or how well Hotspot optimizes it yet, but this might be > a reason to push on its performance. > > Roughly the idea would be to add a couple of Switchpoints to jdk.crac.Core: > > public SwitchPoint getBeforeSwitchpoint(); > public SwitchPoint getAfterSwitchpoint(); > > and users could then write their code using MethodHandles to implementing > the branching logic: > > MethodHandle normalPath = ...... // existing code > MethodHandle fallbackPath = ..... // before Checkpoint extra work > MethodHandle guardWithTest = > getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); > > and the jdk.crac.Core class would invalidate the "before" SwitchPoint > prior to the checkpoint and "after" one after the restore. Aside from the > painful programming model, this might give us the tools we need to make it > performant. > @Dan, this is very interesting! Could you please elaborate a bit further. Perhaps in the context of the CrackDemoExt.java sample? > > Needs more exploration and prototyping but would provide a potential path > to reasonable performance by burying the extra locking in the fallback > paths. And it would be a single pattern to optimize, rather than all the > variations users could produce. > --Dan > [0] > https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ > [1] > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > Thank you, - Christian > >> Cheers, >> >> Radim >> >> [1] https://en.wikipedia.org/wiki/Read-copy-update >> >> On 03. 04. 23 22:30, Christian Tzolov wrote: >> > Hi, I'm testing CRaC in the context of long-running applications (e.g. >> streaming, continuous processing ...) and I've stumbled on an issue related >> to the coordination of the resolved threads. >> > >> > For example, let's have a Processor that performs continuous >> computations. This processor depends on a ProcessorContext and later must >> be fully initialized before the processor can process any data. >> > >> > When the application is first started (e.g. not from checkpoints) it >> ensures that the ProcessorContext is initialized before starting the >> Processor loop. >> > >> > To leverage CRaC I've implemented a ProcessorContextResource gracefully >> stops the context on beforeCheckpoint and then re-initialized it on >> afterRestore. >> > >> > When the checkpoint is performed, CRaC calls the >> ProcessorContextResource.beforeCheckpoint and also preserves the current >> Processor call stack. On Restore processor's call stack is expectedly >> restored at the point it was stopped but unfortunately it doesn't wait for >> the ProcessorContextResource.afterRestore complete. This expectedly crashes >> the processor. >> > >> > The https://github.com/tzolov/crac-demo illustreates this issue. The >> README explains how to reproduce the issue. The OUTPUT.md ( >> https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers >> terminal snapshots of the observed behavior. >> > >> > I've used latest JDK CRaC release: >> > openjdk 17-crac 2021-09-14 >> > OpenJDK Runtime Environment (build 17-crac+5-19) >> > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) >> > >> > As I'm new to CRaC, I'd appreciate your thoughts on this issue. >> > >> > Cheers, >> > Christian >> > >> > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvansa at azul.com Thu Apr 6 14:29:02 2023 From: rvansa at azul.com (Radim Vansa) Date: Thu, 6 Apr 2023 16:29:02 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: <705afe1f-622e-98f6-a476-55e09d82e48f@azul.com> Hi, comments inline... On 06. 04. 23 15:59, Christian Tzolov wrote: > > > Caution: This email originated from outside of the organization. Do > not click links or open attachments unless you recognize the sender > and know the content is safe. > > > Hi Dan and Radim, > > > Thanks for the feedback and suggestions! > > It is the first time I?m facing the java.lang.invoke.* API and it > might take some time to wrap my head around it. > > So be prepared plese for lame questions, as those inlined below. > > > On Wed, Apr 5, 2023 at 4:28?PM Dan Heidinga wrote: > > Hi Radim, > > Thanks for the write up of the various options in this space. > > On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa wrote: > > Hi Christian, > > I believe this is a common problem when porting existing > architecture > under CRaC; the obvious solution is to guard access to the > resource > (ProcessorContext in this case) with a RW lock that'd be > read-acquired > by 'regular' access and acquired for write in > beforeCheckpoint/released > in afterRestore. However this introduces extra synchronization > (at least > in form of volatile writes) even in case that C/R is not used > at all, > especially if the support is added into libraries. > > > I've seen variations of this approach go by in code reviews but > have we written up a good example of how to do this well?? Having > a canonical?pattern would help to highlight the best way to do it > today and make the tradeoffs explicit. > > > @Radim, your ?guard access? suggestion made me realise that perhaps > I?ve oversimplified my ?sample. > > So I?ve modified it a bit: > https://github.com/tzolov/crac-demo/blob/main/src/main/java/com/example/crac/CrackDemoExt.java > by introducing a new ProcessorState used by the Processor for its > computation. > At the same time I?ve removed the direct Processor dependency on the > ProcessorContext. Instead the ProcessorContext is responsible for > managing the lifecycle of the ProcessorState before the Processor can > use it. > Then given your original suggestion is it right to assume that the > ?guard access to the resource? now should guard the ProcessorState not > the ProcessorContext? > And if this is true then how one would be able to identify all > possible ?resources? to be guarded? It seems that the separation between Context and State is a bit artificial, but anyway... Context here would hold a RW lock, write-locked in constructor. At the end of start() method it would unlock it, and at the beginning of stop() it would lock it. In your case the Processor uses that state directly, rather than through Context - that gives you no place to put the read lock. Instead, it should be delegated through Context that would read-lock it before useState() and unlock afterwards. > Anton Kozlov proposed techniques like RCU [1] but at this > point there's > no support for this in Java. Even the Linux implementation > might require > some additional properties from the code in critical (read) > section like > not calling any blocking code; this might be too limiting. > > The situation is simpler if the application uses a single > threaded > event-loop; beforeCheckpoint can enqueue a task that would, > upon its > execution, block on a primitive and notify the C/R > notification thread > that it may now deinit the resource; in afterRestore the > resource is > initialized and the eventloop is unblocked. This way we don't > impose any > extra overhead when C/R is happening. > > > That's a nice idea! > > > To avoid extra synchronization it could be technically > possible to > modify CRaC implementation to keep all other threads frozen > during > restore. There's a risk of some form of deadlock if the thread > performing C/R would require other threads to progress, > though, so any > such solution would require extra thoughts. Besides, this does > not > guarantee exclusivity so the afterRestore would need to > restore the > resource to the *exactly* same state (as some of its > before-checkpoint > state might have leaked to the thread in Processor). In my > opinion this > is not the best way. > > > This is the approach that OpenJ9 took to solve the consistency > problems introduced by updating resources before / after > checkpoints.? OpenJ9 enters "single threaded mode" when creating > the checkpoint and executing the before checkkpoint?fixups.? On > restore, it continues in single-threaded mode while executing the > after checkpoint fixups.? This makes it easier to avoid additional > runtime costs related to per-resource locking for checkpoints, but > complicates locking and wait/notify in general. > > This means a checkpoint hook operation can't wait on another > thread (would block indefinitely?as other threads are paused), > can't wait on a lock being held by another thread (again, would > deadlock), and sending notify may result in inconsistent behaviour > (wrong number of notifies received by other threads).? See "The > checkpointJVM() API" section of their blog post on CRIU for more > details [0]. > > > The "single thread mode", imo, corresponds to the > "serializable?isolation" approach in data processing and DB transactions. Serialization into one thread is one way to achieve serializable isolation, but there are different strategies too. Though beware that no major database nowadays supports strict serializable isolation (even if it calls some mode serializable) - can't find a proper link to show, and I am digressing anyway. > The OpenJ9 blogs are very informative and like the jdk invoke API > would need time to digest. > But I have one conceptual question. What part of this should/cloud be > implemented by the CRaC inself and what abstractions should be exposed > to the CRaC users? CRaC users need to be aware that their task is to clean up before checkpoint. Ideally if they use a library this should do it transparently to any practical extent. Anything beyond is just utilities provided, you need to dig it so here's (hopefully appropriate) spade. Radim > > > The problem with RCU is tracking which threads are in the > critical > section. I've found RCU-like implementations for Java that avoid > excessive overhead using a spread out array - each thread marks > entering/leaving the critical section by writes to its own > counter, > preventing cache ping-pong (assuming no false sharing). > Synchronizer > thread uses another flag to request synchronization; reading > this by > each thread is not totally without cost but reasonably cheap, > and in > that case worker threads can enter a blocking slow path. The > simple > implementation assumes a fixed number of threads; if the list > of threads > is dynamic the solution would be probably more complicated. It > might > also make sense to implement this in native code with a per-CPU > counters, rather than per-thread. A downside, besides some > overhead in > terms of both cycles and memory usage, is that we'd need to > modify the > code and explicitly mark the critical sections. > > Another solution could try to leverage existing JVM mechanics > for code > deoptimization, replacing the critical sections with a slower, > blocking > stub, and reverting back after restore. Or even independently > requesting > a safe-point and inspecting stack of threads until the > synchronization > is possible. > > > This will have a high risk of livelock.? The OpenJ9 experience > implementing single-threaded mode for CRIU indicates there are a > lot of strange locking patterns in the world. > > > So I probably can't offer a ready-to-use performant solution; > pick your > poison. The future, though, offers a few possibilities and I'd > love to > hear others' opinions about which one would look the most > feasible. > Because unless we offer something that does not harm a no-CRaC > use-case > I am afraid that the adoption will be quite limited. > > > Successful solutions will push the costs into the checkpoint / > restore paths as much as possible. Going back to the explicit lock > mechanism you first mentioned, I wonder if there's a role for > java.lang.invoke.Switchpoint [1] here?? Switchpoint was added as a > tool for language implementers that wanted to be able speculate on > a particular condition (ie: CHA assumptions) and get the same kind > of low cost state change that existing JITTED code gets.? I'm not > sure how well that vision worked in practice or how well Hotspot > optimizes it yet, but this might be a reason to push on its > performance. > > Roughly the idea would be to add a couple of Switchpoints to > jdk.crac.Core: > > ? ?public SwitchPoint getBeforeSwitchpoint(); > ? ?public SwitchPoint getAfterSwitchpoint(); > > and users could then write their code using MethodHandles to > implementing the branching logic: > > ? ? MethodHandle normalPath = ...... // existing code > ? ? MethodHandle fallbackPath = ..... // before Checkpoint extra work > ? ? MethodHandle guardWithTest = > getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); > > and the jdk.crac.Core class would invalidate the "before" > SwitchPoint prior to the checkpoint and "after" one after the > restore.? Aside from the painful programming model, this might > give us the tools we need to make it performant. > > > @Dan, this is very interesting! > Could you please elaborate a bit further. Perhaps in the context of > the CrackDemoExt.java sample? > > > Needs more exploration and prototyping but would provide a > potential path to reasonable performance by burying the extra > locking in the fallback paths. And it would be a single pattern to > optimize, rather than all the variations users could produce. > --Dan > [0] > https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ > [1] > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > > > Thank you, > ?- Christian > > > Cheers, > > Radim > > [1] https://en.wikipedia.org/wiki/Read-copy-update > > On 03. 04. 23 22:30, Christian Tzolov wrote: > > Hi, I'm testing CRaC in the context of long-running > applications (e.g. streaming, continuous processing ...) and > I've stumbled on an issue related to the coordination of the > resolved threads. > > > > For example, let's have a Processor that performs continuous > computations. This processor depends on a ProcessorContext and > later must be fully initialized before the processor can > process any data. > > > > When the application is first started (e.g. not from > checkpoints) it ensures that the ProcessorContext is > initialized before starting the Processor loop. > > > > To leverage CRaC I've implemented a ProcessorContextResource > gracefully stops the context on beforeCheckpoint and then > re-initialized it on afterRestore. > > > > When the checkpoint is performed, CRaC calls the > ProcessorContextResource.beforeCheckpoint and also preserves > the current Processor call stack. On Restore processor's call > stack is expectedly restored at the point it was stopped but > unfortunately it doesn't wait for the > ProcessorContextResource.afterRestore complete. This > expectedly crashes the processor. > > > > The https://github.com/tzolov/crac-demo illustreates this > issue. The README explains how to reproduce the issue. The > OUTPUT.md > (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) > offers terminal snapshots of the observed behavior. > > > > I've used latest JDK CRaC release: > >? ? openjdk 17-crac 2021-09-14 > >? ? OpenJDK Runtime Environment (build 17-crac+5-19) > >? ? OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, > sharing) > > > > As I'm new to CRaC, I'd appreciate your thoughts on this issue. > > > > Cheers, > > Christian > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Thu Apr 6 14:30:25 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 6 Apr 2023 10:30:25 -0400 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: On Thu, Apr 6, 2023 at 9:59?AM Christian Tzolov wrote: > Hi Dan and Radim, > > > Thanks for the feedback and suggestions! > > It is the first time I?m facing the java.lang.invoke.* API and it might > take some time to wrap my head around it. > > So be prepared plese for lame questions, as those inlined below. > > > On Wed, Apr 5, 2023 at 4:28?PM Dan Heidinga wrote: > >> Hi Radim, >> >> Thanks for the write up of the various options in this space. >> >> On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa wrote: >> >>> Hi Christian, >>> >>> I believe this is a common problem when porting existing architecture >>> under CRaC; the obvious solution is to guard access to the resource >>> (ProcessorContext in this case) with a RW lock that'd be read-acquired >>> by 'regular' access and acquired for write in beforeCheckpoint/released >>> in afterRestore. However this introduces extra synchronization (at least >>> in form of volatile writes) even in case that C/R is not used at all, >>> especially if the support is added into libraries. >>> >> >> I've seen variations of this approach go by in code reviews but have we >> written up a good example of how to do this well? Having a >> canonical pattern would help to highlight the best way to do it today and >> make the tradeoffs explicit. >> > > @Radim, your ?guard access? suggestion made me realise that perhaps I?ve > oversimplified my sample. > > So I?ve modified it a bit: > https://github.com/tzolov/crac-demo/blob/main/src/main/java/com/example/crac/CrackDemoExt.java > by introducing a new ProcessorState used by the Processor for its > computation. > At the same time I?ve removed the direct Processor dependency on the > ProcessorContext. Instead the ProcessorContext is responsible for managing > the lifecycle of the ProcessorState before the Processor can use it. > Then given your original suggestion is it right to assume that the ?guard > access to the resource? now should guard the ProcessorState not the > ProcessorContext? > I think the example is still too simple as there is no state being protected. Typically, the beforeCheckpoint/afterRestore methods are used to modify the state of a Class so that the class's invariants continue to hold across the restore. This often (though not always) has to do with the external environment - if the application has captured a view of the environment (particular ports, # of cpus, env vars, etc) and made decisions based on that view, then after restore, that view needs to be updated. The lifecycle works by giving the application an opportunity prior to checkpoint to stop using the old state. It also gives an application an opportunity to update that state after restore. Those are the {beforeChecpoint, afterRestore} apis on Resource. This produces an indeterminate length of time from the start of the checkpoint (and first call to beforeCheckpoint) thru to the completion of the last afterRestore call. During this period, threads may see the original value, the beforeCheckpoint updated value, the afterRestore updated value, or some combination of all three depending on timing and thread scheduling. > And if this is true then how one would be able to identify all possible > ?resources? to be guarded? > That's the million dollar question. The answer so far has been code inspection or trial-and-error. And the answer of which "resources" depends a bit on the use case - the set of resources for a desktop application that will be checkpointed/restored on the same machine may be very different than a server application that will be spread across a K8 cluster or a different set of Lambda endpoints. > > > >> Anton Kozlov proposed techniques like RCU [1] but at this point there's >>> no support for this in Java. Even the Linux implementation might require >>> some additional properties from the code in critical (read) section like >>> not calling any blocking code; this might be too limiting. >>> >>> The situation is simpler if the application uses a single threaded >>> event-loop; beforeCheckpoint can enqueue a task that would, upon its >>> execution, block on a primitive and notify the C/R notification thread >>> that it may now deinit the resource; in afterRestore the resource is >>> initialized and the eventloop is unblocked. This way we don't impose any >>> extra overhead when C/R is happening. >>> >> >> That's a nice idea! >> >> >>> >>> To avoid extra synchronization it could be technically possible to >>> modify CRaC implementation to keep all other threads frozen during >>> restore. There's a risk of some form of deadlock if the thread >>> performing C/R would require other threads to progress, though, so any >>> such solution would require extra thoughts. Besides, this does not >>> guarantee exclusivity so the afterRestore would need to restore the >>> resource to the *exactly* same state (as some of its before-checkpoint >>> state might have leaked to the thread in Processor). In my opinion this >>> is not the best way. >>> >> >> This is the approach that OpenJ9 took to solve the consistency problems >> introduced by updating resources before / after checkpoints. OpenJ9 enters >> "single threaded mode" when creating the checkpoint and executing the >> before checkkpoint fixups. On restore, it continues in single-threaded >> mode while executing the after checkpoint fixups. This makes it easier to >> avoid additional runtime costs related to per-resource locking for >> checkpoints, but complicates locking and wait/notify in general. >> >> This means a checkpoint hook operation can't wait on another thread >> (would block indefinitely as other threads are paused), can't wait on a >> lock being held by another thread (again, would deadlock), and sending >> notify may result in inconsistent behaviour (wrong number of notifies >> received by other threads). See "The checkpointJVM() API" section of their >> blog post on CRIU for more details [0]. >> > > The "single thread mode", imo, corresponds to the "serializable isolation" > approach in data processing and DB transactions. The OpenJ9 blogs are very > informative and like the jdk invoke API would need time to digest. > But I have one conceptual question. What part of this should/cloud be > implemented by the CRaC inself and what abstractions should be exposed to > the CRaC users? > If CRaC were to adopt the single-threaded mode, then almost all of the work for that would be in the CRaC project (ie: Hotspot) itself. Users would only need to be sure their before/after checkpoint methods were "safe" to run. > > >>> The problem with RCU is tracking which threads are in the critical >>> section. I've found RCU-like implementations for Java that avoid >>> excessive overhead using a spread out array - each thread marks >>> entering/leaving the critical section by writes to its own counter, >>> preventing cache ping-pong (assuming no false sharing). Synchronizer >>> thread uses another flag to request synchronization; reading this by >>> each thread is not totally without cost but reasonably cheap, and in >>> that case worker threads can enter a blocking slow path. The simple >>> implementation assumes a fixed number of threads; if the list of threads >>> is dynamic the solution would be probably more complicated. It might >>> also make sense to implement this in native code with a per-CPU >>> counters, rather than per-thread. A downside, besides some overhead in >>> terms of both cycles and memory usage, is that we'd need to modify the >>> code and explicitly mark the critical sections. >>> >>> Another solution could try to leverage existing JVM mechanics for code >>> deoptimization, replacing the critical sections with a slower, blocking >>> stub, and reverting back after restore. Or even independently requesting >>> a safe-point and inspecting stack of threads until the synchronization >>> is possible. >>> >> >> This will have a high risk of livelock. The OpenJ9 experience >> implementing single-threaded mode for CRIU indicates there are a lot of >> strange locking patterns in the world. >> >> >>> >>> So I probably can't offer a ready-to-use performant solution; pick your >>> poison. The future, though, offers a few possibilities and I'd love to >>> hear others' opinions about which one would look the most feasible. >>> Because unless we offer something that does not harm a no-CRaC use-case >>> I am afraid that the adoption will be quite limited. >>> >> >> Successful solutions will push the costs into the checkpoint / restore >> paths as much as possible. Going back to the explicit lock mechanism you >> first mentioned, I wonder if there's a role for >> java.lang.invoke.Switchpoint [1] here? Switchpoint was added as a tool for >> language implementers that wanted to be able speculate on a particular >> condition (ie: CHA assumptions) and get the same kind of low cost state >> change that existing JITTED code gets. I'm not sure how well that vision >> worked in practice or how well Hotspot optimizes it yet, but this might be >> a reason to push on its performance. >> >> Roughly the idea would be to add a couple of Switchpoints to >> jdk.crac.Core: >> >> public SwitchPoint getBeforeSwitchpoint(); >> public SwitchPoint getAfterSwitchpoint(); >> >> and users could then write their code using MethodHandles to implementing >> the branching logic: >> >> MethodHandle normalPath = ...... // existing code >> MethodHandle fallbackPath = ..... // before Checkpoint extra work >> MethodHandle guardWithTest = >> getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); >> >> and the jdk.crac.Core class would invalidate the "before" SwitchPoint >> prior to the checkpoint and "after" one after the restore. Aside from the >> painful programming model, this might give us the tools we need to make it >> performant. >> > > @Dan, this is very interesting! > Could you please elaborate a bit further. Perhaps in the context of the > CrackDemoExt.java sample? > Let me think on that. I'll see if I can pull something together that shows the api use. --Dan > > >> >> Needs more exploration and prototyping but would provide a potential path >> to reasonable performance by burying the extra locking in the fallback >> paths. And it would be a single pattern to optimize, rather than all the >> variations users could produce. >> --Dan >> [0] >> https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ >> [1] >> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html >> > > Thank you, > - Christian > > > >> >>> Cheers, >>> >>> Radim >>> >>> [1] https://en.wikipedia.org/wiki/Read-copy-update >>> >>> On 03. 04. 23 22:30, Christian Tzolov wrote: >>> > Hi, I'm testing CRaC in the context of long-running applications (e.g. >>> streaming, continuous processing ...) and I've stumbled on an issue related >>> to the coordination of the resolved threads. >>> > >>> > For example, let's have a Processor that performs continuous >>> computations. This processor depends on a ProcessorContext and later must >>> be fully initialized before the processor can process any data. >>> > >>> > When the application is first started (e.g. not from checkpoints) it >>> ensures that the ProcessorContext is initialized before starting the >>> Processor loop. >>> > >>> > To leverage CRaC I've implemented a ProcessorContextResource >>> gracefully stops the context on beforeCheckpoint and then re-initialized it >>> on afterRestore. >>> > >>> > When the checkpoint is performed, CRaC calls the >>> ProcessorContextResource.beforeCheckpoint and also preserves the current >>> Processor call stack. On Restore processor's call stack is expectedly >>> restored at the point it was stopped but unfortunately it doesn't wait for >>> the ProcessorContextResource.afterRestore complete. This expectedly crashes >>> the processor. >>> > >>> > The https://github.com/tzolov/crac-demo illustreates this issue. The >>> README explains how to reproduce the issue. The OUTPUT.md ( >>> https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers >>> terminal snapshots of the observed behavior. >>> > >>> > I've used latest JDK CRaC release: >>> > openjdk 17-crac 2021-09-14 >>> > OpenJDK Runtime Environment (build 17-crac+5-19) >>> > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) >>> > >>> > As I'm new to CRaC, I'd appreciate your thoughts on this issue. >>> > >>> > Cheers, >>> > Christian >>> > >>> > >>> > >>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmehra at redhat.com Thu Apr 6 15:03:42 2023 From: asmehra at redhat.com (Ashutosh Mehra) Date: Thu, 6 Apr 2023 11:03:42 -0400 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: Another aspect where OpenJ9 differs from CRaC is that the latter allows the user to take a checkpoint at random point in time using jcmd. I don't think OpenJ9 supports that. With the jcmd approach, it is almost impossible for the user to envisage the coordination needed between different entities in the system. Compare this to the case where the user has to explicitly call a checkpoint API in the code and knows to a large extent what the state of the system is and what actions it can take to maintain the integrity of the system after restore. I think using checkpoint API makes it easier for the user by making the problem of coordination more manageable. - Ashutosh Mehra On Thu, Apr 6, 2023 at 10:31?AM Dan Heidinga wrote: > > > On Thu, Apr 6, 2023 at 9:59?AM Christian Tzolov < > christian.tzolov at gmail.com> wrote: > >> Hi Dan and Radim, >> >> >> Thanks for the feedback and suggestions! >> >> It is the first time I?m facing the java.lang.invoke.* API and it might >> take some time to wrap my head around it. >> >> So be prepared plese for lame questions, as those inlined below. >> >> >> On Wed, Apr 5, 2023 at 4:28?PM Dan Heidinga wrote: >> >>> Hi Radim, >>> >>> Thanks for the write up of the various options in this space. >>> >>> On Tue, Apr 4, 2023 at 2:49?AM Radim Vansa wrote: >>> >>>> Hi Christian, >>>> >>>> I believe this is a common problem when porting existing architecture >>>> under CRaC; the obvious solution is to guard access to the resource >>>> (ProcessorContext in this case) with a RW lock that'd be read-acquired >>>> by 'regular' access and acquired for write in beforeCheckpoint/released >>>> in afterRestore. However this introduces extra synchronization (at >>>> least >>>> in form of volatile writes) even in case that C/R is not used at all, >>>> especially if the support is added into libraries. >>>> >>> >>> I've seen variations of this approach go by in code reviews but have we >>> written up a good example of how to do this well? Having a >>> canonical pattern would help to highlight the best way to do it today and >>> make the tradeoffs explicit. >>> >> >> @Radim, your ?guard access? suggestion made me realise that perhaps I?ve >> oversimplified my sample. >> >> So I?ve modified it a bit: >> https://github.com/tzolov/crac-demo/blob/main/src/main/java/com/example/crac/CrackDemoExt.java >> by introducing a new ProcessorState used by the Processor for its >> computation. >> At the same time I?ve removed the direct Processor dependency on the >> ProcessorContext. Instead the ProcessorContext is responsible for managing >> the lifecycle of the ProcessorState before the Processor can use it. >> Then given your original suggestion is it right to assume that the ?guard >> access to the resource? now should guard the ProcessorState not the >> ProcessorContext? >> > > I think the example is still too simple as there is no state being > protected. Typically, the beforeCheckpoint/afterRestore methods are used > to modify the state of a Class so that the class's invariants continue to > hold across the restore. This often (though not always) has to do with the > external environment - if the application has captured a view of the > environment (particular ports, # of cpus, env vars, etc) and made decisions > based on that view, then after restore, that view needs to be updated. > > The lifecycle works by giving the application an opportunity prior to > checkpoint to stop using the old state. It also gives an application an > opportunity to update that state after restore. Those are the > {beforeChecpoint, afterRestore} apis on Resource. This produces an > indeterminate length of time from the start of the checkpoint (and first > call to beforeCheckpoint) thru to the completion of the last afterRestore > call. During this period, threads may see the original value, the > beforeCheckpoint updated value, the afterRestore updated value, or some > combination of all three depending on timing and thread scheduling. > > >> And if this is true then how one would be able to identify all possible >> ?resources? to be guarded? >> > > That's the million dollar question. The answer so far has been code > inspection or trial-and-error. And the answer of which "resources" depends > a bit on the use case - the set of resources for a desktop application that > will be checkpointed/restored on the same machine may be very different > than a server application that will be spread across a K8 cluster or a > different set of Lambda endpoints. > > >> >> >> >>> Anton Kozlov proposed techniques like RCU [1] but at this point there's >>>> no support for this in Java. Even the Linux implementation might >>>> require >>>> some additional properties from the code in critical (read) section >>>> like >>>> not calling any blocking code; this might be too limiting. >>>> >>>> The situation is simpler if the application uses a single threaded >>>> event-loop; beforeCheckpoint can enqueue a task that would, upon its >>>> execution, block on a primitive and notify the C/R notification thread >>>> that it may now deinit the resource; in afterRestore the resource is >>>> initialized and the eventloop is unblocked. This way we don't impose >>>> any >>>> extra overhead when C/R is happening. >>>> >>> >>> That's a nice idea! >>> >>> >>>> >>>> To avoid extra synchronization it could be technically possible to >>>> modify CRaC implementation to keep all other threads frozen during >>>> restore. There's a risk of some form of deadlock if the thread >>>> performing C/R would require other threads to progress, though, so any >>>> such solution would require extra thoughts. Besides, this does not >>>> guarantee exclusivity so the afterRestore would need to restore the >>>> resource to the *exactly* same state (as some of its before-checkpoint >>>> state might have leaked to the thread in Processor). In my opinion this >>>> is not the best way. >>>> >>> >>> This is the approach that OpenJ9 took to solve the consistency problems >>> introduced by updating resources before / after checkpoints. OpenJ9 enters >>> "single threaded mode" when creating the checkpoint and executing the >>> before checkkpoint fixups. On restore, it continues in single-threaded >>> mode while executing the after checkpoint fixups. This makes it easier to >>> avoid additional runtime costs related to per-resource locking for >>> checkpoints, but complicates locking and wait/notify in general. >>> >>> This means a checkpoint hook operation can't wait on another thread >>> (would block indefinitely as other threads are paused), can't wait on a >>> lock being held by another thread (again, would deadlock), and sending >>> notify may result in inconsistent behaviour (wrong number of notifies >>> received by other threads). See "The checkpointJVM() API" section of their >>> blog post on CRIU for more details [0]. >>> >> >> The "single thread mode", imo, corresponds to the >> "serializable isolation" approach in data processing and DB transactions. >> The OpenJ9 blogs are very informative and like the jdk invoke API would >> need time to digest. >> But I have one conceptual question. What part of this should/cloud be >> implemented by the CRaC inself and what abstractions should be exposed to >> the CRaC users? >> > > If CRaC were to adopt the single-threaded mode, then almost all of the > work for that would be in the CRaC project (ie: Hotspot) itself. Users > would only need to be sure their before/after checkpoint methods were > "safe" to run. > > >> >> >>>> The problem with RCU is tracking which threads are in the critical >>>> section. I've found RCU-like implementations for Java that avoid >>>> excessive overhead using a spread out array - each thread marks >>>> entering/leaving the critical section by writes to its own counter, >>>> preventing cache ping-pong (assuming no false sharing). Synchronizer >>>> thread uses another flag to request synchronization; reading this by >>>> each thread is not totally without cost but reasonably cheap, and in >>>> that case worker threads can enter a blocking slow path. The simple >>>> implementation assumes a fixed number of threads; if the list of >>>> threads >>>> is dynamic the solution would be probably more complicated. It might >>>> also make sense to implement this in native code with a per-CPU >>>> counters, rather than per-thread. A downside, besides some overhead in >>>> terms of both cycles and memory usage, is that we'd need to modify the >>>> code and explicitly mark the critical sections. >>>> >>>> Another solution could try to leverage existing JVM mechanics for code >>>> deoptimization, replacing the critical sections with a slower, blocking >>>> stub, and reverting back after restore. Or even independently >>>> requesting >>>> a safe-point and inspecting stack of threads until the synchronization >>>> is possible. >>>> >>> >>> This will have a high risk of livelock. The OpenJ9 experience >>> implementing single-threaded mode for CRIU indicates there are a lot of >>> strange locking patterns in the world. >>> >>> >>>> >>>> So I probably can't offer a ready-to-use performant solution; pick your >>>> poison. The future, though, offers a few possibilities and I'd love to >>>> hear others' opinions about which one would look the most feasible. >>>> Because unless we offer something that does not harm a no-CRaC use-case >>>> I am afraid that the adoption will be quite limited. >>>> >>> >>> Successful solutions will push the costs into the checkpoint / restore >>> paths as much as possible. Going back to the explicit lock mechanism you >>> first mentioned, I wonder if there's a role for >>> java.lang.invoke.Switchpoint [1] here? Switchpoint was added as a tool for >>> language implementers that wanted to be able speculate on a particular >>> condition (ie: CHA assumptions) and get the same kind of low cost state >>> change that existing JITTED code gets. I'm not sure how well that vision >>> worked in practice or how well Hotspot optimizes it yet, but this might be >>> a reason to push on its performance. >>> >>> Roughly the idea would be to add a couple of Switchpoints to >>> jdk.crac.Core: >>> >>> public SwitchPoint getBeforeSwitchpoint(); >>> public SwitchPoint getAfterSwitchpoint(); >>> >>> and users could then write their code using MethodHandles to >>> implementing the branching logic: >>> >>> MethodHandle normalPath = ...... // existing code >>> MethodHandle fallbackPath = ..... // before Checkpoint extra work >>> MethodHandle guardWithTest = >>> getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath); >>> >>> and the jdk.crac.Core class would invalidate the "before" SwitchPoint >>> prior to the checkpoint and "after" one after the restore. Aside from the >>> painful programming model, this might give us the tools we need to make it >>> performant. >>> >> >> @Dan, this is very interesting! >> Could you please elaborate a bit further. Perhaps in the context of the >> CrackDemoExt.java sample? >> > > Let me think on that. I'll see if I can pull something together that > shows the api use. > > --Dan > > >> >> >>> >>> Needs more exploration and prototyping but would provide a potential >>> path to reasonable performance by burying the extra locking in the fallback >>> paths. And it would be a single pattern to optimize, rather than all the >>> variations users could produce. >>> --Dan >>> [0] >>> https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ >>> [1] >>> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html >>> >> >> Thank you, >> - Christian >> >> >> >>> >>>> Cheers, >>>> >>>> Radim >>>> >>>> [1] https://en.wikipedia.org/wiki/Read-copy-update >>>> >>>> On 03. 04. 23 22:30, Christian Tzolov wrote: >>>> > Hi, I'm testing CRaC in the context of long-running applications >>>> (e.g. streaming, continuous processing ...) and I've stumbled on an issue >>>> related to the coordination of the resolved threads. >>>> > >>>> > For example, let's have a Processor that performs continuous >>>> computations. This processor depends on a ProcessorContext and later must >>>> be fully initialized before the processor can process any data. >>>> > >>>> > When the application is first started (e.g. not from checkpoints) it >>>> ensures that the ProcessorContext is initialized before starting the >>>> Processor loop. >>>> > >>>> > To leverage CRaC I've implemented a ProcessorContextResource >>>> gracefully stops the context on beforeCheckpoint and then re-initialized it >>>> on afterRestore. >>>> > >>>> > When the checkpoint is performed, CRaC calls the >>>> ProcessorContextResource.beforeCheckpoint and also preserves the current >>>> Processor call stack. On Restore processor's call stack is expectedly >>>> restored at the point it was stopped but unfortunately it doesn't wait for >>>> the ProcessorContextResource.afterRestore complete. This expectedly crashes >>>> the processor. >>>> > >>>> > The https://github.com/tzolov/crac-demo illustreates this issue. The >>>> README explains how to reproduce the issue. The OUTPUT.md ( >>>> https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers >>>> terminal snapshots of the observed behavior. >>>> > >>>> > I've used latest JDK CRaC release: >>>> > openjdk 17-crac 2021-09-14 >>>> > OpenJDK Runtime Environment (build 17-crac+5-19) >>>> > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) >>>> > >>>> > As I'm new to CRaC, I'd appreciate your thoughts on this issue. >>>> > >>>> > Cheers, >>>> > Christian >>>> > >>>> > >>>> > >>>> > >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Wed Apr 12 09:15:26 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 09:15:26 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking Message-ID: This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. ------------- Commit messages: - RCU Lock - RW lock with very lightweight read- and heavyweight write-locking Changes: https://git.openjdk.org/crac/pull/58/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=58&range=00 Stats: 623 lines in 7 files changed: 622 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From duke at openjdk.org Wed Apr 12 09:15:27 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 09:15:27 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 08:57:48 GMT, Radim Vansa wrote: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. After implementing this I realized that it could be possible to implement the check using `Thread.getAllStackTraces()` and process the stacks in Java; that's even heavier, though. I will yet explore extending implementation with more dynamic list of methods; such implementation could let us use only one instance of RCULock in the VM. ------------- PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1504920463 From heidinga at openjdk.org Wed Apr 12 12:19:06 2023 From: heidinga at openjdk.org (Dan Heidinga) Date: Wed, 12 Apr 2023 12:19:06 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 08:57:48 GMT, Radim Vansa wrote: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. src/java.base/share/classes/jdk/crac/RCULock.java line 136: > 134: MethodHandle noop = MethodHandles.lookup().findSpecial(RCULock.class, "noop", voidType, RCULock.class); > 135: MethodHandle readLockImpl = MethodHandles.lookup().findSpecial(RCULock.class, "readLockImpl", voidType, RCULock.class); > 136: MethodHandle readUnlockImpl = MethodHandles.lookup().findSpecial(RCULock.class, "readUnlockImpl", voidType, RCULock.class); Creating a Lookup object is expensive. Better to cache in a local Suggestion: MethodHandles.Lookup lookup = MethodHandles.lookup(); MethodHandle noop = lookup.findSpecial(RCULock.class, "noop", voidType, RCULock.class); MethodHandle readLockImpl = lookup.findSpecial(RCULock.class, "readLockImpl", voidType, RCULock.class); MethodHandle readUnlockImpl = lookup.findSpecial(RCULock.class, "readUnlockImpl", voidType, RCULock.class); ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1164049509 From duke at openjdk.org Wed Apr 12 12:27:14 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 12:27:14 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v2] In-Reply-To: References: Message-ID: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Reuse MethodHandles.lookup() Co-authored-by: Dan Heidinga ------------- Changes: - all: https://git.openjdk.org/crac/pull/58/files - new: https://git.openjdk.org/crac/pull/58/files/37fa1aa6..98732ab9 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=58&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=58&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From heidinga at openjdk.org Wed Apr 12 12:27:16 2023 From: heidinga at openjdk.org (Dan Heidinga) Date: Wed, 12 Apr 2023 12:27:16 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v2] In-Reply-To: References: Message-ID: <8HsVuh4a66wiZojrfhbahTV4Q10BnDWrS6pYeBpzMeg=.8f0b4123-f8c0-4752-8af1-d8d1f2cabff4@github.com> On Wed, 12 Apr 2023 12:22:37 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Reuse MethodHandles.lookup() > > Co-authored-by: Dan Heidinga src/java.base/share/native/libjava/RCULock.c line 43: > 41: { > 42: readerThreadsListField = (*env)->GetFieldID(env, cls, "readerThreadsList", "J"); > 43: readCriticalMethodsField = (*env)->GetFieldID(env, cls, "readCriticalMethods", "J"); Looking up fields correctly in JNI is a pain to get the error handling correct. I prefer to do the lookup reflectively and then pass the j.l.reflect.Field objects into the JNI method and convert them using `ToReflectField` [0] It moves all the tricky error handling into Java and makes for obviously correct code. If we keep the existing code, we need to null check the `readerThreadsListField` value before attempting a second JNI call as we can't call in with an exception pending. [0] https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html#from_reflected_field ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1164056344 From heidinga at openjdk.org Wed Apr 12 12:30:04 2023 From: heidinga at openjdk.org (Dan Heidinga) Date: Wed, 12 Apr 2023 12:30:04 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v2] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 12:27:14 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Reuse MethodHandles.lookup() > > Co-authored-by: Dan Heidinga src/java.base/share/native/libjava/RCULock.c line 70: > 68: } > 69: for (int i = 0; i < num; ++i) { > 70: jobject el = (*env)->GetObjectArrayElement(env, methods, i); I think for correctness, this needs to be followed by an exception check if ((*env)->ExceptionOccurred(env)) { free_up_to(c_methods, i); // exception already pending return; } ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1164062255 From duke at openjdk.org Wed Apr 12 13:10:03 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 13:10:03 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v3] In-Reply-To: References: Message-ID: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add Core.defaultLock() * Add RCU lock instance that's registered as JDKResource for synchronization around checkpoint. * Allow amend the list of read-critical methods. * Use binary search when searching the list of read-critical methods. ------------- Changes: - all: https://git.openjdk.org/crac/pull/58/files - new: https://git.openjdk.org/crac/pull/58/files/98732ab9..6dacbc11 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=58&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=58&range=01-02 Stats: 191 lines in 5 files changed: 160 ins; 9 del; 22 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From duke at openjdk.org Wed Apr 12 13:11:07 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 13:11:07 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v2] In-Reply-To: References: Message-ID: <6EU4octlZwaii8xFaMMd8-WkfoMFYcP_1DwIUtqiliY=.981c62a4-777e-4971-8586-9022b0f4a2ce@github.com> On Wed, 12 Apr 2023 12:27:14 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Reuse MethodHandles.lookup() > > Co-authored-by: Dan Heidinga @DanHeidinga Thanks for those hints on error handling, I've corrected them in the last commit. I've also made the list of methods mutable, implemented binary search for lookup and created a central lock that other components could easily use. This is not used anywhere, yet; the exact way to consume this in user code is up for discussion. ------------- PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1505250260 From duke at openjdk.org Wed Apr 12 13:21:55 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 13:21:55 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v4] In-Reply-To: References: Message-ID: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Drop some trailing whitespaces ------------- Changes: - all: https://git.openjdk.org/crac/pull/58/files - new: https://git.openjdk.org/crac/pull/58/files/6dacbc11..91cd4451 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=58&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=58&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From duke at openjdk.org Wed Apr 12 13:57:09 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 13:57:09 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v4] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 13:21:55 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Drop some trailing whitespaces Actually, the *big lock* as I've done it doesn't make much sense; usually we'd need to do some cleanup in the resource as well. So it might make more sense to have a full Context guarded by this lock. ------------- PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1505320627 From duke at openjdk.org Wed Apr 12 15:02:05 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 15:02:05 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5] In-Reply-To: References: Message-ID: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add synchronized context ------------- Changes: - all: https://git.openjdk.org/crac/pull/58/files - new: https://git.openjdk.org/crac/pull/58/files/91cd4451..1116ace6 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=58&range=04 - incr: https://webrevs.openjdk.org/?repo=crac&pr=58&range=03-04 Stats: 37 lines in 1 file changed: 19 ins; 9 del; 9 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From duke at openjdk.org Wed Apr 12 15:31:37 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 15:31:37 GMT Subject: [crac] RFR: Harden criuengine cppath reading In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 19:00:38 GMT, Anton Kozlov wrote: > On some older OSes I see a few `fgets error` coming from criuengine restore function. They are intermittent and hard to debug. After replacing libc fopen/fgets invocations with open/read, the problem went away. I'm still not completely sure why the problem with libc file functions exists but I suspect that EINTR is not correctly handled there, or the fact we store a sequence in characters without `\0` or `\n` in the file we read. So I propose using the lower-level interface to read the file. The only objection here could be not closing the file descriptor upon error, but as the process terminates upon error anyway it's a non-issue. ------------- Marked as reviewed by rvansa at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/crac/pull/56#pullrequestreview-1381577462 From duke at openjdk.org Wed Apr 12 15:37:18 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 12 Apr 2023 15:37:18 GMT Subject: [crac] RFR: Backout new API to sync with Reference Handler In-Reply-To: References: Message-ID: On Thu, 10 Nov 2022 15:34:23 GMT, Anton Kozlov wrote: > This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042 and makes #22 obsolete. > > The API introduced in 9cf1995693eead85d3807fb4c83ab38c14e27042 (waitForWaiters) and changed in #22 waits for the state when all discovered references are processed. So WaitForWaiters is used to implement predictable Reference Handling, ensuring that clean-up actions have fired for an object after it becomes unreachable. > > I think that API was a mistake and should be reverted. > > In general, the problem of predictable Reference Handling is independent of CRaC. So I thought about extracting that out of CRaC and found a few issues with the approach. A user needs to know what RefQueue gets References after an object becomes unreachable, to call waitForWaiters on that queue. The queue is not necessarily evident, so a deep understanding of refs and queues in an application is required to select the proper queue to wait on, and to build the right order of them to wait on. Also, it's required somehow to know the number of threads servicing a queue. And there are situations when waitForWaiters may report that all refs are processed, but some of them are not -- consider a thread that is polling a queue and gets refs to be processed but then buffers them in another queue for later, in this example waitForWaiters does not provide the guarantee that corresponding clean-up actions were performed. > > The common and more straightforward way to have predictable clean-up is to call an explicit method like close()/release()/cleanup() that performs object-specific clean-up actions predictably. @AntonKozlov Please rebase. I think that since we don't have a test that would demonstrate any undesired behaviour (the RefQueueTest verifies the functionality but does not really show something that should be fixed) this can be integrated. ------------- PR Comment: https://git.openjdk.org/crac/pull/34#issuecomment-1505485143 From akozlov at openjdk.org Thu Apr 13 13:04:10 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 13:04:10 GMT Subject: [crac] RFR: Harden criuengine cppath reading In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 19:00:38 GMT, Anton Kozlov wrote: > On some older OSes I see a few `fgets error` coming from criuengine restore function. They are intermittent and hard to debug. After replacing libc fopen/fgets invocations with open/read, the problem went away. I'm still not completely sure why the problem with libc file functions exists but I suspect that EINTR is not correctly handled there, or the fact we store a sequence in characters without `\0` or `\n` in the file we read. So I propose using the lower-level interface to read the file. Thanks for review, > The only objection here could be not closing the file descriptor upon error, but as the process terminates upon error anyway it's a non-issue. Exactly, this why we don't close the fd explicitly. ------------- PR Comment: https://git.openjdk.org/crac/pull/56#issuecomment-1506924158 From akozlov at openjdk.org Thu Apr 13 13:04:10 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 13:04:10 GMT Subject: [crac] Integrated: Harden criuengine cppath reading In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 19:00:38 GMT, Anton Kozlov wrote: > On some older OSes I see a few `fgets error` coming from criuengine restore function. They are intermittent and hard to debug. After replacing libc fopen/fgets invocations with open/read, the problem went away. I'm still not completely sure why the problem with libc file functions exists but I suspect that EINTR is not correctly handled there, or the fact we store a sequence in characters without `\0` or `\n` in the file we read. So I propose using the lower-level interface to read the file. This pull request has now been integrated. Changeset: f91de330 Author: Anton Kozlov URL: https://git.openjdk.org/crac/commit/f91de3309aefbd8ff6acdbefe4e80528d0e04d57 Stats: 19 lines in 1 file changed: 11 ins; 2 del; 6 mod Harden criuengine cppath reading ------------- PR: https://git.openjdk.org/crac/pull/56 From akozlov at openjdk.org Thu Apr 13 13:18:09 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 13:18:09 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 15:02:05 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add synchronized context AFAICS this is an optimized version of ReadWriteLock, but it does not implement the j.u.c.l.ReadWriteLock interface. Can it be changed to implement the interface? Also, having this is an optimization, what is the benefit of the new RCULock compared to e.g. a standard j.u.c.l.ReentrantReadWriteLock? A JMH benchmark would be nice. ------------- PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1506946513 From heidinga at redhat.com Thu Apr 13 13:20:41 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 13 Apr 2023 09:20:41 -0400 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: > >> @Dan, this is very interesting! >> Could you please elaborate a bit further. Perhaps in the context of the >> CrackDemoExt.java sample? >> > > Let me think on that. I'll see if I can pull something together that > shows the api use. > I put together a small example showing the use of SwitchPoint to toggle between phases: normal mode, beforeCheckpoint, afterRestore, normal mode. [0] In the CRaCPhase class, there are two methods that take Function arguments that allow the user to provide phase-specific behaviour: * beforeGuard which allows a switching from normal mode to checkpoint mode: https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15 * aroundGuard which allows switching from normal mode to checkpoint mode and back to normal mode: https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28 There's a use of this pattern in the "Test" class [1] which transitions from a regular get to a locked get. The ideas are all there though the code is a little unpleasant to work with due to the exception handling and general complexity of MethodHandles. Radim has an RCU lock that use Switchpoints as well though his API appears to be more pleasant for users: https://github.com/openjdk/crac/pull/58/files [0] https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java [1] https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140 > > --Dan > > >> >> >>> >>> Needs more exploration and prototyping but would provide a potential >>> path to reasonable performance by burying the extra locking in the fallback >>> paths. And it would be a single pattern to optimize, rather than all the >>> variations users could produce. >>> --Dan >>> [0] >>> https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ >>> [1] >>> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html >>> >> >> Thank you, >> - Christian >> >> >> >>> >>>> Cheers, >>>> >>>> Radim >>>> >>>> [1] https://en.wikipedia.org/wiki/Read-copy-update >>>> >>>> On 03. 04. 23 22:30, Christian Tzolov wrote: >>>> > Hi, I'm testing CRaC in the context of long-running applications >>>> (e.g. streaming, continuous processing ...) and I've stumbled on an issue >>>> related to the coordination of the resolved threads. >>>> > >>>> > For example, let's have a Processor that performs continuous >>>> computations. This processor depends on a ProcessorContext and later must >>>> be fully initialized before the processor can process any data. >>>> > >>>> > When the application is first started (e.g. not from checkpoints) it >>>> ensures that the ProcessorContext is initialized before starting the >>>> Processor loop. >>>> > >>>> > To leverage CRaC I've implemented a ProcessorContextResource >>>> gracefully stops the context on beforeCheckpoint and then re-initialized it >>>> on afterRestore. >>>> > >>>> > When the checkpoint is performed, CRaC calls the >>>> ProcessorContextResource.beforeCheckpoint and also preserves the current >>>> Processor call stack. On Restore processor's call stack is expectedly >>>> restored at the point it was stopped but unfortunately it doesn't wait for >>>> the ProcessorContextResource.afterRestore complete. This expectedly crashes >>>> the processor. >>>> > >>>> > The https://github.com/tzolov/crac-demo illustreates this issue. The >>>> README explains how to reproduce the issue. The OUTPUT.md ( >>>> https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers >>>> terminal snapshots of the observed behavior. >>>> > >>>> > I've used latest JDK CRaC release: >>>> > openjdk 17-crac 2021-09-14 >>>> > OpenJDK Runtime Environment (build 17-crac+5-19) >>>> > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) >>>> > >>>> > As I'm new to CRaC, I'd appreciate your thoughts on this issue. >>>> > >>>> > Cheers, >>>> > Christian >>>> > >>>> > >>>> > >>>> > >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From akozlov at openjdk.org Thu Apr 13 13:26:09 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 13:26:09 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 15:02:05 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add synchronized context src/java.base/share/classes/jdk/crac/RCULock.java line 32: > 30: * acquire the read-lock inside the critical > 31: * section (at the beginning) and release it outside, > 32: * preferrably in the finally block. This assimetry in the interface does not look nice. Likely because the implementation, we can release the lock only outside a method with the critical code. But this leaks too much implementation details to the interface. src/java.base/share/classes/jdk/crac/RCULock.java line 141: > 139: * @param readCriticalMethods List of signatures for methods invoked in the read-critical section. > 140: */ > 141: public RCULock(String[] readCriticalMethods) { It's a big question, why (from the interface point of view) the lock should know methods in which it can be used. The second tier question, why the method names are String[] and not Method[] at least. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1165507583 PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1165512496 From duke at openjdk.org Thu Apr 13 13:36:17 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 13 Apr 2023 13:36:17 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5] In-Reply-To: References: Message-ID: <23p2OH9-eV-KkzeLHUmQ5aVTz0jOwV9yli0dSgCO08A=.018b61ab-0439-4534-8a0d-7eb9acf696a7@github.com> On Thu, 13 Apr 2023 13:20:20 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add synchronized context > > src/java.base/share/classes/jdk/crac/RCULock.java line 141: > >> 139: * @param readCriticalMethods List of signatures for methods invoked in the read-critical section. >> 140: */ >> 141: public RCULock(String[] readCriticalMethods) { > > It's a big question, why (from the interface point of view) the lock should know methods in which it can be used. > > The second tier question, why the method names are String[] and not Method[] at least. There's no point in asking 'why'. Interface is there to specify a contract. You declare the methods, and then the implementation fulfills the contract. And you already know why from the point of implementation. There is another constructor that accepts `Method[]`, being able to specify the signature is just an option if it's more convenient. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1165530405 From rvansa at azul.com Thu Apr 13 14:20:22 2023 From: rvansa at azul.com (Radim Vansa) Date: Thu, 13 Apr 2023 16:20:22 +0200 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: <01071774-4b51-741d-15c8-9bd92d0ae0a3@azul.com> On 13. 04. 23 15:20, Dan Heidinga wrote: > Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > @Dan, this is very interesting! > Could you please elaborate a bit further. Perhaps in the context of the CrackDemoExt.java sample? > > Let me think on that. I'll see if I can pull something together that shows the api use. > > I put together a small example showing the use of SwitchPoint to toggle between phases: normal mode, beforeCheckpoint, afterRestore, normal mode. [0] > > In the CRaCPhase class, there are two methods that take Function arguments that allow the user to provide phase-specific behaviour: > * beforeGuard which allows a switching from normal mode to checkpoint mode:https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15 > > * aroundGuard which allows switching from normal mode to checkpoint mode and back to normal mode:https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28 > > There's a use of this pattern in the "Test" class [1] which transitions from a regular get to a locked get. > > The ideas are all there though the code is a little unpleasant to work with due to the exception handling and general complexity of MethodHandles. > > Radim has an RCU lock that use Switchpoints as well though his API appears to be more pleasant for users:https://github.com/openjdk/crac/pull/58/files I think that it's not only about nicer API; I think that your example does not prevent running Test.getSpecialValueRaw() and resource beforeCheckpoint/afterRestore concurrently - if one of the threads enters the Test.getSpecialValueRaw method there's nothing that would prevent calling beforeCheckpoint(). In other words, you'd need the special single-threaded mode. While I've also used SwitchPoint as you suggested in my PR, can you tell what's the difference between just reading a volatile variable (and deciding based on the value) and using this class? It seems that it's used mostly in scripting support, so I could imagine the utility of generating a compact MethodHandle, but is there really any magic? Radim > > > [0]https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java > [1]https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140 > > > --Dan > > > > Needs more exploration and prototyping but would provide a potential path to reasonable performance by burying the extra locking in the fallback paths. And it would be a single pattern to optimize, rather than all the variations users could produce. > --Dan > [0]https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ > [1]https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > > Thank you, > - Christian > > > > Cheers, > > Radim > > [1]https://en.wikipedia.org/wiki/Read-copy-update > > On 03. 04. 23 22:30, Christian Tzolov wrote: >> Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads. >> >> For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data. >> >> When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop. >> >> To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore. >> >> When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor. >> >> Thehttps://github.com/tzolov/crac-demo illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal snapshots of the observed behavior. >> >> I've used latest JDK CRaC release: >> openjdk 17-crac 2021-09-14 >> OpenJDK Runtime Environment (build 17-crac+5-19) >> OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) >> >> As I'm new to CRaC, I'd appreciate your thoughts on this issue. >> >> Cheers, >> Christian >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Thu Apr 13 15:01:30 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 13 Apr 2023 15:01:30 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5] In-Reply-To: References: Message-ID: <505Go3FAC5W3dZewwLqB0Hg7wJEWEBCFQo5tetiMXH0=.d9651739-a128-4aa6-952a-e171018244be@github.com> On Wed, 12 Apr 2023 15:02:05 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add synchronized context It follows the same principle but its use is not interchangeable (and cannot be made so) - had you replaced existing place that uses `ReadWriteLock` with this one it wouldn't work. I already made a benchmark this includes a noop baseline (`unsync`), and executes the `quick` method as fast as it can in 8 threads, and `slow` method with 10/100 ms think time (single thread): Benchmark (impl) (pause) Mode Cnt Score Error Units SwitchPointBenchmark.g:quick unsync 10 thrpt 5 2420777624.146 ? 249641306.573 ops/s SwitchPointBenchmark.g:slow unsync 10 thrpt 5 99.248 ? 0.264 ops/s SwitchPointBenchmark.g:quick unsync 100 thrpt 5 2244724220.494 ? 328435039.061 ops/s SwitchPointBenchmark.g:slow unsync 100 thrpt 5 9.992 ? 0.002 ops/s SwitchPointBenchmark.g:quick rwlock 10 thrpt 5 4414608.947 ? 1525681.326 ops/s SwitchPointBenchmark.g:slow rwlock 10 thrpt 5 99.191 ? 0.160 ops/s SwitchPointBenchmark.g:quick rwlock 100 thrpt 5 4541641.249 ? 3166622.432 ops/s SwitchPointBenchmark.g:slow rwlock 100 thrpt 5 9.989 ? 0.003 ops/s SwitchPointBenchmark.g:quick rculock 10 thrpt 5 196537498.940 ? 305743615.522 ops/s SwitchPointBenchmark.g:slow rculock 10 thrpt 5 94.168 ? 2.159 ops/s SwitchPointBenchmark.g:quick rculock 100 thrpt 5 772304327.917 ? 28329265.290 ops/s SwitchPointBenchmark.g:slow rculock 100 thrpt 5 9.909 ? 0.025 ops/s In case of 10 ms think time (which is really extremely often) results show more than 20x speedup compared to ReentrantReadWriteLock.readLock().lock()+unlock() combo, and just 10x slowdown vs. noop. With 100 ms think time it's order of magnitude better, > 150x speedup vs. < 3x slowdown. I've also run benchmark with no pause time to see the maximum frequency of synchronization, and it shows about 4.5k syncs/s (it would be less with more threads and longer stacks for sure). Benchmark (impl) (pause) Mode Cnt Score Error Units SwitchPointBenchmark.g:quick rculock 0 thrpt 5 1417151.441 ? 72183.322 ops/s SwitchPointBenchmark.g:slow rculock 0 thrpt 5 4486.629 ? 201.970 ops/s Note that these results use single fork VM and just few short iterations, but it gives some idea about the order of magnitude. ------------- PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1507121658 From duke at openjdk.org Thu Apr 13 15:01:28 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 13 Apr 2023 15:01:28 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v6] In-Reply-To: References: Message-ID: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add forgotten condition.signal() call ------------- Changes: - all: https://git.openjdk.org/crac/pull/58/files - new: https://git.openjdk.org/crac/pull/58/files/1116ace6..984cb5d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=58&range=05 - incr: https://webrevs.openjdk.org/?repo=crac&pr=58&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From duke at openjdk.org Thu Apr 13 15:06:07 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 13 Apr 2023 15:06:07 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 13:16:37 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add synchronized context > > src/java.base/share/classes/jdk/crac/RCULock.java line 32: > >> 30: * acquire the read-lock inside the critical >> 31: * section (at the beginning) and release it outside, >> 32: * preferrably in the finally block. > > This assimetry in the interface does not look nice. Likely because the implementation, we can release the lock only outside a method with the critical code. But this leaks too much implementation details to the interface. It would *almost* work except for interpreted mode; in compiled code the safepoint poll is only at method exit (and in uncounted loops). I couldn't find the specific place in code but I've read that in interpreted mode the safepoint may happen on every bytecode instruction, so we would have to somehow disable this. Sounds too fragile to expect that you could jump straight into the critical section. Alternatively we could try to do flow analysis on the bytecode and check where's the critical section, but that's also too much work just to make it look a tad better. This type of lock is intended for very specific use anyway. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1165670679 From akozlov at openjdk.org Thu Apr 13 15:12:12 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 15:12:12 GMT Subject: [crac] RFR: Support repeated checkpoint and restore operations In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 11:51:31 GMT, Radim Vansa wrote: > * VM option CRaCCheckpointTo is recognized when restoring the application (destination can be changed) > * The main problem for checkpoint after restore was old checkpoint image mmapped to files (CRaC-specific CRIU optimization for faster boot). Before performing checkpoint we transparently swap this with memory using anonymous mapping. src/hotspot/os/linux/os_linux.cpp line 392: > 390: next_checkpoint = ""; > 391: } > 392: return write_check_error(fd, next_checkpoint, strlen(next_checkpoint) + 1); As CRaCCheckpointTo is not the only option that may require this, we probably want a more generic approach. Like a new option attribute (RESTORE_UPDATEABLE?) https://github.com/openjdk/crac/blob/master/src/hotspot/share/runtime/globals.hpp#L57 The implementation should check only UPDATEABLE options are provided along -XX:CRaCRestoreFrom. This be better done in a separate PR. src/hotspot/os/linux/os_linux.cpp line 6383: > 6381: bool ok = !_dry_run; > 6382: > 6383: remap_old_imagedir(); VM was not bothered the way CREngine saved the memory content. The mmaping is an implementation detail of the CR mechnism. Have you considered switching off the mmaping in CRIU in this repeated checkpoint-restore sequence? Assuming we would be able communicate that to CREngine (in CRIU mmaping is an option). Semantically, this patch propopses to handle a mapping twice, once in CRIU with mmaping and another time in the VM. There are some benefits of doing everything in the VM and having better control over the process. So it would be cleaner to do a practically big part of the memory management in the VM and leaving bootstraping only to the CRIU. src/hotspot/os/linux/os_linux.cpp line 6707: > 6705: } > 6706: > 6707: // Since putenv does not do its own copy of the strings we need to keep What is the point of these putenv changes? src/java.base/unix/native/criuengine/criuengine.c line 304: > 302: if (WIFEXITED(status)) { > 303: return WEXITSTATUS(status); > 304: } else if (WIFSIGNALED(status)) { WIFSIGNALED is handled on line 306 (310) below. Something looks unnecessary here or there. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1165535817 PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1165677722 PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1165527689 PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1165525555 From akozlov at openjdk.org Thu Apr 13 15:31:49 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 15:31:49 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: On Thu, 6 Apr 2023 13:26:32 GMT, Radim Vansa wrote: >> Oh wait a sec, you're partially right - since we always use javaTimeNanos() if the offset calculated after the first restore wouldn't be zero, we wouldn't have this right. I should zero the offset before calculating it again. Too bad I can't create a test for that yet. > > Fixed the problem above; I did a (unpublished) merge with #57 and wrote a test to validate the behaviour. Can contribute it after #57 gets merged (possibly in another PR). My concern here about this code. This code assumes multiple cycles possible, and does something in not straightforward way. I think we'd better streamline this code (assuming repeated cycles, or not). Should not we just record checkpoint millis and nanos unconditionally? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1165686639 From akozlov at openjdk.org Thu Apr 13 15:31:49 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 15:31:49 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: On Thu, 13 Apr 2023 15:15:38 GMT, Anton Kozlov wrote: >> Fixed the problem above; I did a (unpublished) merge with #57 and wrote a test to validate the behaviour. Can contribute it after #57 gets merged (possibly in another PR). > > My concern here about this code. This code assumes multiple cycles possible, and does something in not straightforward way. I think we'd better streamline this code (assuming repeated cycles, or not). > > Should not we just record checkpoint millis and nanos unconditionally? > It estabilishes relation between real time and monotonic time, and it's sufficient to do that just once I don't belive this is true. Real time can be squezed and extended https://man7.org/linux/man-pages/man3/adjtime.3.html. So it won't be correct to apply a difference in real time to the monotonic clock to calculate the value of monotonic clock in another point of time. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1165691228 From akozlov at openjdk.org Thu Apr 13 15:31:53 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 15:31:53 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: On Thu, 30 Mar 2023 14:34:20 GMT, Radim Vansa wrote: > Therefore things snapshot might seem to take no time, but System.nanoTime() is still monotonic. That's true and that is my concern. Before this change it was possible to measure how much time we've spent in checkpoint. My comment was that if we change the code as proposed, we'll deliberately provide inaccurate measurements based on realtime, not on the monotinic clock. Doing the adjustements is worse than doing nothing when restoring in the same environment. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1165699208 From akozlov at openjdk.org Thu Apr 13 15:49:11 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 13 Apr 2023 15:49:11 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v2] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: On Thu, 30 Mar 2023 12:59:37 GMT, Ashutosh Mehra wrote: > Since you mention it was a bug in kernel or criu and it has been almost 3 years since your commit, may be it is worth enabling the criu changes again to see if the timedwait problem still exists, unless you have already done that. AFAIK the bug is fixed, but I see no point of relying on OS here. Is there one? Timens that is not changed by CRIU provides correct values for our nanoTime() [1]. > The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine [1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/System.html#nanoTime() ------------- PR Comment: https://git.openjdk.org/crac/pull/53#issuecomment-1507198101 From rmarchenko at openjdk.org Thu Apr 13 16:16:12 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Thu, 13 Apr 2023 16:16:12 GMT Subject: [crac] Withdrawn: RestoreEnvironmentTest refactoring In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 07:23:11 GMT, Roman Marchenko wrote: > The test was extended with the example to illustrate the scenario when an user don't like to propagate some of environment variables into a restored process (see `RESTORE_ENVIRONMENT_TEST_VAR2` in `RestoreEnvironmentTest.sh`). See the initial discussion here #30 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/crac/pull/42 From duke at openjdk.org Fri Apr 14 07:13:09 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 14 Apr 2023 07:13:09 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: On Thu, 13 Apr 2023 15:19:12 GMT, Anton Kozlov wrote: >> My concern here about this code. This code assumes multiple cycles possible, and does something in not straightforward way. I think we'd better streamline this code (assuming repeated cycles, or not). >> >> Should not we just record checkpoint millis and nanos unconditionally? > >> It estabilishes relation between real time and monotonic time, and it's sufficient to do that just once > > I don't belive this is true. Real time can be squezed and extended https://man7.org/linux/man-pages/man3/adjtime.3.html. So it won't be correct to apply a difference in real time to the monotonic clock to calculate the value of monotonic clock in another point of time. Alright then, changing wall clock time might be of concern, and I realized it should work even when this is set every time. I'll update it. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1166383397 From duke at openjdk.org Fri Apr 14 07:23:05 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 14 Apr 2023 07:23:05 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v3] In-Reply-To: References: <-i0uB8ZW7r54hoKQJ_wODUXNKVkOI5rH7SJTEhSHiDw=.75ebe53a-9081-40c6-911f-048b17e8850e@github.com> Message-ID: <0OdOj43fLqFTWrOb43klHnmDqL_wzykofZm4qbbdzZU=.485bf2b0-4493-47f9-a8a8-79cdfb02eeda@github.com> On Thu, 13 Apr 2023 15:25:17 GMT, Anton Kozlov wrote: >> If the monotonic time on the machine advanced by X, the offset can't be lower than -X (as the millis part is always positive) and therefore any difference between times read before and after will be at least 0. Therefore things snapshot might seem to take no time, but System.nanoTime() is still monotonic. > >> Therefore things snapshot might seem to take no time, but System.nanoTime() is still monotonic. > > That's true and that is my concern. Before this change it was possible to measure how much time we've spent in checkpoint. My comment was that if we change the code as proposed, we'll deliberately provide inaccurate measurements based on realtime, not on the monotinic clock. Doing the adjustements is worse than doing nothing when restoring in the same environment. Looks like there's `/proc/sys/kernel/random/boot_id` so we can set the offset only when the boot ID changes. Since monotonic time is ~time since boot, too, this could work well. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1166393299 From duke at openjdk.org Fri Apr 14 07:43:07 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 14 Apr 2023 07:43:07 GMT Subject: [crac] RFR: X11 CRaC reinitializing on CheckpointRestore [v4] In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 07:50:57 GMT, Ilya Kuznetsov wrote: >> Allows CRaC to perform a CheckpointRestore operation for applications using GUI (Swing, AWT) and X11 connection. >> >> Resources are registered only if the application uses the GUI. The order in which resources are reinitialized matters: Toolkit should be cleared before reference handling for a proper garbage collection, and GraphicsEnvironment after handling for a correct X11 disconnection. Some resources restore lazily. >> >> The `beforeCheckpoint()` operation dispose necessary toolkit and connection resources and disconnects from X11. This allows CRaC to perform a Checkpoint since there is no external connection. >> The `afterRestore()` operations reconnect to X11 and then restore necessary connection and toolkit resources. >> >> Thus, after the Restore operation, we have a clean X11 connection. It is ready to restore the original GUI state. > > Ilya Kuznetsov has updated the pull request incrementally with one additional commit since the last revision: > > Fix newline @i1ya-kznts9v Looks like this did not receive much attention; would you consider getting that ready again? Besides the merge conflict, this PR is rather big. Isolating (and describing) particular changes, esp. these that are not X11-related, into separate PRs would make the review much simpler. ------------- PR Comment: https://git.openjdk.org/crac/pull/19#issuecomment-1508068190 From duke at openjdk.org Fri Apr 14 08:21:22 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 14 Apr 2023 08:21:22 GMT Subject: [crac] RFR: Support repeated checkpoint and restore operations In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 13:37:36 GMT, Anton Kozlov wrote: >> * VM option CRaCCheckpointTo is recognized when restoring the application (destination can be changed) >> * The main problem for checkpoint after restore was old checkpoint image mmapped to files (CRaC-specific CRIU optimization for faster boot). Before performing checkpoint we transparently swap this with memory using anonymous mapping. > > src/hotspot/os/linux/os_linux.cpp line 392: > >> 390: next_checkpoint = ""; >> 391: } >> 392: return write_check_error(fd, next_checkpoint, strlen(next_checkpoint) + 1); > > As CRaCCheckpointTo is not the only option that may require this, we probably want a more generic approach. > Like a new option attribute (RESTORE_UPDATEABLE?) > > https://github.com/openjdk/crac/blob/master/src/hotspot/share/runtime/globals.hpp#L57 > > The implementation should check only UPDATEABLE options are provided along -XX:CRaCRestoreFrom. > > This be better done in a separate PR. I agree we should not mix a complex change like that into this PR. Regarding another option - while a generic way to specify updatable options is a good idea I wouldn't add it to each and every option. Many of those options would need careful analysis if it's possible to make them updatable, so I would rather keep an independent list. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1166471872 From duke at openjdk.org Fri Apr 14 08:32:06 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 14 Apr 2023 08:32:06 GMT Subject: [crac] RFR: Support repeated checkpoint and restore operations In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 15:08:33 GMT, Anton Kozlov wrote: >> * VM option CRaCCheckpointTo is recognized when restoring the application (destination can be changed) >> * The main problem for checkpoint after restore was old checkpoint image mmapped to files (CRaC-specific CRIU optimization for faster boot). Before performing checkpoint we transparently swap this with memory using anonymous mapping. > > src/hotspot/os/linux/os_linux.cpp line 6383: > >> 6381: bool ok = !_dry_run; >> 6382: >> 6383: remap_old_imagedir(); > > VM was not bothered the way CREngine saved the memory content. The mmaping is an implementation detail of the CR mechnism. > > Have you considered switching off the mmaping in CRIU in this repeated checkpoint-restore sequence? Assuming we would be able communicate that to CREngine (in CRIU mmaping is an option). > > Semantically, this patch propopses to handle a mapping twice, once in CRIU with mmaping and another time in the VM. There are some benefits of doing everything in the VM and having better control over the process. So it would be cleaner to do a practically big part of the memory management in the VM and leaving bootstraping only to the CRIU. I think that mmaping in CRIU is an important optimization that speeds up boot, so I did not want to force disabling that if you ever want to do the checkpoint again. You're right that this mixes the abstractions and responsibilities. There's an alternative solution that would not require changes in the VM, but it's technically more complex: we could ptrace VM and replace the mapping for it externally (though I am not sure how exactly should we invoke syscall on behalf of the tracee - maybe a parasite code would be needed?). Since ptracing process needs elevated priviledges we should probably add this as a separate criu command that would be invoked by criuengine. The advantage is clearer semantics and not relying on SIGSEGV handling here which some consider a sketchy practice. But it's a more complex solution. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1166487358 From duke at openjdk.org Fri Apr 14 08:35:05 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 14 Apr 2023 08:35:05 GMT Subject: [crac] RFR: Support repeated checkpoint and restore operations In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 13:31:34 GMT, Anton Kozlov wrote: >> * VM option CRaCCheckpointTo is recognized when restoring the application (destination can be changed) >> * The main problem for checkpoint after restore was old checkpoint image mmapped to files (CRaC-specific CRIU optimization for faster boot). Before performing checkpoint we transparently swap this with memory using anonymous mapping. > > src/hotspot/os/linux/os_linux.cpp line 6707: > >> 6705: } >> 6706: >> 6707: // Since putenv does not do its own copy of the strings we need to keep > > What is the point of these putenv changes? See the comment in the old code: > // left this pointer unowned, it is freed when process dies Without repeated CRs it was fine to leak some memory, but now we have to keep it around until the vars are overwritten and release afterwards. Regrettably `putenv` does not make an internal copy. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/57#discussion_r1166491181 From duke at openjdk.org Mon Apr 17 09:04:39 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 17 Apr 2023 09:04:39 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v8] In-Reply-To: References: Message-ID: > Tracks `java.io.FileDescriptor` instances as CRaC resource; before checkpoint these are reported and if not allow-listed (e.g. as opened as standard descriptors) an exception is thrown. Further investigation can use system property `jdk.crac.collect-fd-stacktraces=true` to record origin of those file descriptors. > File descriptors claimed in Java code are passed to native; native code checks all open file descriptors and reports error if there's an unexpected FD that is not included in the list passed previously. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: On exception, invoke afterRestore on throwing Context, too. ------------- Changes: - all: https://git.openjdk.org/crac/pull/43/files - new: https://git.openjdk.org/crac/pull/43/files/4f506168..8b51d32a Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=43&range=07 - incr: https://webrevs.openjdk.org/?repo=crac&pr=43&range=06-07 Stats: 44 lines in 2 files changed: 39 ins; 0 del; 5 mod Patch: https://git.openjdk.org/crac/pull/43.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/43/head:pull/43 PR: https://git.openjdk.org/crac/pull/43 From akozlov at openjdk.org Tue Apr 18 15:22:02 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Tue, 18 Apr 2023 15:22:02 GMT Subject: [crac] RFR: CRaC related documentation in JDK classes using custom tag [v2] In-Reply-To: References: Message-ID: On Tue, 28 Feb 2023 12:33:11 GMT, Radim Vansa wrote: >> As proposed in https://mail.openjdk.org/pipermail/crac-dev/2023-February/000496.html I am adding some CRaC related documentation. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add timers related docs src/java.base/share/classes/java/lang/System.java line 551: > 549: * of expected bounds, or {@link javax.crac.Context#register(javax.crac.Resource) register} > 550: * a resource that will help with adapting after the restore. > 551: * The described behavior looks like a bug. After we got to the concensus in #53, this needs to be updated. Or may be just "TODO" here? src/java.base/share/classes/java/lang/System.java line 793: > 791: * a resource and in the {@link javax.crac.Resource#afterRestore(javax.crac.Context) afterRestore method} > 792: * reload system properties, propagating any change. > 793: * The comment above is about standard system properties. Here we should say that system properties are updated after restore. The app can check the updated value in the afterRestore. src/java.base/share/classes/java/lang/System.java line 1111: > 1109: * a resource and in the {@link javax.crac.Resource#afterRestore(javax.crac.Context) afterRestore method} > 1110: * reload environment variables, propagating any change. > 1111: * The constantness of the environment is JDK implementation detail (although likely to be very expected by users). It should be enough to say the env is updated along restore, and that app check the update. But we don't need to force ("The app should ...") to reload values. src/java.base/share/classes/java/net/InetAddress.java line 197: > 195: * @crac This class holds a cache of resolved hostname-address pairs; > 196: * this cache is wiped out before checkpoint since this mapping might be > 197: * outdated or invalid in the environment where the process is restored. The description and rationale behind should be separated. "this cache is wiped out before checkpoint, so after the process is restored any lookup causes name address resolution. This ensures addresses are actual in the new environment"? And I'm not sure rationale part is not evident here. src/java.base/share/classes/java/security/SecureRandom.java line 366: > 364: * the {@link Security#getProviders() Security.getProviders()} method. > 365: * > 366: * @crac See provider documentation for details of behaviour after restore from a checkpoint. Shouldn't be a link here? src/java.base/share/classes/java/util/Timer.java line 328: > 326: * could execute many times after a restore. This is likely an undesired > 327: * behaviour, therefore it is recommended to cancel the task before > 328: * checkpoint and schedule it again after restore. In this wording this looks like a problem, and I'm not convinced that the problem exists (otherwise we may want probably to do something about that). Could you change the description to something more neutral? "could execute many times after a restore, catching up as described above. If this is not desirable, the task can be canceled and scheduled again in a {@link Resource} implementation" src/java.base/share/classes/jdk/internal/util/jar/PersistentJarFile.java line 39: > 37: > 38: /** > 39: * @crac It is assumed that JAR files opened through this class thatn are open "thatn" typo src/java.base/unix/classes/sun/net/www/protocol/jar/JarFileFactory.java line 46: > 44: * > 45: * @crac All JarFile instances that are not referenced from elsewhere are > 46: * removed from the cache before a checkpoint. Looks OK for a non-public class. But in general, the new tag is used for implementation notes like this and also for the description of the behavior of the internal Resource implementation with more impact on the user. It would be nice to somehow distinguish these two (or more) uses of the tag. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1165751039 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1165757063 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1165765261 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170163618 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170164574 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170180639 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170187510 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170196932 From duke at openjdk.org Tue Apr 18 16:23:34 2023 From: duke at openjdk.org (Radim Vansa) Date: Tue, 18 Apr 2023 16:23:34 GMT Subject: [crac] RFR: CRaC related documentation in JDK classes using custom tag [v2] In-Reply-To: References: Message-ID: <3_zIXA73vhJ3ZI3j0GCrX2bYzObfi-hNlA7ptuTYL3o=.38ef3ea3-ae25-402e-b6c5-bd9ba42455d6@github.com> On Thu, 13 Apr 2023 16:06:50 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add timers related docs > > src/java.base/share/classes/java/lang/System.java line 551: > >> 549: * of expected bounds, or {@link javax.crac.Context#register(javax.crac.Resource) register} >> 550: * a resource that will help with adapting after the restore. >> 551: * > > The described behavior looks like a bug. After we got to the concensus in #53, this needs to be updated. Or may be just "TODO" here? Yes, if this gets integrated before #53 I am going to change that one and update the comment with the actual behaviour. > src/java.base/share/classes/java/lang/System.java line 793: > >> 791: * a resource and in the {@link javax.crac.Resource#afterRestore(javax.crac.Context) afterRestore method} >> 792: * reload system properties, propagating any change. >> 793: * > > The comment above is about standard system properties. > > Here we should say that system properties are updated after restore. The app can check the updated value in the afterRestore. Isn't that exactly what is written in the comment? > src/java.base/share/classes/java/security/SecureRandom.java line 366: > >> 364: * the {@link Security#getProviders() Security.getProviders()} method. >> 365: * >> 366: * @crac See provider documentation for details of behaviour after restore from a checkpoint. > > Shouldn't be a link here? You mean to the generic provider interface? Here I meant the implementation of the provider. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170267006 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170268773 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1170273881 From akozlov at openjdk.org Wed Apr 19 12:10:23 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Wed, 19 Apr 2023 12:10:23 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v8] In-Reply-To: References: Message-ID: <6OBTI6ISwHn3dLGQ9_ZxK8IrVHikJhNTXrmqcpFL29k=.1500642d-845f-4cec-863c-8fa3a5c1807a@github.com> On Mon, 17 Apr 2023 09:04:39 GMT, Radim Vansa wrote: >> Tracks `java.io.FileDescriptor` instances as CRaC resource; before checkpoint these are reported and if not allow-listed (e.g. as opened as standard descriptors) an exception is thrown. Further investigation can use system property `jdk.crac.collect-fd-stacktraces=true` to record origin of those file descriptors. >> File descriptors claimed in Java code are passed to native; native code checks all open file descriptors and reports error if there's an unexpected FD that is not included in the list passed previously. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > On exception, invoke afterRestore on throwing Context, too. src/java.base/share/classes/java/io/FileDescriptor.java line 379: > 377: msg += JDKContext.COLLECT_FD_STACKTRACES_HINT; > 378: } > 379: throw new CheckpointOpenFileException(msg, resource.stackTraceHolder); Now, the message looks like: Suppressed: jdk.crac.impl.CheckpointOpenFileException: FileDescriptor 323 left open. at java.base/java.io.FileDescriptor.beforeCheckpoint(FileDescriptor.java:379) at java.base/java.io.FileDescriptor$Resource.beforeCheckpoint(FileDescriptor.java:80) at java.base/jdk.crac.impl.AbstractContextImpl.runBeforeCheckpoint(AbstractContextImpl.java:114) at java.base/jdk.crac.impl.AbstractContextImpl.beforeCheckpoint(AbstractContextImpl.java:83) at java.base/jdk.internal.crac.JDKContext.beforeCheckpoint(JDKContext.java:85) at java.base/jdk.crac.impl.AbstractContextImpl.runBeforeCheckpoint(AbstractContextImpl.java:114) at java.base/jdk.crac.impl.AbstractContextImpl.beforeCheckpoint(AbstractContextImpl.java:83) at java.base/jdk.crac.Core.checkpointRestore1(Core.java:121) ... 2 more Caused by: java.lang.Exception: This file descriptor was created here at java.base/java.io.FileDescriptor$Resource.(FileDescriptor.java:71) at java.base/java.io.FileDescriptor.(FileDescriptor.java:100) at java.base/sun.nio.ch.IOUtil.newFD(IOUtil.java:544) at java.base/sun.nio.ch.Net.socket(Net.java:524) at java.base/sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:146) at java.base/sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:129) at java.base/sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:77) at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:192) While in the past the FD (which is actually a socket, not a file) was reported by the VM like: Suppressed: jdk.crac.impl.CheckpointOpenSocketException: tcp localAddr 0.0.0.0 localPort 8080 remoteAddr 0.0.0.0 remotePort 0 at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:92) at java.base/jdk.crac.Core.checkpointRestore1(Core.java:157) We miss the fact this is a socket now, as well as details (although the stack trace is very useful!). We can ask the VM for the FD type and details. To so we'll report report the details as we used to, by reusing the existing code providing the details, that should not be very hard. src/java.base/share/classes/jdk/crac/LoggerContainer.java line 9: > 7: * Therefore, we isolate the logger into a subclass and initialize lazily. > 8: */ > 9: public class LoggerContainer { This should be package private at least. src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 31: > 29: import java.util.concurrent.locks.ReentrantLock; > 30: > 31: public abstract class AbstractContextImpl extends Context { `P` is not used anymore. The semantic of this class is being changed significantly, as well as implementation. It would be very nice to extract these modifications into a separate PR with a more clear desription of the reason of the change, tests, etc. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/43#discussion_r1171241969 PR Review Comment: https://git.openjdk.org/crac/pull/43#discussion_r1170350155 PR Review Comment: https://git.openjdk.org/crac/pull/43#discussion_r1170359272 From duke at openjdk.org Wed Apr 19 12:15:28 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 19 Apr 2023 12:15:28 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v8] In-Reply-To: <6OBTI6ISwHn3dLGQ9_ZxK8IrVHikJhNTXrmqcpFL29k=.1500642d-845f-4cec-863c-8fa3a5c1807a@github.com> References: <6OBTI6ISwHn3dLGQ9_ZxK8IrVHikJhNTXrmqcpFL29k=.1500642d-845f-4cec-863c-8fa3a5c1807a@github.com> Message-ID: On Tue, 18 Apr 2023 17:15:51 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> On exception, invoke afterRestore on throwing Context, too. > > src/java.base/share/classes/jdk/crac/LoggerContainer.java line 9: > >> 7: * Therefore, we isolate the logger into a subclass and initialize lazily. >> 8: */ >> 9: public class LoggerContainer { > > This should be package private at least. It's not package private as I meant to create one logger for all things CRaC, including logging in one of the implementation packages. I could probably move that to `jdk.crac.impl` or `jdk.internal.crac`. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/43#discussion_r1171247591 From heidinga at openjdk.org Wed Apr 19 14:34:19 2023 From: heidinga at openjdk.org (Dan Heidinga) Date: Wed, 19 Apr 2023 14:34:19 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v6] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 15:01:28 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add forgotten condition.signal() call src/java.base/share/classes/jdk/crac/RCULock.java line 209: > 207: lockImpl = lockSwitchPoint.guardWithTest(noop, readLockImpl); > 208: unlockSwitchPoint = new SwitchPoint(); > 209: unlockImpl = lockSwitchPoint.guardWithTest(noop, readUnlockImpl); After spending a lot of time digging through the Hotspot C2 code, I don't think this use of SwitchPoints will be optimized in the way we might want (ie: deopt on SwitchPoint invalidation). SwitchPoint is implemented as a special MutableCallSite that changes from a known target MH that returns true to a known target MH that returns false. This is all wrapped into a standard guardWithTest MH (basically a MH "if" statement). As far as I can tell, to benefit from C2 optimizing the SW, we need to have the MutableCallSite or the GuardWithTest MH rooted in a static final field or in an invokedynamic bytecode. See the code in type.cpp::make_constant_from_field which asserts a Dependency between the CallSite and the target MH for the current method. Otherwise, we don't have a Dependency to rely on and trigger the eventual deopt. The above analysis may be wrong so any corrections by C2 experts would be appreciated. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1171436557 From duke at openjdk.org Wed Apr 19 14:43:57 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 19 Apr 2023 14:43:57 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v6] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 14:31:42 GMT, Dan Heidinga wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add forgotten condition.signal() call > > src/java.base/share/classes/jdk/crac/RCULock.java line 209: > >> 207: lockImpl = lockSwitchPoint.guardWithTest(noop, readLockImpl); >> 208: unlockSwitchPoint = new SwitchPoint(); >> 209: unlockImpl = lockSwitchPoint.guardWithTest(noop, readUnlockImpl); > > After spending a lot of time digging through the Hotspot C2 code, I don't think this use of SwitchPoints will be optimized in the way we might want (ie: deopt on SwitchPoint invalidation). > > SwitchPoint is implemented as a special MutableCallSite that changes from a known target MH that returns true to a known target MH that returns false. This is all wrapped into a standard guardWithTest MH (basically a MH "if" statement). > > As far as I can tell, to benefit from C2 optimizing the SW, we need to have the MutableCallSite or the GuardWithTest MH rooted in a static final field or in an invokedynamic bytecode. See the code in type.cpp::make_constant_from_field which asserts a Dependency between the CallSite and the target MH for the current method. Otherwise, we don't have a Dependency to rely on and trigger the eventual deopt. > > The above analysis may be wrong so any corrections by C2 experts would be appreciated. Thanks for the insight! So I guess this could be really simplified to a bunch of `if`s - and it would be more clear to the reader, as he wouldn't expect some mysterious magic reason to pick SP. I'll rerun my benchmarks after that and we'll see if there's any difference. While I an imagine limiting this to a `static` field (coarse-graining this to singleton for the whole VM - it's not too bad anyway since we execute a VM-wide operation), but I can't make this `final` as we need to flip this many times. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/58#discussion_r1171448934 From heidinga at redhat.com Wed Apr 19 14:46:55 2023 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 19 Apr 2023 10:46:55 -0400 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: <01071774-4b51-741d-15c8-9bd92d0ae0a3@azul.com> References: <01071774-4b51-741d-15c8-9bd92d0ae0a3@azul.com> Message-ID: On Thu, Apr 13, 2023 at 10:20?AM Radim Vansa wrote: > > On 13. 04. 23 15:20, Dan Heidinga wrote: > > Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > @Dan, this is very interesting! > Could you please elaborate a bit further. Perhaps in the context of the CrackDemoExt.java sample? > > Let me think on that. I'll see if I can pull something together that shows the api use. > > I put together a small example showing the use of SwitchPoint to toggle between phases: normal mode, beforeCheckpoint, afterRestore, normal mode. [0] > > In the CRaCPhase class, there are two methods that take Function arguments that allow the user to provide phase-specific behaviour: > * beforeGuard which allows a switching from normal mode to checkpoint mode: https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15 > > * aroundGuard which allows switching from normal mode to checkpoint mode and back to normal mode: https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28 > > There's a use of this pattern in the "Test" class [1] which transitions from a regular get to a locked get. > > The ideas are all there though the code is a little unpleasant to work with due to the exception handling and general complexity of MethodHandles. > > Radim has an RCU lock that use Switchpoints as well though his API appears to be more pleasant for users: https://github.com/openjdk/crac/pull/58/files > > > I think that it's not only about nicer API; I think that your example does > not prevent running Test.getSpecialValueRaw() and resource beforeCheckpoint/afterRestore > concurrently - if one of the threads enters the Test.getSpecialValueRaw > method there's nothing that would prevent calling beforeCheckpoint(). In > other words, you'd need the special single-threaded mode. > You're right. Sufficiently bad timing with thread scheduling could allow the old value to be seen concurrently (or worse, even after) beforeCheckpoint/afterRestore. > While I've also used SwitchPoint as you suggested in my PR, can you tell > what's the difference between just reading a volatile variable (and > deciding based on the value) and using this class? It seems that it's used > mostly in scripting support, so I could imagine the utility of generating a > compact MethodHandle, but is there really any magic? > The benefits of SwitchPoints (which are built on top of MutableCallSite (MCS) and its syncAll behaviour) is that when rooted in a static final field or invokedynamic callsite, C2 can create a Dependency on the methods that call through the SwitchPoint (ie: the underlying MCS) and force a deoptimization when the MCS.target MH is changed. This makes the "if" target basically free as there's a deopt when Switchpoint flips and forces the MCS target MH to change. Lots of caveats on the above analysis and on actually getting the optimization to happen in practice. I'm not sure we can reliably provoke it without generating bytecode ourselves. --Dan > Radim > > > [0] https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java > [1] https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140 > > > --Dan > > > > Needs more exploration and prototyping but would provide a potential path to reasonable performance by burying the extra locking in the fallback paths. And it would be a single pattern to optimize, rather than all the variations users could produce. > --Dan > [0] https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/ > [1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html > > Thank you, > - Christian > > > > Cheers, > > Radim > > [1] https://en.wikipedia.org/wiki/Read-copy-update > > On 03. 04. 23 22:30, Christian Tzolov wrote: > > Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads. > > For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data. > > When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop. > > To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore. > > When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor. > > The https://github.com/tzolov/crac-demo illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal snapshots of the observed behavior. > > I've used latest JDK CRaC release: > openjdk 17-crac 2021-09-14 > OpenJDK Runtime Environment (build 17-crac+5-19) > OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing) > > As I'm new to CRaC, I'd appreciate your thoughts on this issue. > > Cheers, > Christian > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Thu Apr 20 08:09:26 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 20 Apr 2023 08:09:26 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v4] In-Reply-To: References: Message-ID: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> > There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Set nanotime only if bootid changes ------------- Changes: - all: https://git.openjdk.org/crac/pull/53/files - new: https://git.openjdk.org/crac/pull/53/files/b59d738a..5cc81961 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=53&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=53&range=02-03 Stats: 135 lines in 6 files changed: 98 ins; 3 del; 34 mod Patch: https://git.openjdk.org/crac/pull/53.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/53/head:pull/53 PR: https://git.openjdk.org/crac/pull/53 From duke at openjdk.org Thu Apr 20 08:28:40 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 20 Apr 2023 08:28:40 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v4] In-Reply-To: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> References: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> Message-ID: On Thu, 20 Apr 2023 08:09:26 GMT, Radim Vansa wrote: >> There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Set nanotime only if bootid changes I've updated the PR to change nanotime only if machine `boot_id` changes, and always record the millis + nanos combo before checkpoint. Now I use `ghcr.io/rvansa/crac-test-base` image as the base for tests (we need some native libraries on top of Ubuntu), hopefully we'll set up an org-based repository (`ghcr.io/crac/test-base`) and I'll change update it in the PR. ------------- PR Comment: https://git.openjdk.org/crac/pull/53#issuecomment-1515927515 From akozlov at openjdk.org Thu Apr 20 12:27:21 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 20 Apr 2023 12:27:21 GMT Subject: [crac] RFR: Backout new API to sync with Reference Handler [v2] In-Reply-To: References: Message-ID: > This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042 and makes #22 obsolete. > > The API introduced in 9cf1995693eead85d3807fb4c83ab38c14e27042 (waitForWaiters) and changed in #22 waits for the state when all discovered references are processed. So WaitForWaiters is used to implement predictable Reference Handling, ensuring that clean-up actions have fired for an object after it becomes unreachable. > > I think that API was a mistake and should be reverted. > > In general, the problem of predictable Reference Handling is independent of CRaC. So I thought about extracting that out of CRaC and found a few issues with the approach. A user needs to know what RefQueue gets References after an object becomes unreachable, to call waitForWaiters on that queue. The queue is not necessarily evident, so a deep understanding of refs and queues in an application is required to select the proper queue to wait on, and to build the right order of them to wait on. Also, it's required somehow to know the number of threads servicing a queue. And there are situations when waitForWaiters may report that all refs are processed, but some of them are not -- consider a thread that is polling a queue and gets refs to be processed but then buffers them in another queue for later, in this example waitForWaiters does not provide the guarantee that corresponding clean-up actions were performed. > > The common and more straightforward way to have predictable clean-up is to call an explicit method like close()/release()/cleanup() that performs object-specific clean-up actions predictably. Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge remote-tracking branch 'jdk/crac/crac' into revert-new-ref-handler-api - Backout new API to sync with Reference Handler This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042. ------------- Changes: https://git.openjdk.org/crac/pull/34/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=34&range=01 Stats: 141 lines in 5 files changed: 0 ins; 139 del; 2 mod Patch: https://git.openjdk.org/crac/pull/34.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/34/head:pull/34 PR: https://git.openjdk.org/crac/pull/34 From duke at openjdk.org Thu Apr 20 12:45:23 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 20 Apr 2023 12:45:23 GMT Subject: [crac] RFR: Backout new API to sync with Reference Handler [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 12:27:21 GMT, Anton Kozlov wrote: >> This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042 and makes #22 obsolete. >> >> The API introduced in 9cf1995693eead85d3807fb4c83ab38c14e27042 (waitForWaiters) and changed in #22 waits for the state when all discovered references are processed. So WaitForWaiters is used to implement predictable Reference Handling, ensuring that clean-up actions have fired for an object after it becomes unreachable. >> >> I think that API was a mistake and should be reverted. >> >> In general, the problem of predictable Reference Handling is independent of CRaC. So I thought about extracting that out of CRaC and found a few issues with the approach. A user needs to know what RefQueue gets References after an object becomes unreachable, to call waitForWaiters on that queue. The queue is not necessarily evident, so a deep understanding of refs and queues in an application is required to select the proper queue to wait on, and to build the right order of them to wait on. Also, it's required somehow to know the number of threads servicing a queue. And there are situations when waitForWaiters may report that all refs are processed, but some of them are not -- consider a thread that is polling a queue and gets refs to be processed but then buffers them in another queue for later, in this example waitForWaiters does not provide the guarantee that corresponding clean-up actions were performed. >> >> The common and more straightforward way to have predictable clean-up is to call an explicit method like close()/release()/cleanup() that performs object-specific clean-up actions predictably. > > Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge remote-tracking branch 'jdk/crac/crac' into revert-new-ref-handler-api > - Backout new API to sync with Reference Handler > > This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042. LGTM, let's integrate when the tests pass. ------------- PR Comment: https://git.openjdk.org/crac/pull/34#issuecomment-1516261388 From akozlov at openjdk.org Thu Apr 20 16:06:19 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 20 Apr 2023 16:06:19 GMT Subject: [crac] RFR: Improve modules handling in initial FD bookkeeping Message-ID: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> Replace paths comparision with `os::same_file` which compares paths first, then compares st_{ino,dev}. This makes the check a bit more robust and fixes CRaC example-lambda [1] Also, an annoying empty line is fixed in the warning, now that looks like: anton at mercury:~/proj/crac$ ./jdk/bin/java -XX:CRaCCheckpointTo=./cr -version [0.001s][warning][os] CRaC closing file descriptor 31: /dev/ptmx openjdk version "17-internal" 2021-09-14 OpenJDK Runtime Environment (build 17-internal+0-adhoc..crac) OpenJDK 64-Bit Server VM (build 17-internal+0-adhoc..crac, mixed mode) [1] https://github.com/CRaC/example-lambda ------------- Commit messages: - Improve modules handling in initial FD bookkeeping Changes: https://git.openjdk.org/crac/pull/59/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=59&range=00 Stats: 16 lines in 1 file changed: 8 ins; 6 del; 2 mod Patch: https://git.openjdk.org/crac/pull/59.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/59/head:pull/59 PR: https://git.openjdk.org/crac/pull/59 From duke at openjdk.org Fri Apr 21 08:18:14 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 21 Apr 2023 08:18:14 GMT Subject: [crac] RFR: Improve modules handling in initial FD bookkeeping In-Reply-To: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> References: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> Message-ID: On Thu, 20 Apr 2023 15:59:42 GMT, Anton Kozlov wrote: > Replace paths comparision with `os::same_file` which compares paths first, then compares st_{ino,dev}. This makes the check a bit more robust and fixes CRaC example-lambda [1] > > Also, an annoying empty line is fixed in the warning, now that looks like: > > anton at mercury:~/proj/crac$ ./jdk/bin/java -XX:CRaCCheckpointTo=./cr -version > [0.001s][warning][os] CRaC closing file descriptor 31: /dev/ptmx > openjdk version "17-internal" 2021-09-14 > OpenJDK Runtime Environment (build 17-internal+0-adhoc..crac) > OpenJDK 64-Bit Server VM (build 17-internal+0-adhoc..crac, mixed mode) > > > [1] https://github.com/CRaC/example-lambda Thanks, the change looks fine to me. Any reason to move the modules check under the ignored files check? ------------- PR Comment: https://git.openjdk.org/crac/pull/59#issuecomment-1517454368 From duke at openjdk.org Fri Apr 21 09:04:19 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 21 Apr 2023 09:04:19 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v8] In-Reply-To: <6OBTI6ISwHn3dLGQ9_ZxK8IrVHikJhNTXrmqcpFL29k=.1500642d-845f-4cec-863c-8fa3a5c1807a@github.com> References: <6OBTI6ISwHn3dLGQ9_ZxK8IrVHikJhNTXrmqcpFL29k=.1500642d-845f-4cec-863c-8fa3a5c1807a@github.com> Message-ID: On Tue, 18 Apr 2023 17:25:09 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> On exception, invoke afterRestore on throwing Context, too. > > src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 31: > >> 29: import java.util.concurrent.locks.ReentrantLock; >> 30: >> 31: public abstract class AbstractContextImpl extends Context { > > `P` is not used anymore. > > The semantic of this class is being changed significantly, as well as implementation. It would be very nice to extract these modifications into a separate PR with a more clear desription of the reason of the change, tests, etc. Done in https://github.com/openjdk/crac/pull/60 ------------- PR Review Comment: https://git.openjdk.org/crac/pull/43#discussion_r1173527025 From duke at openjdk.org Fri Apr 21 09:21:27 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 21 Apr 2023 09:21:27 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources Message-ID: * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource ** JDKResource can register resource with higher priority ** Other registrations are prohibited ------------- Commit messages: - Fix ordering of invocation on Resources Changes: https://git.openjdk.org/crac/pull/60/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=60&range=00 Stats: 584 lines in 8 files changed: 376 ins; 165 del; 43 mod Patch: https://git.openjdk.org/crac/pull/60.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/60/head:pull/60 PR: https://git.openjdk.org/crac/pull/60 From duke at openjdk.org Fri Apr 21 09:54:15 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 21 Apr 2023 09:54:15 GMT Subject: [crac] RFR: Improve modules handling in initial FD bookkeeping In-Reply-To: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> References: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> Message-ID: On Thu, 20 Apr 2023 15:59:42 GMT, Anton Kozlov wrote: > Replace paths comparision with `os::same_file` which compares paths first, then compares st_{ino,dev}. This makes the check a bit more robust and fixes CRaC example-lambda [1] > > Also, an annoying empty line is fixed in the warning, now that looks like: > > anton at mercury:~/proj/crac$ ./jdk/bin/java -XX:CRaCCheckpointTo=./cr -version > [0.001s][warning][os] CRaC closing file descriptor 31: /dev/ptmx > openjdk version "17-internal" 2021-09-14 > OpenJDK Runtime Environment (build 17-internal+0-adhoc..crac) > OpenJDK 64-Bit Server VM (build 17-internal+0-adhoc..crac, mixed mode) > > > [1] https://github.com/CRaC/example-lambda Marked as reviewed by rvansa at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/crac/pull/59#pullrequestreview-1395467512 From akozlov at openjdk.org Fri Apr 21 09:54:15 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 21 Apr 2023 09:54:15 GMT Subject: [crac] RFR: Improve modules handling in initial FD bookkeeping In-Reply-To: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> References: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> Message-ID: On Thu, 20 Apr 2023 15:59:42 GMT, Anton Kozlov wrote: > Replace paths comparision with `os::same_file` which compares paths first, then compares st_{ino,dev}. This makes the check a bit more robust and fixes CRaC example-lambda [1] > > Also, an annoying empty line is fixed in the warning, now that looks like: > > anton at mercury:~/proj/crac$ ./jdk/bin/java -XX:CRaCCheckpointTo=./cr -version > [0.001s][warning][os] CRaC closing file descriptor 31: /dev/ptmx > openjdk version "17-internal" 2021-09-14 > OpenJDK Runtime Environment (build 17-internal+0-adhoc..crac) > OpenJDK 64-Bit Server VM (build 17-internal+0-adhoc..crac, mixed mode) > > > [1] https://github.com/CRaC/example-lambda Thank you for review. Could you please click "Approve" in Review? The modules check became a bit more expensive, involves stat syscall. So we have a chance to match file path/fd and avoid modules test, that anyway can succeed only once. ------------- PR Comment: https://git.openjdk.org/crac/pull/59#issuecomment-1517573020 From akozlov at openjdk.org Fri Apr 21 10:08:11 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 21 Apr 2023 10:08:11 GMT Subject: [crac] RFR: Backout new API to sync with Reference Handler [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 12:27:21 GMT, Anton Kozlov wrote: >> This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042 and makes #22 obsolete. >> >> The API introduced in 9cf1995693eead85d3807fb4c83ab38c14e27042 (waitForWaiters) and changed in #22 waits for the state when all discovered references are processed. So WaitForWaiters is used to implement predictable Reference Handling, ensuring that clean-up actions have fired for an object after it becomes unreachable. >> >> I think that API was a mistake and should be reverted. >> >> In general, the problem of predictable Reference Handling is independent of CRaC. So I thought about extracting that out of CRaC and found a few issues with the approach. A user needs to know what RefQueue gets References after an object becomes unreachable, to call waitForWaiters on that queue. The queue is not necessarily evident, so a deep understanding of refs and queues in an application is required to select the proper queue to wait on, and to build the right order of them to wait on. Also, it's required somehow to know the number of threads servicing a queue. And there are situations when waitForWaiters may report that all refs are processed, but some of them are not -- consider a thread that is polling a queue and gets refs to be processed but then buffers them in another queue for later, in this example waitForWaiters does not provide the guarantee that corresponding clean-up actions were performed. >> >> The common and more straightforward way to have predictable clean-up is to call an explicit method like close()/release()/cleanup() that performs object-specific clean-up actions predictably. > > Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge remote-tracking branch 'jdk/crac/crac' into revert-new-ref-handler-api > - Backout new API to sync with Reference Handler > > This reverts commit 9cf1995693eead85d3807fb4c83ab38c14e27042. jdk/crac/JarFileFactoryCacheTest/JarFileFactoryCacheTest.java failed, investigating. ----------stdout:(7/762)---------- JVM: FD fd=0 type=fifo: details1="pipe:[35938]" OK: inherited from process env JVM: FD fd=1 type=fifo: details1="pipe:[35939]" OK: inherited from process env JVM: FD fd=2 type=fifo: details1="pipe:[35940]" OK: inherited from process env JVM: FD fd=3 type=regular: details1="/home/runner/jdk-linux-x64-debug/jdk-17-internal+0_linux-x64_bin-debug/jdk-17/fastdebug/lib/modules" OK: inherited from process env JVM: FD fd=4 type=character: details1="/dev/random" OK: always available, random or urandom JVM: FD fd=5 type=character: details1="/dev/urandom" OK: always available, random or urandom JVM: FD fd=6 type=regular: details1="/home/runner/work/crac/crac/build/run-test-prebuilt/test-support/jtreg_test_jdk_jdk_crac/scratch/test.jar" BAD: opened by application ----------stderr:(11/718)---------- Exception in thread "main" jdk.crac.CheckpointException at java.base/jdk.crac.Core.checkpointRestore1(Core.java:141) at java.base/jdk.crac.Core.checkpointRestore(Core.java:246) at java.base/jdk.crac.Core.checkpointRestore(Core.java:231) at JarFileFactoryCacheTest.exec(JarFileFactoryCacheTest.java:75) at jdk.test.lib.crac.CracTest.run(CracTest.java:157) at jdk.test.lib.crac.CracTest.main(CracTest.java:89) Suppressed: jdk.crac.impl.CheckpointOpenFileException: /home/runner/work/crac/crac/build/run-test-prebuilt/test-support/jtreg_test_jdk_jdk_crac/scratch/test.jar at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:87) at java.base/jdk.crac.Core.checkpointRestore1(Core.java:145) ... 5 more ------------- PR Comment: https://git.openjdk.org/crac/pull/34#issuecomment-1517594606 From akozlov at openjdk.org Fri Apr 21 10:23:14 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 21 Apr 2023 10:23:14 GMT Subject: [crac] Integrated: Improve modules handling in initial FD bookkeeping In-Reply-To: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> References: <9TCVZKFTb3Qd4mf6KjtGBqxt8vEUUsxrFClYpOW6ViQ=.716ac9bb-f5e6-4950-a4c8-bc20c9b3ec7a@github.com> Message-ID: <3VCMwwTUrcEnUpzXyYO_nkOwFIq4U8rBNgBlr-U9tGw=.f707a185-5384-4193-827a-8c334e3f7d73@github.com> On Thu, 20 Apr 2023 15:59:42 GMT, Anton Kozlov wrote: > Replace paths comparision with `os::same_file` which compares paths first, then compares st_{ino,dev}. This makes the check a bit more robust and fixes CRaC example-lambda [1] > > Also, an annoying empty line is fixed in the warning, now that looks like: > > anton at mercury:~/proj/crac$ ./jdk/bin/java -XX:CRaCCheckpointTo=./cr -version > [0.001s][warning][os] CRaC closing file descriptor 31: /dev/ptmx > openjdk version "17-internal" 2021-09-14 > OpenJDK Runtime Environment (build 17-internal+0-adhoc..crac) > OpenJDK 64-Bit Server VM (build 17-internal+0-adhoc..crac, mixed mode) > > > [1] https://github.com/CRaC/example-lambda This pull request has now been integrated. Changeset: 95394e84 Author: Anton Kozlov URL: https://git.openjdk.org/crac/commit/95394e84683f1a816c0283f8c834072324516fba Stats: 16 lines in 1 file changed: 8 ins; 6 del; 2 mod Improve modules handling in initial FD bookkeeping ------------- PR: https://git.openjdk.org/crac/pull/59 From akozlov at openjdk.org Fri Apr 21 14:57:13 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 21 Apr 2023 14:57:13 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v4] In-Reply-To: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> References: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> Message-ID: On Thu, 20 Apr 2023 08:09:26 GMT, Radim Vansa wrote: >> There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Set nanotime only if bootid changes src/hotspot/os/linux/os_linux.cpp line 6605: > 6603: > 6604: bool os::read_bootid(char *dest, size_t size) { > 6605: int fd = ::open("/proc/sys/kernel/random/boot_id", O_RDONLY); Bood_id looks interesting! But AFAICS it remains the same for each new container, and the boot time may have been adjusted for that container. anton at mercury:~$ docker run -it ubuntu:20.04 cat /proc/sys/kernel/random/boot_id 9b913973-3082-471a-add5-6b802a04a9b2 anton at mercury:~$ docker run -it ubuntu:20.04 cat /proc/sys/kernel/random/boot_id 9b913973-3082-471a-add5-6b802a04a9b2 anton at mercury:~$ cat /proc/sys/kernel/random/boot_id 9b913973-3082-471a-add5-6b802a04a9b2 Should not we mix something extra to boot_id, for example, a hostname (which is different for each container)? src/hotspot/share/runtime/os.cpp line 2045: > 2043: char buf[UUID_LENGTH + 1]; > 2044: // We will change the nanotime offset only if this is not the same boot > 2045: // to prevent reducing the accuracy of System.nanoTime() unnecessarily But it would be nice to ensure monotonicity even if it looks like the same boot. Like if (!same_boot) { ... } else if ((diff = (checkpoint_nanos - javaTimeNanos()) > 0) { javaTimeNanos_offset = diff + 1; } src/hotspot/share/runtime/os.cpp line 2055: > 2053: // Make the javaTimeNanos() on the next line return true monotonic time > 2054: javaTimeNanos_offset = 0; > 2055: javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() + diff_millis * 1000000L; First assignment does not make effect. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173871537 PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173867249 PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173862993 From duke at openjdk.org Fri Apr 21 14:57:13 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 21 Apr 2023 14:57:13 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v4] In-Reply-To: References: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> Message-ID: <7d-VAhvBvVnagjh9Z8mUMwIibGGkQ8e7UJZPaRp2iXs=.9ee2abd8-9db1-4886-90ad-5fed81172979@github.com> On Fri, 21 Apr 2023 14:44:39 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Set nanotime only if bootid changes > > src/hotspot/share/runtime/os.cpp line 2055: > >> 2053: // Make the javaTimeNanos() on the next line return true monotonic time >> 2054: javaTimeNanos_offset = 0; >> 2055: javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() + diff_millis * 1000000L; > > First assignment does not make effect. It does; `javaTimeNanos()` uses `javaTimeNanos_offset` underhood. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173872701 From duke at openjdk.org Fri Apr 21 15:03:16 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 21 Apr 2023 15:03:16 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v4] In-Reply-To: References: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> Message-ID: <0yfFAUcHvfq9iRwpgzBvlT2Gy0KLg4TGreYAWOb1tOU=.967d2e26-a8d1-4b0e-b252-fcc10b3b4902@github.com> On Fri, 21 Apr 2023 14:52:49 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Set nanotime only if bootid changes > > src/hotspot/os/linux/os_linux.cpp line 6605: > >> 6603: >> 6604: bool os::read_bootid(char *dest, size_t size) { >> 6605: int fd = ::open("/proc/sys/kernel/random/boot_id", O_RDONLY); > > Bood_id looks interesting! But AFAICS it remains the same for each new container, and the boot time may have been adjusted for that container. > > anton at mercury:~$ docker run -it ubuntu:20.04 cat /proc/sys/kernel/random/boot_id > 9b913973-3082-471a-add5-6b802a04a9b2 > anton at mercury:~$ docker run -it ubuntu:20.04 cat /proc/sys/kernel/random/boot_id > 9b913973-3082-471a-add5-6b802a04a9b2 > anton at mercury:~$ cat /proc/sys/kernel/random/boot_id > 9b913973-3082-471a-add5-6b802a04a9b2 > > > Should not we mix something extra to boot_id, for example, a hostname (which is different for each container)? The aim of this work is to prevent 'random' readings. New containers will keep the `boot_id` but their monotonic time *might* be adjusted, true. I agree with the comment above that in that case we should still ensure monotonicity, but since the readings are not 'random' I think that we can assume that the user did adjust monotonic time intentionally, and the application should observe that. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173879844 From akozlov at openjdk.org Fri Apr 21 15:19:19 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 21 Apr 2023 15:19:19 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v4] In-Reply-To: <0yfFAUcHvfq9iRwpgzBvlT2Gy0KLg4TGreYAWOb1tOU=.967d2e26-a8d1-4b0e-b252-fcc10b3b4902@github.com> References: <06iEWAtgXnr3kPcIq_R7hWA69CsgGD-L8vBmooXYrp8=.7379a114-fe32-44f0-b6b1-6a8362380a3e@github.com> <0yfFAUcHvfq9iRwpgzBvlT2Gy0KLg4TGreYAWOb1tOU=.967d2e26-a8d1-4b0e-b252-fcc10b3b4902@github.com> Message-ID: <_czDRlcy-S8RWuE6Zh1pZXnKTs-zgvzE2TQP5vmk35s=.65782c0e-bd80-4b14-b6e5-40f84fb4c5c4@github.com> On Fri, 21 Apr 2023 15:00:47 GMT, Radim Vansa wrote: >> src/hotspot/os/linux/os_linux.cpp line 6605: >> >>> 6603: >>> 6604: bool os::read_bootid(char *dest, size_t size) { >>> 6605: int fd = ::open("/proc/sys/kernel/random/boot_id", O_RDONLY); >> >> Bood_id looks interesting! But AFAICS it remains the same for each new container, and the boot time may have been adjusted for that container. >> >> anton at mercury:~$ docker run -it ubuntu:20.04 cat /proc/sys/kernel/random/boot_id >> 9b913973-3082-471a-add5-6b802a04a9b2 >> anton at mercury:~$ docker run -it ubuntu:20.04 cat /proc/sys/kernel/random/boot_id >> 9b913973-3082-471a-add5-6b802a04a9b2 >> anton at mercury:~$ cat /proc/sys/kernel/random/boot_id >> 9b913973-3082-471a-add5-6b802a04a9b2 >> >> >> Should not we mix something extra to boot_id, for example, a hostname (which is different for each container)? > > The aim of this work is to prevent 'random' readings. New containers will keep the `boot_id` but their monotonic time *might* be adjusted, true. I agree with the comment above that in that case we should still ensure monotonicity, but since the readings are not 'random' I think that we can assume that the user did adjust monotonic time intentionally, and the application should observe that. OK, agree. This may be a comment in the code, explaining that we leave some control of the time in the user's hands. >> src/hotspot/share/runtime/os.cpp line 2055: >> >>> 2053: // Make the javaTimeNanos() on the next line return true monotonic time >>> 2054: javaTimeNanos_offset = 0; >>> 2055: javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() + diff_millis * 1000000L; >> >> First assignment does not make effect. > > It does; `javaTimeNanos()` uses `javaTimeNanos_offset` underhood. Indeed! Could you extend the comment, highlightint that javaTimeNanos uses javaTimeNanos_offset ------------- PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173893881 PR Review Comment: https://git.openjdk.org/crac/pull/53#discussion_r1173896654 From akozlov at openjdk.org Fri Apr 21 17:17:14 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 21 Apr 2023 17:17:14 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: Message-ID: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> On Fri, 21 Apr 2023 09:01:07 GMT, Radim Vansa wrote: > * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). > * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource > ** JDKResource can register resource with higher priority ** Other registrations are prohibited src/java.base/share/classes/jdk/crac/LoggerContainer.java line 9: > 7: * Therefore, we isolate the logger into a subclass and initialize lazily. > 8: */ > 9: public class LoggerContainer { Technically, this is not a part of Coordintation API, but an implementation of the logging for JDK needs. So still does not look like it should be `public`. Having this public in `jdk.internal.crac`, etc is totally fine. src/java.base/share/classes/jdk/crac/Resource.java line 41: > 39: * Invoked by a {@code Context} as a notification about checkpoint. > 40: * Order of checkpoint notification is the reverse order of > 41: * {@link Context#register(Resource) registration}. This is correct for the Global Context, but the order can be different in other Contexts. I don't think Resource javadoc should describe the ordering (otherwise, a Context that uses a different ordering should refuse registration, but the Context does not have any mean to know the resource assumes some particular ordering). src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 54: > 52: locked = true; > 53: // This is important for the case of recursive registration > 54: throwIfCheckpointInProgress(priority); What is expected handling of this exception? And in the current form the exception is not checked, so in most cases that exception won't be expected. Having that in most cases resources are registered during class or object initialization, those entities will be caught in partially constructed states, likely leaving the whole system stuck. Would it not better to allow registration, but throw CheckpointException at the end of Context.beforeCheckpoint? That is pretty legit exception that means the checkpoint can be attempted again, and another attempt can be pretty well successful, if all Resources do not throw and no new Resources are registered. src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 117: > 115: restoreQ.add(r); > 116: } catch (CheckpointException e) { > 117: enqueueIfContext(r); Ohh, I see the problem > When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). So this is proposed to be handled in the parent context? Have you considered fixing that in the child context, run afterRestore for successfully checkpointed resources, before throwing CheckpointException to the parent context? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1173903740 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1173905238 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174003392 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1173995614 From duke at openjdk.org Mon Apr 24 06:52:16 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 06:52:16 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Fri, 21 Apr 2023 17:04:17 GMT, Anton Kozlov wrote: >> * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). >> * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource >> ** JDKResource can register resource with higher priority ** Other registrations are prohibited > > src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 117: > >> 115: restoreQ.add(r); >> 116: } catch (CheckpointException e) { >> 117: enqueueIfContext(r); > > Ohh, I see the problem > >> When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). > > So this is proposed to be handled in the parent context? Have you considered fixing that in the child context, run afterRestore for successfully checkpointed resources, before throwing CheckpointException to the parent context? Right now we run the checkpoint for all resources even if some fail. Are you fine with erroring out on the first exception? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174852349 From duke at openjdk.org Mon Apr 24 07:02:19 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 07:02:19 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Fri, 21 Apr 2023 17:14:13 GMT, Anton Kozlov wrote: >> * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). >> * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource >> ** JDKResource can register resource with higher priority ** Other registrations are prohibited > > src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 54: > >> 52: locked = true; >> 53: // This is important for the case of recursive registration >> 54: throwIfCheckpointInProgress(priority); > > What is expected handling of this exception? And in the current form the exception is not checked, so in most cases that exception won't be expected. Having that in most cases resources are registered during class or object initialization, those entities will be caught in partially constructed states, likely leaving the whole system stuck. > > Would it not better to allow registration, but throw CheckpointException at the end of Context.beforeCheckpoint? That is pretty legit exception that means the checkpoint can be attempted again, and another attempt can be pretty well successful, if all Resources do not throw and no new Resources are registered. So, making `register` throw a checked exception wouldn't be a good solution; most of the time there wouldn't be a better handling than to log && rethrow. What you propose - having a form of exception stack in the context and checking it anything was added after `beforeCheckpoint` could work, though this is definitely not a pattern that would be natural in Java - it smells of JNI developer :) I think that a situation when the ISE is thrown is a programming error, rather than reacting on the state of 'outer world' (as in case of e.g. I/O...) and as such this can stay as unchecked exception, possibly leading to inconsistent state of the program. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174861133 From duke at openjdk.org Mon Apr 24 07:06:15 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 07:06:15 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Fri, 21 Apr 2023 15:24:41 GMT, Anton Kozlov wrote: >> * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). >> * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource >> ** JDKResource can register resource with higher priority ** Other registrations are prohibited > > src/java.base/share/classes/jdk/crac/Resource.java line 41: > >> 39: * Invoked by a {@code Context} as a notification about checkpoint. >> 40: * Order of checkpoint notification is the reverse order of >> 41: * {@link Context#register(Resource) registration}. > > This is correct for the Global Context, but the order can be different in other Contexts. > > I don't think Resource javadoc should describe the ordering (otherwise, a Context that uses a different ordering should refuse registration, but the Context does not have any mean to know the resource assumes some particular ordering). Alright, I think that the order for the global context is rather hidden in the `package-info.java`. So, I'll add a note here that the order is defined by the context, and move this info to `Core.getGlobalContext` javadoc. Can we say that the fact that afterRestore is called in an inverse order to beforeCheckpoint is an universal rule, or would you keep that specific to context impl? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174864397 From akozlov at openjdk.org Mon Apr 24 08:28:16 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Mon, 24 Apr 2023 08:28:16 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 07:03:31 GMT, Radim Vansa wrote: >> src/java.base/share/classes/jdk/crac/Resource.java line 41: >> >>> 39: * Invoked by a {@code Context} as a notification about checkpoint. >>> 40: * Order of checkpoint notification is the reverse order of >>> 41: * {@link Context#register(Resource) registration}. >> >> This is correct for the Global Context, but the order can be different in other Contexts. >> >> I don't think Resource javadoc should describe the ordering (otherwise, a Context that uses a different ordering should refuse registration, but the Context does not have any mean to know the resource assumes some particular ordering). > > Alright, I think that the order for the global context is rather hidden in the `package-info.java`. So, I'll add a note here that the order is defined by the context, and move this info to `Core.getGlobalContext` javadoc. > > Can we say that the fact that afterRestore is called in an inverse order to beforeCheckpoint is an universal rule, or would you keep that specific to context impl? I think that should be leaved to Context, as ordering does not look quite Resource concern. Contextes were made to hande ordering. >> src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 54: >> >>> 52: locked = true; >>> 53: // This is important for the case of recursive registration >>> 54: throwIfCheckpointInProgress(priority); >> >> What is expected handling of this exception? And in the current form the exception is not checked, so in most cases that exception won't be expected. Having that in most cases resources are registered during class or object initialization, those entities will be caught in partially constructed states, likely leaving the whole system stuck. >> >> Would it not better to allow registration, but throw CheckpointException at the end of Context.beforeCheckpoint? That is pretty legit exception that means the checkpoint can be attempted again, and another attempt can be pretty well successful, if all Resources do not throw and no new Resources are registered. > > So, making `register` throw a checked exception wouldn't be a good solution; most of the time there wouldn't be a better handling than to log && rethrow. > > What you propose - having a form of exception stack in the context and checking it anything was added after `beforeCheckpoint` could work, though this is definitely not a pattern that would be natural in Java - it smells of JNI developer :) > > I think that a situation when the ISE is thrown is a programming error, rather than reacting on the state of 'outer world' (as in case of e.g. I/O...) and as such this can stay as unchecked exception, possibly leading to inconsistent state of the program. I admit that Resource is usually registered implicitly, and the current state (silently ignore newly registered Resources when checkpoint is in progress) is not adequate and may leads to some state leaking to the image, while that is not supposed to (if Resource would be registered some time before). With unchecked ISE, the user will have to somehow ensure that registration likley being implicit, changing the code to break the abstraction above that Resource/registration. With the check for new Resources, we'll detect exactly the situtaion user will likely like to know (if everything in the state ready to be stored to the image), and if not, a recovery will be trying again. Every Resource may throw CheckpointException for their own reasons, and CheckpointException("timeout for waiting some critical clean up, try again") is expected to be something very possible and is expected. >> src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 117: >> >>> 115: restoreQ.add(r); >>> 116: } catch (CheckpointException e) { >>> 117: enqueueIfContext(r); >> >> Ohh, I see the problem >> >>> When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). >> >> So this is proposed to be handled in the parent context? Have you considered fixing that in the child context, run afterRestore for successfully checkpointed resources, before throwing CheckpointException to the parent context? > > Right now we run the checkpoint for all resources even if some fail. Are you fine with erroring out on the first exception? Why is that necessary? If we are trying to avoid second afterRestore for sucessfully checkpointed Resources, that would mean that Context.afterRestore is called, that controverts the problem definition. Assuming afterRestore is not called for the Resource with failed beforeCheckpoint. Suppose we have R1, R2 in a child Context, and R1.beforeCheckpoint succeed, and R2.beforeCheckpoint fails. We can call R1.afterRestore, before throwing from Ctx.beforeCheckpoint. That will restore successfully checkpointed resources. But I checked the code and it seems that we do call afterRestore regardless of the result of beforeCheckpoint https://github.com/openjdk/crac/blob/95394e84683f1a816c0283f8c834072324516fba/src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java#L54 Every Resource available is collected in List resources and then restore queue is exactly the reverse of that list, it does not matter if any of resources throwed CheckpointException. Right? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174890341 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174909443 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174946729 From duke at openjdk.org Mon Apr 24 08:49:31 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 08:49:31 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 08:24:16 GMT, Anton Kozlov wrote: >> Right now we run the checkpoint for all resources even if some fail. Are you fine with erroring out on the first exception? > > Why is that necessary? If we are trying to avoid second afterRestore for sucessfully checkpointed Resources, that would mean that Context.afterRestore is called, that controverts the problem definition. > > Assuming afterRestore is not called for the Resource with failed beforeCheckpoint. Suppose we have R1, R2 in a child Context, and R1.beforeCheckpoint succeed, and R2.beforeCheckpoint fails. We can call R1.afterRestore, before throwing from Ctx.beforeCheckpoint. That will restore successfully checkpointed resources. > > But I checked the code and it seems that we do call afterRestore regardless of the result of beforeCheckpoint > > https://github.com/openjdk/crac/blob/95394e84683f1a816c0283f8c834072324516fba/src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java#L54 > > Every Resource available is collected in List resources and then restore queue is exactly the reverse of that list, it does not matter if any of resources throwed CheckpointException. Right? On regular resources `afterRestore` is not called if the resource throws, it does not end up in the restoreQ. I am a bit worried about calling `R3.bC` that would live in a different context when `R1` is already restored. The components were called in certain order for a reason, and if R3 expect R1 to be checkpointed this could end up badly. You might argue that R2 is not in checkpointed state anyway and R3 might expect it to be, but at least here one component failed and there is a track of consequences. On the other hand R1 and R3 might be completely unrelevant to R2 and the fact that R2 failed shouldn't interfere. What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174972609 From duke at openjdk.org Mon Apr 24 09:02:20 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 09:02:20 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 07:49:56 GMT, Anton Kozlov wrote: >> So, making `register` throw a checked exception wouldn't be a good solution; most of the time there wouldn't be a better handling than to log && rethrow. >> >> What you propose - having a form of exception stack in the context and checking it anything was added after `beforeCheckpoint` could work, though this is definitely not a pattern that would be natural in Java - it smells of JNI developer :) >> >> I think that a situation when the ISE is thrown is a programming error, rather than reacting on the state of 'outer world' (as in case of e.g. I/O...) and as such this can stay as unchecked exception, possibly leading to inconsistent state of the program. > > I admit that Resource is usually registered implicitly, and the current state (silently ignore newly registered Resources when checkpoint is in progress) is not adequate and may leads to some state leaking to the image, while that is not supposed to (if Resource would be registered some time before). > > With unchecked ISE, the user will have to somehow ensure that registration likley being implicit, changing the code to break the abstraction above that Resource/registration. > > With the check for new Resources, we'll detect exactly the situtaion user will likely like to know (if everything in the state ready to be stored to the image), and if not, a recovery will be trying again. Every Resource may throw CheckpointException for their own reasons, and CheckpointException("timeout for waiting some critical clean up, try again") is expected to be something very possible and is expected. Generally speaking, your proposal to try performing the checkpoint until it succeeds is more permissive, as users can back-off from a bad/fragile design through retrying, rather than being forced to correct the dependencies between components. In the past it seemed to me that you were more on the "good design required" rather than "practical" side of things - please don't take this as any form of "ad hominem" argument, I would just like to understand why you opt for such solution here. In my view a timeout/IO error are a totally different category of error. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174988162 From akozlov at openjdk.org Mon Apr 24 09:27:11 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Mon, 24 Apr 2023 09:27:11 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 08:46:50 GMT, Radim Vansa wrote: > On regular resources afterRestore is not called if the resource throws, it does not end up in the restoreQ. I'm puzzled how this is possible in the implementation before the patch. The code says it should List resources = checkpointQ. ... ; for (Resource r : resources) { ... } Collections.reverse(resources); restoreQ = resources; We'll have to document the expected behavior, but that is another concern. > I am a bit worried about calling R3.bC that would live in a different context when R1 is already restored. This is a question of specification. If we set that, it will be adopted by rearranging Contextes and Resources, pretty manageable. In general, resources should not have a lot of assumptions about other resources. Otherwise, they should directly refer to each other, rathen than relying on the Context. That will be more straightforward and less error-prone. > What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives? Correct, to run all registered code, that may also have some side-effects (unpreferable, but possible). Having that checkpoint won't happen anyway, false positive does not look a problem. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1175016444 From akozlov at openjdk.org Mon Apr 24 09:48:16 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Mon, 24 Apr 2023 09:48:16 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 08:59:54 GMT, Radim Vansa wrote: >> I admit that Resource is usually registered implicitly, and the current state (silently ignore newly registered Resources when checkpoint is in progress) is not adequate and may leads to some state leaking to the image, while that is not supposed to (if Resource would be registered some time before). >> >> With unchecked ISE, the user will have to somehow ensure that registration likley being implicit, changing the code to break the abstraction above that Resource/registration. >> >> With the check for new Resources, we'll detect exactly the situtaion user will likely like to know (if everything in the state ready to be stored to the image), and if not, a recovery will be trying again. Every Resource may throw CheckpointException for their own reasons, and CheckpointException("timeout for waiting some critical clean up, try again") is expected to be something very possible and is expected. > > Generally speaking, your proposal to try performing the checkpoint until it succeeds is more permissive, as users can back-off from a bad/fragile design through retrying, rather than being forced to correct the dependencies between components. In the past it seemed to me that you were more on the "good design required" rather than "practical" side of things - please don't take this as any form of "ad hominem" argument, I would just like to understand why you opt for such solution here. In my view a timeout/IO error are a totally different category of error. The CheckpointException is a CRaC-specific, deliberately thrown exception to communicate the problem with the state to the requester of the checkpoint. The exception may suggest rewriting the code, or another attempt. We don't care a lot about the preceise meaning, we just need to deliver the exception. Although lesser number of exceptions is better of course. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1175040797 From duke at openjdk.org Mon Apr 24 10:15:17 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 10:15:17 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 09:24:49 GMT, Anton Kozlov wrote: >> On regular resources `afterRestore` is not called if the resource throws, it does not end up in the restoreQ. >> >> I am a bit worried about calling `R3.bC` that would live in a different context when `R1` is already restored. The components were called in certain order for a reason, and if R3 expect R1 to be checkpointed this could end up badly. >> You might argue that R2 is not in checkpointed state anyway and R3 might expect it to be, but at least here one component failed and there is a track of consequences. On the other hand R1 and R3 might be completely unrelevant to R2 and the fact that R2 failed shouldn't interfere. >> >> What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives? > >> On regular resources afterRestore is not called if the resource throws, it does not end up in the restoreQ. > > I'm puzzled how this is possible in the implementation before the patch. The code says it should > > List resources = checkpointQ. ... ; > for (Resource r : resources) { ... } > Collections.reverse(resources); > restoreQ = resources; > > > We'll have to document the expected behavior, but that is another concern. > >> I am a bit worried about calling R3.bC that would live in a different context when R1 is already restored. > > This is a question of specification. If we set that, it will be adopted by rearranging Contextes and Resources, pretty manageable. > > In general, resources should not have a lot of assumptions about other resources. Otherwise, they should directly refer to each other, rathen than relying on the Context. That will be more straightforward and less error-prone. > >> What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives? > > Correct, to run all registered code, that may also have some side-effects (unpreferable, but possible). Having that checkpoint won't happen anyway, false positive does not look a problem. True, checking again before the patch it was really calling it on resources even after failure - I had "fixed" that probably way before and remembered it that way. When a code is throwing it is aware of that, so it should either revert automatically or be considered completely broken. I would bring an analogy with a locking code - `lock()` method is executed outside of the try-finally block so if it fails it should keep the lock unlocked. The R1 and R3 issue - it is fixable, true, but it's the infra code that introduced the failure. User sees both R2 and R3 failures, decides to investigate R3 only to find that it was a red herring. > resources should not have a lot of assumptions about other resources JDK code is a direct contradiction to that. We shall not expect user code to be any different. > false positive does not look a problem Red herrings are UX problem. And it is a problem when the failure leaves the JVM in a state that does not allow inspection - not sure why but I had cases where neither `jstack` nor `kill -3` worked. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1175074513 From duke at openjdk.org Mon Apr 24 10:54:07 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 10:54:07 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 09:45:39 GMT, Anton Kozlov wrote: >> Generally speaking, your proposal to try performing the checkpoint until it succeeds is more permissive, as users can back-off from a bad/fragile design through retrying, rather than being forced to correct the dependencies between components. In the past it seemed to me that you were more on the "good design required" rather than "practical" side of things - please don't take this as any form of "ad hominem" argument, I would just like to understand why you opt for such solution here. In my view a timeout/IO error are a totally different category of error. > > The CheckpointException is a CRaC-specific, deliberately thrown exception to communicate the problem with the state to the requester of the checkpoint. The exception may suggest rewriting the code, or another attempt. We don't care a lot about the preceise meaning, we just need to deliver the exception. Although lesser number of exceptions is better of course. I realized that you actually asked for a different behaviour than what I thought initially, something like the `ConcurrentModificationException` while iterating a collection. That is not that bad, and we can add debug option to track where this happened. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1175114397 From duke at openjdk.org Mon Apr 24 13:27:30 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 13:27:30 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v9] In-Reply-To: References: Message-ID: > Tracks `java.io.FileDescriptor` instances as CRaC resource; before checkpoint these are reported and if not allow-listed (e.g. as opened as standard descriptors) an exception is thrown. Further investigation can use system property `jdk.crac.collect-fd-stacktraces=true` to record origin of those file descriptors. > File descriptors claimed in Java code are passed to native; native code checks all open file descriptors and reports error if there's an unexpected FD that is not included in the list passed previously. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Provide more information for file descriptors ------------- Changes: - all: https://git.openjdk.org/crac/pull/43/files - new: https://git.openjdk.org/crac/pull/43/files/8b51d32a..8cba86df Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=43&range=08 - incr: https://webrevs.openjdk.org/?repo=crac&pr=43&range=07-08 Stats: 360 lines in 7 files changed: 281 ins; 72 del; 7 mod Patch: https://git.openjdk.org/crac/pull/43.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/43/head:pull/43 PR: https://git.openjdk.org/crac/pull/43 From duke at openjdk.org Mon Apr 24 13:36:15 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 13:36:15 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 10:50:54 GMT, Radim Vansa wrote: >> The CheckpointException is a CRaC-specific, deliberately thrown exception to communicate the problem with the state to the requester of the checkpoint. The exception may suggest rewriting the code, or another attempt. We don't care a lot about the preceise meaning, we just need to deliver the exception. Although lesser number of exceptions is better of course. > > I realized that you actually asked for a different behaviour than what I thought initially, something like the `ConcurrentModificationException` while iterating a collection. That is not that bad, and we can add debug option to track where this happened. I figured out there is another problem with allowing the registration and failing afterwards: let's assume that you have two children contexts, C1 and C2. C1.beforeCheckpoint succeeds, and during C2.beforeCheckpoint you try to register something on C1. You suggest to allow this, but C1 won't be invoked and we cannot find that there's a new resource that should have failed the checkpoint - so it's kind of silently failing. The best you could do is to invoke the beforeCheckpoint on the resource when you find C1 to be done, but this is not what we discussed before. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1175291941 From duke at openjdk.org Mon Apr 24 13:52:26 2023 From: duke at openjdk.org (Radim Vansa) Date: Mon, 24 Apr 2023 13:52:26 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources [v2] In-Reply-To: References: Message-ID: > * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). > * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource > ** JDKResource can register resource with higher priority ** Other registrations are prohibited Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Fix docs & package ------------- Changes: - all: https://git.openjdk.org/crac/pull/60/files - new: https://git.openjdk.org/crac/pull/60/files/e801310e..fb7d4ea7 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=60&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=60&range=00-01 Stats: 26 lines in 7 files changed: 15 ins; 1 del; 10 mod Patch: https://git.openjdk.org/crac/pull/60.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/60/head:pull/60 PR: https://git.openjdk.org/crac/pull/60 From duke at openjdk.org Tue Apr 25 07:34:51 2023 From: duke at openjdk.org (Radim Vansa) Date: Tue, 25 Apr 2023 07:34:51 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v7] In-Reply-To: References: Message-ID: > This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Don't use SwitchPoints ------------- Changes: - all: https://git.openjdk.org/crac/pull/58/files - new: https://git.openjdk.org/crac/pull/58/files/984cb5d3..22e75f7c Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=58&range=06 - incr: https://webrevs.openjdk.org/?repo=crac&pr=58&range=05-06 Stats: 88 lines in 1 file changed: 26 ins; 58 del; 4 mod Patch: https://git.openjdk.org/crac/pull/58.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/58/head:pull/58 PR: https://git.openjdk.org/crac/pull/58 From duke at openjdk.org Tue Apr 25 11:57:42 2023 From: duke at openjdk.org (Radim Vansa) Date: Tue, 25 Apr 2023 11:57:42 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources [v2] In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Mon, 24 Apr 2023 13:33:31 GMT, Radim Vansa wrote: >> I realized that you actually asked for a different behaviour than what I thought initially, something like the `ConcurrentModificationException` while iterating a collection. That is not that bad, and we can add debug option to track where this happened. > > I figured out there is another problem with allowing the registration and failing afterwards: let's assume that you have two children contexts, C1 and C2. C1.beforeCheckpoint succeeds, and during C2.beforeCheckpoint you try to register something on C1. You suggest to allow this, but C1 won't be invoked and we cannot find that there's a new resource that should have failed the checkpoint - so it's kind of silently failing. The best you could do is to invoke the beforeCheckpoint on the resource when you find C1 to be done, but this is not what we discussed before. Thinking about this again, the current code does not prevent that from happening either, had JDK context registered something new in the global context this would be silently ignored. I can imagine that Core could keep a checkpoint counter and the Context would record last finished counter. During registration, if the current counter equals finished counter the registration could throw. However if we're to rollback the checkpoint, we would probably need to set a global flag in Core, too. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1176403873 From duke at openjdk.org Tue Apr 25 13:31:48 2023 From: duke at openjdk.org (Radim Vansa) Date: Tue, 25 Apr 2023 13:31:48 GMT Subject: [crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v7] In-Reply-To: References: Message-ID: <8gxVuJYZbRh97pKyhPvC3Wbi-TPb5OSksxyJeSEKzqg=.7fc4c33b-6573-463a-93c2-e93e816c4bb4@github.com> On Tue, 25 Apr 2023 07:34:51 GMT, Radim Vansa wrote: >> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Don't use SwitchPoints Okay, using just a volatile field does not change much. Anton suggested to have a look on the totally uncontended readers case; a single-threaded benchmark results (including the 'new' implementation that fakes an empty read-lock) look like this: Benchmark (blackhole) (impl) Mode Cnt Score Error Units InlinedCall.newImpl false N/A thrpt 3 2441232707.809 ? 138092251.983 ops/s InlinedCall.rcuLocked false N/A thrpt 3 1116925952.834 ? 105484443.430 ops/s InlinedCall.rwLocked false N/A thrpt 3 59334557.465 ? 267412.636 ops/s InlinedCall.unsync false N/A thrpt 3 2424408670.896 ? 696453915.897 ops/s VirtualCall.component false unsync thrpt 3 2439576011.016 ? 194572187.841 ops/s VirtualCall.component false rwlock thrpt 3 59518346.446 ? 91540.972 ops/s VirtualCall.component false rculock thrpt 3 1059312172.029 ? 91850879.226 ops/s VirtualCall.component false new thrpt 3 1845830506.910 ? 256628384.459 ops/s When I run this with 6 threads I get this: Benchmark (blackhole) (impl) Mode Cnt Score Error Units InlinedCall.newImpl false N/A thrpt 3 12031688167.608 ? 4218322551.532 ops/s InlinedCall.rcuLocked false N/A thrpt 3 5660447942.467 ? 237218291.900 ops/s InlinedCall.rwLocked false N/A thrpt 3 341089781.812 ? 65370977.211 ops/s InlinedCall.unsync false N/A thrpt 3 11648096223.269 ? 282528592.912 ops/s VirtualCall.component false unsync thrpt 3 11406362088.019 ? 1207457766.872 ops/s VirtualCall.component false rwlock thrpt 3 331446602.891 ? 10006346.435 ops/s VirtualCall.component false rculock thrpt 3 5213272454.145 ? 707041721.132 ops/s VirtualCall.component false new thrpt 3 8796928884.245 ? 2177441524.351 ops/s I've tried to see why the virtual invocation vs. inlined of `unsync` does not change while `new` has a significant difference, but I can't really tell after looking into `perfasm` results. I've also checked with disabled inlining of the entry method, and the result is quite different. It's hard to tell which version would be the 'right' one - I think that the advantage of RCU vs. RW lock is clear, and having a 2 fold slowdown vs. empty implementation isn't that bad. `perfasm` points most of the weight to those volatile reads. ------------- PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1521792720 From akozlov at azul.com Tue Apr 25 13:40:50 2023 From: akozlov at azul.com (Anton Kozlov) Date: Tue, 25 Apr 2023 16:40:50 +0300 Subject: On restore the "main" thread is started before the Resource's afterRestore has completed In-Reply-To: References: Message-ID: On 4/6/23 16:59, Christian Tzolov wrote: > Then given your original suggestion is it right to assume that the ?guard access to the resource? now should guard the ProcessorState not the ProcessorContext? I'm not sure about the difference between Context and State, as both look like a state Processor uses. For the discussed RWLock approach, it may look like [1]. That is, accessing the state needs to be done under Read lock. The beforeCheckpoint() locks the corresponding Write lock, thus blocking access to the state until the afterRestore releases the Write lock. Was ProcessorState intended for modification? Or in the example that means something that we don't control, or is too complex to control, so do we'd like to keep modifications limited to ProcessorContext? In this case, we need some synchronization between Processor and the ProcessorContext, like [2]. Do these options work for you? These are probably not the best approaches for the example, rather demo what we mean by RWLock. RCULock being implemented by Radim will reduce the overhead of the Read part of the lock, but the semantics will be the same. [1] https://github.com/tzolov/crac-demo/compare/main...AntonKozlov:crac-demo:main [2] https://github.com/tzolov/crac-demo/compare/main...AntonKozlov:crac-demo:proc-sync > And if this is true then how one would be able to identify all possible ?resources? to be guarded? Resource usually refers to something outside the Java program, a file, socket, or pipe,.. which represents a higher-level link of the program with the world. There is no real need to add CRaC Resource in this example. If you need that, I assume it's required because of links with the world. Which will be detected and reported by CRaC implementation if they exist at the checkpoint. Or that somehow relates to the program logic. Thanks, Anton From akozlov at openjdk.org Tue Apr 25 13:53:39 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Tue, 25 Apr 2023 13:53:39 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources [v2] In-Reply-To: References: <27W9LomvSWtxS6iElpOqQ9acC9WGqV5cEDNqfwHMS70=.d488d6ba-c91d-4b46-a218-a9d4aa4b1090@github.com> Message-ID: On Tue, 25 Apr 2023 11:54:32 GMT, Radim Vansa wrote: >> I figured out there is another problem with allowing the registration and failing afterwards: let's assume that you have two children contexts, C1 and C2. C1.beforeCheckpoint succeeds, and during C2.beforeCheckpoint you try to register something on C1. You suggest to allow this, but C1 won't be invoked and we cannot find that there's a new resource that should have failed the checkpoint - so it's kind of silently failing. The best you could do is to invoke the beforeCheckpoint on the resource when you find C1 to be done, but this is not what we discussed before. > > Thinking about this again, the current code does not prevent that from happening either, had JDK context registered something new in the global context this would be silently ignored. > I can imagine that Core could keep a checkpoint counter and the Context would record last finished counter. During registration, if the current counter equals finished counter the registration could throw. However if we're to rollback the checkpoint, we would probably need to set a global flag in Core, too. Yes, the current implementation allows new Resources registered during notification and silently ignored. I hope this change will deal with the problem somehow. > The best you could do is to invoke the beforeCheckpoint on the resource when you find C1 to be done, but this is not what we discussed before. This is an interesting approach, but this won't suit e.g. Global Context, as last registered Resource would need to be notified first. If notification is in progress, we've lost our chance. But I assume some Context implementation may opt to this, so we need to leave some room in the spec to allow such implementations. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1176549896 From duke at openjdk.org Wed Apr 26 15:22:23 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 26 Apr 2023 15:22:23 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources [v3] In-Reply-To: References: Message-ID: > * When Context.beforeCheckpoint throws, invoke Context.afterRestore anyway (otherwise some resources stay in suspended state). > * Handle Resource.beforeCheckpoint triggering a registration of another resource ** Do not cause deadlock when registering from another thread ** Global resource can register JDKResource > ** JDKResource can register resource with higher priority ** Other registrations are prohibited Radim Vansa has updated the pull request incrementally with two additional commits since the last revision: - More fine-grained synchronization - Rework context ordering (round 2) * call afterRestore even if beforeCheckpoint throws * registering resource in previous/running context does not trigger exception immediatelly ** instead this will be one of the recorded exceptions and the resource has a chance to fire next time * we don't guarantee threads not deadlocking when trying to register a resource, though ------------- Changes: - all: https://git.openjdk.org/crac/pull/60/files - new: https://git.openjdk.org/crac/pull/60/files/fb7d4ea7..1f2c7b39 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=60&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=60&range=01-02 Stats: 616 lines in 13 files changed: 338 ins; 161 del; 117 mod Patch: https://git.openjdk.org/crac/pull/60.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/60/head:pull/60 PR: https://git.openjdk.org/crac/pull/60 From duke at openjdk.org Wed Apr 26 15:30:58 2023 From: duke at openjdk.org (Radim Vansa) Date: Wed, 26 Apr 2023 15:30:58 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 13:52:26 GMT, Radim Vansa wrote: >> * keeps the original handling of exceptions: afterRestore is called even if beforeCheckpoint throws >> * allows to register a resource in a context that did not start beforeCheckpoint invocations yet >> * registering resource in previous/running context fails the checkpoint but does not trigger exception immediately >> * instead this will be one of the recorded exceptions and the resource has a chance to fire next time >> * allowed registration of resources can be invoked from other thread without deadlock; illegal registration can deadlock, though > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Fix docs & package @AntonKozlov It took some effort but I managed to implement the semantics you have requested. ------------- PR Comment: https://git.openjdk.org/crac/pull/60#issuecomment-1523597484 From duke at openjdk.org Thu Apr 27 08:23:27 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 27 Apr 2023 08:23:27 GMT Subject: [crac] RFR: RFC: -XX:CPUFeatures=0xnumber for CPU migration [v8] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:46:18 GMT, Jan Kratochvil wrote: >> I think in this PR we can concentrate on CPU features, as CPU core number is a different problem, that can arise even with the same feature set. >> >> As for ifunc change and Watcher Thread problems, this won't happen if we re-execute with GLIBC_TUNABLES. And re-execute will also resolve the maintainability concern. > >> I think in this PR we can concentrate on CPU features, as CPU core number is a different problem, that can arise even with the same feature set. > > I could split the patch but it is not testable/usable without the CPU count fix/hack. > > But I am now preparing the IFUNC patch for glibc upstreaming, whether it will be accepted or not. > > As long as you do not want a temporary solution in CRaC we can suspend this patch until glibc upstreaming gets resolved. Hi @jankratochvil , the checks still show some trouble with whitespaces (tabs). Could you also merge in the current `crac` branch - looks like this does not use the workflow including CRaC tests. ------------- PR Comment: https://git.openjdk.org/crac/pull/41#issuecomment-1525077214 From duke at openjdk.org Thu Apr 27 08:27:24 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 27 Apr 2023 08:27:24 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v5] In-Reply-To: References: Message-ID: > There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Ensure monotonicity for the same boot ------------- Changes: - all: https://git.openjdk.org/crac/pull/53/files - new: https://git.openjdk.org/crac/pull/53/files/5cc81961..725c6723 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=53&range=04 - incr: https://webrevs.openjdk.org/?repo=crac&pr=53&range=03-04 Stats: 35 lines in 2 files changed: 23 ins; 0 del; 12 mod Patch: https://git.openjdk.org/crac/pull/53.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/53/head:pull/53 PR: https://git.openjdk.org/crac/pull/53 From duke at openjdk.org Thu Apr 27 11:55:53 2023 From: duke at openjdk.org (Radim Vansa) Date: Thu, 27 Apr 2023 11:55:53 GMT Subject: [crac] RFR: Correct System.nanotime() value after restore [v6] In-Reply-To: References: Message-ID: > There are various places both inside JDK and in libraries that rely on monotonicity of `System.nanotime()`. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Use image under ghcr.io/crac ------------- Changes: - all: https://git.openjdk.org/crac/pull/53/files - new: https://git.openjdk.org/crac/pull/53/files/725c6723..c42768b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=53&range=05 - incr: https://webrevs.openjdk.org/?repo=crac&pr=53&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/53.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/53/head:pull/53 PR: https://git.openjdk.org/crac/pull/53 From akozlov at openjdk.org Thu Apr 27 17:29:54 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 27 Apr 2023 17:29:54 GMT Subject: [crac] RFR: CRaC related documentation in JDK classes using custom tag [v2] In-Reply-To: <3_zIXA73vhJ3ZI3j0GCrX2bYzObfi-hNlA7ptuTYL3o=.38ef3ea3-ae25-402e-b6c5-bd9ba42455d6@github.com> References: <3_zIXA73vhJ3ZI3j0GCrX2bYzObfi-hNlA7ptuTYL3o=.38ef3ea3-ae25-402e-b6c5-bd9ba42455d6@github.com> Message-ID: On Tue, 18 Apr 2023 16:06:58 GMT, Radim Vansa wrote: >> src/java.base/share/classes/java/lang/System.java line 793: >> >>> 791: * a resource and in the {@link javax.crac.Resource#afterRestore(javax.crac.Context) afterRestore method} >>> 792: * reload system properties, propagating any change. >>> 793: * >> >> The comment above is about standard system properties. >> >> Here we should say that system properties are updated after restore. The app can check the updated value in the afterRestore. > > Isn't that exactly what is written in the comment? Sorry, missed the comment. To highlight, I read the the it's being discouraged to change _standard_ system properties, which may be cached during the JDK implementation. Users can chose another set of assumptions, and with CRaC, they'd likely want for properties to be updated (to get more input from user, for more flexibile programs). So we don't want to color the statement and just want to inform the system properties are updated. And in the second statement, I propose change "should" -> "can" at least, documenting the possibilty rather than obligation. >> src/java.base/share/classes/java/security/SecureRandom.java line 366: >> >>> 364: * the {@link Security#getProviders() Security.getProviders()} method. >>> 365: * >>> 366: * @crac See provider documentation for details of behaviour after restore from a checkpoint. >> >> Shouldn't be a link here? > > You mean to the generic provider interface? Here I meant the implementation of the provider. I see, I was thinking you were refering something concrete, in that case it's better to have a `{@link ...}` to that. But it looks like "the checkpoint/restore behavior of the instance depends on the particular implementation". Is that correct? If so, worth to rephrase. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1179472639 PR Review Comment: https://git.openjdk.org/crac/pull/51#discussion_r1179484540 From duke at openjdk.org Fri Apr 28 07:41:54 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 28 Apr 2023 07:41:54 GMT Subject: [crac] RFR: Support updating MANAGEABLE JVM options during restore Message-ID: When a JVM option is MANAGEABLE it can be set at any time during runtime, therefore it is safe to change it during the restore operation. Rather than silently ignoring JVM options passed along with -XX:CRaCRestoreFrom we send them to the restored process and either update or print a warning if given option cannot be changed. ------------- Commit messages: - Support updating MANAGEABLE JVM options during restore Changes: https://git.openjdk.org/crac/pull/61/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=61&range=00 Stats: 135 lines in 10 files changed: 125 ins; 1 del; 9 mod Patch: https://git.openjdk.org/crac/pull/61.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/61/head:pull/61 PR: https://git.openjdk.org/crac/pull/61 From duke at openjdk.org Fri Apr 28 07:42:52 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 28 Apr 2023 07:42:52 GMT Subject: [crac] RFR: Support updating MANAGEABLE JVM options during restore In-Reply-To: References: Message-ID: <__diCB-AIoqJfdLtUHJ1oEdVLxsEfXPr3EMb8ajWH6Y=.0b93404e-ee58-47d3-ada8-a1d6a1cc7fb1@github.com> On Fri, 28 Apr 2023 07:24:06 GMT, Radim Vansa wrote: > When a JVM option is MANAGEABLE it can be set at any time during runtime, therefore it is safe to change it during the restore operation. Rather than silently ignoring JVM options passed along with -XX:CRaCRestoreFrom we send them to the restored process and either update or print a warning if given option cannot be changed. This is based on @AntonKozlov 's request to have a generic way of passing JVM options: https://github.com/openjdk/crac/pull/57#discussion_r1165535817 ------------- PR Comment: https://git.openjdk.org/crac/pull/61#issuecomment-1527112829 From duke at openjdk.org Fri Apr 28 13:41:52 2023 From: duke at openjdk.org (Radim Vansa) Date: Fri, 28 Apr 2023 13:41:52 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v9] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 13:27:30 GMT, Radim Vansa wrote: >> Tracks `java.io.FileDescriptor` instances as CRaC resource; before checkpoint these are reported and if not allow-listed (e.g. as opened as standard descriptors) an exception is thrown. Further investigation can use system property `jdk.crac.collect-fd-stacktraces=true` to record origin of those file descriptors. >> File descriptors claimed in Java code are passed to native; native code checks all open file descriptors and reports error if there's an unexpected FD that is not included in the list passed previously. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Provide more information for file descriptors src/java.base/share/classes/jdk/internal/crac/JDKContext.java line 62: > 60: String classpath = System.getProperty("java.class.path"); > 61: int index = 0; > 62: while (index >= 0) { Bug: looks like the update of `index` was dropped somewhere, this loop may run infinitely... ------------- PR Review Comment: https://git.openjdk.org/crac/pull/43#discussion_r1180410276 From akozlov at openjdk.org Fri Apr 28 14:47:23 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 28 Apr 2023 14:47:23 GMT Subject: [crac] RFR: Fix ordering of invocation on Resources [v3] In-Reply-To: References: Message-ID: <-aCCM-B8gVgwEFggEN0XsbkdxBZTh_-g5blSYd5AXN8=.6ddeed9f-a8bb-44b8-89e4-985b7b9e3143@github.com> On Wed, 26 Apr 2023 15:22:23 GMT, Radim Vansa wrote: >> * keeps the original handling of exceptions: afterRestore is called even if beforeCheckpoint throws >> * allows to register a resource in a context that did not start beforeCheckpoint invocations yet >> * registering resource in previous/running context fails the checkpoint but does not trigger exception immediately >> * instead this will be one of the recorded exceptions and the resource has a chance to fire next time >> * allowed registration of resources can be invoked from other thread without deadlock; illegal registration can deadlock, though > > Radim Vansa has updated the pull request incrementally with two additional commits since the last revision: > > - More fine-grained synchronization > - Rework context ordering (round 2) > > * call afterRestore even if beforeCheckpoint throws > * registering resource in previous/running context does not trigger exception immediatelly > ** instead this will be one of the recorded exceptions and the resource has a chance to fire next time > * we don't guarantee threads not deadlocking when trying to register a resource, though src/java.base/share/classes/jdk/crac/Core.java line 104: > 102: * Order of invoking {@link Resource#afterRestore(Context)} is the reverse > 103: * of the order of {@link Resource#beforeCheckpoint(Context) checkpoint notification}, > 104: * hence the same as the order of {@link Context#register(Resource) registration}. How about moving the Global Context description from the package level here (removing there). In javax.crac it should be fine to link to here IMO. src/java.base/share/classes/jdk/crac/Core.java line 118: > 116: public static synchronized boolean isRestoring() { > 117: return restoring; > 118: } Undocumented public API, I'm not sure it should be public. src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 42: > 40: for (Throwable t : suppressed) { > 41: Core.recordException(t); > 42: } Unwrap Checkpoint/RestoreException only? src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 55: > 53: restoreQ.add(resource); > 54: try { > 55: resource.beforeCheckpoint(semanticContext()); Does this mean a Resource may get another Context and not the one to which it has been registered? This may be very unexpected for the Resource implementation. src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 59: > 57: recordExceptions(e); > 58: } catch (Exception e) { > 59: Core.recordException(e); Why is there is the distinction? I think we should throw all exceptions from the context, rather than publishing them to a central store, otherwise the parent Context (if any), won't be able to do anything about those. src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 75: > 73: restoreQ = new ArrayList<>(); > 74: runBeforeCheckpoint(); > 75: Collections.reverse(restoreQ); Smelly code, restoreQ should be maintained either here or in runBeforeCheckpoint() src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java line 78: > 76: } > 77: > 78: protected abstract void runBeforeCheckpoint(); This is intended to be overwritten (becomes a part of the class interface). The intent behind the separate method is not evident. Corresponding runAfterRestore is private though. After AbstractContexImpl has lost parameter P and comparator, a distinction between AbstractContexImpl and OrderedContext has been lost. Merging AbstractContexImpl into OrderedContext likely will provided clearer code. src/java.base/share/classes/jdk/crac/impl/OrderedContext.java line 60: > 58: // It is possible that something registers to us during restore but before > 59: // this context's afterRestore was called. > 60: if (checkpointing && !Core.isRestoring()) { There is a small window between all beforeCheckpoint() are finished and checkpoint. In this window we'll call setModified(). An there is another window between restore and afterRestore() processing is started, where we'll won't call setModified(). Getting the exception or not will be a result of a race between checkpoint/restore (actual event with near-zero duration, without calling Resources) and registration. A Resource may also have an empty beforeCheckpoint() and some afterRestore() clean up. We'll register the resource for the next round of checkpoint/restore and will be silence about newly registered Resource. But since beforeCheckpoint() is empty, the original intent could be to do something useful on restore, which won't be done. src/java.base/share/classes/jdk/crac/impl/PriorityContext.java line 21: > 19: // CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > 20: // ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > 21: // POSSIBILITY OF SUCH DAMAGE. Use standard copyright please ------------- PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180390202 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180303950 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180311461 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180326193 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180320313 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180318502 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180322259 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180374418 PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1180313813 From akozlov at openjdk.org Fri Apr 28 15:31:23 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 28 Apr 2023 15:31:23 GMT Subject: [crac] RFR: Support updating MANAGEABLE JVM options during restore In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 07:24:06 GMT, Radim Vansa wrote: > When a JVM option is MANAGEABLE it can be set at any time during runtime, therefore it is safe to change it during the restore operation. Rather than silently ignoring JVM options passed along with -XX:CRaCRestoreFrom we send them to the restored process and either update or print a warning if given option cannot be changed. src/hotspot/os/linux/os_linux.cpp line 6557: > 6555: ::_restore_start_counter = hdr->_restore_counter; > 6556: > 6557: for (int i = 0; i < hdr->_nflags; i++) { This check can be done in the "bootstrap" process (the one that execs to CREngine): just to avoid restoring and finding out the problem. See the other comment about producing the error. src/hotspot/os/linux/os_linux.cpp line 6579: > 6577: } > 6578: if (result != JVMFlag::Error::SUCCESS) { > 6579: warning("VM Option '%s' cannot be changed, ignoring: %s", A significant set of options cannot be set on restore at the moment, so it will be even better to highlight they don't have effect and produce an error. It may be useful to revert back to warning (with e.g. an option), but by default it should be disabled (leading to the error) src/hotspot/share/runtime/globals.hpp line 2096: > 2094: /* It does not make sense to change this flag in runtime but we'll tag */ \ > 2095: /* it MANAGEABLE to prevent warnings when setting this on restore. */ \ > 2096: product(ccstr, CRaCRestoreFrom, NULL, MANAGEABLE, \ This is an example why we want "can be set on restore" (RESTOREBLE?) flag. So MANAGABLE will be implying RESTORABLE, but not every RESTORABLE will be MANAGEABLE. src/hotspot/share/runtime/globals.hpp line 2106: > 2104: "unavailable") \ > 2105: \ > 2106: product(ccstr, CRaCIgnoredFileDescriptors, NULL, MANAGEABLE, \ Having this is considered only on init, this also does not fit MANAGEABLE concept (there is no need to change that over management) ------------- PR Review Comment: https://git.openjdk.org/crac/pull/61#discussion_r1180515445 PR Review Comment: https://git.openjdk.org/crac/pull/61#discussion_r1180527485 PR Review Comment: https://git.openjdk.org/crac/pull/61#discussion_r1180504943 PR Review Comment: https://git.openjdk.org/crac/pull/61#discussion_r1180533487