On restore the "main" thread is started before the Resource's afterRestore has completed

Thu Apr 6 14:29:02 UTC 2023

Hi, comments inline...

On 06. 04. 23 15:59, Christian Tzolov wrote:
>
> 	
> Caution: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender 
> and know the content is safe.
>
>
> Hi Dan and Radim,
>
>
> Thanks for the feedback and suggestions!
>
> It is the first time I’m facing the java.lang.invoke.* API and it 
> might take some time to wrap my head around it.
>
> So be prepared plese for lame questions, as those inlined below.
>
>
> On Wed, Apr 5, 2023 at 4:28 PM Dan Heidinga <heidinga at redhat.com> wrote:
>
>     Hi Radim,
>
>     Thanks for the write up of the various options in this space.
>
>     On Tue, Apr 4, 2023 at 2:49 AM Radim Vansa <rvansa at azul.com> wrote:
>
>         Hi Christian,
>
>         I believe this is a common problem when porting existing
>         architecture
>         under CRaC; the obvious solution is to guard access to the
>         resource
>         (ProcessorContext in this case) with a RW lock that'd be
>         read-acquired
>         by 'regular' access and acquired for write in
>         beforeCheckpoint/released
>         in afterRestore. However this introduces extra synchronization
>         (at least
>         in form of volatile writes) even in case that C/R is not used
>         at all,
>         especially if the support is added into libraries.
>
>
>     I've seen variations of this approach go by in code reviews but
>     have we written up a good example of how to do this well?  Having
>     a canonical pattern would help to highlight the best way to do it
>     today and make the tradeoffs explicit.
>
>
> @Radim, your “guard access” suggestion made me realise that perhaps 
> I’ve oversimplified my  sample.
>
> So I’ve modified it a bit: 
> https://github.com/tzolov/crac-demo/blob/main/src/main/java/com/example/crac/CrackDemoExt.java 
> by introducing a new ProcessorState used by the Processor for its 
> computation.
> At the same time I’ve removed the direct Processor dependency on the 
> ProcessorContext. Instead the ProcessorContext is responsible for 
> managing the lifecycle of the ProcessorState before the Processor can 
> use it.
> Then given your original suggestion is it right to assume that the 
> “guard access to the resource” now should guard the ProcessorState not 
> the ProcessorContext?
> And if this is true then how one would be able to identify all 
> possible “resources” to be guarded?

It seems that the separation between Context and State is a bit 
artificial, but anyway... Context here would hold a RW lock, 
write-locked in constructor. At the end of start() method it would 
unlock it, and at the beginning of stop() it would lock it. In your case 
the Processor uses that state directly, rather than through Context - 
that gives you no place to put the read lock. Instead, it should be 
delegated through Context that would read-lock it before useState() and 
unlock afterwards.

>         Anton Kozlov proposed techniques like RCU [1] but at this
>         point there's
>         no support for this in Java. Even the Linux implementation
>         might require
>         some additional properties from the code in critical (read)
>         section like
>         not calling any blocking code; this might be too limiting.
>
>         The situation is simpler if the application uses a single
>         threaded
>         event-loop; beforeCheckpoint can enqueue a task that would,
>         upon its
>         execution, block on a primitive and notify the C/R
>         notification thread
>         that it may now deinit the resource; in afterRestore the
>         resource is
>         initialized and the eventloop is unblocked. This way we don't
>         impose any
>         extra overhead when C/R is happening.
>
>
>     That's a nice idea!
>
>
>         To avoid extra synchronization it could be technically
>         possible to
>         modify CRaC implementation to keep all other threads frozen
>         during
>         restore. There's a risk of some form of deadlock if the thread
>         performing C/R would require other threads to progress,
>         though, so any
>         such solution would require extra thoughts. Besides, this does
>         not
>         guarantee exclusivity so the afterRestore would need to
>         restore the
>         resource to the *exactly* same state (as some of its
>         before-checkpoint
>         state might have leaked to the thread in Processor). In my
>         opinion this
>         is not the best way.
>
>
>     This is the approach that OpenJ9 took to solve the consistency
>     problems introduced by updating resources before / after
>     checkpoints.  OpenJ9 enters "single threaded mode" when creating
>     the checkpoint and executing the before checkkpoint fixups.  On
>     restore, it continues in single-threaded mode while executing the
>     after checkpoint fixups.  This makes it easier to avoid additional
>     runtime costs related to per-resource locking for checkpoints, but
>     complicates locking and wait/notify in general.
>
>     This means a checkpoint hook operation can't wait on another
>     thread (would block indefinitely as other threads are paused),
>     can't wait on a lock being held by another thread (again, would
>     deadlock), and sending notify may result in inconsistent behaviour
>     (wrong number of notifies received by other threads).  See "The
>     checkpointJVM() API" section of their blog post on CRIU for more
>     details [0].
>
>
> The "single thread mode", imo, corresponds to the 
> "serializable isolation" approach in data processing and DB transactions.

Serialization into one thread is one way to achieve serializable 
isolation, but there are different strategies too. Though beware that no 
major database nowadays supports strict serializable isolation (even if 
it calls some mode serializable) - can't find a proper link to show, and 
I am digressing anyway.

> The OpenJ9 blogs are very informative and like the jdk invoke API 
> would need time to digest.
> But I have one conceptual question. What part of this should/cloud be 
> implemented by the CRaC inself and what abstractions should be exposed 
> to the CRaC users?

CRaC users need to be aware that their task is to clean up before 
checkpoint. Ideally if they use a library this should do it 
transparently to any practical extent. Anything beyond is just utilities 
provided, you need to dig it so here's (hopefully appropriate) spade.

Radim

>
>
>         The problem with RCU is tracking which threads are in the
>         critical
>         section. I've found RCU-like implementations for Java that avoid
>         excessive overhead using a spread out array - each thread marks
>         entering/leaving the critical section by writes to its own
>         counter,
>         preventing cache ping-pong (assuming no false sharing).
>         Synchronizer
>         thread uses another flag to request synchronization; reading
>         this by
>         each thread is not totally without cost but reasonably cheap,
>         and in
>         that case worker threads can enter a blocking slow path. The
>         simple
>         implementation assumes a fixed number of threads; if the list
>         of threads
>         is dynamic the solution would be probably more complicated. It
>         might
>         also make sense to implement this in native code with a per-CPU
>         counters, rather than per-thread. A downside, besides some
>         overhead in
>         terms of both cycles and memory usage, is that we'd need to
>         modify the
>         code and explicitly mark the critical sections.
>
>         Another solution could try to leverage existing JVM mechanics
>         for code
>         deoptimization, replacing the critical sections with a slower,
>         blocking
>         stub, and reverting back after restore. Or even independently
>         requesting
>         a safe-point and inspecting stack of threads until the
>         synchronization
>         is possible.
>
>
>     This will have a high risk of livelock.  The OpenJ9 experience
>     implementing single-threaded mode for CRIU indicates there are a
>     lot of strange locking patterns in the world.
>
>
>         So I probably can't offer a ready-to-use performant solution;
>         pick your
>         poison. The future, though, offers a few possibilities and I'd
>         love to
>         hear others' opinions about which one would look the most
>         feasible.
>         Because unless we offer something that does not harm a no-CRaC
>         use-case
>         I am afraid that the adoption will be quite limited.
>
>
>     Successful solutions will push the costs into the checkpoint /
>     restore paths as much as possible. Going back to the explicit lock
>     mechanism you first mentioned, I wonder if there's a role for
>     java.lang.invoke.Switchpoint [1] here?  Switchpoint was added as a
>     tool for language implementers that wanted to be able speculate on
>     a particular condition (ie: CHA assumptions) and get the same kind
>     of low cost state change that existing JITTED code gets.  I'm not
>     sure how well that vision worked in practice or how well Hotspot
>     optimizes it yet, but this might be a reason to push on its
>     performance.
>
>     Roughly the idea would be to add a couple of Switchpoints to
>     jdk.crac.Core:
>
>        public SwitchPoint getBeforeSwitchpoint();
>        public SwitchPoint getAfterSwitchpoint();
>
>     and users could then write their code using MethodHandles to
>     implementing the branching logic:
>
>         MethodHandle normalPath = ...... // existing code
>         MethodHandle fallbackPath = ..... // before Checkpoint extra work
>         MethodHandle guardWithTest =
>     getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath);
>
>     and the jdk.crac.Core class would invalidate the "before"
>     SwitchPoint prior to the checkpoint and "after" one after the
>     restore.  Aside from the painful programming model, this might
>     give us the tools we need to make it performant.
>
>
> @Dan, this is very interesting!
> Could you please elaborate a bit further. Perhaps in the context of 
> the CrackDemoExt.java sample?
>
>
>     Needs more exploration and prototyping but would provide a
>     potential path to reasonable performance by burying the extra
>     locking in the fallback paths. And it would be a single pattern to
>     optimize, rather than all the variations users could produce.
>     --Dan
>     [0]
>     https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/
>     [1]
>     https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html
>
>
> Thank you,
>  - Christian
>
>
>         Cheers,
>
>         Radim
>
>         [1] https://en.wikipedia.org/wiki/Read-copy-update
>
>         On 03. 04. 23 22:30, Christian Tzolov wrote:
>         > Hi, I'm testing CRaC in the context of long-running
>         applications (e.g. streaming, continuous processing ...) and
>         I've stumbled on an issue related to the coordination of the
>         resolved threads.
>         >
>         > For example, let's have a Processor that performs continuous
>         computations. This processor depends on a ProcessorContext and
>         later must be fully initialized before the processor can
>         process any data.
>         >
>         > When the application is first started (e.g. not from
>         checkpoints) it ensures that the ProcessorContext is
>         initialized before starting the Processor loop.
>         >
>         > To leverage CRaC I've implemented a ProcessorContextResource
>         gracefully stops the context on beforeCheckpoint and then
>         re-initialized it on afterRestore.
>         >
>         > When the checkpoint is performed, CRaC calls the
>         ProcessorContextResource.beforeCheckpoint and also preserves
>         the current Processor call stack. On Restore processor's call
>         stack is expectedly restored at the point it was stopped but
>         unfortunately it doesn't wait for the
>         ProcessorContextResource.afterRestore complete. This
>         expectedly crashes the processor.
>         >
>         > The https://github.com/tzolov/crac-demo illustreates this
>         issue. The README explains how to reproduce the issue. The
>         OUTPUT.md
>         (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md )
>         offers terminal snapshots of the observed behavior.
>         >
>         > I've used latest JDK CRaC release:
>         >    openjdk 17-crac 2021-09-14
>         >    OpenJDK Runtime Environment (build 17-crac+5-19)
>         >    OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode,
>         sharing)
>         >
>         > As I'm new to CRaC, I'd appreciate your thoughts on this issue.
>         >
>         > Cheers,
>         > Christian
>         >
>         >
>         >
>         >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20230406/9c4e8000/attachment-0001.htm>