On restore the "main" thread is started before the Resource's afterRestore has completed

Dan Heidinga heidinga at redhat.com
Wed Apr 5 20:40:29 UTC 2023


On Wed, Apr 5, 2023 at 12:01 PM Radim Vansa <rvansa at azul.com> wrote:

> Lot of interesting ideas, comments inline....
>
> On 05. 04. 23 16:28, Dan Heidinga wrote:
> > Hi Radim,
> >
> > Thanks for the write up of the various options in this space.
> >
> > On Tue, Apr 4, 2023 at 2:49 AM Radim Vansa <rvansa at azul.com<mailto:
> rvansa at azul.com>> wrote:
> > Hi Christian,
> >
> > I believe this is a common problem when porting existing architecture
> > under CRaC; the obvious solution is to guard access to the resource
> > (ProcessorContext in this case) with a RW lock that'd be read-acquired
> > by 'regular' access and acquired for write in beforeCheckpoint/released
> > in afterRestore. However this introduces extra synchronization (at least
> > in form of volatile writes) even in case that C/R is not used at all,
> > especially if the support is added into libraries.
> >
> > I've seen variations of this approach go by in code reviews but have we
> written up a good example of how to do this well?  Having a canonical
> pattern would help to highlight the best way to do it today and make the
> tradeoffs explicit.
>
> TBH I am rather shy to demo a solution that's quite imperfect unless we
> find that there's no way around that.
>

That's fair.  Sometimes it's easier for people to start from something and
modify it to be a better pattern than to have to generate it from scratch.

>
>
> >
> >
> > Anton Kozlov proposed techniques like RCU [1] but at this point there's
> > no support for this in Java. Even the Linux implementation might require
> > some additional properties from the code in critical (read) section like
> > not calling any blocking code; this might be too limiting.
> >
> > The situation is simpler if the application uses a single threaded
> > event-loop; beforeCheckpoint can enqueue a task that would, upon its
> > execution, block on a primitive and notify the C/R notification thread
> > that it may now deinit the resource; in afterRestore the resource is
> > initialized and the eventloop is unblocked. This way we don't impose any
> > extra overhead when C/R is happening.
> >
> > That's a nice idea!
> >
> >
> > To avoid extra synchronization it could be technically possible to
> > modify CRaC implementation to keep all other threads frozen during
> > restore. There's a risk of some form of deadlock if the thread
> > performing C/R would require other threads to progress, though, so any
> > such solution would require extra thoughts. Besides, this does not
> > guarantee exclusivity so the afterRestore would need to restore the
> > resource to the *exactly* same state (as some of its before-checkpoint
> > state might have leaked to the thread in Processor). In my opinion this
> > is not the best way.
> >
> > This is the approach that OpenJ9 took to solve the consistency problems
> introduced by updating resources before / after checkpoints.  OpenJ9 enters
> "single threaded mode" when creating the checkpoint and executing the
> before checkkpoint fixups.  On restore, it continues in single-threaded
> mode while executing the after checkpoint fixups.  This makes it easier to
> avoid additional runtime costs related to per-resource locking for
> checkpoints, but complicates locking and wait/notify in general.
> >
> > This means a checkpoint hook operation can't wait on another thread
> (would block indefinitely as other threads are paused), can't wait on a
> lock being held by another thread (again, would deadlock), and sending
> notify may result in inconsistent behaviour (wrong number of notifies
> received by other threads).  See "The checkpointJVM() API" section of their
> blog post on CRIU for more details [0].
>
> Great post, I should probably go through the whole blog. I think that
> the single-threaded mode is conceptually simple to think about and with
> the @NotCheckpointSafe annotation deals well with the issue. Have you
> run into any edge cases where this doesn't work well? For example I've
> seen a deadlock because in beforeCheckpoint something was supposed to
> run in the reference cleaner thread.
>

I think we're still playing whack-a-mole with the places that may
need @NotCheckpointSafe.  There's no good static analysis that will
indicate where the annotation is needed so we're finding places to add it
as we go.  Examples include: MethodType interning - if a halted thread was
interning a MethodType, and the checkpoint thread needs to resolve a
MethodType, we deadlock.  Also, ClassValue::initializeMap(),
ClassSpecializer::findSpecies, ConcurrenthashMap::computeIfAbsent.

The more places we add the annotation though, the harder it will be to find
a "safe" point to take the checkpoint without a livelock.  It's a mostly
working bandaid rather than a general solution but it still might be good
enough to make most applications work.


>
> Also, did you run any tests to see the performance impact of your
> changes to wait/notify? Do you also have to tweak higher-level
> synchronization such as java.util.concurrent.*?
>


The OpenJ9 locking code is quite different so results may vary.  I don't
recall any measurable impact due to the wait/notify changes as they already
tend to be slow paths.

I'm not aware of any changes to the j.u.c synchronization classes.  As I
said, whack-a-mole and I don't think this "mole" has poked its head up yet.


>
>
> >
> >
> > The problem with RCU is tracking which threads are in the critical
> > section. I've found RCU-like implementations for Java that avoid
> > excessive overhead using a spread out array - each thread marks
> > entering/leaving the critical section by writes to its own counter,
> > preventing cache ping-pong (assuming no false sharing). Synchronizer
> > thread uses another flag to request synchronization; reading this by
> > each thread is not totally without cost but reasonably cheap, and in
> > that case worker threads can enter a blocking slow path. The simple
> > implementation assumes a fixed number of threads; if the list of threads
> > is dynamic the solution would be probably more complicated. It might
> > also make sense to implement this in native code with a per-CPU
> > counters, rather than per-thread. A downside, besides some overhead in
> > terms of both cycles and memory usage, is that we'd need to modify the
> > code and explicitly mark the critical sections.
> >
> > Another solution could try to leverage existing JVM mechanics for code
> > deoptimization, replacing the critical sections with a slower, blocking
> > stub, and reverting back after restore. Or even independently requesting
> > a safe-point and inspecting stack of threads until the synchronization
> > is possible.
> >
> > This will have a high risk of livelock.  The OpenJ9 experience
> implementing single-threaded mode for CRIU indicates there are a lot of
> strange locking patterns in the world.
>
> There are weird patterns but a more fine-grained exclusivity should be
> more permissive than single-threaded mode which works as an implicit Big
> Fat Lock. Yes, any problems are probably much better reproducible with
> that one.
>
> >
> >
> > So I probably can't offer a ready-to-use performant solution; pick your
> > poison. The future, though, offers a few possibilities and I'd love to
> > hear others' opinions about which one would look the most feasible.
> > Because unless we offer something that does not harm a no-CRaC use-case
> > I am afraid that the adoption will be quite limited.
> >
> > Successful solutions will push the costs into the checkpoint / restore
> paths as much as possible.  Going back to the explicit lock mechanism you
> first mentioned, I wonder if there's a role for
> java.lang.invoke.Switchpoint [1] here?  Switchpoint was added as a tool for
> language implementers that wanted to be able speculate on a particular
> condition (ie: CHA assumptions) and get the same kind of low cost state
> change that existing JITTED code gets.  I'm not sure how well that vision
> worked in practice or how well Hotspot optimizes it yet, but this might be
> a reason to push on its performance.
> >
> > Roughly the idea would be to add a couple of Switchpoints to
> jdk.crac.Core:
> >
> >     public SwitchPoint getBeforeSwitchpoint();
> >     public SwitchPoint getAfterSwitchpoint();
> >
> > and users could then write their code using MethodHandles to
> implementing the branching logic:
> >
> >      MethodHandle normalPath = ...... // existing code
> >      MethodHandle fallbackPath = ..... // before Checkpoint extra work
> >      MethodHandle guardWithTest =
> getBeforeSwitchPoint.guardWithTest(normalPath, fallbackPath);
> >
> > and the jdk.crac.Core class would invalidate the "before" SwitchPoint
> prior to the checkpoint and "after" one after the restore.  Aside from the
> painful programming model, this might give us the tools we need to make it
> performant.
> >
> > Needs more exploration and prototyping but would provide a potential
> path to reasonable performance by burying the extra locking in the fallback
> paths.  And it would be a single pattern to optimize, rather than all the
> variations users could produce.
>
> Well noted, I admit that I haven't heard about SwitchPoint before, I'll
> need some time to ingest it and maybe write a JMH tests to see. However
> from the first look it is not something that would be too convenient for
> users, I would consider some form of interception introduced through
> annotations (yes, I come from the EE world).
>

Not a fan of annotations in the long term.  They are hard to maintain as
the underlying implementation changes / is refactored.  Even something as
straightforward as the @CallerSensitive annotation is tough to maintain.
These would be a lot harder as they would apply in many more circumstances.

--Dan


>
> Thanks!
>
> Radim
>
>
> >
> > --Dan
> >
> >
> > [0]
> https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/
> > [1]
> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html
> >
> > Cheers,
> >
> > Radim
> >
> > [1] https://en.wikipedia.org/wiki/Read-copy-update
> >
> > On 03. 04. 23 22:30, Christian Tzolov wrote:
> >> Hi, I'm testing CRaC in the context of long-running applications (e.g.
> streaming, continuous processing ...) and I've stumbled on an issue related
> to the coordination of the resolved threads.
> >>
> >> For example, let's have a Processor that performs continuous
> computations. This processor depends on a ProcessorContext and later must
> be fully initialized before the processor can process any data.
> >>
> >> When the application is first started (e.g. not from checkpoints) it
> ensures that the ProcessorContext is initialized before starting the
> Processor loop.
> >>
> >> To leverage CRaC I've implemented a ProcessorContextResource gracefully
> stops the context on beforeCheckpoint and then re-initialized it on
> afterRestore.
> >>
> >> When the checkpoint is performed, CRaC calls the
> ProcessorContextResource.beforeCheckpoint and also preserves the current
> Processor call stack. On Restore processor's call stack is expectedly
> restored at the point it was stopped but unfortunately it doesn't wait for
> the ProcessorContextResource.afterRestore complete. This expectedly crashes
> the processor.
> >>
> >> The https://github.com/tzolov/crac-demo illustreates this issue. The
> README explains how to reproduce the issue. The OUTPUT.md (
> https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal
> snapshots of the observed behavior.
> >>
> >> I've used latest JDK CRaC release:
> >>     openjdk 17-crac 2021-09-14
> >>     OpenJDK Runtime Environment (build 17-crac+5-19)
> >>     OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing)
> >>
> >> As I'm new to CRaC, I'd appreciate your thoughts on this issue.
> >>
> >> Cheers,
> >> Christian
> >>
> >>
> >>
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20230405/13465bc3/attachment-0001.htm>


More information about the crac-dev mailing list