On restore the "main" thread is started before the Resource's afterRestore has completed

Tue Apr 4 06:48:44 UTC 2023

Hi Christian,

I believe this is a common problem when porting existing architecture 
under CRaC; the obvious solution is to guard access to the resource 
(ProcessorContext in this case) with a RW lock that'd be read-acquired 
by 'regular' access and acquired for write in beforeCheckpoint/released 
in afterRestore. However this introduces extra synchronization (at least 
in form of volatile writes) even in case that C/R is not used at all, 
especially if the support is added into libraries.

Anton Kozlov proposed techniques like RCU [1] but at this point there's 
no support for this in Java. Even the Linux implementation might require 
some additional properties from the code in critical (read) section like 
not calling any blocking code; this might be too limiting.

The situation is simpler if the application uses a single threaded 
event-loop; beforeCheckpoint can enqueue a task that would, upon its 
execution, block on a primitive and notify the C/R notification thread 
that it may now deinit the resource; in afterRestore the resource is 
initialized and the eventloop is unblocked. This way we don't impose any 
extra overhead when C/R is happening.

To avoid extra synchronization it could be technically possible to 
modify CRaC implementation to keep all other threads frozen during 
restore. There's a risk of some form of deadlock if the thread 
performing C/R would require other threads to progress, though, so any 
such solution would require extra thoughts. Besides, this does not 
guarantee exclusivity so the afterRestore would need to restore the 
resource to the *exactly* same state (as some of its before-checkpoint 
state might have leaked to the thread in Processor). In my opinion this 
is not the best way.

The problem with RCU is tracking which threads are in the critical 
section. I've found RCU-like implementations for Java that avoid 
excessive overhead using a spread out array - each thread marks 
entering/leaving the critical section by writes to its own counter, 
preventing cache ping-pong (assuming no false sharing). Synchronizer 
thread uses another flag to request synchronization; reading this by 
each thread is not totally without cost but reasonably cheap, and in 
that case worker threads can enter a blocking slow path. The simple 
implementation assumes a fixed number of threads; if the list of threads 
is dynamic the solution would be probably more complicated. It might 
also make sense to implement this in native code with a per-CPU 
counters, rather than per-thread. A downside, besides some overhead in 
terms of both cycles and memory usage, is that we'd need to modify the 
code and explicitly mark the critical sections.

Another solution could try to leverage existing JVM mechanics for code 
deoptimization, replacing the critical sections with a slower, blocking 
stub, and reverting back after restore. Or even independently requesting 
a safe-point and inspecting stack of threads until the synchronization 
is possible.

So I probably can't offer a ready-to-use performant solution; pick your 
poison. The future, though, offers a few possibilities and I'd love to 
hear others' opinions about which one would look the most feasible. 
Because unless we offer something that does not harm a no-CRaC use-case 
I am afraid that the adoption will be quite limited.

Cheers,

Radim

[1] https://en.wikipedia.org/wiki/Read-copy-update

On 03. 04. 23 22:30, Christian Tzolov wrote:
> Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads.
>
> For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data.
>
> When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop.
>
> To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore.
>
> When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor.
>
> The https://github.com/tzolov/crac-demo illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers terminal snapshots of the observed behavior.
>
> I've used latest JDK CRaC release:
>    openjdk 17-crac 2021-09-14
>    OpenJDK Runtime Environment (build 17-crac+5-19)
>    OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing)
>
> As I'm new to CRaC, I'd appreciate your thoughts on this issue.
>
> Cheers,
> Christian
>
>
>
>