On restore the "main" thread is started before the Resource's afterRestore has completed

Dan Heidinga heidinga at redhat.com
Thu Apr 13 13:20:41 UTC 2023


>
>> @Dan, this is very interesting!
>> Could you please elaborate a bit further. Perhaps in the context of the
>> CrackDemoExt.java sample?
>>
>
> Let me think on that.  I'll see if I can pull something together that
> shows the api use.
>

I put together a small example showing the use of SwitchPoint to toggle
between phases: normal mode, beforeCheckpoint, afterRestore, normal mode.
[0]

In the CRaCPhase class, there are two methods that take Function arguments
that allow the user to provide phase-specific behaviour:
* beforeGuard which allows a switching from normal mode to checkpoint mode:
https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15

* aroundGuard which allows switching from normal mode to checkpoint mode
and back to normal mode:
https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28

There's a use of this pattern in the "Test" class [1] which transitions
from a regular get to a locked get.

The ideas are all there though the code is a little unpleasant to work with
due to the exception handling and general complexity of MethodHandles.

Radim has an RCU lock that use Switchpoints as well though his API appears
to be more pleasant for users: https://github.com/openjdk/crac/pull/58/files


[0]
https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java
[1]
https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140


>
> --Dan
>
>
>>
>>
>>>
>>> Needs more exploration and prototyping but would provide a potential
>>> path to reasonable performance by burying the extra locking in the fallback
>>> paths.  And it would be a single pattern to optimize, rather than all the
>>> variations users could produce.
>>> --Dan
>>> [0]
>>> https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/
>>> [1]
>>> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html
>>>
>>
>> Thank you,
>>  - Christian
>>
>>
>>
>>>
>>>> Cheers,
>>>>
>>>> Radim
>>>>
>>>> [1] https://en.wikipedia.org/wiki/Read-copy-update
>>>>
>>>> On 03. 04. 23 22:30, Christian Tzolov wrote:
>>>> > Hi, I'm testing CRaC in the context of long-running applications
>>>> (e.g. streaming, continuous processing ...) and I've stumbled on an issue
>>>> related to the coordination of the resolved threads.
>>>> >
>>>> > For example, let's have a Processor that performs continuous
>>>> computations. This processor depends on a ProcessorContext and later must
>>>> be fully initialized before the processor can process any data.
>>>> >
>>>> > When the application is first started (e.g. not from checkpoints) it
>>>> ensures that the ProcessorContext is initialized before starting the
>>>> Processor loop.
>>>> >
>>>> > To leverage CRaC I've implemented a ProcessorContextResource
>>>> gracefully stops the context on beforeCheckpoint and then re-initialized it
>>>> on afterRestore.
>>>> >
>>>> > When the checkpoint is performed, CRaC calls the
>>>> ProcessorContextResource.beforeCheckpoint and also preserves the current
>>>> Processor call stack. On Restore processor's call stack is expectedly
>>>> restored at the point it was stopped but unfortunately it doesn't wait for
>>>> the ProcessorContextResource.afterRestore complete. This expectedly crashes
>>>> the processor.
>>>> >
>>>> > The https://github.com/tzolov/crac-demo illustreates this issue. The
>>>> README explains how to reproduce the issue. The OUTPUT.md (
>>>> https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md ) offers
>>>> terminal snapshots of the observed behavior.
>>>> >
>>>> > I've used latest JDK CRaC release:
>>>> >    openjdk 17-crac 2021-09-14
>>>> >    OpenJDK Runtime Environment (build 17-crac+5-19)
>>>> >    OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing)
>>>> >
>>>> > As I'm new to CRaC, I'd appreciate your thoughts on this issue.
>>>> >
>>>> > Cheers,
>>>> > Christian
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20230413/38593b8d/attachment.htm>


More information about the crac-dev mailing list