On restore the "main" thread is started before the Resource's afterRestore has completed

Radim Vansa rvansa at azul.com
Thu Apr 13 14:20:22 UTC 2023


On 13. 04. 23 15:20, Dan Heidinga wrote:
> Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
>
> @Dan, this is very interesting!
> Could you please elaborate a bit further. Perhaps in the context of the CrackDemoExt.java sample?
>
> Let me think on that.  I'll see if I can pull something together that shows the api use.
>
> I put together a small example showing the use of SwitchPoint to toggle between phases: normal mode, beforeCheckpoint, afterRestore, normal mode. [0]
>
> In the CRaCPhase class, there are two methods that take Function arguments that allow the user to provide phase-specific behaviour:
> * beforeGuard which allows a switching from normal mode to checkpoint mode:https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15
>
> * aroundGuard which allows switching from normal mode to checkpoint mode and back to normal mode:https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28
>
> There's a use of this pattern in the "Test" class [1] which transitions from a regular get to a locked get.
>
> The ideas are all there though the code is a little unpleasant to work with due to the exception handling and general complexity of MethodHandles.
>
> Radim has an RCU lock that use Switchpoints as well though his API appears to be more pleasant for users:https://github.com/openjdk/crac/pull/58/files


I think that it's not only about nicer API; I think that your example 
does not prevent running Test.getSpecialValueRaw() and resource 
beforeCheckpoint/afterRestore concurrently - if one of the threads 
enters the Test.getSpecialValueRaw method there's nothing that would 
prevent calling beforeCheckpoint(). In other words, you'd need the 
special single-threaded mode.

While I've also used SwitchPoint as you suggested in my PR, can you tell 
what's the difference between just reading a volatile variable (and 
deciding based on the value) and using this class? It seems that it's 
used mostly in scripting support, so I could imagine the utility of 
generating a compact MethodHandle, but is there really any magic?

Radim


>
>
> [0]https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java
> [1]https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140
>
>
> --Dan
>
>
>
> Needs more exploration and prototyping but would provide a potential path to reasonable performance by burying the extra locking in the fallback paths.  And it would be a single pattern to optimize, rather than all the variations users could produce.
> --Dan
> [0]https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/
> [1]https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html
>
> Thank you,
>   - Christian
>
>
>
> Cheers,
>
> Radim
>
> [1]https://en.wikipedia.org/wiki/Read-copy-update
>
> On 03. 04. 23 22:30, Christian Tzolov wrote:
>> Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads.
>>
>> For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data.
>>
>> When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop.
>>
>> To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore.
>>
>> When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor.
>>
>> Thehttps://github.com/tzolov/crac-demo  illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md  ) offers terminal snapshots of the observed behavior.
>>
>> I've used latest JDK CRaC release:
>>     openjdk 17-crac 2021-09-14
>>     OpenJDK Runtime Environment (build 17-crac+5-19)
>>     OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing)
>>
>> As I'm new to CRaC, I'd appreciate your thoughts on this issue.
>>
>> Cheers,
>> Christian
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20230413/eb2ebdd7/attachment.htm>


More information about the crac-dev mailing list