On restore the "main" thread is started before the Resource's afterRestore has completed

Christian Tzolov christian.tzolov at gmail.com
Wed Apr 5 12:05:50 UTC 2023


Hi Radim,

(Unfortunately, the mailing list didn't deliver the original message nor
the replay. So i'm reposting with Re instead)

The distributed process synchronization is a hard topic and, IMO, CRaC
should at least offer some help with it. For example if the primary goal is
to provide a functional warmed up clone of the application (as opposed to
data replication), then perhaps we can relax the data consistency as long
as the application and its components are up and running in the right order.

Then approaches such as:

> To avoid extra synchronization it could be technically possible to
> modify CRaC implementation to keep all other threads frozen during
> restore.
or
> Another solution could try to leverage existing JVM mechanics for code
> deoptimization, replacing the critical sections with a slower, blocking
> stub, and reverting back after restore. Or even independently requesting
> a safe-point and inspecting stack of threads until the synchronization
> is possible.

would be a workable solution, despite the inconsistencies they may
introduce.

I agree with your observation that:
> unless we offer something that does not harm a no-CRaC use-case
> I am afraid that the adoption will be quite limited.

I wrongly assumed that CRaC provides some process synchronization
mechanism, while in reality it imposes a new programming model that leaves
the synchronization tasks to the end developers.
Then the complexity of using CRaC with existing applications would be an
order of magnitude higher compared to making the same applications GraalVM
compliant.

Cheers,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20230405/f5a7db97/attachment.htm>


More information about the crac-dev mailing list