[crac] RFR: Fix ordering of invocation on Resources

Mon Apr 24 08:49:31 UTC 2023

On Mon, 24 Apr 2023 08:24:16 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

>> Right now we run the checkpoint for all resources even if some fail. Are you fine with erroring out on the first exception?
>
> Why is that necessary? If we are trying to avoid second afterRestore for sucessfully checkpointed Resources, that would mean that Context.afterRestore is called, that controverts the problem definition.
> 
> Assuming afterRestore is not called for the Resource with failed beforeCheckpoint. Suppose we have R1, R2 in a child Context, and R1.beforeCheckpoint succeed, and R2.beforeCheckpoint fails. We can call R1.afterRestore, before throwing from Ctx.beforeCheckpoint. That will restore successfully checkpointed resources.
> 
> But I checked the code and it seems that we do call afterRestore regardless of the result of beforeCheckpoint 
> 
> https://github.com/openjdk/crac/blob/95394e84683f1a816c0283f8c834072324516fba/src/java.base/share/classes/jdk/crac/impl/AbstractContextImpl.java#L54
> 
> Every Resource available is collected in List resources and then restore queue is exactly the reverse of that list, it does not matter if any of resources throwed CheckpointException. Right?

On regular resources `afterRestore` is not called if the resource throws, it does not end up in the restoreQ.

I am a bit worried about calling `R3.bC` that would live in a different context when `R1` is already restored. The components were called in certain order for a reason, and if R3 expect R1 to be checkpointed this could end up badly.
You might argue that R2 is not in checkpointed state anyway and R3 might expect it to be, but at least here one component failed and there is a track of consequences. On the other hand R1 and R3 might be completely unrelevant to R2 and the fact that R2 failed shouldn't interfere.

What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives?

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174972609