[crac] RFR: Fix ordering of invocation on Resources

Mon Apr 24 10:15:17 UTC 2023

On Mon, 24 Apr 2023 09:24:49 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

>> On regular resources `afterRestore` is not called if the resource throws, it does not end up in the restoreQ.
>> 
>> I am a bit worried about calling `R3.bC` that would live in a different context when `R1` is already restored. The components were called in certain order for a reason, and if R3 expect R1 to be checkpointed this could end up badly.
>> You might argue that R2 is not in checkpointed state anyway and R3 might expect it to be, but at least here one component failed and there is a track of consequences. On the other hand R1 and R3 might be completely unrelevant to R2 and the fact that R2 failed shouldn't interfere.
>> 
>> What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives?
>
>> On regular resources afterRestore is not called if the resource throws, it does not end up in the restoreQ.
> 
> I'm puzzled how this is possible in the implementation before the patch. The code says it should
> 
>         List<R> resources = checkpointQ. ... ;
>         for (Resource r : resources) { ... }
>         Collections.reverse(resources);
>         restoreQ = resources;
> 
> 
> We'll have to document the expected behavior, but that is another concern.
> 
>> I am a bit worried about calling R3.bC that would live in a different context when R1 is already restored.
> 
> This is a question of specification. If we set that, it will be adopted by rearranging Contextes and Resources, pretty manageable.
> 
> In general, resources should not have a lot of assumptions about other resources. Otherwise, they should directly refer to each other, rathen than relying on the Context. That will be more straightforward and less error-prone.
> 
>> What is the point of continuing with the checkpoint when we now that it won't happen in the end, with all the suppressed exceptions in the first place? Just to collect all errors at once rather than one-by-one? Is it worth potential false positives?
> 
> Correct, to run all registered code, that may also have some side-effects (unpreferable, but possible). Having that checkpoint won't happen anyway, false positive does not look a problem.

True, checking again before the patch it was really calling it on resources even after failure - I had "fixed" that probably way before and remembered it that way. When a code is throwing it is aware of that, so it should either revert automatically or be considered completely broken. I would bring an analogy with a locking code - `lock()` method is executed outside of the try-finally block so if it fails it should keep the lock unlocked.

The R1 and R3 issue - it is fixable, true, but it's the infra code that introduced the failure. User sees both R2 and R3 failures, decides to investigate R3 only to find that it was a red herring.

> resources should not have a lot of assumptions about other resources

JDK code is a direct contradiction to that. We shall not expect user code to be any different.

> false positive does not look a problem

Red herrings are UX problem. And it is a problem when the failure leaves the JVM in a state that does not allow inspection - not sure why but I had cases where neither `jstack` nor `kill -3` worked.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1175074513