[crac] RFR: Fix ordering of invocation on Resources

Radim Vansa duke at openjdk.org
Mon Apr 24 09:02:20 UTC 2023


On Mon, 24 Apr 2023 07:49:56 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

>> So, making `register` throw a checked exception wouldn't be a good solution; most of the time there wouldn't be a better handling than to log && rethrow.
>> 
>> What you propose - having a form of exception stack in the context and checking it anything was added after `beforeCheckpoint` could work, though this is definitely not a pattern that would be natural in Java - it smells of JNI developer :)
>> 
>> I think that a situation when the ISE is thrown is a programming error, rather than reacting on the state of 'outer world' (as in case of e.g. I/O...) and as such this can stay as unchecked exception, possibly leading to inconsistent state of the program.
>
> I admit that Resource is usually registered implicitly, and the current state (silently ignore newly registered Resources when checkpoint is in progress) is not adequate and may leads to some state leaking to the image, while that is not supposed to (if Resource would be registered some time before).
> 
> With unchecked ISE, the user will have to somehow ensure that registration likley being implicit, changing the code to break the abstraction above that Resource/registration.
> 
> With the check for new Resources, we'll detect exactly the situtaion user will likely like to know (if everything in the state ready to be stored to the image), and if not, a recovery will be trying again. Every Resource may throw  CheckpointException for their own reasons, and CheckpointException("timeout for waiting some critical clean up, try again") is expected to be something very possible and is expected.

Generally speaking, your proposal to try performing the checkpoint until it succeeds is more permissive, as users can back-off from a bad/fragile design through retrying, rather than being forced to correct the dependencies between components. In the past it seemed to me that you were more on the "good design required" rather than "practical" side of things - please don't take this as any form of "ad hominem" argument, I would just like to understand why you opt for such solution here. In my view a timeout/IO error are a totally different category of error.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/60#discussion_r1174988162


More information about the crac-dev mailing list