[crac] RFR: Introduce per-Priority Context with different policies [v3]
Anton Kozlov
akozlov at openjdk.org
Tue May 30 16:03:32 UTC 2023
On Tue, 30 May 2023 14:29:41 GMT, Radim Vansa <duke at openjdk.org> wrote:
>> It was burried along other things that PR was doing. After some thought [1] Resources are used for two different purposes: fix-up the state to ensure checkpoint won't fail later (close file descriptors, which otherwise will be detected later); and later we discovered some resources are ensuring properties of the image (e.g. cleaning secrets). The blocking behavior makes sense for the latter, and for real concurrent registration it uses the race to delay registration and thus the problem is resolved without user intervention. But it brings significant usability drawback with the deadlock with registration from the same thread when notification has been started. And this is IMO more frequent that a registration from another thread, which is waited for by the thread executing e.g. beforeCheckpoint. For the clean-up, blocking
>>
>> We should not impose such a big inconvenience for the more frequent use case, regardless how the error is detected or reported. Applications, in their turn, can implement a blocking context with necessary guarantees (or use some implementation, which we eventually expose, once we got this contextes shaked out).
>>
>> My take on blocking is preffered to an exception was in context of JDK implementation: we do not expose the problem via API, but auto-deadlocks is a bug in the JDK implementaiton.
>>
>> [1] https://github.com/openjdk/crac/pull/60#issuecomment-1545853147
>
> You won't know whether the resource registered in global context is to be invoked optionally or if it's mandatory; any checks like those for FDs is only a last line of defense and debugging help. Therefore not being executed before C/R is a problem the platform should detect, preventing C/R from happening, to be on the safe side. We should not choose the unsafe for default, because users likely won't find out that something fishy is happening without any 'inconvenience'. As you say, users can change the semantics in their own context once they know.
>
> I agree that deadlock is not the best way to report this, if you want to choice different means for the global context I am okay with that. We've talked about `register` throwing might not be the best option as the exception could be swallowed, exception from `Core.checkpointRestore` along with stack trace of the registration would be preferable.
Exactly, that's why existing global context cannot satisfy all requirements. I'm fine to continue this discussion, but as a part of separate discussion about the global context behavior, separated from the refactoring. OK, to make it more comfortable, let's use BlockingContext, once I allowed that to sleep in.
-------------
PR Review Comment: https://git.openjdk.org/crac/pull/74#discussion_r1210499871
More information about the crac-dev
mailing list