[crac] RFR: Introduce per-Priority Context with different policies [v2]

Anton Kozlov akozlov at openjdk.org
Tue May 30 12:39:25 UTC 2023


On Tue, 30 May 2023 06:20:24 GMT, Radim Vansa <duke at openjdk.org> wrote:

>> Having that we've met so many self-deadlocks after the blocking was introduced, it seems it will happen with the users who will run into the same problem with the global context, which we suggest by default. I'm still trying to untagle with consequences of #60, with this PR in particular. You can see the policy for the global context is going to be specified by a single line of code. Anyway sorry for reviews taking so long. You can simplify reviewer job by separating functional and non-functional changes. Rephrasing javadoc for readability is non functional, changing the meaning is functional.
>
> One of the points in #60 was to make user errors *visible*, rather than silently passing - that was one reason I pushed that PR hard. Now looking again into `OrderedContext` that's what this PR brings and I strongly disagree. There's two ways to cope with those errors in the order of registration I know of: either throw an error (and I don't mind if it's from the `Core.checkpointRestore` or straight from `register`) or block, and you've chosen the latter option, stating that the problem is clearly visible from the threaddump.

It was burried along other things that PR was doing. After some thought [1] Resources are used for two different purposes: fix-up the state to ensure checkpoint won't fail later (close file descriptors, which otherwise will be detected later); and later we discovered some resources are ensuring properties of the image (e.g. cleaning secrets). The blocking behavior makes sense for the latter, and for real concurrent registration it uses the race to delay registration and thus the problem is resolved without user intervention. But it brings significant usability drawback with the deadlock with registration from the same thread when notification has been started. And this is IMO more frequent that a registration from another thread, which is waited for by the thread executing e.g. beforeCheckpoint. For the clean-up, blocking 

We should not impose such a big inconvenience for the more frequent use case, regardless how the error is detected or reported. Applications, in their turn, can implement a blocking context with necessary guarantees (or use some implementation, which we eventually expose, once we got this contextes shaked out).

My take on blocking is preffered to an exception was in context of JDK implementation: we do not expose the problem via API, but auto-deadlocks is a bug in the JDK implementaiton.

[1] https://github.com/openjdk/crac/pull/60#issuecomment-1545853147

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/74#discussion_r1210208633


More information about the crac-dev mailing list