RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572
Zhengyu Gu
zgu at openjdk.org
Thu Jun 20 13:17:11 UTC 2024
On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise.
>
> Additional testing:
> - [ ] Linux x86_64 service fastdebug, `all`
> - [ ] Linux AArch64 service fastdebug, `all`
LGTM
> This looks good as a direct fix to the bug. I agree though with the assessment that we should use a different lock for the queue going forward.
>
> It's also interesting to read the placement of these hooks. For the GC VM operation there is a comment saying that we probably just used the oop map cache so now is a good time to trigger cleanup.
>
> The same comment is present for ZGC and Shenandoah where we perform safepoints. But there the situation is typically the exact opposite to what the comment suggests then. Since we perform concurrent root scanning, we are _just about to_ use the oop map cache, and the placement is probably the most unfortunate instead. It seems like we clean the caches _just before_ using them instead just after. But again, that seems like a follow-up thing.
The cleanup is on already stalled/evicted oop map entries, which are no longer accessible, so I don't the placement issue.
-------------
Marked as reviewed by zgu (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/19800#pullrequestreview-2130401926
PR Comment: https://git.openjdk.org/jdk/pull/19800#issuecomment-2180657257
More information about the shenandoah-dev
mailing list