RFR: 8266963: Reentrance condition for safepoint/handshake

Fri May 14 20:04:04 UTC 2021

On Fri, 14 May 2021 08:01:12 GMT, Yude Lin <github.com+16811675+linade at openjdk.org> wrote:

> Shenandoah hangs when running specjvm2008 derby. The reason is a Java Thread reenters safepoint/handshake and blocks on itself. Please checkout the bugid for more details. After discussion with @zhengyu123, we think this might not be Shenandoah-specific. I propose to add a check before processing the safepoint/handshake.
> 
> An alternative approach (also insight from @zhengyu123) is to move the check a little earlier to the specific place where the Java Thread do ThreadBlockInVM. To feel reassured that no more reentrance exists in other places, I still leave the check in safepoint/handshake as debug code. See https://github.com/openjdk/jdk/compare/master...linade:reentrancecond
> 
> I'd appreciate more of your thoughts on these as I understand it could be a rather critical part of the code.

Hi Yude,

Comments about the issue below.

Thanks,
Patricio

src/hotspot/share/runtime/safepointMechanism.cpp line 157:

> 155:     // reentrance of the handshake mutex. We also don't need to do anything
> 156:     // because the process() routine will be retried after the handshake returns.
> 157:     return;

We cannot do a return here because a safepoint could be already in progress after transitioning out of the blocked state. The handshake would then execute concurrently with the safepoint operation which is not allowed.
We used to have a flag in HandshakeState to avoid these reentrant cases [1], but we removed it after we added the NoSafepointVerifier checks in handshake.cpp. I'm guessing this failed with release bits, otherwise you should have hit the assert in check_possible_safepoint() in ThreadBlockInVM. So unless we also remove the NoSafepointVerifier checks in handshake.cpp bringing that flag back would just solve this issue for release builds. I think the question is then whether it is safe to poll for safepoints inside a handshake closure. Before stackwatermarks maybe there were no issues, but now I don't think so. If ThreadA is executing a handshake on behalf of ThreadB and blocks in ThreadBlockInVM, then a safepoint could happen. After resuming I don't think it is safe for ThreadA to keep poking into ThreadB stack before doing StackWatermarkSet::start_processing() on ThreadB. Maybe @fisk could confirm?
Note that the NoSafepointVerifier checks are also there to prevent requesting a VM operation inside the handshake since that can deadlock too. So even if polling would be fine we would need to keep checking that (not necessarily with NoSafepointVerifier though).

[1] https://github.com/openjdk/jdk/blob/6d19fe65d1c1d8f41fd2f18afd09de134912505f/src/hotspot/share/runtime/handshake.hpp#L93

-------------

PR: https://git.openjdk.java.net/jdk/pull/4028