RFR(XS) 8252521: possible race in java_suspend_self_with_safepoint_check
David Holmes
david.holmes at oracle.com
Sun Sep 6 22:36:08 UTC 2020
Hi Dan,
On 5/09/2020 3:15 am, Daniel D. Daugherty wrote:
> Richard,
>
> Sorry for the late review. I know you have already pushed the fix.
>
> src/hotspot/share/runtime/thread.cpp
> L2625: } while (is_external_suspend());
> In the other places where we are checking for a racing suspend
> request, we check the is_external_suspend() condition while
> holding the SR_lock or we call is_external_suspend_with_lock().
>
> Here's the usual example I point people at:
>
> L2049: void JavaThread::exit(bool destroy_vm, ExitType
> exit_type) {
> <snip>
> L2121: while (true) {
> L2122: {
> L2123: MutexLocker ml(SR_lock(),
> Mutex::_no_safepoint_check_flag);
> L2124: if (!is_external_suspend()) {
>
> The JVM/TI SuspendThread() and JVM_SuspendThread() entry points
> call set_external_suspend() while holding the SR_lock so the
> only way to be sure you haven't lost the race is to hold the
> SR_lock while you're checking the flag yourself.
>
> Have I missed something in my analysis?
Holding the SR_lock while checking is_external_suspend doesn't really
achieve anything by itself - a racing suspend can come just before the
check or just after it. The SR_lock primarily ensures correct
synchronization for the actual suspension (when it waits on the SR_lock)
and resumption - and as per the exit code, it ensures there is no thread
termination race by setting is_exiting under the lock.
What we are racing with in this changeset are the
java_suspend()/JvmtiSuspendControl::suspend() calls which don't hold the
SR_lock. The race we have to avoid is the race where another thread
completes the safepoint/handshake operation which is supposed to ensure
the target is suspended, and the loop checking is_external_suspend()
achieves that.
You could argue that without the lock we may see a stale value for
is_external_suspend() but that is not possible when a
safepoint/handshake has been issued as we have full memory
synchronization between threads in that case.
Put another way, when the target thread returns from the
safepoint/handshake and sees is_external_suspend() then it knows there
is another suspend request in progress, and it will honour it (and any
racing resume is handled in java_suspend_self()). If it doesn't see
is_external_suspend() set then any racing suspend request can't have
initiated the safepoint/handshake yet and so that suspend request will
be seen the next time the target does a safepoint/handshake poll.
Cheers,
David
-----
> Dan
>
>
>
> On 9/2/20 11:15 AM, Reingruber, Richard wrote:
>> Hi,
>>
>> please help review this fix for a race condition in
>> JavaThread::java_suspend_self_with_safepoint_check() that allows a
>> suspended
>> thread to continue executing java for an arbitrary long time (see
>> repro test
>> attached to bug report).
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8252521/webrev.0/
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8252521
>>
>> The fix is to add a do-while-loop to
>> java_suspend_self_with_safepoint_check()
>> that checks if the current thread was suspended again after returning
>> from
>> java_suspend_self() and before restoring the original thread state.
>> The check is
>> performed after restoring the original state because then we are
>> guaranteed to
>> see the suspend request issued before the requester observed that
>> target to be
>> _thread_blocked and executed VM_ThreadSuspend.
>>
>> Thanks, Richard.
>
More information about the hotspot-runtime-dev
mailing list