RFR(XS) 8252521: possible race in java_suspend_self_with_safepoint_check
David Holmes
david.holmes at oracle.com
Sun Sep 6 22:58:59 UTC 2020
Clarification ...
On 7/09/2020 8:36 am, David Holmes wrote:
> Hi Dan,
>
> On 5/09/2020 3:15 am, Daniel D. Daugherty wrote:
>> Richard,
>>
>> Sorry for the late review. I know you have already pushed the fix.
>>
>> src/hotspot/share/runtime/thread.cpp
>> L2625: } while (is_external_suspend());
>> In the other places where we are checking for a racing suspend
>> request, we check the is_external_suspend() condition while
>> holding the SR_lock or we call is_external_suspend_with_lock().
>>
>> Here's the usual example I point people at:
>>
>> L2049: void JavaThread::exit(bool destroy_vm, ExitType
>> exit_type) {
>> <snip>
>> L2121: while (true) {
>> L2122: {
>> L2123: MutexLocker ml(SR_lock(),
>> Mutex::_no_safepoint_check_flag);
>> L2124: if (!is_external_suspend()) {
>>
>> The JVM/TI SuspendThread() and JVM_SuspendThread() entry points
>> call set_external_suspend() while holding the SR_lock so the
>> only way to be sure you haven't lost the race is to hold the
>> SR_lock while you're checking the flag yourself.
>>
>> Have I missed something in my analysis?
>
> Holding the SR_lock while checking is_external_suspend doesn't really
> achieve anything by itself - a racing suspend can come just before the
> check or just after it.
By "racing suspend" I mean the part that calls set_external_suspend().
David
-----
The SR_lock primarily ensures correct
> synchronization for the actual suspension (when it waits on the SR_lock)
> and resumption - and as per the exit code, it ensures there is no thread
> termination race by setting is_exiting under the lock.
>
> What we are racing with in this changeset are the
> java_suspend()/JvmtiSuspendControl::suspend() calls which don't hold the
> SR_lock. The race we have to avoid is the race where another thread
> completes the safepoint/handshake operation which is supposed to ensure
> the target is suspended, and the loop checking is_external_suspend()
> achieves that.
>
> You could argue that without the lock we may see a stale value for
> is_external_suspend() but that is not possible when a
> safepoint/handshake has been issued as we have full memory
> synchronization between threads in that case.
>
> Put another way, when the target thread returns from the
> safepoint/handshake and sees is_external_suspend() then it knows there
> is another suspend request in progress, and it will honour it (and any
> racing resume is handled in java_suspend_self()). If it doesn't see
> is_external_suspend() set then any racing suspend request can't have
> initiated the safepoint/handshake yet and so that suspend request will
> be seen the next time the target does a safepoint/handshake poll.
>
> Cheers,
> David
> -----
>
>> Dan
>>
>>
>>
>> On 9/2/20 11:15 AM, Reingruber, Richard wrote:
>>> Hi,
>>>
>>> please help review this fix for a race condition in
>>> JavaThread::java_suspend_self_with_safepoint_check() that allows a
>>> suspended
>>> thread to continue executing java for an arbitrary long time (see
>>> repro test
>>> attached to bug report).
>>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8252521/webrev.0/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8252521
>>>
>>> The fix is to add a do-while-loop to
>>> java_suspend_self_with_safepoint_check()
>>> that checks if the current thread was suspended again after returning
>>> from
>>> java_suspend_self() and before restoring the original thread state.
>>> The check is
>>> performed after restoring the original state because then we are
>>> guaranteed to
>>> see the suspend request issued before the requester observed that
>>> target to be
>>> _thread_blocked and executed VM_ThreadSuspend.
>>>
>>> Thanks, Richard.
>>
More information about the hotspot-runtime-dev
mailing list