RFR(XS) 8252521: possible race in java_suspend_self_with_safepoint_check

Sun Sep 6 22:58:59 UTC 2020

Clarification ...

On 7/09/2020 8:36 am, David Holmes wrote:
> Hi Dan,
> 
> On 5/09/2020 3:15 am, Daniel D. Daugherty wrote:
>> Richard,
>>
>> Sorry for the late review. I know you have already pushed the fix.
>>
>> src/hotspot/share/runtime/thread.cpp
>>      L2625:   } while (is_external_suspend());
>>          In the other places where we are checking for a racing suspend
>>          request, we check the is_external_suspend() condition while
>>          holding the SR_lock or we call is_external_suspend_with_lock().
>>
>>          Here's the usual example I point people at:
>>
>>          L2049: void JavaThread::exit(bool destroy_vm, ExitType 
>> exit_type) {
>> <snip>
>>          L2121:     while (true) {
>>          L2122:       {
>>          L2123:         MutexLocker ml(SR_lock(), 
>> Mutex::_no_safepoint_check_flag);
>>          L2124:         if (!is_external_suspend()) {
>>
>>          The JVM/TI SuspendThread() and JVM_SuspendThread() entry points
>>          call set_external_suspend() while holding the SR_lock so the
>>          only way to be sure you haven't lost the race is to hold the
>>          SR_lock while you're checking the flag yourself.
>>
>>          Have I missed something in my analysis?
> 
> Holding the SR_lock while checking is_external_suspend doesn't really 
> achieve anything by itself - a racing suspend can come just before the 
> check or just after it.

By "racing suspend" I mean the part that calls set_external_suspend().

David
-----

  The SR_lock primarily ensures correct
> synchronization for the actual suspension (when it waits on the SR_lock) 
> and resumption - and as per the exit code, it ensures there is no thread 
> termination race by setting is_exiting under the lock.
> 
> What we are racing with in this changeset are the 
> java_suspend()/JvmtiSuspendControl::suspend() calls which don't hold the 
> SR_lock. The race we have to avoid is the race where another thread 
> completes the safepoint/handshake operation which is supposed to ensure 
> the target is suspended, and the loop checking is_external_suspend() 
> achieves that.
> 
> You could argue that without the lock we may see a stale value for 
> is_external_suspend() but that is not possible when a 
> safepoint/handshake has been issued as we have full memory 
> synchronization between threads in that case.
> 
> Put another way, when the target thread returns from the 
> safepoint/handshake and sees is_external_suspend() then it knows there 
> is another suspend request in progress, and it will honour it (and any 
> racing resume is handled in java_suspend_self()). If it doesn't see 
> is_external_suspend() set then any racing suspend request can't have 
> initiated the safepoint/handshake yet and so that suspend request will 
> be seen the next time the target does a safepoint/handshake poll.
> 
> Cheers,
> David
> -----
> 
>> Dan
>>
>>
>>
>> On 9/2/20 11:15 AM, Reingruber, Richard wrote:
>>> Hi,
>>>
>>> please help review this fix for a race condition in
>>> JavaThread::java_suspend_self_with_safepoint_check() that allows a 
>>> suspended
>>> thread to continue executing java for an arbitrary long time (see 
>>> repro test
>>> attached to bug report).
>>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8252521/webrev.0/
>>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8252521
>>>
>>> The fix is to add a do-while-loop to 
>>> java_suspend_self_with_safepoint_check()
>>> that checks if the current thread was suspended again after returning 
>>> from
>>> java_suspend_self() and before restoring the original thread state. 
>>> The check is
>>> performed after restoring the original state because then we are 
>>> guaranteed to
>>> see the suspend request issued before the requester observed that 
>>> target to be
>>> _thread_blocked and executed VM_ThreadSuspend.
>>>
>>> Thanks, Richard.
>>