RFR(XS) 8252521: possible race in java_suspend_self_with_safepoint_check

David Holmes david.holmes at oracle.com
Sun Sep 6 22:36:08 UTC 2020


Hi Dan,

On 5/09/2020 3:15 am, Daniel D. Daugherty wrote:
> Richard,
> 
> Sorry for the late review. I know you have already pushed the fix.
> 
> src/hotspot/share/runtime/thread.cpp
>      L2625:   } while (is_external_suspend());
>          In the other places where we are checking for a racing suspend
>          request, we check the is_external_suspend() condition while
>          holding the SR_lock or we call is_external_suspend_with_lock().
> 
>          Here's the usual example I point people at:
> 
>          L2049: void JavaThread::exit(bool destroy_vm, ExitType 
> exit_type) {
> <snip>
>          L2121:     while (true) {
>          L2122:       {
>          L2123:         MutexLocker ml(SR_lock(), 
> Mutex::_no_safepoint_check_flag);
>          L2124:         if (!is_external_suspend()) {
> 
>          The JVM/TI SuspendThread() and JVM_SuspendThread() entry points
>          call set_external_suspend() while holding the SR_lock so the
>          only way to be sure you haven't lost the race is to hold the
>          SR_lock while you're checking the flag yourself.
> 
>          Have I missed something in my analysis?

Holding the SR_lock while checking is_external_suspend doesn't really 
achieve anything by itself - a racing suspend can come just before the 
check or just after it. The SR_lock primarily ensures correct 
synchronization for the actual suspension (when it waits on the SR_lock) 
and resumption - and as per the exit code, it ensures there is no thread 
termination race by setting is_exiting under the lock.

What we are racing with in this changeset are the 
java_suspend()/JvmtiSuspendControl::suspend() calls which don't hold the 
SR_lock. The race we have to avoid is the race where another thread 
completes the safepoint/handshake operation which is supposed to ensure 
the target is suspended, and the loop checking is_external_suspend() 
achieves that.

You could argue that without the lock we may see a stale value for 
is_external_suspend() but that is not possible when a 
safepoint/handshake has been issued as we have full memory 
synchronization between threads in that case.

Put another way, when the target thread returns from the 
safepoint/handshake and sees is_external_suspend() then it knows there 
is another suspend request in progress, and it will honour it (and any 
racing resume is handled in java_suspend_self()). If it doesn't see 
is_external_suspend() set then any racing suspend request can't have 
initiated the safepoint/handshake yet and so that suspend request will 
be seen the next time the target does a safepoint/handshake poll.

Cheers,
David
-----

> Dan
> 
> 
> 
> On 9/2/20 11:15 AM, Reingruber, Richard wrote:
>> Hi,
>>
>> please help review this fix for a race condition in
>> JavaThread::java_suspend_self_with_safepoint_check() that allows a 
>> suspended
>> thread to continue executing java for an arbitrary long time (see 
>> repro test
>> attached to bug report).
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8252521/webrev.0/
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8252521
>>
>> The fix is to add a do-while-loop to 
>> java_suspend_self_with_safepoint_check()
>> that checks if the current thread was suspended again after returning 
>> from
>> java_suspend_self() and before restoring the original thread state. 
>> The check is
>> performed after restoring the original state because then we are 
>> guaranteed to
>> see the suspend request issued before the requester observed that 
>> target to be
>> _thread_blocked and executed VM_ThreadSuspend.
>>
>> Thanks, Richard.
> 


More information about the hotspot-runtime-dev mailing list