RFR(XS) 8252521: possible race in java_suspend_self_with_safepoint_check

Reingruber, Richard richard.reingruber at sap.com
Mon Sep 7 08:28:11 UTC 2020


Hi Dan,

  > Sorry for the late review. I know you have already pushed the fix.

Thanks for taking the time! And sorry for being a bit quick with the push. I
wanted it to happen before the switch to git.

Like David, I think the VM_ThreadSuspend on the suspender side and restoring the
saved thread state with a fence on the suspendee side are sufficient
synchronization to avoid the issue.

2613 void JavaThread::java_suspend_self_with_safepoint_check() {
2614   assert(this == Thread::current(), "invariant");
2615   JavaThreadState state = thread_state();
2616 
2617   do {
2618     set_thread_state(_thread_blocked);
2619     java_suspend_self();
2620     // The current thread could have been suspended again. We have to check for
2621     // suspend after restoring the saved state. Without this the current thread
2622     // might return to _thread_in_Java and execute bytecodes for an arbitrary
2623     // long time.
2624     set_thread_state_fence(state);
2625   } while (is_external_suspend());
2626 
2627   // Since we are not using a regular thread-state transition helper here,
2628   // we must manually emit the instruction barrier after leaving a safe state.
2629   OrderAccess::cross_modify_fence();
2630   if (state != _thread_in_native) {
2631     SafepointMechanism::block_if_requested(this);
2632   }
2633 }

The issue was that (A) a java thread T1 returns from a call T2.suspend() and
then observes (B) T2 is still running. With the fix either (A) or (B) does
not happen.

(A) means that T1 has set _external_suspend for T2 and then executed
VM_ThreadSuspend and (B) means that it did so before T2 reached the safepoint in
2631.

The fence in line 2624 (*) guarantees that the check in line 2625 will see a suspend
request if the associated safepoint was established before the state change in
line 2624 was executed. So if the check 2625 returns false this means that the
VM_ThreadSuspend has not begun and T1 cannot yet return from T2.suspend(). So
either not (A) or not (B).

Thanks, Richard.

(*) together with the safepoint sync in T1

-----Original Message-----
From: Daniel D. Daugherty <daniel.daugherty at oracle.com> 
Sent: Freitag, 4. September 2020 19:16
To: Reingruber, Richard <richard.reingruber at sap.com>; Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
Subject: Re: RFR(XS) 8252521: possible race in java_suspend_self_with_safepoint_check

Richard,

Sorry for the late review. I know you have already pushed the fix.

src/hotspot/share/runtime/thread.cpp
     L2625:   } while (is_external_suspend());
         In the other places where we are checking for a racing suspend
         request, we check the is_external_suspend() condition while
         holding the SR_lock or we call is_external_suspend_with_lock().

         Here's the usual example I point people at:

         L2049: void JavaThread::exit(bool destroy_vm, ExitType exit_type) {
<snip>
         L2121:     while (true) {
         L2122:       {
         L2123:         MutexLocker ml(SR_lock(), 
Mutex::_no_safepoint_check_flag);
         L2124:         if (!is_external_suspend()) {

         The JVM/TI SuspendThread() and JVM_SuspendThread() entry points
         call set_external_suspend() while holding the SR_lock so the
         only way to be sure you haven't lost the race is to hold the
         SR_lock while you're checking the flag yourself.

         Have I missed something in my analysis?

Dan



On 9/2/20 11:15 AM, Reingruber, Richard wrote:
> Hi,
>
> please help review this fix for a race condition in
> JavaThread::java_suspend_self_with_safepoint_check() that allows a suspended
> thread to continue executing java for an arbitrary long time (see repro test
> attached to bug report).
>
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8252521/webrev.0/
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8252521
>
> The fix is to add a do-while-loop to java_suspend_self_with_safepoint_check()
> that checks if the current thread was suspended again after returning from
> java_suspend_self() and before restoring the original thread state. The check is
> performed after restoring the original state because then we are guaranteed to
> see the suspend request issued before the requester observed that target to be
> _thread_blocked and executed VM_ThreadSuspend.
>
> Thanks, Richard.



More information about the hotspot-runtime-dev mailing list