RFR(XS): VM times out in VM_HandshakeAllThreads::doit() with RunThese30M

David Holmes david.holmes at oracle.com
Wed Aug 15 10:40:58 UTC 2018


Hi Robbin,

On 15/08/2018 7:13 PM, Robbin Ehn wrote:
> Hi all, please review.
> 
> The issue is that external suspend don't set the thread state to 
> blocked. The safepoint code handles this internally but, as handshakes 
> do, using e.g. safepoint_safe() this was not handled. External 
> suspending a thread causes the handshake not to come through until you 
> resume that thread (bad for ZGC).
> 
> The proper fix (at least IMHO) to this is to implement suspend (and the 
> other uses of suspend_flag) with handshakes instead, but that will 
> require asynch handshakes and would require all platforms to implement 
> handshakes (or they would suffer a safepoint). So if you read this have 
> a platform which you do not have implemented handshakes for, please 
> start looking into it.
> 
> Below patch adds the unhandle case for handshakes.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8207334
> 
> Passed tier 1-5 and I could no longer get prolonged handshakes due to 
> suspend.
> 
> Thanks, Robbin
> 
> diff -r f3cf91d5373f src/hotspot/share/runtime/handshake.cpp
> --- a/src/hotspot/share/runtime/handshake.cpp    Mon Aug 13 14:04:43 
> 2018 +0200
> +++ b/src/hotspot/share/runtime/handshake.cpp    Wed Aug 15 11:00:22 
> 2018 +0200
> @@ -340,3 +340,8 @@
>   bool HandshakeState::vmthread_can_process_handshake(JavaThread* target) {
> -  return SafepointSynchronize::safepoint_safe(target, 
> target->thread_state());
> +  // We must block the S/R protocol to resume a thread while the vm thread
> +  // is executing the operation, by holding the Threads_lock.
> +  // This is problematic, since we now can't remove the Threads_lock in 
> handshakes.
> +  assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
> +  return SafepointSynchronize::safepoint_safe(target, 
> target->thread_state()) ||
> +         target->is_ext_suspended();
>   }

The context in which this is executed is not at all clear so assessing 
the suitability of the fix is also not clear. IIRC a thread can only be 
resumed at a safepoint so if the above code executes in the VMThread 
then it can't be racing with a resume and the Threads_lock holding seems 
unnecessary wrt. this action. ??

Thanks,
David



More information about the hotspot-runtime-dev mailing list