~ThreadInVMForHandshake() should call handle_special_runtime_exit_condition()

Fri May 10 07:42:12 UTC 2019

Hi Dan,

thanks for looking at the issue.

  >                 _semaphore.wait_with_safepoint_check(thread)
  >                 : ThreadBlockInVM tbivm
  >                 :   make_walkable()
  >                 :   trans(_thread_in_vm, _thread_blocked)
  >                 : _impl.wait()
  >                 : <VM_thread signals>
  >                 : <VM_ThreadSuspend on target thread>

Presume the safepoint ends before T (target thread) is scheduled back on a CPU again. Makes things
easier. Then T will go straight back to handle_polling_page_exception() and from there to Java.

Note we reach process_self_inner() as well coming from
JavaThread::check_special_condition_for_native_trans(JavaThread *thread), that is when returning
from native to java.

(copied from Eclipse CDT)

HandshakeState::process_self_inner(JavaThread *) : void
	HandshakeState::process_by_self(JavaThread *) : void
		JavaThread::handshake_process_by_self() : void
			SafepointMechanism::block_or_handshake(JavaThread *) : void
				SafepointMechanism::block_if_requested_slow(JavaThread *) : void
					SafepointMechanism::block_if_requested(JavaThread *) : void
						JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread *) : void
							JavaThread::check_special_condition_for_native_trans(JavaThread *) : void
								JavaThread::check_special_condition_for_native_trans_and_transition(JavaThread *) : void
								SharedRuntime::generate_native_wrapper(MacroAssembler *, const methodHandle &, int, enum BasicType *, VMRegPair *, enum BasicType) : nmethod *
								TemplateInterpreterGenerator::generate_native_entry(bool) : address

And note that async. exceptions are handled there, too.

  > You don't mention how you tested this fix.

I've put the patch through our nighly testing at SAP: proprietary tests, tck, jtreg, with and
without -Xcomp on 9 platforms.

  > You also don't say whether you have a test that reproduces this issue.

I do have stress testing that reproduces the issue. Unfortunately it's based on a larger project
still in developement.

A dedicated test case could look like this:

Target threads T1...TN run in a tight java loop
Another thread U executes handshake operations with T1...TN in an endless loop
Another thread V in a loop calls java_suspend() on T1...TN and asserts that stacks are walkable

Thanks, Richard.

-----Original Message-----
From: Daniel D. Daugherty <daniel.daugherty at oracle.com> 
Sent: Freitag, 10. Mai 2019 01:11
To: Reingruber, Richard <richard.reingruber at sap.com>; hotspot-runtime-dev at openjdk.java.net; Robbin Ehn <robbin.ehn at oracle.com>
Subject: Re: ~ThreadInVMForHandshake() should call handle_special_runtime_exit_condition()

Richard,

Very nice catch!

On 5/9/19 6:05 AM, Reingruber, Richard wrote:
> Hi,
>
> could I please get reviews and sponsoring for
>
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8223572/webrev.1/

src/hotspot/share/runtime/interfaceSupport.inline.hpp
     Verified that your new conditional block for calling
     handle_special_runtime_exit_condition() and the
     conditions for the parameter match the code at the
     end of SafepointSynchronize::block(JavaThread *thread).

I had to walk through the target thread call sequence:

SafepointSynchronize::handle_polling_page_exception(JavaThread *thread)
   thread->safepoint_state()->handle_polling_page_exception
     SafepointMechanism::block_if_requested(thread())
       SafepointMechanism::block_if_requested_slow(thread)
         SafepointMechanism::block_or_handshake()
           // not a global poll
           thread->handshake_process_by_self()
             _handshake.process_by_self(this)
               process_self_inner(thread)
                 ThreadInVMForHandshake tivm(thread)
                 :  make_walkable()
                 :  set _thread_in_vm
                 _semaphore.wait_with_safepoint_check(thread)
                 : ThreadBlockInVM tbivm
                 :   make_walkable()
                 :   trans(_thread_in_vm, _thread_blocked)
                 : _impl.wait()
                 : <VM_thread signals>
                 : <VM_ThreadSuspend on target thread>
                 : ~ThreadBlockInVM
                 :   trans(_thread_blocked, _thread_in_vm)
                 :   : transition(_thread, from, to)
                 :   :   set _thread_blocked+1
                 :   : SafepointMechanism::block_if_requested(thread)
                 :   :   : 
SafepointMechanism::block_if_requested_slow(thread)
                 :   :   : SafepointMechanism::block_or_handshake()
                 :   :   :     if (global_poll()) {
                 :   :   : SafepointSynchronize::block(thread)
                 :   :   :         make_walkable()
                 :   :   :         set _thread_blocked
                 :   :   :         _wait_barrier->wait()
                 :   :   :         // wait is notified when the 
safepoint is over
                 :   :   :         restore _thread_blocked+1  // 
_thread_blocked_trans
                 :   :   :         skip the call to 
handle_special_runtime_exit_condition()
                 :   :   :           because state == _thread_blocked_trans
                 :   :   // return all the way back to transition()
                 :   :   set _thread_in_vm
                 ~ThreadInVMForHandshake
                   transition_back()
                     set _thread_in_vm_trans
                     SafepointMechanism::block_if_requested()
                     set _original_state
// Now we return all the way back to handle_polling_page_exception().
// If we came from a _thread_in_Java state, then we just returned to
// that state without self suspending.
// The fix adds this:
                     if (_original_state != _thread_blocked_trans &&
                         _original_state != _thread_in_vm_trans &&
_thread->has_special_runtime_exit_condition()) {
_thread->handle_special_runtime_exit_condition(
                           !_thread->is_at_poll_safepoint() &&
                           (_original_state != _thread_in_native_trans));

And handle_special_runtime_exit_condition() does this:

   bool do_self_suspend = is_external_suspend_with_lock();
   if (do_self_suspend && (!AllowJNIEnvProxy || this == 
JavaThread::current())) {
     frame_anchor()->make_walkable(this);
     java_suspend_self_with_safepoint_check();
   }

Okay. Now I grok where the bug is and how your fix handles it.
Thumbs up on the fix itself!

You don't mention how you tested this fix. You also don't say
whether you have a test that reproduces this issue.

Obviously, we also need to hear from Robbin on this thread.

Dan

> Bug:    https://bugs.openjdk.java.net/browse/JDK-8223572
>
> A java thread T can continue executing bytecodes after it was suspended using the vm operation
> VM_ThreadSuspend. This can happen in HandshakeState::process_self_inner() when T becomes safepoint
> safe calling _semaphore.wait_with_safepoint_check(thread).
>
> Fix: poll has_special_runtime_exit_condition() and conditionally call
> handle_special_runtime_exit_condition() as it is done in SafepointSynchronize::block(), too.
>
> Thanks, Richard.
>
> PS: see also mail thread https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034144.html