~ThreadInVMForHandshake() should call handle_special_runtime_exit_condition()

Fri May 10 07:46:32 UTC 2019

Hi David,

  > There's no regression test as this was not seen in the wild only
  > hypothesised:

I've got stress tests in a larger project still in development that trigger the issue. I took me a
week to find the cause for sporadic crashes when walking stacks.

  > I missed the use in ThreadToNativeFromVM and SafepointSynchronize::block ...

Please note that process_self_inner and SafepointSynchronize::block are dominated by
SafepointMechanism::block_or_handshake():

HandshakeState::process_self_inner(JavaThread *) : void
	HandshakeState::process_by_self(JavaThread *) : void
		JavaThread::handshake_process_by_self() : void
			SafepointMechanism::block_or_handshake(JavaThread *) : void

SafepointSynchronize::block(JavaThread *) : void
	SafepointMechanism::block_or_handshake(JavaThread *) : void

I.e. we reach there only by coming through block_or_handshake(). And we reach there with the same
thread states. That's why I think ~ThreadToNativeFromVM::transition_back() should do the same as
SafepointSynchronize::block() does regarding the special runtime exit conditions.

Thanks for looking at this, Richard.

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Freitag, 10. Mai 2019 03:38
To: Reingruber, Richard <richard.reingruber at sap.com>; hotspot-runtime-dev at openjdk.java.net; Robbin Ehn <robbin.ehn at oracle.com>
Subject: Re: ~ThreadInVMForHandshake() should call handle_special_runtime_exit_condition()

Correction ...

On 10/05/2019 11:25 am, David Holmes wrote:
> Actually I'm not convinced this is the right solution. 
> handle_special_runtime_exit_condition as its name suggests is for 
> exiting the runtime (VM) and implicitly returning to (or going to) Java. 
> It is only checked by the ThreadInVMFromJava destructor, and the 
> JavaCallWrapper. Separately we check for thread suspension when a thread 
> enters the VM from native.

I missed the use in ThreadToNativeFromVM and SafepointSynchronize::block ...

> ThreadInVMForHandshake seems to shortcut the thread-state transition 
> process. It doesn't seem to care what state it is coming from it just 
> ducks into the VM for the handkshake, then restores the original state 
> and continues. That doesn't seem right as per the current bug report.
> 
> However AFAICS the correct fix would be to:
> 
> a) check for suspension in ThreadInVMForHandshake when coming from 
> _thread_in_native; and
> b) check handle_special_runtime_exit_condition in 
> ~ThreadInVMForHandshake when the original state is _thread_in_Java
> 
> Now it may be that for the first case, as we're only processing a 
> handshake, that it doesn't matter if we do the suspend check before or 
> after the handshake. Hence that would mean we do (b) when the original 
> state is thread_in_Java or _thread_in_native - which is what Richard 
> originally proposed.
> 
> So I don't understand how the original:
> 
> +    if ((_original_state == _thread_in_Java || _original_state == 
> _thread_in_native) &&
> +        _thread->has_special_runtime_exit_condition()) {
> 
> became
> 
> +     if (_original_state != _thread_blocked_trans &&  _original_state 
> != _thread_in_vm_trans &&
> +        _thread->has_special_runtime_exit_condition()) {
> 
> ? Perhaps they are equivalent (I don't know all the possible initial 
> states especially given the way this can be entered recursively!), but 
> the former is much clearer than the latter.

I now see that this is mimicking the code in SafepointSynchronize::block 
for which there are 5 potential states, three of which need the check 
and two that do not - hence its shorter to skip the two. What remains 
unclear to me is that ThreadInVMForHandShake has the same set of 
possible initial states?

Thanks,
David

> Cheers,
> David
> -----
> 
> On 10/05/2019 10:05 am, David Holmes wrote:
>> Hi Dan,
>>
>> Many thanks for doing that detailed walkthrough! I also tried to 
>> verify the conditions of the _thread_state as they are not what was 
>> originally proposed. But once I saw the recursion I called it a night! 
>> The recursion here makes me very nervous (it's preexisting of course) 
>> - this stuff is hard enough to reason about with only one level of 
>> nesting. :(
>>
>> There's no regression test as this was not seen in the wild only 
>> hypothesised:
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034144.html 
>>
>>
>> Thanks,
>> David
>>
>> On 10/05/2019 9:11 am, Daniel D. Daugherty wrote:
>>> Richard,
>>>
>>> Very nice catch!
>>>
>>>
>>> On 5/9/19 6:05 AM, Reingruber, Richard wrote:
>>>> Hi,
>>>>
>>>> could I please get reviews and sponsoring for
>>>>
>>>> Webrev: 
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8223572/webrev.1/
>>>
>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>      Verified that your new conditional block for calling
>>>      handle_special_runtime_exit_condition() and the
>>>      conditions for the parameter match the code at the
>>>      end of SafepointSynchronize::block(JavaThread *thread).
>>>
>>> I had to walk through the target thread call sequence:
>>>
>>> SafepointSynchronize::handle_polling_page_exception(JavaThread *thread)
>>>    thread->safepoint_state()->handle_polling_page_exception
>>>      SafepointMechanism::block_if_requested(thread())
>>>        SafepointMechanism::block_if_requested_slow(thread)
>>>          SafepointMechanism::block_or_handshake()
>>>            // not a global poll
>>>            thread->handshake_process_by_self()
>>>              _handshake.process_by_self(this)
>>>                process_self_inner(thread)
>>>                  ThreadInVMForHandshake tivm(thread)
>>>                  :  make_walkable()
>>>                  :  set _thread_in_vm
>>>                  _semaphore.wait_with_safepoint_check(thread)
>>>                  : ThreadBlockInVM tbivm
>>>                  :   make_walkable()
>>>                  :   trans(_thread_in_vm, _thread_blocked)
>>>                  : _impl.wait()
>>>                  : <VM_thread signals>
>>>                  : <VM_ThreadSuspend on target thread>
>>>                  : ~ThreadBlockInVM
>>>                  :   trans(_thread_blocked, _thread_in_vm)
>>>                  :   : transition(_thread, from, to)
>>>                  :   :   set _thread_blocked+1
>>>                  :   : SafepointMechanism::block_if_requested(thread)
>>>                  :   :   : 
>>> SafepointMechanism::block_if_requested_slow(thread)
>>>                  :   :   : SafepointMechanism::block_or_handshake()
>>>                  :   :   :     if (global_poll()) {
>>>                  :   :   : SafepointSynchronize::block(thread)
>>>                  :   :   :         make_walkable()
>>>                  :   :   :         set _thread_blocked
>>>                  :   :   :         _wait_barrier->wait()
>>>                  :   :   :         // wait is notified when the 
>>> safepoint is over
>>>                  :   :   :         restore _thread_blocked+1  // 
>>> _thread_blocked_trans
>>>                  :   :   :         skip the call to 
>>> handle_special_runtime_exit_condition()
>>>                  :   :   :           because state == 
>>> _thread_blocked_trans
>>>                  :   :   // return all the way back to transition()
>>>                  :   :   set _thread_in_vm
>>>                  ~ThreadInVMForHandshake
>>>                    transition_back()
>>>                      set _thread_in_vm_trans
>>>                      SafepointMechanism::block_if_requested()
>>>                      set _original_state
>>> // Now we return all the way back to handle_polling_page_exception().
>>> // If we came from a _thread_in_Java state, then we just returned to
>>> // that state without self suspending.
>>> // The fix adds this:
>>>                      if (_original_state != _thread_blocked_trans &&
>>>                          _original_state != _thread_in_vm_trans &&
>>> _thread->has_special_runtime_exit_condition()) {
>>> _thread->handle_special_runtime_exit_condition(
>>>                            !_thread->is_at_poll_safepoint() &&
>>>                            (_original_state != 
>>> _thread_in_native_trans));
>>>
>>> And handle_special_runtime_exit_condition() does this:
>>>
>>>    bool do_self_suspend = is_external_suspend_with_lock();
>>>    if (do_self_suspend && (!AllowJNIEnvProxy || this == 
>>> JavaThread::current())) {
>>>      frame_anchor()->make_walkable(this);
>>>      java_suspend_self_with_safepoint_check();
>>>    }
>>>
>>> Okay. Now I grok where the bug is and how your fix handles it.
>>> Thumbs up on the fix itself!
>>>
>>> You don't mention how you tested this fix. You also don't say
>>> whether you have a test that reproduces this issue.
>>>
>>> Obviously, we also need to hear from Robbin on this thread.
>>>
>>> Dan
>>>
>>>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8223572
>>>>
>>>> A java thread T can continue executing bytecodes after it was 
>>>> suspended using the vm operation
>>>> VM_ThreadSuspend. This can happen in 
>>>> HandshakeState::process_self_inner() when T becomes safepoint
>>>> safe calling _semaphore.wait_with_safepoint_check(thread).
>>>>
>>>> Fix: poll has_special_runtime_exit_condition() and conditionally call
>>>> handle_special_runtime_exit_condition() as it is done in 
>>>> SafepointSynchronize::block(), too.
>>>>
>>>> Thanks, Richard.
>>>>
>>>> PS: see also mail thread 
>>>> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034144.html 
>>>>
>>>