~ThreadInVMForHandshake() should call handle_special_runtime_exit_condition()
Reingruber, Richard
richard.reingruber at sap.com
Fri May 10 07:46:32 UTC 2019
Hi David,
> There's no regression test as this was not seen in the wild only
> hypothesised:
I've got stress tests in a larger project still in development that trigger the issue. I took me a
week to find the cause for sporadic crashes when walking stacks.
> I missed the use in ThreadToNativeFromVM and SafepointSynchronize::block ...
Please note that process_self_inner and SafepointSynchronize::block are dominated by
SafepointMechanism::block_or_handshake():
HandshakeState::process_self_inner(JavaThread *) : void
HandshakeState::process_by_self(JavaThread *) : void
JavaThread::handshake_process_by_self() : void
SafepointMechanism::block_or_handshake(JavaThread *) : void
SafepointSynchronize::block(JavaThread *) : void
SafepointMechanism::block_or_handshake(JavaThread *) : void
I.e. we reach there only by coming through block_or_handshake(). And we reach there with the same
thread states. That's why I think ~ThreadToNativeFromVM::transition_back() should do the same as
SafepointSynchronize::block() does regarding the special runtime exit conditions.
Thanks for looking at this, Richard.
-----Original Message-----
From: David Holmes <david.holmes at oracle.com>
Sent: Freitag, 10. Mai 2019 03:38
To: Reingruber, Richard <richard.reingruber at sap.com>; hotspot-runtime-dev at openjdk.java.net; Robbin Ehn <robbin.ehn at oracle.com>
Subject: Re: ~ThreadInVMForHandshake() should call handle_special_runtime_exit_condition()
Correction ...
On 10/05/2019 11:25 am, David Holmes wrote:
> Actually I'm not convinced this is the right solution.
> handle_special_runtime_exit_condition as its name suggests is for
> exiting the runtime (VM) and implicitly returning to (or going to) Java.
> It is only checked by the ThreadInVMFromJava destructor, and the
> JavaCallWrapper. Separately we check for thread suspension when a thread
> enters the VM from native.
I missed the use in ThreadToNativeFromVM and SafepointSynchronize::block ...
> ThreadInVMForHandshake seems to shortcut the thread-state transition
> process. It doesn't seem to care what state it is coming from it just
> ducks into the VM for the handkshake, then restores the original state
> and continues. That doesn't seem right as per the current bug report.
>
> However AFAICS the correct fix would be to:
>
> a) check for suspension in ThreadInVMForHandshake when coming from
> _thread_in_native; and
> b) check handle_special_runtime_exit_condition in
> ~ThreadInVMForHandshake when the original state is _thread_in_Java
>
> Now it may be that for the first case, as we're only processing a
> handshake, that it doesn't matter if we do the suspend check before or
> after the handshake. Hence that would mean we do (b) when the original
> state is thread_in_Java or _thread_in_native - which is what Richard
> originally proposed.
>
> So I don't understand how the original:
>
> + if ((_original_state == _thread_in_Java || _original_state ==
> _thread_in_native) &&
> + _thread->has_special_runtime_exit_condition()) {
>
> became
>
> + if (_original_state != _thread_blocked_trans && _original_state
> != _thread_in_vm_trans &&
> + _thread->has_special_runtime_exit_condition()) {
>
> ? Perhaps they are equivalent (I don't know all the possible initial
> states especially given the way this can be entered recursively!), but
> the former is much clearer than the latter.
I now see that this is mimicking the code in SafepointSynchronize::block
for which there are 5 potential states, three of which need the check
and two that do not - hence its shorter to skip the two. What remains
unclear to me is that ThreadInVMForHandShake has the same set of
possible initial states?
Thanks,
David
> Cheers,
> David
> -----
>
> On 10/05/2019 10:05 am, David Holmes wrote:
>> Hi Dan,
>>
>> Many thanks for doing that detailed walkthrough! I also tried to
>> verify the conditions of the _thread_state as they are not what was
>> originally proposed. But once I saw the recursion I called it a night!
>> The recursion here makes me very nervous (it's preexisting of course)
>> - this stuff is hard enough to reason about with only one level of
>> nesting. :(
>>
>> There's no regression test as this was not seen in the wild only
>> hypothesised:
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034144.html
>>
>>
>> Thanks,
>> David
>>
>> On 10/05/2019 9:11 am, Daniel D. Daugherty wrote:
>>> Richard,
>>>
>>> Very nice catch!
>>>
>>>
>>> On 5/9/19 6:05 AM, Reingruber, Richard wrote:
>>>> Hi,
>>>>
>>>> could I please get reviews and sponsoring for
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8223572/webrev.1/
>>>
>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>> Verified that your new conditional block for calling
>>> handle_special_runtime_exit_condition() and the
>>> conditions for the parameter match the code at the
>>> end of SafepointSynchronize::block(JavaThread *thread).
>>>
>>> I had to walk through the target thread call sequence:
>>>
>>> SafepointSynchronize::handle_polling_page_exception(JavaThread *thread)
>>> thread->safepoint_state()->handle_polling_page_exception
>>> SafepointMechanism::block_if_requested(thread())
>>> SafepointMechanism::block_if_requested_slow(thread)
>>> SafepointMechanism::block_or_handshake()
>>> // not a global poll
>>> thread->handshake_process_by_self()
>>> _handshake.process_by_self(this)
>>> process_self_inner(thread)
>>> ThreadInVMForHandshake tivm(thread)
>>> : make_walkable()
>>> : set _thread_in_vm
>>> _semaphore.wait_with_safepoint_check(thread)
>>> : ThreadBlockInVM tbivm
>>> : make_walkable()
>>> : trans(_thread_in_vm, _thread_blocked)
>>> : _impl.wait()
>>> : <VM_thread signals>
>>> : <VM_ThreadSuspend on target thread>
>>> : ~ThreadBlockInVM
>>> : trans(_thread_blocked, _thread_in_vm)
>>> : : transition(_thread, from, to)
>>> : : set _thread_blocked+1
>>> : : SafepointMechanism::block_if_requested(thread)
>>> : : :
>>> SafepointMechanism::block_if_requested_slow(thread)
>>> : : : SafepointMechanism::block_or_handshake()
>>> : : : if (global_poll()) {
>>> : : : SafepointSynchronize::block(thread)
>>> : : : make_walkable()
>>> : : : set _thread_blocked
>>> : : : _wait_barrier->wait()
>>> : : : // wait is notified when the
>>> safepoint is over
>>> : : : restore _thread_blocked+1 //
>>> _thread_blocked_trans
>>> : : : skip the call to
>>> handle_special_runtime_exit_condition()
>>> : : : because state ==
>>> _thread_blocked_trans
>>> : : // return all the way back to transition()
>>> : : set _thread_in_vm
>>> ~ThreadInVMForHandshake
>>> transition_back()
>>> set _thread_in_vm_trans
>>> SafepointMechanism::block_if_requested()
>>> set _original_state
>>> // Now we return all the way back to handle_polling_page_exception().
>>> // If we came from a _thread_in_Java state, then we just returned to
>>> // that state without self suspending.
>>> // The fix adds this:
>>> if (_original_state != _thread_blocked_trans &&
>>> _original_state != _thread_in_vm_trans &&
>>> _thread->has_special_runtime_exit_condition()) {
>>> _thread->handle_special_runtime_exit_condition(
>>> !_thread->is_at_poll_safepoint() &&
>>> (_original_state !=
>>> _thread_in_native_trans));
>>>
>>> And handle_special_runtime_exit_condition() does this:
>>>
>>> bool do_self_suspend = is_external_suspend_with_lock();
>>> if (do_self_suspend && (!AllowJNIEnvProxy || this ==
>>> JavaThread::current())) {
>>> frame_anchor()->make_walkable(this);
>>> java_suspend_self_with_safepoint_check();
>>> }
>>>
>>> Okay. Now I grok where the bug is and how your fix handles it.
>>> Thumbs up on the fix itself!
>>>
>>> You don't mention how you tested this fix. You also don't say
>>> whether you have a test that reproduces this issue.
>>>
>>> Obviously, we also need to hear from Robbin on this thread.
>>>
>>> Dan
>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8223572
>>>>
>>>> A java thread T can continue executing bytecodes after it was
>>>> suspended using the vm operation
>>>> VM_ThreadSuspend. This can happen in
>>>> HandshakeState::process_self_inner() when T becomes safepoint
>>>> safe calling _semaphore.wait_with_safepoint_check(thread).
>>>>
>>>> Fix: poll has_special_runtime_exit_condition() and conditionally call
>>>> handle_special_runtime_exit_condition() as it is done in
>>>> SafepointSynchronize::block(), too.
>>>>
>>>> Thanks, Richard.
>>>>
>>>> PS: see also mail thread
>>>> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034144.html
>>>>
>>>
More information about the hotspot-runtime-dev
mailing list