RFR: 8221207: Redo JDK-8218446 - SuspendAtExit hangs

Mon Mar 25 03:56:33 UTC 2019

Thanks Dan! Much appreciated.

I'll push this now.

David

On 25/03/2019 12:38 pm, Daniel D. Daugherty wrote:
> Hi David,
> 
> Stress run is done. Zero failures on both Linux-X64 and Solaris-X64.
> So this version of the fix looks good to me. I've attached the
> summary for both runs. I have more detailed logs for each run
> available (until I start the jdk-13+14 run later this week) if
> you happen to need any other info.
> 
> Dan
> 
> 
> 
> On 3/24/19 12:46 AM, Daniel D. Daugherty wrote:
>> My jdk-13+13 stress runs finished in just over 18 hours each.
>> Don't know why. Don't really care at the moment.
>>
>> I just kicked off a stress run for 8221207. Will keep you posted.
>>
>> Dan
>>
>>
>> On 3/21/19 6:33 PM, David Holmes wrote:
>>> Thanks Dan! webrev updated in place.
>>>
>>> I'll wait to push this until after your stress testing has been done.
>>>
>>> David
>>>
>>> On 22/03/2019 8:17 am, Daniel D. Daugherty wrote:
>>>> On 3/21/19 5:52 PM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>> Thanks again for the eagle-eyed analysis ...
>>>>>
>>>>> On 22/03/2019 5:57 am, Daniel D. Daugherty wrote:
>>>>>> On 3/21/19 4:47 AM, David Holmes wrote:
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8221207
>>>>>>>
>>>>>>> There was one small oversight in the original fix that led to 
>>>>>>> crashes, seen (randomly) in JDI tests. The safepoint check must 
>>>>>>> not happen if the thread-state is already _thread_in_native. I've 
>>>>>>> checked the thread-state on all call paths to confirm that.
>>>>>>>
>>>>>>> Incremental webrev from original fix: 
>>>>>>> http://cr.openjdk.java.net/~dholmes/8221207/webrev.inc/
>>>>>>>
>>>>>>> Full webrev: http://cr.openjdk.java.net/~dholmes/8221207/webrev/
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>      No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>      So here's the crashing stack:
>>>>>>
>>>>>>      V [libjvm.so+0x14b1fee] 
>>>>>> SafepointSynchronize::block(JavaThread*)+0xae
>>>>>>      V [libjvm.so+0x14ba78d] 
>>>>>> SafepointMechanism::block_if_requested_slow(JavaThread*)+0x6d
>>>>>>      V [libjvm.so+0x1637207] 
>>>>>> JavaThread::handle_special_runtime_exit_condition(bool)+0x67
>>>>>>      V [libjvm.so+0x1073b1e] 
>>>>>> JvmtiJavaThreadEventTransition::JvmtiJavaThreadEventTransition(JavaThread*)+0x1ae 
>>>>>>
>>>>>>      V [libjvm.so+0x1069990] 
>>>>>> JvmtiExport::post_class_prepare(JavaThread*, Klass*)+0x1b0
>>>>>>
>>>>>>      so we have a JvmtiJavaThreadEventTransition helper object to
>>>>>>      handle the transition from thread_in_vm -> thread_in_native:
>>>>>>
>>>>>>      src/hotspot/share/prims/jvmtiExport.cpp
>>>>>>
>>>>>>          class JvmtiJavaThreadEventTransition : StackObj {
>>>>>>          private:
>>>>>>            ResourceMark _rm;
>>>>>>            ThreadToNativeFromVM _transition;
>>>>>>            HandleMark _hm;
>>>>>>
>>>>>>          public:
>>>>>>            JvmtiJavaThreadEventTransition(JavaThread *thread) :
>>>>>>              _rm(),
>>>>>>              _transition(thread),
>>>>>>              _hm(thread)  {};
>>>>>>          };
>>>>>>
>>>>>>      so that's just a wrapper around ThreadToNativeFromVM:
>>>>>>
>>>>>>      src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>
>>>>>>          class ThreadToNativeFromVM : public ThreadStateTransition {
>>>>>>           public:
>>>>>>            ThreadToNativeFromVM(JavaThread *thread) : 
>>>>>> ThreadStateTransition(thread) {
>>>>>>              // We are leaving the VM at this point and going 
>>>>>> directly to native code.
>>>>>>              // Block, if we are in the middle of a safepoint 
>>>>>> synchronization.
>>>>>>              assert(!thread->owns_locks(), "must release all locks 
>>>>>> when leaving VM");
>>>>>> thread->frame_anchor()->make_walkable(thread);
>>>>>>              trans_and_fence(_thread_in_vm, _thread_in_native);
>>>>>>              // Check for pending. async. exceptions or suspends.
>>>>>>              if (_thread->has_special_runtime_exit_condition()) 
>>>>>> _thread->handle_special_runtime_exit_condition(false);
>>>>>>            }
>>>>>>
>>>>>>            ~ThreadToNativeFromVM() {
>>>>>>              trans_from_native(_thread_in_vm);
>>>>>> assert(!_thread->is_pending_jni_exception_check(), "Pending JNI 
>>>>>> Exception Check");
>>>>>>              // We don't need to clear_walkable because it will 
>>>>>> happen automagically when we return to java
>>>>>>            }
>>>>>>          };
>>>>>>
>>>>>>      so trans_and_fence() calls transition_and_fence() which
>>>>>>      does this:
>>>>>>
>>>>>>          static inline void transition_and_fence(JavaThread 
>>>>>> *thread, JavaThreadState from, JavaThreadState to) {
>>>>>>            assert(thread->thread_state() == from, "coming from 
>>>>>> wrong thread state");
>>>>>>            assert((from & 1) == 0 && (to & 1) == 0, "odd numbers 
>>>>>> are transitions states");
>>>>>>            // Change to transition state
>>>>>> thread->set_thread_state((JavaThreadState)(from + 1));
>>>>>>
>>>>>> InterfaceSupport::serialize_thread_state_with_handler(thread);
>>>>>>
>>>>>>            SafepointMechanism::block_if_requested(thread);
>>>>>>            thread->set_thread_state(to);
>>>>>>
>>>>>> CHECK_UNHANDLED_OOPS_ONLY(thread->clear_unhandled_oops();)
>>>>>>          }
>>>>>>
>>>>>>      So for this use of handle_special_runtime_exit_condition(false),
>>>>>>      a safepoint is already handled by the previous 
>>>>>> transition_and_fence()
>>>>>>      with the thread still in the right thread state. However, if 
>>>>>> that
>>>>>>      handle_special_runtime_exit_condition() honors a self-suspend
>>>>>>      request and there's another safepoint, then we run the risk of
>>>>>>      the VMThread seeing _thread_blocked during the self-suspend
>>>>>>      phase of the thread and then the thread will go ahead into
>>>>>>      thread_native without stopping for the safepoint.
>>>>>>
>>>>>>      Okay, but do we care? I don't think so. The thread will be off
>>>>>>      in native code and if it returns quickly and the safepoint is
>>>>>>      still active, then ~ThreadToNativeFromVM() should cause the
>>>>>>      thread to block for the safepoint.
>>>>>
>>>>> The way I looked at this is that we are trying to ensure that the 
>>>>> VMThread doesn't see _thread_blocked just before we restore the 
>>>>> true state, and allow the thread to escape the safepoint. In this 
>>>>> case the true state is _thread_in_native, which is safepoint-safe 
>>>>> just as _thread_blocked is, so it doesn't matter if the VMThread 
>>>>> sees the true state or _thread_blocked as the result is the same. 
>>>>> Hence for _thread_in_native we don't need to do the additional 
>>>>> safepoint check.
>>>>>
>>>>>>      So this is a long winded way of saying I think the revised
>>>>>>      fix is okay. :-)
>>>>>
>>>>> Okay - thanks :)
>>>>>
>>>>>>      You added this comment for the new if-statement:
>>>>>>
>>>>>>      +// But it's more complicated than that as not all initial 
>>>>>> thread-states are suitable for
>>>>>>      +// doing safepoint checks. Fortunately, _thread_in_native is 
>>>>>> the only unsuitable state we
>>>>>>      +// can encounter based on our two callers.
>>>>>>
>>>>>>      and I'm okay with it.
>>>>>>
>>>>>>      Please consider adding this comment:
>>>>>>
>>>>>>      +  if (state != _thread_in_native) {
>>>>>>           // _thread_in_native will block for a safepoint when it 
>>>>>> transitions back.
>>>>>>           SafepointMechanism::block_if_requested(this);
>>>>>>      +  }
>>>>>
>>>>> It's a correct statement but I'm not sure its really relevant to 
>>>>> the context. But I didn't explain in the code why _thread_in_native 
>>>>> is special. My "But it's more complicated ..." comment doesn't 
>>>>> explain things properly. How about:
>>>>>
>>>>> // However, not all initial-states are allowed when performing a
>>>>> // safepoint check, as we should never be blocking at a safepoint
>>>>> // whilst in those states. Of these 'bad' states only 
>>>>> _thread_in_native
>>>>> // is possible when executing this code (based on our two callers).
>>>>> // A thread that is _thread_in_native is already safepoint-safe and so
>>>>> // it doesn't matter whether the VMThread sees the _thread_blocked
>>>>> // state, or the _thread_in_native state, and so we don't need the
>>>>> // explicit safepoint check.
>>>>
>>>> I like it!
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> ?
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Thumbs up!
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Re-tested in mach5 tiers 1-3 and com/sun/jdi tests (but they 
>>>>>>> passed last time too.).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>