Fwd: RFR: 8233375: JFR emergency dump do not recover thread state

David Holmes david.holmes at oracle.com
Tue Nov 5 05:34:48 UTC 2019


On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
> Hi David,
> 
>> It would be cleaner/simpler if prepare_for_emergency_dump takes the 
>> thread argument. As it is just a static function that doesn't impact 
>> anything else.
> 
> prepare_for_emergency_dump() returns false if some critical locks could 
> not unlock.
> So what should we return if NULL is passed as the argument? true?

But you're not calling prepare_for_emergency_dump when the thread is NULL:

  454   Thread* thread = Thread::current_or_null_safe();
  455
  456   // Ensure a JavaThread is _thread_in_vm when we make this call
  457   JavaThreadInVM jtivm(thread);
  458   if ((thread != NULL) && !prepare_for_emergency_dump()) {
  459     return;
  460   }

All I'm saying is that you pass "thread" as a parameter so you can then 
delete the existing call to Thread::current() that is inside 
prepare_for_emergency_dump.

David
-----

> It might break semantics of this function, so I did not add argument to 
> prepare_for_emergency_dump() in this webrev.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2019/11/05 13:56, David Holmes wrote:
>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>> On 2019/11/05 9:17, David Holmes wrote:
>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>> Hi Markus,
>>>>>>
>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>
>>>>>>    http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>
>>>>>> If it passes all tests, I will send review request again.
>>>>>
>>>>> This change passed all tests on submit repo 
>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>> Could you review again?
>>>>>
>>>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>
>>>>> In Markus's change, emergency dump will not perform when 
>>>>> Thread::current_or_null_safe() returns NULL.
>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to 
>>>>> PID by `kill` command - main thread of the process will be already 
>>>>> detached (out of JVM).
>>>>> Also the crash might happen in native thread - created by 
>>>>> pthread_create (on Linux) from JNI code.
>>>>>
>>>>> Thus we should continue to perform emergency dump even if 
>>>>> Thread::current_or_null_safe() returns NULL.
>>>>
>>>> I didn't quite follow all that, but if there is no current thread 
>>>> then prepare_for_emergency_dump() is either going to assert here:
>>>>
>>>>   348   Thread* const thread = Thread::current();
>>>>
>>>> or crash here:
>>>>
>>>>   349   if (thread->is_Watcher_thread()) {
>>>
>>> Thanks David!
>>> I fixed it in new webrev:
>>>
>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>
>> It would be cleaner/simpler if prepare_for_emergency_dump takes the 
>> thread argument. As it is just a static function that doesn't impact 
>> anything else.
>>
>> Thanks,
>> David
>>
>>> It works fine on submit repo 
>>> (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>
>>>
>>> Yasumasa
>>>
>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>> Hi Yasumasa and David,
>>>>>>>
>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I 
>>>>>>> realized later that, as you have pointed out, it would perform a 
>>>>>>> real thread transition. Sorry.
>>>>>>>
>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I 
>>>>>>> have seen at this location, I think the only case we need to be 
>>>>>>> concerned about here is when a JavaThread is _thread_in_native. 
>>>>>>> _thread_in_java transition to _thread_in_vm via stubs in 
>>>>>>> SharedRuntime (i believe) as part of coming out of the exception 
>>>>>>> handler(s). Unfortunately I cannot give a proper argument now to 
>>>>>>> give the premises where this invariant is enforced, so let's work 
>>>>>>> with the original thread state as you suggested Yasumasa.
>>>>>>>
>>>>>>> If we can avoid passing the thread all the way through, I think 
>>>>>>> that is preferable (this is not performance critical code). David 
>>>>>>> also alluded to the fact that you always manipulate the current 
>>>>>>> thread anyway. Although very unlikely, we could have run into an 
>>>>>>> issue with thread local storage, so it makes sense to test this 
>>>>>>> up front. If we cannot read the thread local, the operations we 
>>>>>>> intend to perform will fail, so we might just bail out already.
>>>>>>>
>>>>>>> I took the liberty to tighten up the transition class a little 
>>>>>>> bit; you only need to restore the thread state if there was an 
>>>>>>> actual change.
>>>>>>>
>>>>>>> Perhaps we can do it like this?
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>
>>>>>>> Thanks for your patience investigating this
>>>>>>>
>>>>>>> Markus
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes
>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund 
>>>>>>> <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; 
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Cc: yasuenag at gmail.com
>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>>>>> thread state
>>>>>>>
>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>
>>>>>>> I don't think it is worth the disruption to pass the "thread" all 
>>>>>>> the way through these API's. It is simpler/cleaner to just call
>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>
>>>>>>> 357   assert(thread->is_Java_thread() &&
>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>>>> "invariant");
>>>>>>>
>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee 
>>>>>>> that we need be in a JavaThread at all.
>>>>>>>
>>>>>>>    428 class ThreadInVMForJFR : public StackObj {
>>>>>>>
>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects 
>>>>>>> JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>
>>>>>>> Based on Markus's proposed change, and with a view to constrain 
>>>>>>> the scope even further can I suggest the following:
>>>>>>>
>>>>>>> if (!guard_reentrancy()) {
>>>>>>>     return;
>>>>>>> } else {
>>>>>>>     // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>     JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>     if (!prepare_for_emergency_dump()) {
>>>>>>>       return;
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>> Correction ...
>>>>>>>>
>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>
>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>
>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>
>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its 
>>>>>>>>>> arguments. So
>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>
>>>>>>>>> Not sure what you mean. You only ever manipulate the thread 
>>>>>>>>> state of
>>>>>>>>> the current thread.
>>>>>>>>>
>>>>>>>>>> In addition, ThreadInVMfromUnknown uses 
>>>>>>>>>> transition_from_native() to
>>>>>>>>>> change the thread state.
>>>>>>>>>> It checks (and manipulates?) something which relates to 
>>>>>>>>>> safepoint.
>>>>>>>>>
>>>>>>>>> Yes it does - which would be a problem if a safepoint (or 
>>>>>>>>> handshake)
>>>>>>>>> were pending. But the path through before_exit already has 
>>>>>>>>> safepoint
>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>
>>>>>>>> But that isn't relevant. The issue is we don't want a safepoint 
>>>>>>>> check
>>>>>>>> on the report_and_die() path. So a custom transition helper is 
>>>>>>>> needed
>>>>>>>> to avoid that.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>> running in a JavaThread:
>>>>>>>>>
>>>>>>>>>    349   Thread* const thread = Thread::current();
>>>>>>>>>    350   if (thread->is_Watcher_thread()) {
>>>>>>>>>
>>>>>>>>> so we can't use the existing thread-state helpers, unless we 
>>>>>>>>> narrow
>>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>
>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Markus
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>> recover
>>>>>>>>>>>> thread state
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Markus commented in JBS this change should be kept local to 
>>>>>>>>>>>> JFR.
>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>
>>>>>>>>>>>>      
>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>
>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR 
>>>>>>>>>>>>> code.
>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which 
>>>>>>>>>>>>> recover
>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover 
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> state
>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>      webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>>
>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>> However this function transits thread state to 
>>>>>>>>>>>>> "_thread_in_vm".
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>> However this test is for JIT compiler, and related issue 
>>>>>>>>>>>>> has been
>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa


More information about the hotspot-runtime-dev mailing list