RFR 8150689: Thread dump report "waiting to re-lock in wait()" shows incorrectly
David Holmes
david.holmes at oracle.com
Mon Nov 26 21:47:45 UTC 2018
I'm fine with this final version.
If there are tests lurking somewhere that parse a stack dump (other than
the jstack test we already found) then we will deal with them as they arise.
Thanks,
David
On 27/11/2018 7:31 am, Patricio Chilano wrote:
> Hi Dan,
>
> On 11/26/18 2:56 PM, Daniel D. Daugherty wrote:
>> On 11/26/18 2:21 PM, Patricio Chilano wrote:
>>> Hi Dan,
>>>
>>> I reverted the change on safepoint.cpp and just removed the extra
>>> "print_thread_state_on()" line on thread.cpp.
>>> Here is the new webrev:
>>>
>>> http://cr.openjdk.java.net/~pchilanomate/8150689.02/webrev
>>
>> src/hotspot/share/runtime/thread.cpp
>> Okay. So the extra print_thread_state_on() call output that would
>> appear
>> before the "Thread: ..." output line from
>> _safepoint_state->print_on()
>> is now gone. Good.
>>
>> The print_thread_state_on() output will still appear after the
>> "Thread: ..."
>> line in a _safepoint_state->print_on() call which means that
>> safepoint
>> logging will still have that info. That should address David's
>> concern.
>>
>> src/hotspot/share/runtime/vframe.cpp
>> No comments.
>>
>> Thumbs up on the code change. Maybe I missed it, but it is not clear
>> from a quick scan of this email what kind of testing was done. I'm
>> strongly hoping that we don't have any tests that rely on either:
>>
>> - print_thread_state_on() output happening before "Thread: ..."
>> - two print_thread_state_on() output line
> I run mach tiers1-3 and all the test passed. There is a test found by
> David (https://bugs.openjdk.java.net/browse/JDK-8214148) that should
> failed based on this change but it is still working because there is
> something else wrong with it. I fixed that test too but I will send it
> for the 8214148 RFR.
>
>>> Is this reviewed?
>>
>> Yes, but please wait for David H. to chime in and confirm also.
> Ok, thanks Dan!
>
>
> Thanks,
> Patricio
>
>> Dan
>>
>>
>>>
>>> Thanks!
>>> Patricio
>>>
>>> On 11/21/18 7:18 PM, Daniel D. Daugherty wrote:
>>>> On 11/21/18 5:45 PM, Patricio Chilano wrote:
>>>>> Hi Dan,
>>>>>
>>>>> On 11/21/18 2:52 PM, Daniel D. Daugherty wrote:
>>>>>> On 11/20/18 12:51 PM, Patricio Chilano wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Could you review this small fix for thread dump reports?
>>>>>>> Now the "waiting to re-lock in wait()" message is shown in the
>>>>>>> "at java.lang.Object.wait(Native Method)" frame where the
>>>>>>> re-locking is actually occurring. In the frame where the lock was
>>>>>>> first taken we always show "locked". Please check if there is
>>>>>>> some scenario where you think the lock info on the stack is not
>>>>>>> being printed as you would expect.
>>>>>>>
>>>>>>> I also noticed the _thread_state attribute was printed twice so I
>>>>>>> removed one instance. I don't know if there was a specific reason
>>>>>>> to print it twice so I can add it back.
>>>>>>>
>>>>>>> Webrev URL:
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8150689.01/webrev
>>>>>>
>>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>>> new L1122: _thread->print_thread_state_on(st);
>>>>>> L1123: st->print_cr("Thread: " INTPTR_FORMAT
>>>>>> L1124: " [0x%2x] State: %s _has_called_back
>>>>>> %d _at_poll_safepoint %d",
>>>>>> L1125: p2i(_thread),
>>>>>> _thread->osthread()->thread_id(), s, _has_called_back,
>>>>>> L1126: _at_poll_safepoint);
>>>>>> old L1127: _thread->print_thread_state_on(st);
>>>>>>
>>>>>> The reason for moving the line isn't jumping out at me (yet).
>>>>>> Update: see next file's comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>> L2945: #ifndef PRODUCT
>>>>>> old L2946: print_thread_state_on(st);
>>>>>> L2947: _safepoint_state->print_on(st);
>>>>>> L2948: #endif // PRODUCT
>>>>>>
>>>>>> Okay so now I understand the change to safepoint.cpp. I agree
>>>>>> with deleting L2946 from thread.cpp because it would
>>>>>> result in
>>>>>> two print_thread_state_on() calls in non-PRODUCT bits.
>>>>>>
>>>>>> However, I don't agree with moving the
>>>>>> print_thread_state_on()
>>>>>> call in safepoint.cpp. If you do that then you'll change the
>>>>>> output order for the "Thread: ..." line to be after the
>>>>>> print_thread_state_on() output for all build configs.
>>>>> We could keep it in the same order in
>>>>> ThreadSafepointState::print_on() and if JavaThreadState and
>>>>> ThreadStatus need to be printed together maybe we could add a flag
>>>>> to decide whether to print the JavaThreadState in
>>>>> ThreadSafepointState::print_on(). (?)
>>>>>
>>>>>> So I just took a look at the output for baseline release bits
>>>>>> on my MBP and I'm surprised to see this:
>>>>>>
>>>>>> "main" #1 prio=5 os_prio=31 cpu=121.30ms elapsed=1.41s
>>>>>> tid=0x00007f8abb808800 nid=0x2003 waiting on condition
>>>>>> [0x0000000100aff000]
>>>>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>> JavaThread state: _thread_blocked
>>>>>> Thread: 0x00007f8abb808800 [0x2003] State: _at_safepoint
>>>>>> _has_called_back 0 _at_poll_safepoint 0
>>>>>> JavaThread state: _thread_blocked
>>>>>> at java.lang.Thread.sleep(java.base at 12-internal/Native Method)
>>>>>> at Sleeper.main(Sleeper.java:5)
>>>>>>
>>>>>> So even in baseline 'release' bits, I'm seeing two of these:
>>>>>>
>>>>>> JavaThread state: _thread_blocked
>>>>>>
>>>>>> I don't know what to think here. Can you explain why release
>>>>>> bits are also showing double print_thread_state_on() output?
>>>>>> Maybe I'm not understanding about the thread.cpp code.
>>>>> Odd, I did the same test in my MBP for a "release" build but I
>>>>> don't see the extra output. I also tested it for Linux and it
>>>>> doesn't show the extra output either. Can you confirm this?
>>>>
>>>> I just reran my Sleeper.java and did a CTRL-\ thread dump on
>>>> both Linux and MacOS with release bits. This time I did not
>>>> get the extra output with either...
>>>>
>>>> I just scrolled back thru my window to see what I did... I
>>>> set JAVA_HOME to the release bits and then I pasted the full
>>>> path to the fastdebug bits on my MBP... So instead of using
>>>> $JAVA_HOME/bin/java Sleeper... I ran the wrong version...
>>>>
>>>> Sorry for the confusion.
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>>> So I went back to JDK8 release bits on my Solaris box:
>>>>>>
>>>>>> "main" #1 prio=5 os_prio=64 tid=0x000000000041e800 nid=0x2 waiting
>>>>>> on condition [0xffff80ffbf1ae000]
>>>>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>> at java.lang.Thread.sleep(Native Method)
>>>>>> at Sleeper.main(Sleeper.java:5)
>>>>>>
>>>>>> So there's no "JavaThread state: _thread_blocked" line at
>>>>>> all.
>>>>>> I'm guessing we added that to 'release' bits output after
>>>>>> JDK8.
>>>>>>
>>>>>> And here's the JDK8 fastdebug bits on my Solaris box:
>>>>>>
>>>>>> "main" #1 prio=5 os_prio=64 tid=0x0000000000421800 nid=0x2 waiting
>>>>>> on condition [0xffff80ffbf1ae000]
>>>>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>> JavaThread state: _thread_blocked
>>>>>> Thread: 0x0000000000421800 [0x 2] State: _at_safepoint
>>>>>> _has_called_back 0 _at_poll_safepoint 0
>>>>>> JavaThread state: _thread_blocked
>>>>>> at java.lang.Thread.sleep(Native Method)
>>>>>> at Sleeper.main(Sleeper.java:5)
>>>>>>
>>>>>> Short version: I agree that there are two "JavaThread
>>>>>> state: _thread_blocked"
>>>>>> and we should fix that. I have no preference on whether
>>>>>> the line is before or
>>>>>> after the "Thread: ..." line. Others may have an opinion.
>>>>>>
>>>>>> src/hotspot/share/runtime/vframe.cpp
>>>>>> I agree with the comment rewrite on L174-6.
>>>>> I actually kept thinking about the comment I added in lines
>>>>> L174-L176, but now I think it's not completely true. Because
>>>>> although that window of times exists there is no safepoint check
>>>>> inside it so the VMthread would never be able to stop the
>>>>> JavaThread during that time. Maybe I should remove that comment?
>>>>>
>>>>>> I like this new code:
>>>>>>
>>>>>> L182:
>>>>>> if(java_lang_Thread::get_thread_status(thread()->threadObj()) ==
>>>>>> java_lang_Thread::BLOCKED_ON_MONITOR_ENTER){
>>>>>> L183: wait_state = "waiting to re-lock in wait()";
>>>>>> L184: }
>>>>>> Needs space between 'if' and '(' and between ')' and '{'.
>>>>> Done!
>>>>>>
>>>>>> I agree with deleting old L232->250. That's the code that was
>>>>>> causing the confusing "waiting to re-lock in wait()" output
>>>>>> in the frame(s) after frame 0.
>>>>>>
>>>>>> Thumbs up!
>>>>>>
>>>>>> As David says, having some example current and new output in the
>>>>>> bug report
>>>>>> would help.
>>>>> Done!
>>>>>
>>>>>
>>>>> Thanks!
>>>>> Patricio
>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>> Bug URL: https://bugs.openjdk.java.net/browse/JDK-8150689
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Patricio
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list