RFR 8150689: Thread dump report "waiting to re-lock in wait()" shows incorrectly
Patricio Chilano
patricio.chilano.mateo at oracle.com
Mon Nov 26 19:21:15 UTC 2018
Hi Dan,
I reverted the change on safepoint.cpp and just removed the extra
"print_thread_state_on()" line on thread.cpp.
Here is the new webrev:
http://cr.openjdk.java.net/~pchilanomate/8150689.02/webrev
Is this reviewed?
Thanks!
Patricio
On 11/21/18 7:18 PM, Daniel D. Daugherty wrote:
> On 11/21/18 5:45 PM, Patricio Chilano wrote:
>> Hi Dan,
>>
>> On 11/21/18 2:52 PM, Daniel D. Daugherty wrote:
>>> On 11/20/18 12:51 PM, Patricio Chilano wrote:
>>>> Hi all,
>>>>
>>>> Could you review this small fix for thread dump reports?
>>>> Now the "waiting to re-lock in wait()" message is shown in the "at
>>>> java.lang.Object.wait(Native Method)" frame where the re-locking is
>>>> actually occurring. In the frame where the lock was first taken we
>>>> always show "locked". Please check if there is some scenario where
>>>> you think the lock info on the stack is not being printed as you
>>>> would expect.
>>>>
>>>> I also noticed the _thread_state attribute was printed twice so I
>>>> removed one instance. I don't know if there was a specific reason
>>>> to print it twice so I can add it back.
>>>>
>>>> Webrev URL: http://cr.openjdk.java.net/~pchilanomate/8150689.01/webrev
>>>
>>> src/hotspot/share/runtime/safepoint.cpp
>>> new L1122: _thread->print_thread_state_on(st);
>>> L1123: st->print_cr("Thread: " INTPTR_FORMAT
>>> L1124: " [0x%2x] State: %s _has_called_back
>>> %d _at_poll_safepoint %d",
>>> L1125: p2i(_thread),
>>> _thread->osthread()->thread_id(), s, _has_called_back,
>>> L1126: _at_poll_safepoint);
>>> old L1127: _thread->print_thread_state_on(st);
>>>
>>> The reason for moving the line isn't jumping out at me (yet).
>>> Update: see next file's comments.
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>> L2945: #ifndef PRODUCT
>>> old L2946: print_thread_state_on(st);
>>> L2947: _safepoint_state->print_on(st);
>>> L2948: #endif // PRODUCT
>>>
>>> Okay so now I understand the change to safepoint.cpp. I agree
>>> with deleting L2946 from thread.cpp because it would result in
>>> two print_thread_state_on() calls in non-PRODUCT bits.
>>>
>>> However, I don't agree with moving the print_thread_state_on()
>>> call in safepoint.cpp. If you do that then you'll change the
>>> output order for the "Thread: ..." line to be after the
>>> print_thread_state_on() output for all build configs.
>> We could keep it in the same order in
>> ThreadSafepointState::print_on() and if JavaThreadState and
>> ThreadStatus need to be printed together maybe we could add a flag to
>> decide whether to print the JavaThreadState in
>> ThreadSafepointState::print_on(). (?)
>>
>>> So I just took a look at the output for baseline release bits
>>> on my MBP and I'm surprised to see this:
>>>
>>> "main" #1 prio=5 os_prio=31 cpu=121.30ms elapsed=1.41s
>>> tid=0x00007f8abb808800 nid=0x2003 waiting on condition
>>> [0x0000000100aff000]
>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> JavaThread state: _thread_blocked
>>> Thread: 0x00007f8abb808800 [0x2003] State: _at_safepoint
>>> _has_called_back 0 _at_poll_safepoint 0
>>> JavaThread state: _thread_blocked
>>> at java.lang.Thread.sleep(java.base at 12-internal/Native Method)
>>> at Sleeper.main(Sleeper.java:5)
>>>
>>> So even in baseline 'release' bits, I'm seeing two of these:
>>>
>>> JavaThread state: _thread_blocked
>>>
>>> I don't know what to think here. Can you explain why release
>>> bits are also showing double print_thread_state_on() output?
>>> Maybe I'm not understanding about the thread.cpp code.
>> Odd, I did the same test in my MBP for a "release" build but I don't
>> see the extra output. I also tested it for Linux and it doesn't show
>> the extra output either. Can you confirm this?
>
> I just reran my Sleeper.java and did a CTRL-\ thread dump on
> both Linux and MacOS with release bits. This time I did not
> get the extra output with either...
>
> I just scrolled back thru my window to see what I did... I
> set JAVA_HOME to the release bits and then I pasted the full
> path to the fastdebug bits on my MBP... So instead of using
> $JAVA_HOME/bin/java Sleeper... I ran the wrong version...
>
> Sorry for the confusion.
>
> Dan
>
>
>>
>>> So I went back to JDK8 release bits on my Solaris box:
>>>
>>> "main" #1 prio=5 os_prio=64 tid=0x000000000041e800 nid=0x2 waiting
>>> on condition [0xffff80ffbf1ae000]
>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> at java.lang.Thread.sleep(Native Method)
>>> at Sleeper.main(Sleeper.java:5)
>>>
>>> So there's no "JavaThread state: _thread_blocked" line at all.
>>> I'm guessing we added that to 'release' bits output after JDK8.
>>>
>>> And here's the JDK8 fastdebug bits on my Solaris box:
>>>
>>> "main" #1 prio=5 os_prio=64 tid=0x0000000000421800 nid=0x2 waiting
>>> on condition [0xffff80ffbf1ae000]
>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> JavaThread state: _thread_blocked
>>> Thread: 0x0000000000421800 [0x 2] State: _at_safepoint
>>> _has_called_back 0 _at_poll_safepoint 0
>>> JavaThread state: _thread_blocked
>>> at java.lang.Thread.sleep(Native Method)
>>> at Sleeper.main(Sleeper.java:5)
>>>
>>> Short version: I agree that there are two "JavaThread state:
>>> _thread_blocked"
>>> and we should fix that. I have no preference on whether the
>>> line is before or
>>> after the "Thread: ..." line. Others may have an opinion.
>>>
>>> src/hotspot/share/runtime/vframe.cpp
>>> I agree with the comment rewrite on L174-6.
>> I actually kept thinking about the comment I added in lines
>> L174-L176, but now I think it's not completely true. Because although
>> that window of times exists there is no safepoint check inside it so
>> the VMthread would never be able to stop the JavaThread during that
>> time. Maybe I should remove that comment?
>>
>>> I like this new code:
>>>
>>> L182:
>>> if(java_lang_Thread::get_thread_status(thread()->threadObj()) ==
>>> java_lang_Thread::BLOCKED_ON_MONITOR_ENTER){
>>> L183: wait_state = "waiting to re-lock in wait()";
>>> L184: }
>>> Needs space between 'if' and '(' and between ')' and '{'.
>> Done!
>>>
>>> I agree with deleting old L232->250. That's the code that was
>>> causing the confusing "waiting to re-lock in wait()" output
>>> in the frame(s) after frame 0.
>>>
>>> Thumbs up!
>>>
>>> As David says, having some example current and new output in the bug
>>> report
>>> would help.
>> Done!
>>
>>
>> Thanks!
>> Patricio
>>
>>> Dan
>>>
>>>
>>>> Bug URL: https://bugs.openjdk.java.net/browse/JDK-8150689
>>>>
>>>> Thanks,
>>>> Patricio
>>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list