RFR 8150689: Thread dump report "waiting to re-lock in wait()" shows incorrectly

Mon Nov 26 19:56:08 UTC 2018

On 11/26/18 2:21 PM, Patricio Chilano wrote:
> Hi Dan,
>
> I reverted the change on safepoint.cpp and just removed the extra 
> "print_thread_state_on()" line  on thread.cpp.
> Here is the new webrev:
>
> http://cr.openjdk.java.net/~pchilanomate/8150689.02/webrev

src/hotspot/share/runtime/thread.cpp
     Okay. So the extra print_thread_state_on() call output that would 
appear
     before the "Thread: ..." output line from _safepoint_state->print_on()
     is now gone. Good.

     The print_thread_state_on() output will still appear after the 
"Thread: ..."
     line in a _safepoint_state->print_on() call which means that safepoint
     logging will still have that info. That should address David's concern.

src/hotspot/share/runtime/vframe.cpp
     No comments.

Thumbs up on the code change. Maybe I missed it, but it is not clear
from a quick scan of this email what kind of testing was done. I'm
strongly hoping that we don't have any tests that rely on either:

   - print_thread_state_on() output happening before "Thread: ..."
   - two print_thread_state_on() output line

> Is this reviewed?

Yes, but please wait for David H. to chime in and confirm also.

Dan

>
> Thanks!
> Patricio
>
> On 11/21/18 7:18 PM, Daniel D. Daugherty wrote:
>> On 11/21/18 5:45 PM, Patricio Chilano wrote:
>>> Hi Dan,
>>>
>>> On 11/21/18 2:52 PM, Daniel D. Daugherty wrote:
>>>> On 11/20/18 12:51 PM, Patricio Chilano wrote:
>>>>> Hi all,
>>>>>
>>>>> Could you review this small fix for thread dump reports?
>>>>> Now the "waiting to re-lock in wait()" message is shown in the "at 
>>>>> java.lang.Object.wait(Native Method)" frame where the re-locking 
>>>>> is actually occurring. In the frame where the lock was first taken 
>>>>> we always show "locked". Please check if there is some scenario 
>>>>> where you think the lock info on the stack is not being printed as 
>>>>> you would expect.
>>>>>
>>>>> I also noticed the _thread_state attribute was printed twice so I 
>>>>> removed one instance. I don't know if there was a specific reason 
>>>>> to print it twice so I can add it back.
>>>>>
>>>>> Webrev URL: 
>>>>> http://cr.openjdk.java.net/~pchilanomate/8150689.01/webrev
>>>>
>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>     new L1122:   _thread->print_thread_state_on(st);
>>>>         L1123:   st->print_cr("Thread: " INTPTR_FORMAT
>>>>         L1124:               "  [0x%2x] State: %s _has_called_back 
>>>> %d _at_poll_safepoint %d",
>>>>         L1125:                p2i(_thread), 
>>>> _thread->osthread()->thread_id(), s, _has_called_back,
>>>>         L1126:                _at_poll_safepoint);
>>>>     old L1127:   _thread->print_thread_state_on(st);
>>>>
>>>>         The reason for moving the line isn't jumping out at me (yet).
>>>>         Update: see next file's comments.
>>>>
>>>> src/hotspot/share/runtime/thread.cpp
>>>>         L2945: #ifndef PRODUCT
>>>>     old L2946:   print_thread_state_on(st);
>>>>         L2947:   _safepoint_state->print_on(st);
>>>>         L2948: #endif // PRODUCT
>>>>
>>>>         Okay so now I understand the change to safepoint.cpp. I agree
>>>>         with deleting L2946 from thread.cpp because it would result in
>>>>         two print_thread_state_on() calls in non-PRODUCT bits.
>>>>
>>>>         However, I don't agree with moving the print_thread_state_on()
>>>>         call in safepoint.cpp. If you do that then you'll change the
>>>>         output order for the "Thread: ..." line to be after the
>>>>         print_thread_state_on() output for all build configs.
>>> We could keep it in the same order in 
>>> ThreadSafepointState::print_on() and if JavaThreadState and 
>>> ThreadStatus need to be printed together maybe we could add a flag 
>>> to decide whether to print the JavaThreadState in 
>>> ThreadSafepointState::print_on(). (?)
>>>
>>>> So I just took a look at the output for baseline release bits
>>>>         on my MBP and I'm surprised to see this:
>>>>
>>>> "main" #1 prio=5 os_prio=31 cpu=121.30ms elapsed=1.41s 
>>>> tid=0x00007f8abb808800 nid=0x2003 waiting on condition 
>>>> [0x0000000100aff000]
>>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>    JavaThread state: _thread_blocked
>>>> Thread: 0x00007f8abb808800  [0x2003] State: _at_safepoint 
>>>> _has_called_back 0 _at_poll_safepoint 0
>>>>    JavaThread state: _thread_blocked
>>>>     at java.lang.Thread.sleep(java.base at 12-internal/Native Method)
>>>>     at Sleeper.main(Sleeper.java:5)
>>>>
>>>>         So even in baseline 'release' bits, I'm seeing two of these:
>>>>
>>>>            JavaThread state: _thread_blocked
>>>>
>>>>         I don't know what to think here. Can you explain why release
>>>>         bits are also showing double print_thread_state_on() output?
>>>>         Maybe I'm not understanding about the thread.cpp code.
>>> Odd, I did the same test in my MBP for a "release" build but I don't 
>>> see the extra output. I also tested it for Linux and it doesn't show 
>>> the extra output either. Can you confirm this?
>>
>> I just reran my Sleeper.java and did a CTRL-\ thread dump on
>> both Linux and MacOS with release bits. This time I did not
>> get the extra output with either...
>>
>> I just scrolled back thru my window to see what I did... I
>> set JAVA_HOME to the release bits and then I pasted the full
>> path to the fastdebug bits on my MBP... So instead of using
>> $JAVA_HOME/bin/java Sleeper... I ran the wrong version...
>>
>> Sorry for the confusion.
>>
>> Dan
>>
>>
>>>
>>>> So I went back to JDK8 release bits on my Solaris box:
>>>>
>>>> "main" #1 prio=5 os_prio=64 tid=0x000000000041e800 nid=0x2 waiting 
>>>> on condition [0xffff80ffbf1ae000]
>>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>         at java.lang.Thread.sleep(Native Method)
>>>>         at Sleeper.main(Sleeper.java:5)
>>>>
>>>>         So there's no "JavaThread state: _thread_blocked" line at all.
>>>>         I'm guessing we added that to 'release' bits output after 
>>>> JDK8.
>>>>
>>>>         And here's the JDK8 fastdebug bits on my Solaris box:
>>>>
>>>> "main" #1 prio=5 os_prio=64 tid=0x0000000000421800 nid=0x2 waiting 
>>>> on condition [0xffff80ffbf1ae000]
>>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>    JavaThread state: _thread_blocked
>>>> Thread: 0x0000000000421800  [0x 2] State: _at_safepoint 
>>>> _has_called_back 0 _at_poll_safepoint 0
>>>>    JavaThread state: _thread_blocked
>>>>         at java.lang.Thread.sleep(Native Method)
>>>>         at Sleeper.main(Sleeper.java:5)
>>>>
>>>>         Short version: I agree that there are two "JavaThread 
>>>> state: _thread_blocked"
>>>>         and we should fix that. I have no preference on whether the 
>>>> line is before or
>>>>         after the "Thread: ..." line. Others may have an opinion.
>>>>
>>>> src/hotspot/share/runtime/vframe.cpp
>>>>     I agree with the comment rewrite on L174-6.
>>> I actually kept thinking about the comment I added in lines 
>>> L174-L176, but now I think it's not completely true. Because 
>>> although that window of times exists there is no safepoint check 
>>> inside it so the VMthread would never be able to stop the JavaThread 
>>> during that time. Maybe I should remove that comment?
>>>
>>>>     I like this new code:
>>>>
>>>>     L182: 
>>>> if(java_lang_Thread::get_thread_status(thread()->threadObj()) == 
>>>> java_lang_Thread::BLOCKED_ON_MONITOR_ENTER){
>>>>     L183:             wait_state = "waiting to re-lock in wait()";
>>>>     L184:           }
>>>>         Needs space between 'if' and '(' and between ')' and '{'.
>>> Done!
>>>>
>>>>     I agree with deleting old L232->250. That's the code that was
>>>>     causing the confusing "waiting to re-lock in wait()" output
>>>>     in the frame(s) after frame 0.
>>>>
>>>> Thumbs up!
>>>>
>>>> As David says, having some example current and new output in the 
>>>> bug report
>>>> would help.
>>> Done!
>>>
>>>
>>> Thanks!
>>> Patricio
>>>
>>>> Dan
>>>>
>>>>
>>>>> Bug URL: https://bugs.openjdk.java.net/browse/JDK-8150689
>>>>>
>>>>> Thanks,
>>>>> Patricio
>>>>
>>>
>>>
>>
>