RFR 8150689: Thread dump report "waiting to re-lock in wait()" shows incorrectly

Patricio Chilano patricio.chilano.mateo at oracle.com
Tue Nov 27 00:19:00 UTC 2018


Ok, thanks David!

Thanks,
Patricio
On 11/26/18 4:47 PM, David Holmes wrote:
> I'm fine with this final version.
>
> If there are tests lurking somewhere that parse a stack dump (other 
> than the jstack test we already found) then we will deal with them as 
> they arise.
>
> Thanks,
> David
>
> On 27/11/2018 7:31 am, Patricio Chilano wrote:
>> Hi Dan,
>>
>> On 11/26/18 2:56 PM, Daniel D. Daugherty wrote:
>>> On 11/26/18 2:21 PM, Patricio Chilano wrote:
>>>> Hi Dan,
>>>>
>>>> I reverted the change on safepoint.cpp and just removed the extra 
>>>> "print_thread_state_on()" line  on thread.cpp.
>>>> Here is the new webrev:
>>>>
>>>> http://cr.openjdk.java.net/~pchilanomate/8150689.02/webrev
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>     Okay. So the extra print_thread_state_on() call output that 
>>> would appear
>>>     before the "Thread: ..." output line from 
>>> _safepoint_state->print_on()
>>>     is now gone. Good.
>>>
>>>     The print_thread_state_on() output will still appear after the 
>>> "Thread: ..."
>>>     line in a _safepoint_state->print_on() call which means that 
>>> safepoint
>>>     logging will still have that info. That should address David's 
>>> concern.
>>>
>>> src/hotspot/share/runtime/vframe.cpp
>>>     No comments.
>>>
>>> Thumbs up on the code change. Maybe I missed it, but it is not clear
>>> from a quick scan of this email what kind of testing was done. I'm
>>> strongly hoping that we don't have any tests that rely on either:
>>>
>>>   - print_thread_state_on() output happening before "Thread: ..."
>>>   - two print_thread_state_on() output line
>> I run mach tiers1-3 and all the test passed. There is a test found by 
>> David (https://bugs.openjdk.java.net/browse/JDK-8214148) that should 
>> failed based on this change but it is still working because there is 
>> something else wrong with it. I fixed that test too but I will send 
>> it for the 8214148 RFR.
>>
>>>> Is this reviewed?
>>>
>>> Yes, but please wait for David H. to chime in and confirm also.
>> Ok, thanks Dan!
>>
>>
>> Thanks,
>> Patricio
>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks!
>>>> Patricio
>>>>
>>>> On 11/21/18 7:18 PM, Daniel D. Daugherty wrote:
>>>>> On 11/21/18 5:45 PM, Patricio Chilano wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> On 11/21/18 2:52 PM, Daniel D. Daugherty wrote:
>>>>>>> On 11/20/18 12:51 PM, Patricio Chilano wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Could you review this small fix for thread dump reports?
>>>>>>>> Now the "waiting to re-lock in wait()" message is shown in the 
>>>>>>>> "at java.lang.Object.wait(Native Method)" frame where the 
>>>>>>>> re-locking is actually occurring. In the frame where the lock 
>>>>>>>> was first taken we always show "locked". Please check if there 
>>>>>>>> is some scenario where you think the lock info on the stack is 
>>>>>>>> not being printed as you would expect.
>>>>>>>>
>>>>>>>> I also noticed the _thread_state attribute was printed twice so 
>>>>>>>> I removed one instance. I don't know if there was a specific 
>>>>>>>> reason to print it twice so I can add it back.
>>>>>>>>
>>>>>>>> Webrev URL: 
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8150689.01/webrev
>>>>>>>
>>>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>>>>     new L1122: _thread->print_thread_state_on(st);
>>>>>>>         L1123:   st->print_cr("Thread: " INTPTR_FORMAT
>>>>>>>         L1124:               "  [0x%2x] State: %s 
>>>>>>> _has_called_back %d _at_poll_safepoint %d",
>>>>>>>         L1125:                p2i(_thread), 
>>>>>>> _thread->osthread()->thread_id(), s, _has_called_back,
>>>>>>>         L1126:                _at_poll_safepoint);
>>>>>>>     old L1127: _thread->print_thread_state_on(st);
>>>>>>>
>>>>>>>         The reason for moving the line isn't jumping out at me 
>>>>>>> (yet).
>>>>>>>         Update: see next file's comments.
>>>>>>>
>>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>>         L2945: #ifndef PRODUCT
>>>>>>>     old L2946:   print_thread_state_on(st);
>>>>>>>         L2947:   _safepoint_state->print_on(st);
>>>>>>>         L2948: #endif // PRODUCT
>>>>>>>
>>>>>>>         Okay so now I understand the change to safepoint.cpp. I 
>>>>>>> agree
>>>>>>>         with deleting L2946 from thread.cpp because it would 
>>>>>>> result in
>>>>>>>         two print_thread_state_on() calls in non-PRODUCT bits.
>>>>>>>
>>>>>>>         However, I don't agree with moving the 
>>>>>>> print_thread_state_on()
>>>>>>>         call in safepoint.cpp. If you do that then you'll change 
>>>>>>> the
>>>>>>>         output order for the "Thread: ..." line to be after the
>>>>>>>         print_thread_state_on() output for all build configs.
>>>>>> We could keep it in the same order in 
>>>>>> ThreadSafepointState::print_on() and if JavaThreadState and 
>>>>>> ThreadStatus need to be printed together maybe we could add a 
>>>>>> flag to decide whether to print the JavaThreadState in 
>>>>>> ThreadSafepointState::print_on(). (?)
>>>>>>
>>>>>>> So I just took a look at the output for baseline release bits
>>>>>>>         on my MBP and I'm surprised to see this:
>>>>>>>
>>>>>>> "main" #1 prio=5 os_prio=31 cpu=121.30ms elapsed=1.41s 
>>>>>>> tid=0x00007f8abb808800 nid=0x2003 waiting on condition 
>>>>>>> [0x0000000100aff000]
>>>>>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>>>    JavaThread state: _thread_blocked
>>>>>>> Thread: 0x00007f8abb808800  [0x2003] State: _at_safepoint 
>>>>>>> _has_called_back 0 _at_poll_safepoint 0
>>>>>>>    JavaThread state: _thread_blocked
>>>>>>>     at java.lang.Thread.sleep(java.base at 12-internal/Native Method)
>>>>>>>     at Sleeper.main(Sleeper.java:5)
>>>>>>>
>>>>>>>         So even in baseline 'release' bits, I'm seeing two of 
>>>>>>> these:
>>>>>>>
>>>>>>>            JavaThread state: _thread_blocked
>>>>>>>
>>>>>>>         I don't know what to think here. Can you explain why 
>>>>>>> release
>>>>>>>         bits are also showing double print_thread_state_on() 
>>>>>>> output?
>>>>>>>         Maybe I'm not understanding about the thread.cpp code.
>>>>>> Odd, I did the same test in my MBP for a "release" build but I 
>>>>>> don't see the extra output. I also tested it for Linux and it 
>>>>>> doesn't show the extra output either. Can you confirm this?
>>>>>
>>>>> I just reran my Sleeper.java and did a CTRL-\ thread dump on
>>>>> both Linux and MacOS with release bits. This time I did not
>>>>> get the extra output with either...
>>>>>
>>>>> I just scrolled back thru my window to see what I did... I
>>>>> set JAVA_HOME to the release bits and then I pasted the full
>>>>> path to the fastdebug bits on my MBP... So instead of using
>>>>> $JAVA_HOME/bin/java Sleeper... I ran the wrong version...
>>>>>
>>>>> Sorry for the confusion.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>>> So I went back to JDK8 release bits on my Solaris box:
>>>>>>>
>>>>>>> "main" #1 prio=5 os_prio=64 tid=0x000000000041e800 nid=0x2 
>>>>>>> waiting on condition [0xffff80ffbf1ae000]
>>>>>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>>>         at java.lang.Thread.sleep(Native Method)
>>>>>>>         at Sleeper.main(Sleeper.java:5)
>>>>>>>
>>>>>>>         So there's no "JavaThread state: _thread_blocked" line 
>>>>>>> at all.
>>>>>>>         I'm guessing we added that to 'release' bits output 
>>>>>>> after JDK8.
>>>>>>>
>>>>>>>         And here's the JDK8 fastdebug bits on my Solaris box:
>>>>>>>
>>>>>>> "main" #1 prio=5 os_prio=64 tid=0x0000000000421800 nid=0x2 
>>>>>>> waiting on condition [0xffff80ffbf1ae000]
>>>>>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>>>>>    JavaThread state: _thread_blocked
>>>>>>> Thread: 0x0000000000421800  [0x 2] State: _at_safepoint 
>>>>>>> _has_called_back 0 _at_poll_safepoint 0
>>>>>>>    JavaThread state: _thread_blocked
>>>>>>>         at java.lang.Thread.sleep(Native Method)
>>>>>>>         at Sleeper.main(Sleeper.java:5)
>>>>>>>
>>>>>>>         Short version: I agree that there are two "JavaThread 
>>>>>>> state: _thread_blocked"
>>>>>>>         and we should fix that. I have no preference on whether 
>>>>>>> the line is before or
>>>>>>>         after the "Thread: ..." line. Others may have an opinion.
>>>>>>>
>>>>>>> src/hotspot/share/runtime/vframe.cpp
>>>>>>>     I agree with the comment rewrite on L174-6.
>>>>>> I actually kept thinking about the comment I added in lines 
>>>>>> L174-L176, but now I think it's not completely true. Because 
>>>>>> although that window of times exists there is no safepoint check 
>>>>>> inside it so the VMthread would never be able to stop the 
>>>>>> JavaThread during that time. Maybe I should remove that comment?
>>>>>>
>>>>>>>     I like this new code:
>>>>>>>
>>>>>>>     L182: 
>>>>>>> if(java_lang_Thread::get_thread_status(thread()->threadObj()) == 
>>>>>>> java_lang_Thread::BLOCKED_ON_MONITOR_ENTER){
>>>>>>>     L183:             wait_state = "waiting to re-lock in wait()";
>>>>>>>     L184:           }
>>>>>>>         Needs space between 'if' and '(' and between ')' and '{'.
>>>>>> Done!
>>>>>>>
>>>>>>>     I agree with deleting old L232->250. That's the code that was
>>>>>>>     causing the confusing "waiting to re-lock in wait()" output
>>>>>>>     in the frame(s) after frame 0.
>>>>>>>
>>>>>>> Thumbs up!
>>>>>>>
>>>>>>> As David says, having some example current and new output in the 
>>>>>>> bug report
>>>>>>> would help.
>>>>>> Done!
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> Patricio
>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>> Bug URL: https://bugs.openjdk.java.net/browse/JDK-8150689
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Patricio
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list