RFR(XS) 8036823: Stack trace sometimes shows 'locked' instead of 'waiting to lock'

Wed Jun 11 09:40:55 UTC 2014

On 11/06/2014 2:35 PM, Daniel D. Daugherty wrote:
> On 6/10/14 6:49 AM, Daniel D. Daugherty wrote:
>> David,
>>
>> Thanks for the fast code review!
>>
>>
>> On 6/10/14 1:12 AM, David Holmes wrote:
>>> On 10/06/2014 2:45 PM, serguei.spitsyn at oracle.com wrote:
>>>> It looks good to me.
>>>> It is still hard to guarantee there are no other unexpected thread
>>>> states,
>>>
>>> I also wonder about that.
>>>
>>> But I also question whether this is even possible:
>>>
>>>  253     // Example 2: (This example has not yet been spotted)
>>>  254     // "ContendingThread-1" #23 prio=5 os_prio=64
>>> tid=0x0000000001556000 nid=0x30 waiting on condition
>>> [0xfffffd7fbf17e000]
>>>  255     //    java.lang.Thread.State: BLOCKED (on object monitor)
>>>
>>> I don't see how you can be blocking on an object monitor but suddenly
>>> appear to be waiting on a condition ??
>>
>> That would be why I said "This example has not yet been spotted".
>> Yes, I'm being paranoid, but you knew that anyway :-)
>>
>>
>>> Looking at the comments in the bug, the Threads_lock is a mutex so it
>>> uses ParkEvent::park, which is PlatformEvent::park which doesn't use
>>> OSThreadWaitState.
>>
>> I'll chase this down with a native stack trace to see where
>> we are exactly. I don't know if I'll be able to get that done
>> today...
>
> So I chased this down to the VM_Deopt call that results from
> the -XX:+DeoptimizeALot option. So yes, my original comment
> about Parker::park() is wrong, but the bug now has the gory
> details of how we really got there...

Yep - nasty stuff :)

Thanks,
David

> Dan
>
>
>>
>> Dan
>>
>>
>>>
>>> David
>>> -----
>>>
>>>
>>>> but running for 3600 samples with no fails is convincing.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>> On 6/9/14 9:09 PM, Daniel D. Daugherty wrote:
>>>>> I forgot to include a handy link to the bug:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8046287
>>>>>
>>>>> Also, the new test was run for 3600 samples with the
>>>>> -Xcomp and -XX:+DeoptimizeALot options which takes
>>>>> about 45 minutes. With those options, the bad test
>>>>> would fail in < 15 samples.
>>>>>
>>>>> The new test was also run for 3600 samples in the
>>>>> default config which takes about 42 minutes and did
>>>>> not fail in that config.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 6/9/14 10:04 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Nightly testing has revealed a bug in the test that reproduces
>>>>>> nicely when these options are used: -Xcomp -XX:+DeoptimizeALot
>>>>>>
>>>>>> Here's the webrev URL for the minor tweak to catch yet more
>>>>>> variation of the waiting pattern:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8046287-webrev/0-jdk9-hs-rt/
>>>>>>
>>>>>> Thanks to Vladimir K for reporting the test failure and for
>>>>>> providing the right details in the bug report.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 5/29/14 8:49 AM, Daniel D. Daugherty wrote:
>>>>>>> One more round of review after refactoring the test based on
>>>>>>> comments
>>>>>>> from David H and Serguei.
>>>>>>>
>>>>>>> Here's the webrev for this round:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/2-jdk9-hs-rt/
>>>>>>>
>>>>>>> Had to change the default sample size from 30 -> 15 in order to
>>>>>>> get the test to pass reliably on Solaris SPARC JPRT machines.
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 5/22/14 10:18 PM, Daniel D. Daugherty wrote:
>>>>>>>> Zhengyu is tied up with some other work so I've taken on this fix.
>>>>>>>>
>>>>>>>> Here's the webrev URL for the next round:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/1-jdk9-hs-rt/
>>>>>>>>
>>>>>>>> The fix has been tested with vm.quick on all Aurora Adhoc
>>>>>>>> platforms.
>>>>>>>> The new test has been run with the fix via JPRT and passes on all
>>>>>>>> JPRT platforms. The new test has also been run without the fix and
>>>>>>>> fails on most platforms. Since the default sample size is just 30,
>>>>>>>> it is possible to get 30 runs in a row without failing.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/19/14 7:58 AM, Zhengyu Gu wrote:
>>>>>>>>> This is a simple fix for incorrect lock state.
>>>>>>>>>
>>>>>>>>> The timing on setting thread's pending monitor can result stack
>>>>>>>>> trace dump reporting incorrect lock state. The solution is to
>>>>>>>>> check the monitor's owner, if the owner is other than the current
>>>>>>>>> thread, then the monitor, is or is in process of being, set the
>>>>>>>>> pending monitor of current thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036823
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8036823/webrev.00/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -Zhengyu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>
>>
>>
>