RFR(XS) 8036823: Stack trace sometimes shows 'locked' instead of 'waiting to lock'
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Jun 11 04:35:12 UTC 2014
On 6/10/14 6:49 AM, Daniel D. Daugherty wrote:
> David,
>
> Thanks for the fast code review!
>
>
> On 6/10/14 1:12 AM, David Holmes wrote:
>> On 10/06/2014 2:45 PM, serguei.spitsyn at oracle.com wrote:
>>> It looks good to me.
>>> It is still hard to guarantee there are no other unexpected thread
>>> states,
>>
>> I also wonder about that.
>>
>> But I also question whether this is even possible:
>>
>> 253 // Example 2: (This example has not yet been spotted)
>> 254 // "ContendingThread-1" #23 prio=5 os_prio=64
>> tid=0x0000000001556000 nid=0x30 waiting on condition
>> [0xfffffd7fbf17e000]
>> 255 // java.lang.Thread.State: BLOCKED (on object monitor)
>>
>> I don't see how you can be blocking on an object monitor but suddenly
>> appear to be waiting on a condition ??
>
> That would be why I said "This example has not yet been spotted".
> Yes, I'm being paranoid, but you knew that anyway :-)
>
>
>> Looking at the comments in the bug, the Threads_lock is a mutex so it
>> uses ParkEvent::park, which is PlatformEvent::park which doesn't use
>> OSThreadWaitState.
>
> I'll chase this down with a native stack trace to see where
> we are exactly. I don't know if I'll be able to get that done
> today...
So I chased this down to the VM_Deopt call that results from
the -XX:+DeoptimizeALot option. So yes, my original comment
about Parker::park() is wrong, but the bug now has the gory
details of how we really got there...
Dan
>
> Dan
>
>
>>
>> David
>> -----
>>
>>
>>> but running for 3600 samples with no fails is convincing.
>>>
>>> Thanks,
>>> Serguei
>>>
>>> On 6/9/14 9:09 PM, Daniel D. Daugherty wrote:
>>>> I forgot to include a handy link to the bug:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8046287
>>>>
>>>> Also, the new test was run for 3600 samples with the
>>>> -Xcomp and -XX:+DeoptimizeALot options which takes
>>>> about 45 minutes. With those options, the bad test
>>>> would fail in < 15 samples.
>>>>
>>>> The new test was also run for 3600 samples in the
>>>> default config which takes about 42 minutes and did
>>>> not fail in that config.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 6/9/14 10:04 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Nightly testing has revealed a bug in the test that reproduces
>>>>> nicely when these options are used: -Xcomp -XX:+DeoptimizeALot
>>>>>
>>>>> Here's the webrev URL for the minor tweak to catch yet more
>>>>> variation of the waiting pattern:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8046287-webrev/0-jdk9-hs-rt/
>>>>>
>>>>> Thanks to Vladimir K for reporting the test failure and for
>>>>> providing the right details in the bug report.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 5/29/14 8:49 AM, Daniel D. Daugherty wrote:
>>>>>> One more round of review after refactoring the test based on
>>>>>> comments
>>>>>> from David H and Serguei.
>>>>>>
>>>>>> Here's the webrev for this round:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/2-jdk9-hs-rt/
>>>>>>
>>>>>> Had to change the default sample size from 30 -> 15 in order to
>>>>>> get the test to pass reliably on Solaris SPARC JPRT machines.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 5/22/14 10:18 PM, Daniel D. Daugherty wrote:
>>>>>>> Zhengyu is tied up with some other work so I've taken on this fix.
>>>>>>>
>>>>>>> Here's the webrev URL for the next round:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/1-jdk9-hs-rt/
>>>>>>>
>>>>>>> The fix has been tested with vm.quick on all Aurora Adhoc
>>>>>>> platforms.
>>>>>>> The new test has been run with the fix via JPRT and passes on all
>>>>>>> JPRT platforms. The new test has also been run without the fix and
>>>>>>> fails on most platforms. Since the default sample size is just 30,
>>>>>>> it is possible to get 30 runs in a row without failing.
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 5/19/14 7:58 AM, Zhengyu Gu wrote:
>>>>>>>> This is a simple fix for incorrect lock state.
>>>>>>>>
>>>>>>>> The timing on setting thread's pending monitor can result stack
>>>>>>>> trace dump reporting incorrect lock state. The solution is to
>>>>>>>> check the monitor's owner, if the owner is other than the current
>>>>>>>> thread, then the monitor, is or is in process of being, set the
>>>>>>>> pending monitor of current thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036823
>>>>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8036823/webrev.00/
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -Zhengyu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
>
>
More information about the hotspot-dev
mailing list