RFR(XS) 8036823: Stack trace sometimes shows 'locked' instead of 'waiting to lock'

Thu Jun 12 02:38:11 UTC 2014

On 12/06/2014 12:14 PM, Daniel D. Daugherty wrote:
> Thanks for the very fast review!
>
> Would you be OK if I just went with what I have since I've tested
> that thoroughly? I've burned quite a few cycles on the original bug
> and this test and I'd like to get back to my primary task...

Sure, it was a nit :)

David

> Dan
>
>
> On 6/11/14 7:55 PM, David Holmes wrote:
>> Hi Dan,
>>
>> My only nit would be to use a CountDownLatch rather than roll your own
>> via wait/notify :)
>>
>> Cheers,
>> David
>>
>> On 12/06/2014 8:36 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Let's try this hopefully one last time:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8046287-webrev/1-jdk9-hs-rt/
>>> https://bugs.openjdk.java.net/browse/JDK-8046287
>>>
>>> Changes relative to the ORIGINAL version of the test:
>>>
>>> - added a new header waiting pattern to catch the case where
>>>    the target thread waiting on a condition (like a VM op)
>>> - add synchronization to the start-up of the contending threads
>>>    so that we don't start sampling while the contending threads
>>>    are initializing
>>> - add sanity check for observing only two "ContendingThread-*"
>>>    stack traces
>>>
>>> - rename some variables to make their use more clear
>>> - update/add various comments
>>> - add counters for the various checks and report a summary
>>>    of all the sampling runs
>>> - issue a warning if the specific scenario encountered by
>>>    the original bug (8036823) is never seen
>>>
>>> Testing:
>>>
>>> - JPRT test run of the test using product and fastdebug
>>>    bits on all the usual platforms
>>>
>>> - 3600 sample run with fastdebug bits:
>>>      INFO: Summary for all samples:
>>>      INFO: both_running_cnt=0
>>>      INFO: both_waiting_cnt=0
>>>      INFO: contended_cnt=2005
>>>      INFO: one_waiting_cnt=1405
>>>      INFO: uncontended_cnt=190
>>>
>>> - 3600 sample run with fastdebug bits w/ -Xcomp:
>>>      INFO: Summary for all samples:
>>>      INFO: both_running_cnt=0
>>>      INFO: both_waiting_cnt=0
>>>      INFO: contended_cnt=1867
>>>      INFO: one_waiting_cnt=1548
>>>      INFO: uncontended_cnt=185
>>>
>>> - 3600 sample run with fastdebug bits w/ -Xcomp -XX:+DeoptimizeALot:
>>>      INFO: Summary for all samples:
>>>      INFO: both_running_cnt=46
>>>      INFO: both_waiting_cnt=0
>>>      INFO: contended_cnt=3135
>>>      INFO: one_waiting_cnt=3
>>>      INFO: uncontended_cnt=416
>>>
>>> The contended_cnt is where we're hitting the original
>>> bug's scenario and we've got great coverage there.
>>> The other counts reflect how often we hit the edge
>>> cases...
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 6/9/14 10:04 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Nightly testing has revealed a bug in the test that reproduces
>>>> nicely when these options are used: -Xcomp -XX:+DeoptimizeALot
>>>>
>>>> Here's the webrev URL for the minor tweak to catch yet more
>>>> variation of the waiting pattern:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8046287-webrev/0-jdk9-hs-rt/
>>>>
>>>> Thanks to Vladimir K for reporting the test failure and for
>>>> providing the right details in the bug report.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 5/29/14 8:49 AM, Daniel D. Daugherty wrote:
>>>>> One more round of review after refactoring the test based on comments
>>>>> from David H and Serguei.
>>>>>
>>>>> Here's the webrev for this round:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/2-jdk9-hs-rt/
>>>>>
>>>>> Had to change the default sample size from 30 -> 15 in order to
>>>>> get the test to pass reliably on Solaris SPARC JPRT machines.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 5/22/14 10:18 PM, Daniel D. Daugherty wrote:
>>>>>> Zhengyu is tied up with some other work so I've taken on this fix.
>>>>>>
>>>>>> Here's the webrev URL for the next round:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/1-jdk9-hs-rt/
>>>>>>
>>>>>> The fix has been tested with vm.quick on all Aurora Adhoc platforms.
>>>>>> The new test has been run with the fix via JPRT and passes on all
>>>>>> JPRT platforms. The new test has also been run without the fix and
>>>>>> fails on most platforms. Since the default sample size is just 30,
>>>>>> it is possible to get 30 runs in a row without failing.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 5/19/14 7:58 AM, Zhengyu Gu wrote:
>>>>>>> This is a simple fix for incorrect lock state.
>>>>>>>
>>>>>>> The timing on setting thread's pending monitor can result stack
>>>>>>> trace dump reporting incorrect lock state. The solution is to check
>>>>>>> the monitor's owner, if the owner is other than the current thread,
>>>>>>> then the monitor, is or is in process of being, set the pending
>>>>>>> monitor of current thread.
>>>>>>>
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036823
>>>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8036823/webrev.00/
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -Zhengyu
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>