RFR(XS) 8036823: Stack trace sometimes shows 'locked' instead of 'waiting to lock'

Thu Jun 12 20:57:59 UTC 2014

Hi Dan,

Sorry for the latency.

As I understood, you put more control to the test for stability which is 
good.
It is not easy to follow all the details, but I do not see any issues.

Ship it! :)

Thanks,
Serguei

On 6/11/14 3:36 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> Let's try this hopefully one last time:
>
> http://cr.openjdk.java.net/~dcubed/8046287-webrev/1-jdk9-hs-rt/
> https://bugs.openjdk.java.net/browse/JDK-8046287
>
> Changes relative to the ORIGINAL version of the test:
>
> - added a new header waiting pattern to catch the case where
>   the target thread waiting on a condition (like a VM op)
> - add synchronization to the start-up of the contending threads
>   so that we don't start sampling while the contending threads
>   are initializing
> - add sanity check for observing only two "ContendingThread-*"
>   stack traces
>
> - rename some variables to make their use more clear
> - update/add various comments
> - add counters for the various checks and report a summary
>   of all the sampling runs
> - issue a warning if the specific scenario encountered by
>   the original bug (8036823) is never seen
>
> Testing:
>
> - JPRT test run of the test using product and fastdebug
>   bits on all the usual platforms
>
> - 3600 sample run with fastdebug bits:
>     INFO: Summary for all samples:
>     INFO: both_running_cnt=0
>     INFO: both_waiting_cnt=0
>     INFO: contended_cnt=2005
>     INFO: one_waiting_cnt=1405
>     INFO: uncontended_cnt=190
>
> - 3600 sample run with fastdebug bits w/ -Xcomp:
>     INFO: Summary for all samples:
>     INFO: both_running_cnt=0
>     INFO: both_waiting_cnt=0
>     INFO: contended_cnt=1867
>     INFO: one_waiting_cnt=1548
>     INFO: uncontended_cnt=185
>
> - 3600 sample run with fastdebug bits w/ -Xcomp -XX:+DeoptimizeALot:
>     INFO: Summary for all samples:
>     INFO: both_running_cnt=46
>     INFO: both_waiting_cnt=0
>     INFO: contended_cnt=3135
>     INFO: one_waiting_cnt=3
>     INFO: uncontended_cnt=416
>
> The contended_cnt is where we're hitting the original
> bug's scenario and we've got great coverage there.
> The other counts reflect how often we hit the edge
> cases...
>
> Thanks, in advance, for any comments, questions or suggestions.
>
> Dan
>
>
> On 6/9/14 10:04 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Nightly testing has revealed a bug in the test that reproduces
>> nicely when these options are used: -Xcomp -XX:+DeoptimizeALot
>>
>> Here's the webrev URL for the minor tweak to catch yet more
>> variation of the waiting pattern:
>>
>> http://cr.openjdk.java.net/~dcubed/8046287-webrev/0-jdk9-hs-rt/
>>
>> Thanks to Vladimir K for reporting the test failure and for
>> providing the right details in the bug report.
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
>>
>> On 5/29/14 8:49 AM, Daniel D. Daugherty wrote:
>>> One more round of review after refactoring the test based on comments
>>> from David H and Serguei.
>>>
>>> Here's the webrev for this round:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/2-jdk9-hs-rt/
>>>
>>> Had to change the default sample size from 30 -> 15 in order to
>>> get the test to pass reliably on Solaris SPARC JPRT machines.
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 5/22/14 10:18 PM, Daniel D. Daugherty wrote:
>>>> Zhengyu is tied up with some other work so I've taken on this fix.
>>>>
>>>> Here's the webrev URL for the next round:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8036823-webrev/1-jdk9-hs-rt/
>>>>
>>>> The fix has been tested with vm.quick on all Aurora Adhoc platforms.
>>>> The new test has been run with the fix via JPRT and passes on all
>>>> JPRT platforms. The new test has also been run without the fix and
>>>> fails on most platforms. Since the default sample size is just 30,
>>>> it is possible to get 30 runs in a row without failing.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 5/19/14 7:58 AM, Zhengyu Gu wrote:
>>>>> This is a simple fix for incorrect lock state.
>>>>>
>>>>> The timing on setting thread's pending monitor can result stack 
>>>>> trace dump reporting incorrect lock state. The solution is to 
>>>>> check the monitor's owner, if the owner is other than the current 
>>>>> thread, then the monitor, is or is in process of being, set the 
>>>>> pending monitor of current thread.
>>>>>
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036823
>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8036823/webrev.00/
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>