RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'

Tue Jan 7 13:50:07 UTC 2014

Hi David/Martin,

If you agree with Kalyan's fix for this issue, could one of you please sponsor the push.

Thanks,
Sandeep

On Dec 23, 2013, at 11:17 AM, srikalyan chandrashekar <srikalyan.chandrashekar at oracle.com> wrote:

> Hi David/Martin, could any one of you sponsor this change for me?
> 
> ---
> Thanks
> kalyan
> 
> On 12/20/2013 10:28 PM, David Holmes wrote:
>> On 21/12/2013 4:19 AM, srikalyan wrote:
>>> Hi David, i retained only the changes to ITERS, ProbleMList.txt and
>>> upstream changes by Doug Lea(as pointed by Martin), could you please
>>> review the new change available here
>>> http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ 
>>> .
>> 
>> Ok.
>> 
>> Thanks,
>> David
>> 
>>> -- 
>>> Thanks
>>> kalyan
>>> Ph: (408)-585-8040
>>> 
>>> 
>>> On 12/19/13, 8:10 PM, David Holmes wrote:
>>>> Sorry Kalyan but I don't see the need for all the incidental changes
>>>> if the primary change is to just increase the iterations. I also don't
>>>> see why you need to do anything for BrokenBarrierException as it is
>>>> not expected to happen and the test should just fail if it does.
>>>> 
>>>> David
>>>> 
>>>> On 10/12/2013 6:15 AM, srikalyan wrote:
>>>>> Hi David/Martin a gentle reminder for review.
>>>>> 
>>>>> -- 
>>>>> Thanks
>>>>> kalyan
>>>>> Ph: (408)-585-8040
>>>>> 
>>>>> 
>>>>> On 12/2/13, 11:21 AM, srikalyan wrote:
>>>>>> Hi David, Thanks for the review, the new webrev is hosted at
>>>>>> http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ 
>>>>>> . Please see inline text.
>>>>>> 
>>>>>> On 11/20/13, 6:35 PM, David Holmes wrote:
>>>>>>> On 21/11/2013 10:28 AM, Martin Buchholz wrote:
>>>>>>>> I again tried and failed to reproduce a failure.  Even if I go whole
>>>>>>>> hog
>>>>>>>> and multiply TIMEOUT by 100 and divide ITERS by 100, the test
>>>>>>>> continues to
>>>>>>>> PASS.  Is it just me?!
>>>>>>> 
>>>>>>> I think you are going the wrong way Martin - you want the timeout to
>>>>>>> be smaller than the time it takes to execute ITERS.
>>>>>>> 
>>>>>>>> I don't think there's any reason to make result long.  It's not even
>>>>>>>> used
>>>>>>>> except to inhibit hotspot optimizations.
>>>>>>>> 
>>>>>>>> +        private volatile long result = 17;//Can get int overflow,so
>>>>>>>> using long
>>>>>>> 
>>>>>>> Further the subsequent use of += is incorrect as it is not an atomic
>>>>>>> operation. Even if we don't care about the value, it looks bad.
>>>>>> Made the necessary changes for atomic update.
>>>>>>> 
>>>>>>> I'm not sure result must be updated if we get a
>>>>>>> BrokenBarrierException either. Probably harmless, but necessary?
>>>>>> I retained it in the fix for completeness in updating the numbers,
>>>>>> please let me know if you still think otherwise.
>>>>>>> 
>>>>>>>> need to fix spelling and spacing below.
>>>>>>>> 
>>>>>>>> +                barrier.await();//If a BrokeBarrierException happens
>>>>>>>> here(due to
>>>>>>> 
>>>>>>> There are a number of style issues with spacing around comments.
>>>>>> Fixed the spelling error and styling issues.
>>>>>>> 
>>>>>>> And I don't think this change is sufficient to claim co-author status
>>>>>>> with Doug either ;-)
>>>>>> Removed the claim :)
>>>>>>> 
>>>>>>> The additional tracing may be useful but seems stylistically
>>>>>>> different from the rest of the code.
>>>>>> Retained the tracking to understand if it is again the timing issue
>>>>>> which is the cause in an event of a failure, however i can remove it
>>>>>> if you think it is not necessary (OR) include an alternate solution as
>>>>>> you may want to suggest.
>>>>>>> 
>>>>>>> Overall I'm suspicious that the changed timeout actually fixes
>>>>>>> anything - normally we need to add longer timeouts not shorter ones.
>>>>>>> Does this fail on a range of machines or only specific ones? Have we
>>>>>>> verified that the clocks/timers are behaving properly on those
>>>>>>> systems?
>>>>>> Here the time out is not about waiting for threads to complete
>>>>>> something but to "be interrupted" before being considered done, so we
>>>>>> decreased the timeout. However we now chose to increase the number of
>>>>>> iterations to 5000000 from 1000000(thanks to tristan for the
>>>>>> suggestion) instead of decreasing the timeout as done earlier because
>>>>>> the increasing iterations ensures the threads are busy for long time
>>>>>> curtailing the need to touch the timeout.
>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> David
>>>>>> 
>>>>>> -- 
>>>>>> Thanks
>>>>>> kalyan
>>>>>> Ph: (408)-585-8040
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Nov 20, 2013 at 11:52 AM, srikalyan <
>>>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>>>>>> 
>>>>>>>>>  Hi Martin , apologies for the delay , was trying to get help for
>>>>>>>>> hosting
>>>>>>>>> my webrev.  .  Please see inline text.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 11/19/13, 10:35 PM, Martin Buchholz wrote:
>>>>>>>>> 
>>>>>>>>> Hi Kalyan,
>>>>>>>>> 
>>>>>>>>>  None of us can review your changes yet because you haven't given
>>>>>>>>> us a
>>>>>>>>> URL of your webrev.
>>>>>>>>> 
>>>>>>>>> It is located here
>>>>>>>>> http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>  I've tried to make the jsr166 copy of CancelledLockLoops fail by
>>>>>>>>> adjusting ITERS and TIMEOUT radically up and down, but the test
>>>>>>>>> just keeps
>>>>>>>>> on passing for me.  Hints appreciated.
>>>>>>>>> 
>>>>>>>>> Bump up the timeout to 500ms and you will see a failure (i can
>>>>>>>>> see it
>>>>>>>>> consistently on my machine Linux 64bit,8GBRAM,dual cores, with
>>>>>>>>> JDK 1.8
>>>>>>>>> latest any promoted build).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar <
>>>>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>>>    Suggested Fix:
>>>>>>>>>>> a) Decrease the timeout from 100 to 50ms which will ensure that
>>>>>>>>>>> the test
>>>>>>>>>>> will pass even on faster machines
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>  This doesn't look like a permanent fix - it just makes the
>>>>>>>>> failing case
>>>>>>>>> rarer.
>>>>>>>>> 
>>>>>>>>> Thats true , the other way is to make the main thread wait on
>>>>>>>>> TIMEOUT
>>>>>>>>> after firing the interrupts instead of other way round, but that
>>>>>>>>> would be
>>>>>>>>> over-optimization which probably is not desirable as well.  The 50
>>>>>>>>> ms was
>>>>>>>>> arrived at empirically after running several 100 times on multiple
>>>>>>>>> configurations and did not cause failures.
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Thanks
>>>>>>>>> kalyan
>>>>>>>>> Ph: (408)-585-8040
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>