RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'

Fri Dec 20 18:19:39 UTC 2013

Hi David, i retained only the changes to ITERS, ProbleMList.txt and 
upstream changes by Doug Lea(as pointed by Martin), could you please 
review the new change available here 
http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ 
.

--
Thanks
kalyan
Ph: (408)-585-8040

On 12/19/13, 8:10 PM, David Holmes wrote:
> Sorry Kalyan but I don't see the need for all the incidental changes 
> if the primary change is to just increase the iterations. I also don't 
> see why you need to do anything for BrokenBarrierException as it is 
> not expected to happen and the test should just fail if it does.
>
> David
>
> On 10/12/2013 6:15 AM, srikalyan wrote:
>> Hi David/Martin a gentle reminder for review.
>>
>> -- 
>> Thanks
>> kalyan
>> Ph: (408)-585-8040
>>
>>
>> On 12/2/13, 11:21 AM, srikalyan wrote:
>>> Hi David, Thanks for the review, the new webrev is hosted at
>>> http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/
>>> . Please see inline text.
>>>
>>> On 11/20/13, 6:35 PM, David Holmes wrote:
>>>> On 21/11/2013 10:28 AM, Martin Buchholz wrote:
>>>>> I again tried and failed to reproduce a failure.  Even if I go whole
>>>>> hog
>>>>> and multiply TIMEOUT by 100 and divide ITERS by 100, the test
>>>>> continues to
>>>>> PASS.  Is it just me?!
>>>>
>>>> I think you are going the wrong way Martin - you want the timeout to
>>>> be smaller than the time it takes to execute ITERS.
>>>>
>>>>> I don't think there's any reason to make result long.  It's not even
>>>>> used
>>>>> except to inhibit hotspot optimizations.
>>>>>
>>>>> +        private volatile long result = 17;//Can get int overflow,so
>>>>> using long
>>>>
>>>> Further the subsequent use of += is incorrect as it is not an atomic
>>>> operation. Even if we don't care about the value, it looks bad.
>>> Made the necessary changes for atomic update.
>>>>
>>>> I'm not sure result must be updated if we get a
>>>> BrokenBarrierException either. Probably harmless, but necessary?
>>> I retained it in the fix for completeness in updating the numbers,
>>> please let me know if you still think otherwise.
>>>>
>>>>> need to fix spelling and spacing below.
>>>>>
>>>>> +                barrier.await();//If a BrokeBarrierException happens
>>>>> here(due to
>>>>
>>>> There are a number of style issues with spacing around comments.
>>> Fixed the spelling error and styling issues.
>>>>
>>>> And I don't think this change is sufficient to claim co-author status
>>>> with Doug either ;-)
>>> Removed the claim :)
>>>>
>>>> The additional tracing may be useful but seems stylistically
>>>> different from the rest of the code.
>>> Retained the tracking to understand if it is again the timing issue
>>> which is the cause in an event of a failure, however i can remove it
>>> if you think it is not necessary (OR) include an alternate solution as
>>> you may want to suggest.
>>>>
>>>> Overall I'm suspicious that the changed timeout actually fixes
>>>> anything - normally we need to add longer timeouts not shorter ones.
>>>> Does this fail on a range of machines or only specific ones? Have we
>>>> verified that the clocks/timers are behaving properly on those 
>>>> systems?
>>> Here the time out is not about waiting for threads to complete
>>> something but to "be interrupted" before being considered done, so we
>>> decreased the timeout. However we now chose to increase the number of
>>> iterations to 5000000 from 1000000(thanks to tristan for the
>>> suggestion) instead of decreasing the timeout as done earlier because
>>> the increasing iterations ensures the threads are busy for long time
>>> curtailing the need to touch the timeout.
>>>
>>>>
>>>> Thanks,
>>>> David
>>>
>>> -- 
>>> Thanks
>>> kalyan
>>> Ph: (408)-585-8040
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 20, 2013 at 11:52 AM, srikalyan <
>>>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>>>
>>>>>>   Hi Martin , apologies for the delay , was trying to get help for
>>>>>> hosting
>>>>>> my webrev.  .  Please see inline text.
>>>>>>
>>>>>>
>>>>>> On 11/19/13, 10:35 PM, Martin Buchholz wrote:
>>>>>>
>>>>>> Hi Kalyan,
>>>>>>
>>>>>>   None of us can review your changes yet because you haven't given
>>>>>> us a
>>>>>> URL of your webrev.
>>>>>>
>>>>>> It is located here
>>>>>> http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   I've tried to make the jsr166 copy of CancelledLockLoops fail by
>>>>>> adjusting ITERS and TIMEOUT radically up and down, but the test
>>>>>> just keeps
>>>>>> on passing for me.  Hints appreciated.
>>>>>>
>>>>>> Bump up the timeout to 500ms and you will see a failure (i can 
>>>>>> see it
>>>>>> consistently on my machine Linux 64bit,8GBRAM,dual cores, with 
>>>>>> JDK 1.8
>>>>>> latest any promoted build).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar <
>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>>>>
>>>>>>>     Suggested Fix:
>>>>>>>> a) Decrease the timeout from 100 to 50ms which will ensure that
>>>>>>>> the test
>>>>>>>> will pass even on faster machines
>>>>>>>>
>>>>>>>
>>>>>>   This doesn't look like a permanent fix - it just makes the
>>>>>> failing case
>>>>>> rarer.
>>>>>>
>>>>>> Thats true , the other way is to make the main thread wait on 
>>>>>> TIMEOUT
>>>>>> after firing the interrupts instead of other way round, but that
>>>>>> would be
>>>>>> over-optimization which probably is not desirable as well.  The 50
>>>>>> ms was
>>>>>> arrived at empirically after running several 100 times on multiple
>>>>>> configurations and did not cause failures.
>>>>>>
>>>>>> -- 
>>>>>> Thanks
>>>>>> kalyan
>>>>>> Ph: (408)-585-8040
>>>>>>
>>>>>>
>>>>>>