RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'

Mon Dec 23 19:17:43 UTC 2013

Hi David/Martin, could any one of you sponsor this change for me?

---
Thanks
kalyan

On 12/20/2013 10:28 PM, David Holmes wrote:
> On 21/12/2013 4:19 AM, srikalyan wrote:
>> Hi David, i retained only the changes to ITERS, ProbleMList.txt and
>> upstream changes by Doug Lea(as pointed by Martin), could you please
>> review the new change available here
>> http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/ 
>>
>> .
>
> Ok.
>
> Thanks,
> David
>
>> -- 
>> Thanks
>> kalyan
>> Ph: (408)-585-8040
>>
>>
>> On 12/19/13, 8:10 PM, David Holmes wrote:
>>> Sorry Kalyan but I don't see the need for all the incidental changes
>>> if the primary change is to just increase the iterations. I also don't
>>> see why you need to do anything for BrokenBarrierException as it is
>>> not expected to happen and the test should just fail if it does.
>>>
>>> David
>>>
>>> On 10/12/2013 6:15 AM, srikalyan wrote:
>>>> Hi David/Martin a gentle reminder for review.
>>>>
>>>> -- 
>>>> Thanks
>>>> kalyan
>>>> Ph: (408)-585-8040
>>>>
>>>>
>>>> On 12/2/13, 11:21 AM, srikalyan wrote:
>>>>> Hi David, Thanks for the review, the new webrev is hosted at
>>>>> http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ 
>>>>>
>>>>> . Please see inline text.
>>>>>
>>>>> On 11/20/13, 6:35 PM, David Holmes wrote:
>>>>>> On 21/11/2013 10:28 AM, Martin Buchholz wrote:
>>>>>>> I again tried and failed to reproduce a failure.  Even if I go 
>>>>>>> whole
>>>>>>> hog
>>>>>>> and multiply TIMEOUT by 100 and divide ITERS by 100, the test
>>>>>>> continues to
>>>>>>> PASS.  Is it just me?!
>>>>>>
>>>>>> I think you are going the wrong way Martin - you want the timeout to
>>>>>> be smaller than the time it takes to execute ITERS.
>>>>>>
>>>>>>> I don't think there's any reason to make result long.  It's not 
>>>>>>> even
>>>>>>> used
>>>>>>> except to inhibit hotspot optimizations.
>>>>>>>
>>>>>>> +        private volatile long result = 17;//Can get int 
>>>>>>> overflow,so
>>>>>>> using long
>>>>>>
>>>>>> Further the subsequent use of += is incorrect as it is not an atomic
>>>>>> operation. Even if we don't care about the value, it looks bad.
>>>>> Made the necessary changes for atomic update.
>>>>>>
>>>>>> I'm not sure result must be updated if we get a
>>>>>> BrokenBarrierException either. Probably harmless, but necessary?
>>>>> I retained it in the fix for completeness in updating the numbers,
>>>>> please let me know if you still think otherwise.
>>>>>>
>>>>>>> need to fix spelling and spacing below.
>>>>>>>
>>>>>>> +                barrier.await();//If a BrokeBarrierException 
>>>>>>> happens
>>>>>>> here(due to
>>>>>>
>>>>>> There are a number of style issues with spacing around comments.
>>>>> Fixed the spelling error and styling issues.
>>>>>>
>>>>>> And I don't think this change is sufficient to claim co-author 
>>>>>> status
>>>>>> with Doug either ;-)
>>>>> Removed the claim :)
>>>>>>
>>>>>> The additional tracing may be useful but seems stylistically
>>>>>> different from the rest of the code.
>>>>> Retained the tracking to understand if it is again the timing issue
>>>>> which is the cause in an event of a failure, however i can remove it
>>>>> if you think it is not necessary (OR) include an alternate 
>>>>> solution as
>>>>> you may want to suggest.
>>>>>>
>>>>>> Overall I'm suspicious that the changed timeout actually fixes
>>>>>> anything - normally we need to add longer timeouts not shorter ones.
>>>>>> Does this fail on a range of machines or only specific ones? Have we
>>>>>> verified that the clocks/timers are behaving properly on those
>>>>>> systems?
>>>>> Here the time out is not about waiting for threads to complete
>>>>> something but to "be interrupted" before being considered done, so we
>>>>> decreased the timeout. However we now chose to increase the number of
>>>>> iterations to 5000000 from 1000000(thanks to tristan for the
>>>>> suggestion) instead of decreasing the timeout as done earlier because
>>>>> the increasing iterations ensures the threads are busy for long time
>>>>> curtailing the need to touch the timeout.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>
>>>>> -- 
>>>>> Thanks
>>>>> kalyan
>>>>> Ph: (408)-585-8040
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 20, 2013 at 11:52 AM, srikalyan <
>>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>>>>>
>>>>>>>>   Hi Martin , apologies for the delay , was trying to get help for
>>>>>>>> hosting
>>>>>>>> my webrev.  .  Please see inline text.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/19/13, 10:35 PM, Martin Buchholz wrote:
>>>>>>>>
>>>>>>>> Hi Kalyan,
>>>>>>>>
>>>>>>>>   None of us can review your changes yet because you haven't given
>>>>>>>> us a
>>>>>>>> URL of your webrev.
>>>>>>>>
>>>>>>>> It is located here
>>>>>>>> http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   I've tried to make the jsr166 copy of CancelledLockLoops fail by
>>>>>>>> adjusting ITERS and TIMEOUT radically up and down, but the test
>>>>>>>> just keeps
>>>>>>>> on passing for me.  Hints appreciated.
>>>>>>>>
>>>>>>>> Bump up the timeout to 500ms and you will see a failure (i can
>>>>>>>> see it
>>>>>>>> consistently on my machine Linux 64bit,8GBRAM,dual cores, with
>>>>>>>> JDK 1.8
>>>>>>>> latest any promoted build).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar <
>>>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>>>>>>
>>>>>>>>>     Suggested Fix:
>>>>>>>>>> a) Decrease the timeout from 100 to 50ms which will ensure that
>>>>>>>>>> the test
>>>>>>>>>> will pass even on faster machines
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>   This doesn't look like a permanent fix - it just makes the
>>>>>>>> failing case
>>>>>>>> rarer.
>>>>>>>>
>>>>>>>> Thats true , the other way is to make the main thread wait on
>>>>>>>> TIMEOUT
>>>>>>>> after firing the interrupts instead of other way round, but that
>>>>>>>> would be
>>>>>>>> over-optimization which probably is not desirable as well.  The 50
>>>>>>>> ms was
>>>>>>>> arrived at empirically after running several 100 times on multiple
>>>>>>>> configurations and did not cause failures.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Thanks
>>>>>>>> kalyan
>>>>>>>> Ph: (408)-585-8040
>>>>>>>>
>>>>>>>>
>>>>>>>>