RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'

Mon Dec 2 19:21:51 UTC 2013

Hi David, Thanks for the review, the new webrev is hosted at 
http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/ . 
Please see inline text.

On 11/20/13, 6:35 PM, David Holmes wrote:
> On 21/11/2013 10:28 AM, Martin Buchholz wrote:
>> I again tried and failed to reproduce a failure.  Even if I go whole hog
>> and multiply TIMEOUT by 100 and divide ITERS by 100, the test 
>> continues to
>> PASS.  Is it just me?!
>
> I think you are going the wrong way Martin - you want the timeout to 
> be smaller than the time it takes to execute ITERS.
>
>> I don't think there's any reason to make result long.  It's not even 
>> used
>> except to inhibit hotspot optimizations.
>>
>> +        private volatile long result = 17;//Can get int overflow,so 
>> using long
>
> Further the subsequent use of += is incorrect as it is not an atomic 
> operation. Even if we don't care about the value, it looks bad.
Made the necessary changes for atomic update.
>
> I'm not sure result must be updated if we get a BrokenBarrierException 
> either. Probably harmless, but necessary?
I retained it in the fix for completeness in updating the numbers, 
please let me know if you still think otherwise.
>
>> need to fix spelling and spacing below.
>>
>> +                barrier.await();//If a BrokeBarrierException happens
>> here(due to
>
> There are a number of style issues with spacing around comments.
Fixed the spelling error and styling issues.
>
> And I don't think this change is sufficient to claim co-author status 
> with Doug either ;-)
Removed the claim :)
>
> The additional tracing may be useful but seems stylistically different 
> from the rest of the code.
Retained the tracking to understand if it is again the timing issue 
which is the cause in an event of a failure, however i can remove it if 
you think it is not necessary (OR) include an alternate solution as you 
may want to suggest.
>
> Overall I'm suspicious that the changed timeout actually fixes 
> anything - normally we need to add longer timeouts not shorter ones. 
> Does this fail on a range of machines or only specific ones? Have we 
> verified that the clocks/timers are behaving properly on those systems?
Here the time out is not about waiting for threads to complete something 
but to "be interrupted" before being considered done, so we decreased 
the timeout. However we now chose to increase the number of iterations 
to 5000000 from 1000000(thanks to tristan for the suggestion) instead of 
decreasing the timeout as done earlier because the increasing iterations 
ensures the threads are busy for long time curtailing the need to touch 
the timeout.

>
> Thanks,
> David

--
Thanks
kalyan
Ph: (408)-585-8040

>
>>
>>
>>
>> On Wed, Nov 20, 2013 at 11:52 AM, srikalyan <
>> srikalyan.chandrashekar at oracle.com> wrote:
>>
>>>   Hi Martin , apologies for the delay , was trying to get help for 
>>> hosting
>>> my webrev.  .  Please see inline text.
>>>
>>>
>>> On 11/19/13, 10:35 PM, Martin Buchholz wrote:
>>>
>>> Hi Kalyan,
>>>
>>>   None of us can review your changes yet because you haven't given us a
>>> URL of your webrev.
>>>
>>> It is located here
>>> http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/ 
>>>
>>>
>>>
>>>   I've tried to make the jsr166 copy of CancelledLockLoops fail by
>>> adjusting ITERS and TIMEOUT radically up and down, but the test just 
>>> keeps
>>> on passing for me.  Hints appreciated.
>>>
>>> Bump up the timeout to 500ms and you will see a failure (i can see it
>>> consistently on my machine Linux 64bit,8GBRAM,dual cores, with JDK 1.8
>>> latest any promoted build).
>>>
>>>
>>>
>>> On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar <
>>> srikalyan.chandrashekar at oracle.com> wrote:
>>>
>>>>     Suggested Fix:
>>>>> a) Decrease the timeout from 100 to 50ms which will ensure that 
>>>>> the test
>>>>> will pass even on faster machines
>>>>>
>>>>
>>>   This doesn't look like a permanent fix - it just makes the failing 
>>> case
>>> rarer.
>>>
>>> Thats true , the other way is to make the main thread wait on TIMEOUT
>>> after firing the interrupts instead of other way round, but that 
>>> would be
>>> over-optimization which probably is not desirable as well.  The 50 
>>> ms was
>>> arrived at empirically after running several 100 times on multiple
>>> configurations and did not cause failures.
>>>
>>> -- 
>>> Thanks
>>> kalyan
>>> Ph: (408)-585-8040
>>>
>>>
>>>