RFR for JDK-6772009 Intermittent test failure: java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java test failed with 'Completed != 2'

Martin Buchholz martinrb at google.com
Tue Jan 7 16:55:44 UTC 2014


Sorry, I am not volunteering to be a sponsor at this time.  Chris or David?


On Tue, Jan 7, 2014 at 5:50 AM, Sandeep Konchady <
sandeep.konchady at oracle.com> wrote:

> Hi David/Martin,
>
> If you agree with Kalyan's fix for this issue, could one of you please
> sponsor the push.
>
> Thanks,
> Sandeep
>
> On Dec 23, 2013, at 11:17 AM, srikalyan chandrashekar <
> srikalyan.chandrashekar at oracle.com> wrote:
>
> > Hi David/Martin, could any one of you sponsor this change for me?
> >
> > ---
> > Thanks
> > kalyan
> >
> > On 12/20/2013 10:28 PM, David Holmes wrote:
> >> On 21/12/2013 4:19 AM, srikalyan wrote:
> >>> Hi David, i retained only the changes to ITERS, ProbleMList.txt and
> >>> upstream changes by Doug Lea(as pointed by Martin), could you please
> >>> review the new change available here
> >>>
> http://cr.openjdk.java.net/~srikchan/Regression/6772009-CancelledLockLoop-webrev/
> >>> .
> >>
> >> Ok.
> >>
> >> Thanks,
> >> David
> >>
> >>> --
> >>> Thanks
> >>> kalyan
> >>> Ph: (408)-585-8040
> >>>
> >>>
> >>> On 12/19/13, 8:10 PM, David Holmes wrote:
> >>>> Sorry Kalyan but I don't see the need for all the incidental changes
> >>>> if the primary change is to just increase the iterations. I also don't
> >>>> see why you need to do anything for BrokenBarrierException as it is
> >>>> not expected to happen and the test should just fail if it does.
> >>>>
> >>>> David
> >>>>
> >>>> On 10/12/2013 6:15 AM, srikalyan wrote:
> >>>>> Hi David/Martin a gentle reminder for review.
> >>>>>
> >>>>> --
> >>>>> Thanks
> >>>>> kalyan
> >>>>> Ph: (408)-585-8040
> >>>>>
> >>>>>
> >>>>> On 12/2/13, 11:21 AM, srikalyan wrote:
> >>>>>> Hi David, Thanks for the review, the new webrev is hosted at
> >>>>>>
> http://cr.openjdk.java.net/~cl/host_for_kal/6772009-CancelledLockLoop/
> >>>>>> . Please see inline text.
> >>>>>>
> >>>>>> On 11/20/13, 6:35 PM, David Holmes wrote:
> >>>>>>> On 21/11/2013 10:28 AM, Martin Buchholz wrote:
> >>>>>>>> I again tried and failed to reproduce a failure.  Even if I go
> whole
> >>>>>>>> hog
> >>>>>>>> and multiply TIMEOUT by 100 and divide ITERS by 100, the test
> >>>>>>>> continues to
> >>>>>>>> PASS.  Is it just me?!
> >>>>>>>
> >>>>>>> I think you are going the wrong way Martin - you want the timeout
> to
> >>>>>>> be smaller than the time it takes to execute ITERS.
> >>>>>>>
> >>>>>>>> I don't think there's any reason to make result long.  It's not
> even
> >>>>>>>> used
> >>>>>>>> except to inhibit hotspot optimizations.
> >>>>>>>>
> >>>>>>>> +        private volatile long result = 17;//Can get int
> overflow,so
> >>>>>>>> using long
> >>>>>>>
> >>>>>>> Further the subsequent use of += is incorrect as it is not an
> atomic
> >>>>>>> operation. Even if we don't care about the value, it looks bad.
> >>>>>> Made the necessary changes for atomic update.
> >>>>>>>
> >>>>>>> I'm not sure result must be updated if we get a
> >>>>>>> BrokenBarrierException either. Probably harmless, but necessary?
> >>>>>> I retained it in the fix for completeness in updating the numbers,
> >>>>>> please let me know if you still think otherwise.
> >>>>>>>
> >>>>>>>> need to fix spelling and spacing below.
> >>>>>>>>
> >>>>>>>> +                barrier.await();//If a BrokeBarrierException
> happens
> >>>>>>>> here(due to
> >>>>>>>
> >>>>>>> There are a number of style issues with spacing around comments.
> >>>>>> Fixed the spelling error and styling issues.
> >>>>>>>
> >>>>>>> And I don't think this change is sufficient to claim co-author
> status
> >>>>>>> with Doug either ;-)
> >>>>>> Removed the claim :)
> >>>>>>>
> >>>>>>> The additional tracing may be useful but seems stylistically
> >>>>>>> different from the rest of the code.
> >>>>>> Retained the tracking to understand if it is again the timing issue
> >>>>>> which is the cause in an event of a failure, however i can remove it
> >>>>>> if you think it is not necessary (OR) include an alternate solution
> as
> >>>>>> you may want to suggest.
> >>>>>>>
> >>>>>>> Overall I'm suspicious that the changed timeout actually fixes
> >>>>>>> anything - normally we need to add longer timeouts not shorter
> ones.
> >>>>>>> Does this fail on a range of machines or only specific ones? Have
> we
> >>>>>>> verified that the clocks/timers are behaving properly on those
> >>>>>>> systems?
> >>>>>> Here the time out is not about waiting for threads to complete
> >>>>>> something but to "be interrupted" before being considered done, so
> we
> >>>>>> decreased the timeout. However we now chose to increase the number
> of
> >>>>>> iterations to 5000000 from 1000000(thanks to tristan for the
> >>>>>> suggestion) instead of decreasing the timeout as done earlier
> because
> >>>>>> the increasing iterations ensures the threads are busy for long time
> >>>>>> curtailing the need to touch the timeout.
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> David
> >>>>>>
> >>>>>> --
> >>>>>> Thanks
> >>>>>> kalyan
> >>>>>> Ph: (408)-585-8040
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Nov 20, 2013 at 11:52 AM, srikalyan <
> >>>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
> >>>>>>>>
> >>>>>>>>>  Hi Martin , apologies for the delay , was trying to get help for
> >>>>>>>>> hosting
> >>>>>>>>> my webrev.  .  Please see inline text.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 11/19/13, 10:35 PM, Martin Buchholz wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Kalyan,
> >>>>>>>>>
> >>>>>>>>>  None of us can review your changes yet because you haven't given
> >>>>>>>>> us a
> >>>>>>>>> URL of your webrev.
> >>>>>>>>>
> >>>>>>>>> It is located here
> >>>>>>>>>
> http://cr.openjdk.java.net/~cl/host_for_srikalyan_6772009_CancelledLockLoops/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  I've tried to make the jsr166 copy of CancelledLockLoops fail by
> >>>>>>>>> adjusting ITERS and TIMEOUT radically up and down, but the test
> >>>>>>>>> just keeps
> >>>>>>>>> on passing for me.  Hints appreciated.
> >>>>>>>>>
> >>>>>>>>> Bump up the timeout to 500ms and you will see a failure (i can
> >>>>>>>>> see it
> >>>>>>>>> consistently on my machine Linux 64bit,8GBRAM,dual cores, with
> >>>>>>>>> JDK 1.8
> >>>>>>>>> latest any promoted build).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Nov 19, 2013 at 6:39 PM, srikalyan chandrashekar <
> >>>>>>>>> srikalyan.chandrashekar at oracle.com> wrote:
> >>>>>>>>>
> >>>>>>>>>>    Suggested Fix:
> >>>>>>>>>>> a) Decrease the timeout from 100 to 50ms which will ensure that
> >>>>>>>>>>> the test
> >>>>>>>>>>> will pass even on faster machines
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>  This doesn't look like a permanent fix - it just makes the
> >>>>>>>>> failing case
> >>>>>>>>> rarer.
> >>>>>>>>>
> >>>>>>>>> Thats true , the other way is to make the main thread wait on
> >>>>>>>>> TIMEOUT
> >>>>>>>>> after firing the interrupts instead of other way round, but that
> >>>>>>>>> would be
> >>>>>>>>> over-optimization which probably is not desirable as well.  The
> 50
> >>>>>>>>> ms was
> >>>>>>>>> arrived at empirically after running several 100 times on
> multiple
> >>>>>>>>> configurations and did not cause failures.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Thanks
> >>>>>>>>> kalyan
> >>>>>>>>> Ph: (408)-585-8040
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >
>
>



More information about the core-libs-dev mailing list