RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing
roger riggs
roger.riggs at oracle.com
Mon Jun 9 15:03:13 UTC 2014
Hi Eric, Martin,
I'm fine with the re-write. I'm not sure why the re-ordering of y3 will
change the
behavior of the test but it will provide more debugging info.
Roger
On 6/6/2014 9:32 PM, Martin Buchholz wrote:
> If you don't want to go with my rewrite, you can conservatively just
> check in a 10x increase in all the constant durations and see whether
> the flakiness goes away.
>
>
> On Thu, Jun 5, 2014 at 9:46 PM, Martin Buchholz <martinrb at google.com
> <mailto:martinrb at google.com>> wrote:
>
> As with David, the cause of the failure is mystifying.
> How can things fail when we stay below the timeout value of 500ms?
> There's a bug either in Timer or my own understanding of what
> should be happening.
>
> Anyways, raising the timeout value (as I have done in my minor
> rewrite) seems prudent. Fortunately, we can write this test in a
> way that doesn't require actually waiting for the timeout to elapse.
>
>
> On Wed, Jun 4, 2014 at 1:23 PM, roger riggs
> <roger.riggs at oracle.com <mailto:roger.riggs at oracle.com>> wrote:
>
> Hi Martin, Eric,
>
> Of several hundred failures of this test, most were done in a
> JRE run with
> -Xcomp set. A few failures occurred with -Xmixed, none with
> -Xint.
>
> The printed "elapsed" times (not normalized to hardware or OS)
> range from
> 24 to 132 (ms) with most falling into several buckets in the
> 30's, 40's, 50's and 70's.
>
> I don't spot anything in the Timer.mainLoop code that might
> break when highly
> optimized but that's one possibility.
>
> Roger
>
>
>
> On 6/4/2014 3:25 PM, Martin Buchholz wrote:
>
> Tests for Timer are inherently timing (!) dependent.
> It's reasonable for tests to assume that:
> - reasonable events like creating a thread and executing a
> simple task
> should complete in less than, say 2500ms.
> - system clock will not change by a significant amount (>
> 1 sec) during the
> test. Yes, that means Timer tests are likely to fail
> during daylight
> saving time switchover - we can live with that. (we could
> even try to fix
> that, by detecting deviations between clock time and
> elapsed time, but
> probably not worth it)
>
> Can you detect any real-world unreliability in my latest
> version of the
> test, not counting daylight saving time switch?
>
> I continue to resist your efforts to "fix" the test by
> removing chances for
> the SUT code to go wrong.
>
>
> On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang
> <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>>
> wrote:
>
> Hi Martin,
>
> Thanks for explanation, now I can understand why you
> set the DELAY_MS to
> 100 seconds, it is true that it prevents failure on a
> slow host, however, i
> still have some concerns.
> Because the test tests to schedule tasks at the time
> in the past, so all
> 13 tasks should be executed immediately and finished
> within a short time.
> If set the elapsed time limitation to 50s
> (DELAY_MS/2), it seems that the
> timer have plenty of time to finish tasks, so whether
> it causes above test
> point lost.
>
> Back to the original test, i think it should be a test
> stabilization
> issue, because the original test assumes that the
> timer should be cancelled
> within < 1 second before the 14th task is called. this
> assumption may not
> be guaranteed due to 2 reasons:
> 1. if test is executed in jtreg concurrent mode on a
> slow host.
> 2. the system clock of virtual machine may not be
> accurate (maybe faster
> than physical).
>
> To support the point, i changed the test as attached
> to print the
> execution time to see whether the timer behaves
> expected as the API
> document described. the result is as expected.
>
> The unrepeated task executed immediately: [1401855509336]
> The repeated task executed immediately and repeated
> per 1 second:
> [1401855509337, 1401855510337, 1401855511338]
> The fixed-rate task executed immediately and catch up
> the delay:
> [1401855509338, 1401855509338, 1401855509338,
> 1401855509338, 1401855509338,
> 1401855509338, 1401855509338, 1401855509338,
> 1401855509338, 1401855509338,
> 1401855509338, 1401855509836, 1401855510836]
>
>
> Thanks,
> Eric
> On 2014/6/4 9:16, Martin Buchholz wrote:
>
>
>
>
> On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang
> <yiming.wang at oracle.com
> <mailto:yiming.wang at oracle.com>> wrote:
>
> Hi Martin,
>
> To sleep(1000) is not enough to reproduce the
> failure, because it is much
> lower than the period DELAY_MS (10*1000) of the
> repeated task created by
> "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".
>
> Try sleep(DELAY_MS), the failure can be reproduced.
>
> Well sure, then the task is rescheduled, so I expect
> it to fail in this
> case.
>
> But in my version I had set DELAY_MS to 100 seconds.
> The point of
> extending the DELAY_MS is to prevent flaky failure on
> a slow machine.
>
> Again, how do we know that this test hasn't found a
> Timer bug?
>
> I still can't reproduce it.
>
>
>
>
>
>
More information about the core-libs-dev
mailing list