RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing

Wed Jun 4 20:23:43 UTC 2014

Hi Martin, Eric,

Of several hundred failures of this test, most were done in a JRE run with
-Xcomp set.  A few failures occurred with -Xmixed, none with -Xint.

The printed "elapsed" times (not normalized to hardware or OS) range from
24 to 132 (ms) with most falling into several buckets in the 30's, 40's, 
50's and 70's.

I don't spot anything in the Timer.mainLoop code that might break when 
highly
optimized but that's one possibility.

Roger

On 6/4/2014 3:25 PM, Martin Buchholz wrote:
> Tests for Timer are inherently timing (!) dependent.
> It's reasonable for tests to assume that:
> - reasonable events like creating a thread and executing a simple task
> should complete in less than, say 2500ms.
> - system clock will not change by a significant amount (> 1 sec) during the
> test.  Yes, that means Timer tests are likely to fail during daylight
> saving time switchover - we can live with that. (we could even try to fix
> that, by detecting deviations between clock time and elapsed time, but
> probably not worth it)
>
> Can you detect any real-world unreliability in my latest version of the
> test, not counting daylight saving time switch?
>
> I continue to resist your efforts to "fix" the test by removing chances for
> the SUT code to go wrong.
>
>
> On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang at oracle.com> wrote:
>
>>   Hi Martin,
>>
>> Thanks for explanation, now I can understand why you set the DELAY_MS to
>> 100 seconds, it is true that it prevents failure on a slow host, however, i
>> still have some concerns.
>> Because the test tests to schedule tasks at the time in the past, so all
>> 13 tasks should be executed immediately and finished within a short time.
>> If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the
>> timer have plenty of time to finish tasks, so whether it causes above test
>> point lost.
>>
>> Back to the original test, i think it should be a test stabilization
>> issue, because the original test assumes that the timer should be cancelled
>> within < 1 second before the 14th task is called. this assumption may not
>> be guaranteed due to 2 reasons:
>> 1. if test is executed in jtreg concurrent mode on a slow host.
>> 2. the system clock of virtual machine may not be accurate (maybe faster
>> than physical).
>>
>> To support the point, i changed the test as attached to print the
>> execution time to see whether the timer behaves expected as the API
>> document described. the result is as expected.
>>
>> The unrepeated task executed immediately: [1401855509336]
>> The repeated task executed immediately and repeated per 1 second:
>> [1401855509337, 1401855510337, 1401855511338]
>> The fixed-rate task executed immediately and catch up the delay:
>> [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>> 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>> 1401855509338, 1401855509836, 1401855510836]
>>
>>
>> Thanks,
>> Eric
>> On 2014/6/4 9:16, Martin Buchholz wrote:
>>
>>
>>
>>
>> On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang at oracle.com> wrote:
>>
>>> Hi Martin,
>>>
>>> To sleep(1000) is not enough to reproduce the failure, because it is much
>>> lower than the period DELAY_MS (10*1000) of the repeated task created by
>>> "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".
>>>
>>> Try sleep(DELAY_MS), the failure can be reproduced.
>>>
>>   Well sure, then the task is rescheduled, so I expect it to fail in this
>> case.
>>
>>   But in my version I had set DELAY_MS to 100 seconds.  The point of
>> extending the DELAY_MS is to prevent flaky failure on a slow machine.
>>
>>   Again, how do we know that this test hasn't found a Timer bug?
>>
>>   I still can't reproduce it.
>>
>>
>>