RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing

Fri Jun 6 04:32:34 UTC 2014

On 5/06/2014 5:25 AM, Martin Buchholz wrote:
> Tests for Timer are inherently timing (!) dependent.
> It's reasonable for tests to assume that:
> - reasonable events like creating a thread and executing a simple task
> should complete in less than, say 2500ms.

Not necessarily with the wrong combination of Xcomp, fastdebug and 
low-power/slow devices. :(

> - system clock will not change by a significant amount (> 1 sec) during the
> test.  Yes, that means Timer tests are likely to fail during daylight
> saving time switchover - we can live with that. (we could even try to fix
> that, by detecting deviations between clock time and elapsed time, but
> probably not worth it)

Virtual environments have notoriously bad time keeping.

> Can you detect any real-world unreliability in my latest version of the
> test, not counting daylight saving time switch?
>
> I continue to resist your efforts to "fix" the test by removing chances for
> the SUT code to go wrong.

I haven't been tracking this one and although I just had a good look at 
the bug report and the patch I'm not getting a good handle on what is 
failing.

Trying to write timing dependent tests that actually show that something 
took too long is very difficult given the range of test conditions that 
have to be accounted for. We've raised lots of timeouts when tests fail 
just to get them to stop failing. Rarely is consideration given as to 
whether the failure was reasonable ie that something really shouldn't 
have taken as long as it did. Unfortunately it's nearly impossible to 
answer this as we can't determine where the time actually went.

David
-----

>
> On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang at oracle.com> wrote:
>
>>   Hi Martin,
>>
>> Thanks for explanation, now I can understand why you set the DELAY_MS to
>> 100 seconds, it is true that it prevents failure on a slow host, however, i
>> still have some concerns.
>> Because the test tests to schedule tasks at the time in the past, so all
>> 13 tasks should be executed immediately and finished within a short time.
>> If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the
>> timer have plenty of time to finish tasks, so whether it causes above test
>> point lost.
>>
>> Back to the original test, i think it should be a test stabilization
>> issue, because the original test assumes that the timer should be cancelled
>> within < 1 second before the 14th task is called. this assumption may not
>> be guaranteed due to 2 reasons:
>> 1. if test is executed in jtreg concurrent mode on a slow host.
>> 2. the system clock of virtual machine may not be accurate (maybe faster
>> than physical).
>>
>> To support the point, i changed the test as attached to print the
>> execution time to see whether the timer behaves expected as the API
>> document described. the result is as expected.
>>
>> The unrepeated task executed immediately: [1401855509336]
>> The repeated task executed immediately and repeated per 1 second:
>> [1401855509337, 1401855510337, 1401855511338]
>> The fixed-rate task executed immediately and catch up the delay:
>> [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>> 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>> 1401855509338, 1401855509836, 1401855510836]
>>
>>
>> Thanks,
>> Eric
>> On 2014/6/4 9:16, Martin Buchholz wrote:
>>
>>
>>
>>
>> On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang at oracle.com> wrote:
>>
>>> Hi Martin,
>>>
>>> To sleep(1000) is not enough to reproduce the failure, because it is much
>>> lower than the period DELAY_MS (10*1000) of the repeated task created by
>>> "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".
>>>
>>> Try sleep(DELAY_MS), the failure can be reproduced.
>>>
>>
>>   Well sure, then the task is rescheduled, so I expect it to fail in this
>> case.
>>
>>   But in my version I had set DELAY_MS to 100 seconds.  The point of
>> extending the DELAY_MS is to prevent flaky failure on a slow machine.
>>
>>   Again, how do we know that this test hasn't found a Timer bug?
>>
>>   I still can't reproduce it.
>>
>>
>>