RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing

Fri Jun 6 04:46:44 UTC 2014

As with David, the cause of the failure is mystifying.
How can things fail when we stay below the timeout value of 500ms?
There's a bug either in Timer or my own understanding of what should be
happening.

Anyways, raising the timeout value (as I have done in my minor rewrite)
seems prudent.  Fortunately, we can write this test in a way that doesn't
require actually waiting for the timeout to elapse.

On Wed, Jun 4, 2014 at 1:23 PM, roger riggs <roger.riggs at oracle.com> wrote:

> Hi Martin, Eric,
>
> Of several hundred failures of this test, most were done in a JRE run with
> -Xcomp set.  A few failures occurred with -Xmixed, none with -Xint.
>
> The printed "elapsed" times (not normalized to hardware or OS) range from
> 24 to 132 (ms) with most falling into several buckets in the 30's, 40's,
> 50's and 70's.
>
> I don't spot anything in the Timer.mainLoop code that might break when
> highly
> optimized but that's one possibility.
>
> Roger
>
>
>
> On 6/4/2014 3:25 PM, Martin Buchholz wrote:
>
>> Tests for Timer are inherently timing (!) dependent.
>> It's reasonable for tests to assume that:
>> - reasonable events like creating a thread and executing a simple task
>> should complete in less than, say 2500ms.
>> - system clock will not change by a significant amount (> 1 sec) during
>> the
>> test.  Yes, that means Timer tests are likely to fail during daylight
>> saving time switchover - we can live with that. (we could even try to fix
>> that, by detecting deviations between clock time and elapsed time, but
>> probably not worth it)
>>
>> Can you detect any real-world unreliability in my latest version of the
>> test, not counting daylight saving time switch?
>>
>> I continue to resist your efforts to "fix" the test by removing chances
>> for
>> the SUT code to go wrong.
>>
>>
>> On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang at oracle.com>
>> wrote:
>>
>>    Hi Martin,
>>>
>>> Thanks for explanation, now I can understand why you set the DELAY_MS to
>>> 100 seconds, it is true that it prevents failure on a slow host,
>>> however, i
>>> still have some concerns.
>>> Because the test tests to schedule tasks at the time in the past, so all
>>> 13 tasks should be executed immediately and finished within a short time.
>>> If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the
>>> timer have plenty of time to finish tasks, so whether it causes above
>>> test
>>> point lost.
>>>
>>> Back to the original test, i think it should be a test stabilization
>>> issue, because the original test assumes that the timer should be
>>> cancelled
>>> within < 1 second before the 14th task is called. this assumption may not
>>> be guaranteed due to 2 reasons:
>>> 1. if test is executed in jtreg concurrent mode on a slow host.
>>> 2. the system clock of virtual machine may not be accurate (maybe faster
>>> than physical).
>>>
>>> To support the point, i changed the test as attached to print the
>>> execution time to see whether the timer behaves expected as the API
>>> document described. the result is as expected.
>>>
>>> The unrepeated task executed immediately: [1401855509336]
>>> The repeated task executed immediately and repeated per 1 second:
>>> [1401855509337, 1401855510337, 1401855511338]
>>> The fixed-rate task executed immediately and catch up the delay:
>>> [1401855509338, 1401855509338, 1401855509338, 1401855509338,
>>> 1401855509338,
>>> 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>>> 1401855509338,
>>> 1401855509338, 1401855509836, 1401855510836]
>>>
>>>
>>> Thanks,
>>> Eric
>>> On 2014/6/4 9:16, Martin Buchholz wrote:
>>>
>>>
>>>
>>>
>>> On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang at oracle.com>
>>> wrote:
>>>
>>>  Hi Martin,
>>>>
>>>> To sleep(1000) is not enough to reproduce the failure, because it is
>>>> much
>>>> lower than the period DELAY_MS (10*1000) of the repeated task created by
>>>> "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".
>>>>
>>>> Try sleep(DELAY_MS), the failure can be reproduced.
>>>>
>>>>    Well sure, then the task is rescheduled, so I expect it to fail in
>>> this
>>> case.
>>>
>>>   But in my version I had set DELAY_MS to 100 seconds.  The point of
>>> extending the DELAY_MS is to prevent flaky failure on a slow machine.
>>>
>>>   Again, how do we know that this test hasn't found a Timer bug?
>>>
>>>   I still can't reproduce it.
>>>
>>>
>>>
>>>
>