RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing

older
RFR: 8046085: (s) inserting null...

Eric Wang

30 May 2014 30 May '14

7:27 a.m.

Hi, Please help to review the fix for bug https://bugs.openjdk.java.net/browse/JDK-8004807 as below: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.00/ The root cause of the failure is if Timer.cancel() is not called in time, the recurring timer task keeps running which cause the the assertion "check(latch.getCount() > 0)" failed. The fix includes: 1. As the purpose of the assertion "check(latch.getCount() > 0)" in timer task is used to check if the unrepeatable task is executed repeatedly and this assertion is meaningless for a repeatable task. it can be replaced by other way, see the webrev. 2. The timer thread should be terminated ASAP once all tasks are finished to avoid time spent on thread switch. 3. The test should be executed in othervm mode to avoid time spent on thread switch. Thanks, Eric

Show replies by date

Martin Buchholz

30 May 30 May

8:26 p.m.

HI Eric, Thanks for working on my old dusty tests. I can't view the bug report - I get Permission Violation. It does look like you're making the test more robust, but at the cost of throwing away much of what the test is testing, which is not so good. This test hardcodes 1 second as the period - it assumes all 13 task invocations can be performed in 500ms, which may be a little over-optimistic. Nevertheless, this failure should be very rare, eh? But bump the period to a much higher number and we should be OK. I think making this an othervm test is overkill. Better to make the test itself more robust. I propose instead this major rewrite: public class Args { static final long DELAY_MS = 100 * 1000L; void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); } void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); } void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); } TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { check(latch.getCount() > 0); latch.countDown(); }}; } TimerTask nop() { return new TimerTask() { public void run() { }}; } void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); } // Attempts to schedule tasks on a cancelled Timer result in ISE. final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); } void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)), () -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42), () -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42)); assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null), () -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42), () -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42)); final CountDownLatch y1 = new CountDownLatch(1); final CountDownLatch y2 = new CountDownLatch(1); final CountDownLatch y3 = new CountDownLatch(11); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2)); schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); y3.await(); y1.await(); y2.await(); final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); } //--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} } On Fri, May 30, 2014 at 12:27 AM, Eric Wang <yiming.wang@oracle.com> wrote:

...

Hi,

Please help to review the fix for bug https://bugs.openjdk.java.net/ browse/JDK-8004807 as below: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.00/

The root cause of the failure is if Timer.cancel() is not called in time, the recurring timer task keeps running which cause the the assertion "check(latch.getCount() > 0)" failed.

The fix includes: 1. As the purpose of the assertion "check(latch.getCount() > 0)" in timer task is used to check if the unrepeatable task is executed repeatedly and this assertion is meaningless for a repeatable task. it can be replaced by other way, see the webrev. 2. The timer thread should be terminated ASAP once all tasks are finished to avoid time spent on thread switch. 3. The test should be executed in othervm mode to avoid time spent on thread switch.

Thanks, Eric

Eric Wang

3 Jun 3 Jun

1:51 p.m.

Hi Martin, Glad to know you are the original author and thank you to rewrite the test using modern way. I have some concerns about the limit of elapsed time (DELAY_MS / 2 = 50000ms) which is much higher than the old value (500ms). The reason is that i checked the nightly failure histories, there are dozens of failures which are caused by exception thrown at the call "check(latch.getCount() > 0)" in the run method of TimerTask. it is actually caused by the the task scheduled with fixed rate is not cancelled after latch reaches zero. None of the test failed due to the elapsed time larger than 500 ms. Sorry that I should paste the failure to clarify the problem: http://cr.openjdk.java.net/~ewang/JDK-8004807/failure.txt So are you OK that i adopt your test with 2 small changes: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.01/ 1. keep the limit of elapsed time 500ms as it was. 2. cancel the task when latch reached zero. Thanks, Eric On 2014/5/31 4:26, Martin Buchholz wrote:

...

HI Eric,

Thanks for working on my old dusty tests.

I can't view the bug report - I get Permission Violation.

It does look like you're making the test more robust, but at the cost of throwing away much of what the test is testing, which is not so good.

This test hardcodes 1 second as the period - it assumes all 13 task invocations can be performed in 500ms, which may be a little over-optimistic. Nevertheless, this failure should be very rare, eh? But bump the period to a much higher number and we should be OK.

I think making this an othervm test is overkill. Better to make the test itself more robust.

I propose instead this major rewrite:

public class Args { static final long DELAY_MS = 100 * 1000L;

void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); }

void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); }

void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); }

TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { check(latch.getCount() > 0); latch.countDown(); }}; }

TimerTask nop() { return new TimerTask() { public void run() { }}; }

void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); }

// Attempts to schedule tasks on a cancelled Timer result in ISE.

final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); }

void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)),

() -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42),

() -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42));

assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null),

() -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42),

() -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42));

final CountDownLatch y1 = new CountDownLatch(1); final CountDownLatch y2 = new CountDownLatch(1); final CountDownLatch y3 = new CountDownLatch(11); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2));

schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); y3.await(); y1.await(); y2.await();

final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); }

//--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} }

On Fri, May 30, 2014 at 12:27 AM, Eric Wang <yiming.wang@oracle.com <mailto:yiming.wang@oracle.com>> wrote:

Hi,

Please help to review the fix for bug https://bugs.openjdk.java.net/browse/JDK-8004807 as below: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.00/ <http://cr.openjdk.java.net/%7Eewang/JDK-8004807/webrev.00/>

The root cause of the failure is if Timer.cancel() is not called in time, the recurring timer task keeps running which cause the the assertion "check(latch.getCount() > 0)" failed.

The fix includes: 1. As the purpose of the assertion "check(latch.getCount() > 0)" in timer task is used to check if the unrepeatable task is executed repeatedly and this assertion is meaningless for a repeatable task. it can be replaced by other way, see the webrev. 2. The timer thread should be terminated ASAP once all tasks are finished to avoid time spent on thread switch. 3. The test should be executed in othervm mode to avoid time spent on thread switch.

Thanks, Eric

Martin Buchholz

5:48 p.m.

We can compromise and set the DELAY_MS to 10 seconds instead of 100 seconds. Note that the test is supposed to run quickly and should never actually hit DELAY_MS, so making it large is not a problem. I continue to not understand why you would need to cancel the task when the count reaches zero. Checking that the tasks are run the expected number of times, and no more .... ===> IS THE WHOLE POINT OF THIS TEST. <=== Maybe there's actually a bug in Timer that this test has discovered. Or perhaps there's a misunderstanding on my part about how many task scheduling is supposed to work. Please explain. On Tue, Jun 3, 2014 at 6:51 AM, Eric Wang <yiming.wang@oracle.com> wrote:

...

Hi Martin,

Glad to know you are the original author and thank you to rewrite the test using modern way.

I have some concerns about the limit of elapsed time (DELAY_MS / 2 = 50000ms) which is much higher than the old value (500ms). The reason is that i checked the nightly failure histories, there are dozens of failures which are caused by exception thrown at the call "check(latch.getCount() > 0)" in the run method of TimerTask. it is actually caused by the the task scheduled with fixed rate is not cancelled after latch reaches zero. None of the test failed due to the elapsed time larger than 500 ms.

Sorry that I should paste the failure to clarify the problem: http://cr.openjdk.java.net/~ewang/JDK-8004807/failure.txt

So are you OK that i adopt your test with 2 small changes: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.01/ 1. keep the limit of elapsed time 500ms as it was. 2. cancel the task when latch reached zero.

Thanks, Eric

On 2014/5/31 4:26, Martin Buchholz wrote:

HI Eric,

Thanks for working on my old dusty tests.

I can't view the bug report - I get Permission Violation.

It does look like you're making the test more robust, but at the cost of throwing away much of what the test is testing, which is not so good.

This test hardcodes 1 second as the period - it assumes all 13 task invocations can be performed in 500ms, which may be a little over-optimistic. Nevertheless, this failure should be very rare, eh? But bump the period to a much higher number and we should be OK.

I think making this an othervm test is overkill. Better to make the test itself more robust.

I propose instead this major rewrite:

public class Args { static final long DELAY_MS = 100 * 1000L;

void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); }

void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); }

void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); }

TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { check(latch.getCount() > 0); latch.countDown(); }}; }

TimerTask nop() { return new TimerTask() { public void run() { }}; }

void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); }

// Attempts to schedule tasks on a cancelled Timer result in ISE.

final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); }

void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)),

() -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42),

() -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42));

assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null),

() -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42),

() -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42));

final CountDownLatch y1 = new CountDownLatch(1); final CountDownLatch y2 = new CountDownLatch(1); final CountDownLatch y3 = new CountDownLatch(11); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2));

schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); y3.await(); y1.await(); y2.await();

final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); }

//--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} }

On Fri, May 30, 2014 at 12:27 AM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi,

Please help to review the fix for bug https://bugs.openjdk.java.net/browse/JDK-8004807 as below: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.00/

The root cause of the failure is if Timer.cancel() is not called in time, the recurring timer task keeps running which cause the the assertion "check(latch.getCount() > 0)" failed.

The fix includes: 1. As the purpose of the assertion "check(latch.getCount() > 0)" in timer task is used to check if the unrepeatable task is executed repeatedly and this assertion is meaningless for a repeatable task. it can be replaced by other way, see the webrev. 2. The timer thread should be terminated ASAP once all tasks are finished to avoid time spent on thread switch. 3. The test should be executed in othervm mode to avoid time spent on thread switch.

Thanks, Eric

roger riggs

6:12 p.m.

Hi Martin, Is it possible that the main thread takes too long to go from y3.await() to the t.cancel() that the fixed rate timer (y3) gets called again? It seems unlikely, but with the printf, its not just compute time. Canceling the timer before the printf would likely reduce the failures. Another alternative is to increase the period to allow more time for the main thread to respond. From the error, its not clear which timer was called an extra time. Roger On 6/3/2014 1:48 PM, Martin Buchholz wrote:

...

We can compromise and set the DELAY_MS to 10 seconds instead of 100 seconds. Note that the test is supposed to run quickly and should never actually hit DELAY_MS, so making it large is not a problem.

I continue to not understand why you would need to cancel the task when the count reaches zero. Checking that the tasks are run the expected number of times, and no more ....

===> IS THE WHOLE POINT OF THIS TEST. <===

Maybe there's actually a bug in Timer that this test has discovered. Or perhaps there's a misunderstanding on my part about how many task scheduling is supposed to work. Please explain.

On Tue, Jun 3, 2014 at 6:51 AM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi Martin,

Glad to know you are the original author and thank you to rewrite the test using modern way.

I have some concerns about the limit of elapsed time (DELAY_MS / 2 = 50000ms) which is much higher than the old value (500ms). The reason is that i checked the nightly failure histories, there are dozens of failures which are caused by exception thrown at the call "check(latch.getCount() > 0)" in the run method of TimerTask. it is actually caused by the the task scheduled with fixed rate is not cancelled after latch reaches zero. None of the test failed due to the elapsed time larger than 500 ms.

Sorry that I should paste the failure to clarify the problem: http://cr.openjdk.java.net/~ewang/JDK-8004807/failure.txt

So are you OK that i adopt your test with 2 small changes: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.01/ 1. keep the limit of elapsed time 500ms as it was. 2. cancel the task when latch reached zero.

Thanks, Eric

On 2014/5/31 4:26, Martin Buchholz wrote:

HI Eric,

Thanks for working on my old dusty tests.

I can't view the bug report - I get Permission Violation.

It does look like you're making the test more robust, but at the cost of throwing away much of what the test is testing, which is not so good.

This test hardcodes 1 second as the period - it assumes all 13 task invocations can be performed in 500ms, which may be a little over-optimistic. Nevertheless, this failure should be very rare, eh? But bump the period to a much higher number and we should be OK.

I think making this an othervm test is overkill. Better to make the test itself more robust.

I propose instead this major rewrite:

public class Args { static final long DELAY_MS = 100 * 1000L;

void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); }

void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); }

void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); }

TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { check(latch.getCount() > 0); latch.countDown(); }}; }

TimerTask nop() { return new TimerTask() { public void run() { }}; }

void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); }

// Attempts to schedule tasks on a cancelled Timer result in ISE.

final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); }

void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)),

() -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42),

() -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42));

assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null),

() -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42),

() -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42));

final CountDownLatch y1 = new CountDownLatch(1); final CountDownLatch y2 = new CountDownLatch(1); final CountDownLatch y3 = new CountDownLatch(11); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2));

schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); y3.await(); y1.await(); y2.await();

final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); }

//--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} }

On Fri, May 30, 2014 at 12:27 AM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi,

Please help to review the fix for bug https://bugs.openjdk.java.net/browse/JDK-8004807 as below: http://cr.openjdk.java.net/~ewang/JDK-8004807/webrev.00/

The root cause of the failure is if Timer.cancel() is not called in time, the recurring timer task keeps running which cause the the assertion "check(latch.getCount() > 0)" failed.

The fix includes: 1. As the purpose of the assertion "check(latch.getCount() > 0)" in timer task is used to check if the unrepeatable task is executed repeatedly and this assertion is meaningless for a repeatable task. it can be replaced by other way, see the webrev. 2. The timer thread should be terminated ASAP once all tasks are finished to avoid time spent on thread switch. 3. The test should be executed in othervm mode to avoid time spent on thread switch.

Thanks, Eric

Martin Buchholz

4 Jun 4 Jun

12:33 a.m.

On Tue, Jun 3, 2014 at 11:12 AM, roger riggs <roger.riggs@oracle.com> wrote:

...

Hi Martin,

Is it possible that the main thread takes too long to go from y3.await() to the t.cancel() that the fixed rate timer (y3) gets called again?

If that was true, adding a sleep(1000) after the awaits in my version would cause consistent failures, but I don't see any.

...

It seems unlikely, but with the printf, its not just compute time. Canceling the timer before the printf would likely reduce the failures.

Again, we need to understand the failures independently. I have never seen this test fail. Is there some environmental thing?

...

Another alternative is to increase the period to allow more time for the main thread to respond.

I agree that this would make the test more robust, and that's one thing my version does.

...

From the error, its not clear which timer was called an extra time.

True. My updated version tries to fix that problem. Hopefully you can report which task is being scheduled too often. import java.util.*; import java.util.concurrent.*; public class Args { static final long DELAY_MS = 10 * 1000L; void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); } void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); } void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); } TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { if (latch.getCount() == 0) fail(String.format("Latch counted down too many times: " + latch)); latch.countDown(); }}; } TimerTask nop() { return new TimerTask() { public void run() { }}; } void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); } // Attempts to schedule tasks on a cancelled Timer result in ISE. final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); } void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)), () -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42), () -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42)); assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null), () -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42), () -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42)); // Create local classes for clearer diagnostics in case of failure class OneShotLatch extends CountDownLatch { OneShotLatch() { super(1); } } class FixedRateLatch extends CountDownLatch { FixedRateLatch() { super(11); } } class FixedDelayLatch extends CountDownLatch { FixedDelayLatch() { super(1); } } final CountDownLatch y1 = new OneShotLatch(); final CountDownLatch y2 = new FixedDelayLatch(); final CountDownLatch y3 = new FixedRateLatch(); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2)); schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); y1.await(); y2.await(); y3.await(); final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); } //--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} }

Eric Wang

1:12 a.m.

Hi Martin, To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)". Try sleep(DELAY_MS), the failure can be reproduced. Thanks, Eric On 2014/6/4 8:33, Martin Buchholz wrote:

...

On Tue, Jun 3, 2014 at 11:12 AM, roger riggs <roger.riggs@oracle.com> wrote:

...
Hi Martin,

Is it possible that the main thread takes too long to go from y3.await() to the t.cancel() that the fixed rate timer (y3) gets called again?

If that was true, adding a sleep(1000) after the awaits in my version would cause consistent failures, but I don't see any.

...
It seems unlikely, but with the printf, its not just compute time. Canceling the timer before the printf would likely reduce the failures.

Again, we need to understand the failures independently. I have never seen this test fail. Is there some environmental thing?

...
Another alternative is to increase the period to allow more time for the main thread to respond.

I agree that this would make the test more robust, and that's one thing my version does.

...
From the error, its not clear which timer was called an extra time.

True. My updated version tries to fix that problem. Hopefully you can report which task is being scheduled too often.

import java.util.*; import java.util.concurrent.*;

public class Args { static final long DELAY_MS = 10 * 1000L;

void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); }

void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); }

void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); }

TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { if (latch.getCount() == 0) fail(String.format("Latch counted down too many times: " + latch)); latch.countDown(); }}; }

TimerTask nop() { return new TimerTask() { public void run() { }}; }

void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); }

// Attempts to schedule tasks on a cancelled Timer result in ISE.

final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); }

void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)),

() -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42),

() -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42));

assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null),

() -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42),

() -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42));

// Create local classes for clearer diagnostics in case of failure class OneShotLatch extends CountDownLatch { OneShotLatch() { super(1); } } class FixedRateLatch extends CountDownLatch { FixedRateLatch() { super(11); } } class FixedDelayLatch extends CountDownLatch { FixedDelayLatch() { super(1); } } final CountDownLatch y1 = new OneShotLatch(); final CountDownLatch y2 = new FixedDelayLatch(); final CountDownLatch y3 = new FixedRateLatch(); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2));

schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); y1.await(); y2.await(); y3.await();

final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); }

//--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} }

Martin Buchholz

1:16 a.m.

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...

Hi Martin,

To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in this case. But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine. Again, how do we know that this test hasn't found a Timer bug? I still can't reproduce it.

Eric Wang

6:28 a.m.

Hi Martin, Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost. Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical). To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected. The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836] Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

...

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com <mailto:yiming.wang@oracle.com>> wrote:

Hi Martin,

To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

Martin Buchholz

7:25 p.m.

Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms. - system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it) Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch? I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong. On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...

Hi Martin,

Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost.

Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical).

To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected.

The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836]

Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi Martin,

To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

roger riggs

8:23 p.m.

Hi Martin, Eric, Of several hundred failures of this test, most were done in a JRE run with -Xcomp set. A few failures occurred with -Xmixed, none with -Xint. The printed "elapsed" times (not normalized to hardware or OS) range from 24 to 132 (ms) with most falling into several buckets in the 30's, 40's, 50's and 70's. I don't spot anything in the Timer.mainLoop code that might break when highly optimized but that's one possibility. Roger On 6/4/2014 3:25 PM, Martin Buchholz wrote:

...

Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms. - system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it)

Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch?

I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong.

On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi Martin,

Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost.

Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical).

To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected.

The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836]

Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi Martin,

To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

Martin Buchholz

6 Jun 6 Jun

4:46 a.m.

As with David, the cause of the failure is mystifying. How can things fail when we stay below the timeout value of 500ms? There's a bug either in Timer or my own understanding of what should be happening. Anyways, raising the timeout value (as I have done in my minor rewrite) seems prudent. Fortunately, we can write this test in a way that doesn't require actually waiting for the timeout to elapse. On Wed, Jun 4, 2014 at 1:23 PM, roger riggs <roger.riggs@oracle.com> wrote:

...

Hi Martin, Eric,

Of several hundred failures of this test, most were done in a JRE run with -Xcomp set. A few failures occurred with -Xmixed, none with -Xint.

The printed "elapsed" times (not normalized to hardware or OS) range from 24 to 132 (ms) with most falling into several buckets in the 30's, 40's, 50's and 70's.

I don't spot anything in the Timer.mainLoop code that might break when highly optimized but that's one possibility.

Roger

On 6/4/2014 3:25 PM, Martin Buchholz wrote:

...
Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms. - system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it)

Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch?

I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong.

On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang@oracle.com> wrote:

Hi Martin,

...
Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost.

Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical).

To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected.

The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836]

Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com> wrote:

Hi Martin,

...
To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in

this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

Martin Buchholz

7 Jun 7 Jun

1:32 a.m.

If you don't want to go with my rewrite, you can conservatively just check in a 10x increase in all the constant durations and see whether the flakiness goes away. On Thu, Jun 5, 2014 at 9:46 PM, Martin Buchholz <martinrb@google.com> wrote:

...

As with David, the cause of the failure is mystifying. How can things fail when we stay below the timeout value of 500ms? There's a bug either in Timer or my own understanding of what should be happening.

Anyways, raising the timeout value (as I have done in my minor rewrite) seems prudent. Fortunately, we can write this test in a way that doesn't require actually waiting for the timeout to elapse.

On Wed, Jun 4, 2014 at 1:23 PM, roger riggs <roger.riggs@oracle.com> wrote:

...
Hi Martin, Eric,

Of several hundred failures of this test, most were done in a JRE run with -Xcomp set. A few failures occurred with -Xmixed, none with -Xint.

The printed "elapsed" times (not normalized to hardware or OS) range from 24 to 132 (ms) with most falling into several buckets in the 30's, 40's, 50's and 70's.

I don't spot anything in the Timer.mainLoop code that might break when highly optimized but that's one possibility.

Roger

On 6/4/2014 3:25 PM, Martin Buchholz wrote:

...
Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms. - system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it)

Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch?

I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong.

On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang@oracle.com> wrote:

Hi Martin,

...
Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost.

Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical).

To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected.

The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836]

Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com> wrote:

Hi Martin,

...
To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in

this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

roger riggs

9 Jun 9 Jun

3:03 p.m.

Hi Eric, Martin, I'm fine with the re-write. I'm not sure why the re-ordering of y3 will change the behavior of the test but it will provide more debugging info. Roger On 6/6/2014 9:32 PM, Martin Buchholz wrote:

...

If you don't want to go with my rewrite, you can conservatively just check in a 10x increase in all the constant durations and see whether the flakiness goes away.

On Thu, Jun 5, 2014 at 9:46 PM, Martin Buchholz <martinrb@google.com <mailto:martinrb@google.com>> wrote:

As with David, the cause of the failure is mystifying. How can things fail when we stay below the timeout value of 500ms? There's a bug either in Timer or my own understanding of what should be happening.

Anyways, raising the timeout value (as I have done in my minor rewrite) seems prudent. Fortunately, we can write this test in a way that doesn't require actually waiting for the timeout to elapse.

On Wed, Jun 4, 2014 at 1:23 PM, roger riggs <roger.riggs@oracle.com <mailto:roger.riggs@oracle.com>> wrote:

Hi Martin, Eric,

Of several hundred failures of this test, most were done in a JRE run with -Xcomp set. A few failures occurred with -Xmixed, none with -Xint.

The printed "elapsed" times (not normalized to hardware or OS) range from 24 to 132 (ms) with most falling into several buckets in the 30's, 40's, 50's and 70's.

I don't spot anything in the Timer.mainLoop code that might break when highly optimized but that's one possibility.

Roger

On 6/4/2014 3:25 PM, Martin Buchholz wrote:

Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms. - system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it)

Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch?

I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong.

On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang@oracle.com <mailto:yiming.wang@oracle.com>> wrote:

Hi Martin,

Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost.

Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical).

To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected.

The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836]

Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com <mailto:yiming.wang@oracle.com>> wrote:

Hi Martin,

To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

Martin Buchholz

5:18 p.m.

On Mon, Jun 9, 2014 at 8:03 AM, roger riggs <roger.riggs@oracle.com> wrote:

...

Hi Eric, Martin,

I'm fine with the re-write. I'm not sure why the re-ordering of y3 will change the behavior of the test but it will provide more debugging info.

No reason for re-ordering except consistency. Instead of untimed await, it seems slightly better to use timed wait for the latches. Here's my latest version of the mini-rewrite: import java.util.*; import java.util.concurrent.*; import static java.util.concurrent.TimeUnit.*; public class Args { static final long DELAY_MS = 30 * 1000L; void schedule(final Timer t, final TimerTask task, final Date d) { t.schedule(task, d); assertThrows (IllegalStateException.class, () -> t.schedule(task, d)); } void schedule(final Timer t, final TimerTask task, final Date d, final long period) { t.schedule(task, d, period); assertThrows (IllegalStateException.class, () -> t.schedule(task, d, period)); } void scheduleAtFixedRate(final Timer t, final TimerTask task, final Date d, final long period) { t.scheduleAtFixedRate(task, d, period); assertThrows (IllegalStateException.class, () -> t.scheduleAtFixedRate(task, d, period)); } TimerTask counter(final CountDownLatch latch) { return new TimerTask() { public void run() { if (latch.getCount() == 0) fail(String.format("Latch counted down too many times: " + latch)); latch.countDown(); }}; } TimerTask nop() { return new TimerTask() { public void run() { }}; } void test(String[] args) throws Throwable { final Timer t = new Timer(); try { test(t); } finally { // Ensure this test doesn't interfere with subsequent // tests even in case of failure. t.cancel(); } // Attempts to schedule tasks on a cancelled Timer result in ISE. final Date past = new Date(System.currentTimeMillis() - DELAY_MS); final Date future = new Date(System.currentTimeMillis() + DELAY_MS); assertThrows (IllegalStateException.class, () -> t.schedule(nop(), 42), () -> t.schedule(nop(), 42), () -> t.schedule(nop(), past), () -> t.schedule(nop(), 42, 42), () -> t.schedule(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), 42, 42), () -> t.scheduleAtFixedRate(nop(), past, 42), () -> t.scheduleAtFixedRate(nop(), future, 42)); } void test(Timer t) throws Throwable { final TimerTask x = new TimerTask() { public void run() {}}; assertThrows (IllegalArgumentException.class, () -> t.schedule(x, -42), () -> t.schedule(x, new Date(-42)), () -> t.schedule(x, Long.MAX_VALUE), () -> t.schedule(x, -42, 42), () -> t.schedule(x, new Date(-42), 42), () -> t.schedule(x, Long.MAX_VALUE, 42), () -> t.schedule(x, 42, 0), () -> t.schedule(x, new Date(42), 0), () -> t.schedule(x, 42, -42), () -> t.schedule(x, new Date(42), -42), () -> t.scheduleAtFixedRate(x, -42, 42), () -> t.scheduleAtFixedRate(x, new Date(-42), 42), () -> t.scheduleAtFixedRate(x, Long.MAX_VALUE, 42), () -> t.scheduleAtFixedRate(x, 42, 0), () -> t.scheduleAtFixedRate(x, new Date(42), 0), () -> t.scheduleAtFixedRate(x, 42, -42), () -> t.scheduleAtFixedRate(x, new Date(42), -42)); assertThrows (NullPointerException.class, () -> t.schedule(null, 42), () -> t.schedule(x, (Date)null), () -> t.schedule(null, 42, 42), () -> t.schedule(x, (Date)null, 42), () -> t.scheduleAtFixedRate(null, 42, 42), () -> t.scheduleAtFixedRate(x, (Date)null, 42)); // Create local classes for clearer diagnostics in case of failure class OneShotLatch extends CountDownLatch { OneShotLatch() { super(1); } } class FixedDelayLatch extends CountDownLatch { FixedDelayLatch() { super(1); } } class FixedRateLatch extends CountDownLatch { FixedRateLatch() { super(11); } } final CountDownLatch y1 = new OneShotLatch(); final CountDownLatch y2 = new FixedDelayLatch(); final CountDownLatch y3 = new FixedRateLatch(); final long start = System.currentTimeMillis(); final Date past = new Date(start - (10 * DELAY_MS + DELAY_MS / 2)); schedule( t, counter(y1), past); schedule( t, counter(y2), past, DELAY_MS); scheduleAtFixedRate(t, counter(y3), past, DELAY_MS); check(y1.await(DELAY_MS / 4, MILLISECONDS)); check(y2.await(DELAY_MS / 4, MILLISECONDS)); check(y3.await(DELAY_MS / 4, MILLISECONDS)); final long elapsed = System.currentTimeMillis() - start; if (elapsed >= DELAY_MS / 2) fail(String.format("Test took too long: elapsed=%d%n", elapsed)); } //--------------------- Infrastructure --------------------------- volatile int passed = 0, failed = 0; void pass() {passed++;} void fail() {failed++; Thread.dumpStack();} void fail(String msg) {System.err.println(msg); fail();} void unexpected(Throwable t) {failed++; t.printStackTrace();} void check(boolean cond) {if (cond) pass(); else fail();} void equal(Object x, Object y) { if (x == null ? y == null : x.equals(y)) pass(); else fail(x + " not equal to " + y);} public static void main(String[] args) throws Throwable { new Args().instanceMain(args);} void instanceMain(String[] args) throws Throwable { try {test(args);} catch (Throwable t) {unexpected(t);} System.out.printf("%nPassed = %d, failed = %d%n%n", passed, failed); if (failed > 0) throw new AssertionError("Some tests failed");} interface F { void f() throws Throwable; } void assertThrows(Class<? extends Throwable> k, F... fs) { for (F f : fs) try {f.f(); fail("Expected " + k.getName() + " not thrown");} catch (Throwable t) { if (k.isAssignableFrom(t.getClass())) pass(); else unexpected(t);}} }

David Holmes

6 Jun 6 Jun

4:32 a.m.

On 5/06/2014 5:25 AM, Martin Buchholz wrote:

...

Tests for Timer are inherently timing (!) dependent. It's reasonable for tests to assume that: - reasonable events like creating a thread and executing a simple task should complete in less than, say 2500ms.

Not necessarily with the wrong combination of Xcomp, fastdebug and low-power/slow devices. :(

...

- system clock will not change by a significant amount (> 1 sec) during the test. Yes, that means Timer tests are likely to fail during daylight saving time switchover - we can live with that. (we could even try to fix that, by detecting deviations between clock time and elapsed time, but probably not worth it)

Virtual environments have notoriously bad time keeping.

...

Can you detect any real-world unreliability in my latest version of the test, not counting daylight saving time switch?

I continue to resist your efforts to "fix" the test by removing chances for the SUT code to go wrong.

I haven't been tracking this one and although I just had a good look at the bug report and the patch I'm not getting a good handle on what is failing. Trying to write timing dependent tests that actually show that something took too long is very difficult given the range of test conditions that have to be accounted for. We've raised lots of timeouts when tests fail just to get them to stop failing. Rarely is consideration given as to whether the failure was reasonable ie that something really shouldn't have taken as long as it did. Unfortunately it's nearly impossible to answer this as we can't determine where the time actually went. David -----

...

On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi Martin,

Thanks for explanation, now I can understand why you set the DELAY_MS to 100 seconds, it is true that it prevents failure on a slow host, however, i still have some concerns. Because the test tests to schedule tasks at the time in the past, so all 13 tasks should be executed immediately and finished within a short time. If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the timer have plenty of time to finish tasks, so whether it causes above test point lost.

Back to the original test, i think it should be a test stabilization issue, because the original test assumes that the timer should be cancelled within < 1 second before the 14th task is called. this assumption may not be guaranteed due to 2 reasons: 1. if test is executed in jtreg concurrent mode on a slow host. 2. the system clock of virtual machine may not be accurate (maybe faster than physical).

To support the point, i changed the test as attached to print the execution time to see whether the timer behaves expected as the API document described. the result is as expected.

The unrepeated task executed immediately: [1401855509336] The repeated task executed immediately and repeated per 1 second: [1401855509337, 1401855510337, 1401855511338] The fixed-rate task executed immediately and catch up the delay: [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509836, 1401855510836]

Thanks, Eric On 2014/6/4 9:16, Martin Buchholz wrote:

On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang@oracle.com> wrote:

...
Hi Martin,

To sleep(1000) is not enough to reproduce the failure, because it is much lower than the period DELAY_MS (10*1000) of the repeated task created by "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".

Try sleep(DELAY_MS), the failure can be reproduced.

Well sure, then the task is rescheduled, so I expect it to fail in this case.

But in my version I had set DELAY_MS to 100 seconds. The point of extending the DELAY_MS is to prevent flaky failure on a slow machine.

Again, how do we know that this test hasn't found a Timer bug?

I still can't reproduce it.

4291

Age (days ago)

4301

Last active (days ago)

List overview

Download

15 comments

4 participants

participants (4)

David Holmes
Eric Wang
Martin Buchholz
roger riggs