RFR 8078143: java/lang/management/ThreadMXBean/AllThreadIds.java fails intermittently

Thu Apr 30 18:27:23 UTC 2015

On 30.4.2015 19:18, Martin Buchholz wrote:
> Tests that sleep can almost always be better written some other way.
> In this case, I would prefer busy-waiting with timeout until the
> expected condition becomes true.

The thing is that in case of a real issue with the thread counters we
a/ would be busy-waiting till the test times out (using an arbitrary 
delay is also problematic due to different performance of different 
machines running with different configurations)
b/ would get a rather confusing message about the test timing out at the end

-JB-

>
> Here's some code from jsr166 tck tests:
>
>      /**
>       * Spin-waits until sync.isQueued(t) becomes true.
>       */
>      void waitForQueuedThread(AbstractQueuedSynchronizer sync, Thread t) {
>          long startTime = System.nanoTime();
>          while (!sync.isQueued(t)) {
>              if (millisElapsedSince(startTime) > LONG_DELAY_MS)
>                  throw new AssertionFailedError("timed out");
>              Thread.yield();
>          }
>          assertTrue(t.isAlive());
>      }
>
>
> On Thu, Apr 30, 2015 at 7:25 AM, Jaroslav Bachorik
> <jaroslav.bachorik at oracle.com <mailto:jaroslav.bachorik at oracle.com>> wrote:
>
>     Please, review the following test change
>
>     Issue : https://bugs.openjdk.java.net/browse/JDK-8078143
>     Webrev: http://cr.openjdk.java.net/~jbachorik/8078143/webrev.00
>
>     The test fails intermittently due to inconsistent reporting of the
>     live threads number.It is related to
>     https://bugs.openjdk.java.net/browse/JDK-8021335 (or, better said,
>     caused by) - certain performance counters used for the thread
>     accounting are being updated under a mutex but are read without it -
>     and it can lead to stale data being reported. More details are
>     available in the linked issue and discussion.
>
>     Because of this it is not enough to join() a terminating thread to
>     make sure the numbers would be correct. Luckily enough, it seems to
>     be sufficient to wait for a short time before actually accessing the
>     counters to be able to get a consistent view. In this fix I opted
>     for a ridiculously huge interval of 500ms just to be sure.
>
>     Thanks,
>
>     -JB-
>
>