RFR: 8335124: com/sun/management/ThreadMXBean/ThreadCpuTimeArray.java failed with CPU time out of expected range

Kevin Walls kevinw at openjdk.org
Thu Jun 27 09:32:33 UTC 2024


On Thu, 27 Jun 2024 08:54:16 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> This test has had occasional failures for years, possibly forever.
> A previous update made the test "othervm" which removed some interruptions, but a time accounting problem remains.
> 
> This change adds a simple sleep after observing that the test threads are all sleeping.
> 
> The idea is that threads may not have actually started sleeping when we observe their java.lang.Thread.State as WAITING, they may use some CPU time after that in order to actually get to sleep.

The test waits for the state of test threads to be WAITING.  There are a few more lines of test code (which might take 20,000 or 50,0000 ns using simple System.nanoTime calls), then we record threadmxbean.getCpuTime() for threads, do a sleep, and check these times again.

Usually they are the same, no more cpu time was used (correct!).  But the times can be some thousands of nanos different and the test fails.

My theory is that sometimes, a test thread has not really started sleeping yet when we read its state as WAITING and record the first time.  The test should sleep for a few hundred millis before taking the first time readings.

Commonly, in the failures the time discrepancy is under 100,000 nanos, with an example I see of nearly 200,000 nanos and the worst I saw was 15625000 nanos, approx 15ms.


When test threads are waiting, they look like this:

"MyThread-6" #38 [16479] prio=5 os_prio=0 cpu=56.05ms elapsed=34.89s tid=0x000014f83806c590 nid=16479 in Object.wait()  [0x000014f8dd343000]
   java.lang.Thread.State: WAITING (on object monitor)
Thread: 0x000014f83806c590  [0x405f] State: _at_safepoint _at_poll_safepoint 0
   JavaThread state: _thread_blocked
        at java.lang.Object.wait0(java.base at 24-internal/Native Method)
        - waiting on <0x00000000ffa66fc0> (a java.lang.Object)
        at java.lang.Object.wait(java.base at 24-internal/Object.java:374)
        at java.lang.Object.wait(java.base at 24-internal/Object.java:348)
        at ThreadCpuTimeArray$MyThread.run(ThreadCpuTimeArray.java:263)
        - locked <0x00000000ffa66fc0> (a java.lang.Object)


This is a harmless test change either way.  If the theory is wrong, we will see the "out of expected range" failure again.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19924#issuecomment-2194213706


More information about the serviceability-dev mailing list