RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

Wed Jun 3 19:08:14 UTC 2020

Hi Chris,

I was not able to reproduce the original issue anymore in Mach5. However, the test itself has a potential for a deadlock (that was also reported) and in the proposed change we fix it.  The log still should be printed and the expectation is that we will be able to see the underlying problem in it if it ever reproduced.

I could create a separate bug ( not sure if the subtask is a good fit here since the change fixes some problem in the test ) and close the current one as not reproducible if you think it is a better approach.

Regarding Thread.suspend() and Thread.resume() methods the test also checks the thread state after these methods are invoked  and since these deprecated methods are still  in API I don’t think we should exclude them from being tested.

Best regards,

Daniil

From: Chris Plummer <chris.plummer at oracle.com>
Date: Wednesday, June 3, 2020 at 11:40 AM
To: David Holmes <david.holmes at oracle.com>, Daniil Titov <daniil.x.titov at oracle.com>, serviceability-dev <serviceability-dev at openjdk.java.net>
Subject: Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

On 6/1/20 12:10 AM, David Holmes wrote:

Hi Daniil, 

On 30/05/2020 10:07 am, Daniil Titov wrote: 

Please review a change [1] that  fixes an intermittent test timeout. 

The main logic of the test has this basic structure: 

try { 
   // lots of thread state manipulation of target 
} 
finally { 
   thread.getLog(); 
} 

and as David noticed in his comment  ( the last comment in [2] )  if an exception occurs anywhere 
in the try block we can hang waiting for the join() in getLog() because we haven't executed the logic that 
tells the thread to terminate. 

So the fix puts a timeout on the join() which means the test will no longer timeout but it will still fail when whatever was leading to the timeout now happens. So as a diagnostic fix this seems fine. Hopefully the logger will show what we need to see and determine the real underlying problem. 

If this change is really just diagnostic in nature, then it should be a subtask. However, it seems to me it will actually hide the failure. The test won't get a timeout and won't print the log. Am I missing something?

Also, after reading through the bug comments it looks like the getLog()/join() timeout issue is different from the main issue that caused the CR to be filed in the first place. Comments regarding the initial problem are:

"According to the stack trace the test seems to hang on trying to load the 'java.lang.Math' class concurrently. "

"Need to see some native stacks to understand why the classloading thread is not proceeding even though RUNNABLE."

"I should have looked at the test first - it uses Thread.suspend and Thread.resume and so is inherently deadlock prone."

Does this issue no longer exist, or have we decided that since the test is expected to be deadlock prone to just ignore it.

thanks,

Chris

Thanks, 
David 
----- 

Testing:  Running a modified test that explicitly throws a runtime exception inside the try block shows the fix solves the problem. 
                  Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are in progress. 

[1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/ 
[2] https://bugs.openjdk.java.net/browse/JDK-8081652 

Thank you, 
Daniil 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20200603/4e1e135f/attachment.htm>