RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Thu Jun 4 23:08:58 UTC 2020


Hi Daniil,


On 6/4/20 16:01, Daniil Titov wrote:
> Hi Serguei,
>
>> 201     private static void checkLiveThreads(int numNewThreads,
>> 202                                          int numTerminatedThreads) {
>> 203         int diff = numNewThreads - numTerminatedThreads;
>> 204         long threadCount = mbean.getThreadCount();
>> 205         long expectedThreadCount = prevLiveTestThreadCount + diff;
>> 206         if (threadCount < expectedThreadCount) {
>> 207             testFailed = true;
>> When all threads are counted with mbean.getThreadCount() it is not clear
>> there is no race with new non-tested threads creation. Is it possible?
>> If so, then the check at line 206 is going to fail.
> Even if some Internal (non-tested) threads are created  the value mbean.getThreadCount() returns should be  no less  than the expected number of live test threads (please note that  prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass.

Okay, I see that it is failure condition.
But then is there a race with (non-tested) threads termination?
Note, the threads can be terminated even after the diff value is 
calculated at line 203.
I'm sorry, if the same questions are repeated again.

Thanks,
Serguei

> --Best regards,
> Daniil
>
> From: "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com>
> Date: Thursday, June 4, 2020 at 3:03 PM
> To: Daniil Titov <daniil.x.titov at oracle.com>, Alex Menkov <alexey.menkov at oracle.com>, serviceability-dev <serviceability-dev at openjdk.java.net>
> Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently
>
> Hi Daniil,
>
> It is hard to be on top of all the details in these review rounds.
> When all threads are counted with mbean.getThreadCount() it is not clear
> there is no race with new non-tested threads creation. Is it possible?
> If so, then the check at line 206 is going to fail.
>   201     private static void checkLiveThreads(int numNewThreads,
>   202                                          int numTerminatedThreads) {
>   203         int diff = numNewThreads - numTerminatedThreads;
>   204         long threadCount = mbean.getThreadCount();
>   205         long expectedThreadCount = prevLiveTestThreadCount + diff;
>   206         if (threadCount < expectedThreadCount) {
>   207             testFailed = true;
>
> Thanks,
> Serguei
>
> On 6/3/20 20:42, Daniil Titov wrote:
> Hi Alex,
>
> Please review a new version of the webrev [1] that no longer uses waitTillEquals() method.
>
> [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/
> [2] https://bugs.openjdk.java.net/browse/JDK-8131745
>
> Thank you,
> Daniil
>
> On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
>
>      Hi again Daniil,
>
>      On 06/03/2020 16:31, Daniil Titov wrote:
>      > Hi Alex,
>      >
>      > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method.
>      >
>      > I will include this change in the new  version of  the webrev.
>      >
>      >>      207         int diff = numNewThreads - numTerminatedThreads;
>      >>       208         long threadCount = mbean.getThreadCount();
>      >>       209         long expectedThreadCount = prevLiveTestThreadCount + diff;
>      >>      210         if (threadCount < expectedThreadCount) {
>      >>     if some internal thread terminates, we'll get failure here
>      >
>      > The failure will not happen. Please note that  prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated   the value mbean.getThreadCount() returns should still be no less  than the expected number of live test threads.
>      >
>      > 310         prevLiveTestThreadCount = getTestThreadCount();
>
>      Oh, yes, I missed it.
>
>      LGTM.
>
>      --alex
>
>      >
>      > Best regards,
>      > Daniil
>      >
>      >
>      > On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
>      >
>      >      Hi Daniil,
>      >
>      >      couple notes:
>      >
>      >      198         waitForThreads(numNewThreads, numTerminatedThreads);
>      >
>      >      You don't actually need any wait here.
>      >      Test cases wait until all threads are in desired state
>      >      (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and
>      >      checkAllThreadsDead use join())
>      >
>      >
>      >        205     private static void checkLiveThreads(int numNewThreads,
>      >        206                                          int numTerminatedThreads) {
>      >        207         int diff = numNewThreads - numTerminatedThreads;
>      >        208         long threadCount = mbean.getThreadCount();
>      >        209         long expectedThreadCount = prevLiveTestThreadCount + diff;
>      >        210         if (threadCount < expectedThreadCount) {
>      >
>      >      if some internal thread terminates, we'll get failure here
>      >
>      >
>      >      --alex
>      >
>      >      On 06/02/2020 21:00, Daniil Titov wrote:
>      >      > Hi Alex, Serguei, and Martin,
>      >      >
>      >      > Thank you for your comments. Please review a new version of the fix that addresses them, specifically:
>      >      > 1)  Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll()  method.
>      >      > 2)  Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions.
>      >      > 3)  Relaxes the check inside checkThreadIds() method
>      >      >
>      >      >
>      >      > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/
>      >      > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
>      >      >
>      >      > Thank you,
>      >      > Daniil
>      >      >
>      >      > On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
>      >      >
>      >      >      Hi Daniil,
>      >      >
>      >      >      1. before the fix checkLiveThreads() tested
>      >      >      ThreadMXBean.getThreadCount(), but now as far as I see it tests
>      >      >      Thread.getAllStackTraces();
>      >      >
>      >      >      2.
>      >      >        237     private static void checkThreadIds() throws InterruptedException {
>      >      >        238         long[] list = mbean.getAllThreadIds();
>      >      >        239
>      >      >        240         waitTillEquals(
>      >      >        241             list.length,
>      >      >        242             ()->(long)mbean.getThreadCount(),
>      >      >        243             "Array length returned by " +
>      >      >        244                 "getAllThreadIds() = %1$d not matched count =
>      >      >      ${provided}",
>      >      >        245             ()->list.length
>      >      >        246         );
>      >      >        247     }
>      >      >
>      >      >      I suppose purpose of waitTillEquals() is to handle creation/termination
>      >      >      of VM internal threads.
>      >      >      But if some internal thread terminates after mbean.getAllThreadIds() and
>      >      >      before 1st mbean.getThreadCount() call and then VM does not need to
>      >      >      restart it, waitTillEquals will wait forever.
>      >      >
>      >      >      --alex
>      >      >
>      >      >
>      >      >      On 05/29/2020 16:28, Daniil Titov wrote:
>      >      >      > Hi Alex and Serguei,
>      >      >      >
>      >      >      > Please review a new version of the change [1] that makes sure that the test counts
>      >      >      > only the threads it creates and ignores  Internal threads VM might create or  destroy.
>      >      >      >
>      >      >      > Testing: Running this test in Mach5 with Graal on several hundred times ,
>      >      >      >   tier1-tier3 tests  are in progress.
>      >      >      >
>      >      >      > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/
>      >      >      > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
>      >      >      >
>      >      >      > Thank you,
>      >      >      > Daniil
>      >      >      >
>      >      >      > On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
>      >      >      >
>      >      >      >      Hi Daniil,
>      >      >      >
>      >      >      >      I'm not sure all this retry logic is a good way.
>      >      >      >      As mentioned in jira the most important part of the testing is ensuring
>      >      >      >      that you find all the created threads when they are alive, and you don't
>      >      >      >      find them when they are dead. The actual thread count checking is not
>      >      >      >      that important.
>      >      >      >      I agree with this and I'd just simplify the test by removing checks for
>      >      >      >      thread count. VM may create and destroy internal threads when it needs it.
>      >      >      >
>      >      >      >      --alex
>      >      >      >
>      >      >      >      On 05/18/2020 10:31, Daniil Titov wrote:
>      >      >      >      > Please review the change [1] that fixes an intermittent failure of the test.
>      >      >      >      >
>      >      >      >      > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts.  The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed  (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads.
>      >      >      >      >
>      >      >      >      > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself.
>      >      >      >      >
>      >      >      >      > Testing. Test with Graal on and Mach5 tier1-tier7 test passed.
>      >      >      >      >
>      >      >      >      > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01
>      >      >      >      > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
>      >      >      >      >
>      >      >      >      > Thank you,
>      >      >      >      > Daniil
>      >      >      >      >
>      >      >      >      >
>      >      >      >      >
>      >      >      >
>      >      >      >
>      >      >
>      >      >
>      >
>      >
>
>
>
>
>
>



More information about the serviceability-dev mailing list