RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently

Daniil Titov daniil.x.titov at oracle.com
Thu Jun 4 23:45:53 UTC 2020


Hi Serguei,

> Note, the threads can be terminated even after the diff value is 
>    calculated at line 203.

Please note that the diff value calculated on line 203 shows how many *test* threads were created or terminated,
numNewThreads is number of new *test* threads and numTerminatedThreads is number of terminated *test* threads.

No *test* thread can terminate or start after the diff value is calculated.

Number of threads mbean.getThreadCount() could be seen as number of live *test* threads plus number of live internal (non-test) threads,
or A = B + C , where A - result of mbean.getThreadCount(), B - number of live test threads, C - number of live non-test threads.

Regardless what happens with internal "non-tested" threads the invariant  that this method tests is that number of threads
mbean.getThreadCount() returns could not be less than number of live test threads, or that A >= B.


Best regards,
Daniil

On 6/4/20, 4:08 PM, "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com> wrote:

    Hi Daniil,


    On 6/4/20 16:01, Daniil Titov wrote:
    > Hi Serguei,
    >
    >> 201     private static void checkLiveThreads(int numNewThreads,
    >> 202                                          int numTerminatedThreads) {
    >> 203         int diff = numNewThreads - numTerminatedThreads;
    >> 204         long threadCount = mbean.getThreadCount();
    >> 205         long expectedThreadCount = prevLiveTestThreadCount + diff;
    >> 206         if (threadCount < expectedThreadCount) {
    >> 207             testFailed = true;
    >> When all threads are counted with mbean.getThreadCount() it is not clear
    >> there is no race with new non-tested threads creation. Is it possible?
    >> If so, then the check at line 206 is going to fail.
    > Even if some Internal (non-tested) threads are created  the value mbean.getThreadCount() returns should be  no less  than the expected number of live test threads (please note that  prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass.

    Okay, I see that it is failure condition.
    But then is there a race with (non-tested) threads termination?
    Note, the threads can be terminated even after the diff value is 
    calculated at line 203.
    I'm sorry, if the same questions are repeated again.

    Thanks,
    Serguei

    > --Best regards,
    > Daniil
    >
    > From: "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com>
    > Date: Thursday, June 4, 2020 at 3:03 PM
    > To: Daniil Titov <daniil.x.titov at oracle.com>, Alex Menkov <alexey.menkov at oracle.com>, serviceability-dev <serviceability-dev at openjdk.java.net>
    > Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently
    >
    > Hi Daniil,
    >
    > It is hard to be on top of all the details in these review rounds.
    > When all threads are counted with mbean.getThreadCount() it is not clear
    > there is no race with new non-tested threads creation. Is it possible?
    > If so, then the check at line 206 is going to fail.
    >   201     private static void checkLiveThreads(int numNewThreads,
    >   202                                          int numTerminatedThreads) {
    >   203         int diff = numNewThreads - numTerminatedThreads;
    >   204         long threadCount = mbean.getThreadCount();
    >   205         long expectedThreadCount = prevLiveTestThreadCount + diff;
    >   206         if (threadCount < expectedThreadCount) {
    >   207             testFailed = true;
    >
    > Thanks,
    > Serguei
    >
    > On 6/3/20 20:42, Daniil Titov wrote:
    > Hi Alex,
    >
    > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method.
    >
    > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/
    > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
    >
    > Thank you,
    > Daniil
    >
    > On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
    >
    >      Hi again Daniil,
    >
    >      On 06/03/2020 16:31, Daniil Titov wrote:
    >      > Hi Alex,
    >      >
    >      > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method.
    >      >
    >      > I will include this change in the new  version of  the webrev.
    >      >
    >      >>      207         int diff = numNewThreads - numTerminatedThreads;
    >      >>       208         long threadCount = mbean.getThreadCount();
    >      >>       209         long expectedThreadCount = prevLiveTestThreadCount + diff;
    >      >>      210         if (threadCount < expectedThreadCount) {
    >      >>     if some internal thread terminates, we'll get failure here
    >      >
    >      > The failure will not happen. Please note that  prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated   the value mbean.getThreadCount() returns should still be no less  than the expected number of live test threads.
    >      >
    >      > 310         prevLiveTestThreadCount = getTestThreadCount();
    >
    >      Oh, yes, I missed it.
    >
    >      LGTM.
    >
    >      --alex
    >
    >      >
    >      > Best regards,
    >      > Daniil
    >      >
    >      >
    >      > On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
    >      >
    >      >      Hi Daniil,
    >      >
    >      >      couple notes:
    >      >
    >      >      198         waitForThreads(numNewThreads, numTerminatedThreads);
    >      >
    >      >      You don't actually need any wait here.
    >      >      Test cases wait until all threads are in desired state
    >      >      (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and
    >      >      checkAllThreadsDead use join())
    >      >
    >      >
    >      >        205     private static void checkLiveThreads(int numNewThreads,
    >      >        206                                          int numTerminatedThreads) {
    >      >        207         int diff = numNewThreads - numTerminatedThreads;
    >      >        208         long threadCount = mbean.getThreadCount();
    >      >        209         long expectedThreadCount = prevLiveTestThreadCount + diff;
    >      >        210         if (threadCount < expectedThreadCount) {
    >      >
    >      >      if some internal thread terminates, we'll get failure here
    >      >
    >      >
    >      >      --alex
    >      >
    >      >      On 06/02/2020 21:00, Daniil Titov wrote:
    >      >      > Hi Alex, Serguei, and Martin,
    >      >      >
    >      >      > Thank you for your comments. Please review a new version of the fix that addresses them, specifically:
    >      >      > 1)  Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll()  method.
    >      >      > 2)  Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions.
    >      >      > 3)  Relaxes the check inside checkThreadIds() method
    >      >      >
    >      >      >
    >      >      > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/
    >      >      > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
    >      >      >
    >      >      > Thank you,
    >      >      > Daniil
    >      >      >
    >      >      > On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
    >      >      >
    >      >      >      Hi Daniil,
    >      >      >
    >      >      >      1. before the fix checkLiveThreads() tested
    >      >      >      ThreadMXBean.getThreadCount(), but now as far as I see it tests
    >      >      >      Thread.getAllStackTraces();
    >      >      >
    >      >      >      2.
    >      >      >        237     private static void checkThreadIds() throws InterruptedException {
    >      >      >        238         long[] list = mbean.getAllThreadIds();
    >      >      >        239
    >      >      >        240         waitTillEquals(
    >      >      >        241             list.length,
    >      >      >        242             ()->(long)mbean.getThreadCount(),
    >      >      >        243             "Array length returned by " +
    >      >      >        244                 "getAllThreadIds() = %1$d not matched count =
    >      >      >      ${provided}",
    >      >      >        245             ()->list.length
    >      >      >        246         );
    >      >      >        247     }
    >      >      >
    >      >      >      I suppose purpose of waitTillEquals() is to handle creation/termination
    >      >      >      of VM internal threads.
    >      >      >      But if some internal thread terminates after mbean.getAllThreadIds() and
    >      >      >      before 1st mbean.getThreadCount() call and then VM does not need to
    >      >      >      restart it, waitTillEquals will wait forever.
    >      >      >
    >      >      >      --alex
    >      >      >
    >      >      >
    >      >      >      On 05/29/2020 16:28, Daniil Titov wrote:
    >      >      >      > Hi Alex and Serguei,
    >      >      >      >
    >      >      >      > Please review a new version of the change [1] that makes sure that the test counts
    >      >      >      > only the threads it creates and ignores  Internal threads VM might create or  destroy.
    >      >      >      >
    >      >      >      > Testing: Running this test in Mach5 with Graal on several hundred times ,
    >      >      >      >   tier1-tier3 tests  are in progress.
    >      >      >      >
    >      >      >      > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/
    >      >      >      > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
    >      >      >      >
    >      >      >      > Thank you,
    >      >      >      > Daniil
    >      >      >      >
    >      >      >      > On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
    >      >      >      >
    >      >      >      >      Hi Daniil,
    >      >      >      >
    >      >      >      >      I'm not sure all this retry logic is a good way.
    >      >      >      >      As mentioned in jira the most important part of the testing is ensuring
    >      >      >      >      that you find all the created threads when they are alive, and you don't
    >      >      >      >      find them when they are dead. The actual thread count checking is not
    >      >      >      >      that important.
    >      >      >      >      I agree with this and I'd just simplify the test by removing checks for
    >      >      >      >      thread count. VM may create and destroy internal threads when it needs it.
    >      >      >      >
    >      >      >      >      --alex
    >      >      >      >
    >      >      >      >      On 05/18/2020 10:31, Daniil Titov wrote:
    >      >      >      >      > Please review the change [1] that fixes an intermittent failure of the test.
    >      >      >      >      >
    >      >      >      >      > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts.  The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed  (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads.
    >      >      >      >      >
    >      >      >      >      > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself.
    >      >      >      >      >
    >      >      >      >      > Testing. Test with Graal on and Mach5 tier1-tier7 test passed.
    >      >      >      >      >
    >      >      >      >      > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01
    >      >      >      >      > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
    >      >      >      >      >
    >      >      >      >      > Thank you,
    >      >      >      >      > Daniil
    >      >      >      >      >
    >      >      >      >      >
    >      >      >      >      >
    >      >      >      >
    >      >      >      >
    >      >      >
    >      >      >
    >      >
    >      >
    >
    >
    >
    >
    >
    >





More information about the serviceability-dev mailing list