RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently
Daniil Titov
daniil.x.titov at oracle.com
Thu Jun 4 23:45:53 UTC 2020
Hi Serguei,
> Note, the threads can be terminated even after the diff value is
> calculated at line 203.
Please note that the diff value calculated on line 203 shows how many *test* threads were created or terminated,
numNewThreads is number of new *test* threads and numTerminatedThreads is number of terminated *test* threads.
No *test* thread can terminate or start after the diff value is calculated.
Number of threads mbean.getThreadCount() could be seen as number of live *test* threads plus number of live internal (non-test) threads,
or A = B + C , where A - result of mbean.getThreadCount(), B - number of live test threads, C - number of live non-test threads.
Regardless what happens with internal "non-tested" threads the invariant that this method tests is that number of threads
mbean.getThreadCount() returns could not be less than number of live test threads, or that A >= B.
Best regards,
Daniil
On 6/4/20, 4:08 PM, "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com> wrote:
Hi Daniil,
On 6/4/20 16:01, Daniil Titov wrote:
> Hi Serguei,
>
>> 201 private static void checkLiveThreads(int numNewThreads,
>> 202 int numTerminatedThreads) {
>> 203 int diff = numNewThreads - numTerminatedThreads;
>> 204 long threadCount = mbean.getThreadCount();
>> 205 long expectedThreadCount = prevLiveTestThreadCount + diff;
>> 206 if (threadCount < expectedThreadCount) {
>> 207 testFailed = true;
>> When all threads are counted with mbean.getThreadCount() it is not clear
>> there is no race with new non-tested threads creation. Is it possible?
>> If so, then the check at line 206 is going to fail.
> Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass.
Okay, I see that it is failure condition.
But then is there a race with (non-tested) threads termination?
Note, the threads can be terminated even after the diff value is
calculated at line 203.
I'm sorry, if the same questions are repeated again.
Thanks,
Serguei
> --Best regards,
> Daniil
>
> From: "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com>
> Date: Thursday, June 4, 2020 at 3:03 PM
> To: Daniil Titov <daniil.x.titov at oracle.com>, Alex Menkov <alexey.menkov at oracle.com>, serviceability-dev <serviceability-dev at openjdk.java.net>
> Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently
>
> Hi Daniil,
>
> It is hard to be on top of all the details in these review rounds.
> When all threads are counted with mbean.getThreadCount() it is not clear
> there is no race with new non-tested threads creation. Is it possible?
> If so, then the check at line 206 is going to fail.
> 201 private static void checkLiveThreads(int numNewThreads,
> 202 int numTerminatedThreads) {
> 203 int diff = numNewThreads - numTerminatedThreads;
> 204 long threadCount = mbean.getThreadCount();
> 205 long expectedThreadCount = prevLiveTestThreadCount + diff;
> 206 if (threadCount < expectedThreadCount) {
> 207 testFailed = true;
>
> Thanks,
> Serguei
>
> On 6/3/20 20:42, Daniil Titov wrote:
> Hi Alex,
>
> Please review a new version of the webrev [1] that no longer uses waitTillEquals() method.
>
> [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/
> [2] https://bugs.openjdk.java.net/browse/JDK-8131745
>
> Thank you,
> Daniil
>
> On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
>
> Hi again Daniil,
>
> On 06/03/2020 16:31, Daniil Titov wrote:
> > Hi Alex,
> >
> > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method.
> >
> > I will include this change in the new version of the webrev.
> >
> >> 207 int diff = numNewThreads - numTerminatedThreads;
> >> 208 long threadCount = mbean.getThreadCount();
> >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff;
> >> 210 if (threadCount < expectedThreadCount) {
> >> if some internal thread terminates, we'll get failure here
> >
> > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads.
> >
> > 310 prevLiveTestThreadCount = getTestThreadCount();
>
> Oh, yes, I missed it.
>
> LGTM.
>
> --alex
>
> >
> > Best regards,
> > Daniil
> >
> >
> > On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> >
> > Hi Daniil,
> >
> > couple notes:
> >
> > 198 waitForThreads(numNewThreads, numTerminatedThreads);
> >
> > You don't actually need any wait here.
> > Test cases wait until all threads are in desired state
> > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and
> > checkAllThreadsDead use join())
> >
> >
> > 205 private static void checkLiveThreads(int numNewThreads,
> > 206 int numTerminatedThreads) {
> > 207 int diff = numNewThreads - numTerminatedThreads;
> > 208 long threadCount = mbean.getThreadCount();
> > 209 long expectedThreadCount = prevLiveTestThreadCount + diff;
> > 210 if (threadCount < expectedThreadCount) {
> >
> > if some internal thread terminates, we'll get failure here
> >
> >
> > --alex
> >
> > On 06/02/2020 21:00, Daniil Titov wrote:
> > > Hi Alex, Serguei, and Martin,
> > >
> > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically:
> > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method.
> > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions.
> > > 3) Relaxes the check inside checkThreadIds() method
> > >
> > >
> > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/
> > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> > >
> > > Thank you,
> > > Daniil
> > >
> > > On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> > >
> > > Hi Daniil,
> > >
> > > 1. before the fix checkLiveThreads() tested
> > > ThreadMXBean.getThreadCount(), but now as far as I see it tests
> > > Thread.getAllStackTraces();
> > >
> > > 2.
> > > 237 private static void checkThreadIds() throws InterruptedException {
> > > 238 long[] list = mbean.getAllThreadIds();
> > > 239
> > > 240 waitTillEquals(
> > > 241 list.length,
> > > 242 ()->(long)mbean.getThreadCount(),
> > > 243 "Array length returned by " +
> > > 244 "getAllThreadIds() = %1$d not matched count =
> > > ${provided}",
> > > 245 ()->list.length
> > > 246 );
> > > 247 }
> > >
> > > I suppose purpose of waitTillEquals() is to handle creation/termination
> > > of VM internal threads.
> > > But if some internal thread terminates after mbean.getAllThreadIds() and
> > > before 1st mbean.getThreadCount() call and then VM does not need to
> > > restart it, waitTillEquals will wait forever.
> > >
> > > --alex
> > >
> > >
> > > On 05/29/2020 16:28, Daniil Titov wrote:
> > > > Hi Alex and Serguei,
> > > >
> > > > Please review a new version of the change [1] that makes sure that the test counts
> > > > only the threads it creates and ignores Internal threads VM might create or destroy.
> > > >
> > > > Testing: Running this test in Mach5 with Graal on several hundred times ,
> > > > tier1-tier3 tests are in progress.
> > > >
> > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/
> > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> > > >
> > > > Thank you,
> > > > Daniil
> > > >
> > > > On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> > > >
> > > > Hi Daniil,
> > > >
> > > > I'm not sure all this retry logic is a good way.
> > > > As mentioned in jira the most important part of the testing is ensuring
> > > > that you find all the created threads when they are alive, and you don't
> > > > find them when they are dead. The actual thread count checking is not
> > > > that important.
> > > > I agree with this and I'd just simplify the test by removing checks for
> > > > thread count. VM may create and destroy internal threads when it needs it.
> > > >
> > > > --alex
> > > >
> > > > On 05/18/2020 10:31, Daniil Titov wrote:
> > > > > Please review the change [1] that fixes an intermittent failure of the test.
> > > > >
> > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads.
> > > > >
> > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself.
> > > > >
> > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed.
> > > > >
> > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01
> > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> > > > >
> > > > > Thank you,
> > > > > Daniil
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
>
>
>
>
More information about the serviceability-dev
mailing list