RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently
Daniil Titov
daniil.x.titov at oracle.com
Fri Jun 5 00:39:29 UTC 2020
Thank you, Serguei!
I will add a comment before pushing the fix.
Best regards,
Daniil
On 6/4/20, 4:56 PM, "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com> wrote:
Hi Daniil,
Got it, thanks.
I think, some short comment above this comparisons would be helpful.
LGTM.
Thanks,
Serguei
On 6/4/20 16:45, Daniil Titov wrote:
> Hi Serguei,
>
>> Note, the threads can be terminated even after the diff value is
>> calculated at line 203.
> Please note that the diff value calculated on line 203 shows how many *test* threads were created or terminated,
> numNewThreads is number of new *test* threads and numTerminatedThreads is number of terminated *test* threads.
>
> No *test* thread can terminate or start after the diff value is calculated.
>
> Number of threads mbean.getThreadCount() could be seen as number of live *test* threads plus number of live internal (non-test) threads,
> or A = B + C , where A - result of mbean.getThreadCount(), B - number of live test threads, C - number of live non-test threads.
>
> Regardless what happens with internal "non-tested" threads the invariant that this method tests is that number of threads
> mbean.getThreadCount() returns could not be less than number of live test threads, or that A >= B.
>
>
> Best regards,
> Daniil
>
> On 6/4/20, 4:08 PM, "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com> wrote:
>
> Hi Daniil,
>
>
> On 6/4/20 16:01, Daniil Titov wrote:
> > Hi Serguei,
> >
> >> 201 private static void checkLiveThreads(int numNewThreads,
> >> 202 int numTerminatedThreads) {
> >> 203 int diff = numNewThreads - numTerminatedThreads;
> >> 204 long threadCount = mbean.getThreadCount();
> >> 205 long expectedThreadCount = prevLiveTestThreadCount + diff;
> >> 206 if (threadCount < expectedThreadCount) {
> >> 207 testFailed = true;
> >> When all threads are counted with mbean.getThreadCount() it is not clear
> >> there is no race with new non-tested threads creation. Is it possible?
> >> If so, then the check at line 206 is going to fail.
> > Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass.
>
> Okay, I see that it is failure condition.
> But then is there a race with (non-tested) threads termination?
> Note, the threads can be terminated even after the diff value is
> calculated at line 203.
> I'm sorry, if the same questions are repeated again.
>
> Thanks,
> Serguei
>
> > --Best regards,
> > Daniil
> >
> > From: "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com>
> > Date: Thursday, June 4, 2020 at 3:03 PM
> > To: Daniil Titov <daniil.x.titov at oracle.com>, Alex Menkov <alexey.menkov at oracle.com>, serviceability-dev <serviceability-dev at openjdk.java.net>
> > Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently
> >
> > Hi Daniil,
> >
> > It is hard to be on top of all the details in these review rounds.
> > When all threads are counted with mbean.getThreadCount() it is not clear
> > there is no race with new non-tested threads creation. Is it possible?
> > If so, then the check at line 206 is going to fail.
> > 201 private static void checkLiveThreads(int numNewThreads,
> > 202 int numTerminatedThreads) {
> > 203 int diff = numNewThreads - numTerminatedThreads;
> > 204 long threadCount = mbean.getThreadCount();
> > 205 long expectedThreadCount = prevLiveTestThreadCount + diff;
> > 206 if (threadCount < expectedThreadCount) {
> > 207 testFailed = true;
> >
> > Thanks,
> > Serguei
> >
> > On 6/3/20 20:42, Daniil Titov wrote:
> > Hi Alex,
> >
> > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method.
> >
> > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/
> > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> >
> > Thank you,
> > Daniil
> >
> > On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> >
> > Hi again Daniil,
> >
> > On 06/03/2020 16:31, Daniil Titov wrote:
> > > Hi Alex,
> > >
> > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method.
> > >
> > > I will include this change in the new version of the webrev.
> > >
> > >> 207 int diff = numNewThreads - numTerminatedThreads;
> > >> 208 long threadCount = mbean.getThreadCount();
> > >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff;
> > >> 210 if (threadCount < expectedThreadCount) {
> > >> if some internal thread terminates, we'll get failure here
> > >
> > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads.
> > >
> > > 310 prevLiveTestThreadCount = getTestThreadCount();
> >
> > Oh, yes, I missed it.
> >
> > LGTM.
> >
> > --alex
> >
> > >
> > > Best regards,
> > > Daniil
> > >
> > >
> > > On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> > >
> > > Hi Daniil,
> > >
> > > couple notes:
> > >
> > > 198 waitForThreads(numNewThreads, numTerminatedThreads);
> > >
> > > You don't actually need any wait here.
> > > Test cases wait until all threads are in desired state
> > > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and
> > > checkAllThreadsDead use join())
> > >
> > >
> > > 205 private static void checkLiveThreads(int numNewThreads,
> > > 206 int numTerminatedThreads) {
> > > 207 int diff = numNewThreads - numTerminatedThreads;
> > > 208 long threadCount = mbean.getThreadCount();
> > > 209 long expectedThreadCount = prevLiveTestThreadCount + diff;
> > > 210 if (threadCount < expectedThreadCount) {
> > >
> > > if some internal thread terminates, we'll get failure here
> > >
> > >
> > > --alex
> > >
> > > On 06/02/2020 21:00, Daniil Titov wrote:
> > > > Hi Alex, Serguei, and Martin,
> > > >
> > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically:
> > > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method.
> > > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions.
> > > > 3) Relaxes the check inside checkThreadIds() method
> > > >
> > > >
> > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/
> > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> > > >
> > > > Thank you,
> > > > Daniil
> > > >
> > > > On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> > > >
> > > > Hi Daniil,
> > > >
> > > > 1. before the fix checkLiveThreads() tested
> > > > ThreadMXBean.getThreadCount(), but now as far as I see it tests
> > > > Thread.getAllStackTraces();
> > > >
> > > > 2.
> > > > 237 private static void checkThreadIds() throws InterruptedException {
> > > > 238 long[] list = mbean.getAllThreadIds();
> > > > 239
> > > > 240 waitTillEquals(
> > > > 241 list.length,
> > > > 242 ()->(long)mbean.getThreadCount(),
> > > > 243 "Array length returned by " +
> > > > 244 "getAllThreadIds() = %1$d not matched count =
> > > > ${provided}",
> > > > 245 ()->list.length
> > > > 246 );
> > > > 247 }
> > > >
> > > > I suppose purpose of waitTillEquals() is to handle creation/termination
> > > > of VM internal threads.
> > > > But if some internal thread terminates after mbean.getAllThreadIds() and
> > > > before 1st mbean.getThreadCount() call and then VM does not need to
> > > > restart it, waitTillEquals will wait forever.
> > > >
> > > > --alex
> > > >
> > > >
> > > > On 05/29/2020 16:28, Daniil Titov wrote:
> > > > > Hi Alex and Serguei,
> > > > >
> > > > > Please review a new version of the change [1] that makes sure that the test counts
> > > > > only the threads it creates and ignores Internal threads VM might create or destroy.
> > > > >
> > > > > Testing: Running this test in Mach5 with Graal on several hundred times ,
> > > > > tier1-tier3 tests are in progress.
> > > > >
> > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/
> > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> > > > >
> > > > > Thank you,
> > > > > Daniil
> > > > >
> > > > > On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote:
> > > > >
> > > > > Hi Daniil,
> > > > >
> > > > > I'm not sure all this retry logic is a good way.
> > > > > As mentioned in jira the most important part of the testing is ensuring
> > > > > that you find all the created threads when they are alive, and you don't
> > > > > find them when they are dead. The actual thread count checking is not
> > > > > that important.
> > > > > I agree with this and I'd just simplify the test by removing checks for
> > > > > thread count. VM may create and destroy internal threads when it needs it.
> > > > >
> > > > > --alex
> > > > >
> > > > > On 05/18/2020 10:31, Daniil Titov wrote:
> > > > > > Please review the change [1] that fixes an intermittent failure of the test.
> > > > > >
> > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads.
> > > > > >
> > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself.
> > > > > >
> > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed.
> > > > > >
> > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01
> > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745
> > > > > >
> > > > > > Thank you,
> > > > > > Daniil
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
More information about the serviceability-dev
mailing list