RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Wed Oct 10 15:01:44 UTC 2018
Hi David,
I implemented your little experiment, and did 4 runs with my fix.
I copied you the relevant output here:
http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/01/with_my_fix.txt
Your code completes in one loop.
From my output you can see that the CPU time is increasing a little, but
after 3-4 iterations the thread goes away.
I also did 4 runs without my fix:
http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/01/without_my_fix.txt
I got 3 failures, one pass.
Also here, your code completes in one loop.
Best regards,
Goetz.
> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Mittwoch, 10. Oktober 2018 14:32
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
> don't terminate immediately
>
> Hi Goetz,
>
> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
> > Hi David,
> >
> > This failure is very well reproducible, but only on linuxppc64 and
> linuxppc64le.
>
> That doesn't really make sense to me. I would not expect the
> process/thread lifecycle management code to be different based on the
> CPU involved. This should be a simple kernel + NPTL/libc issue.
>
> > I implemented this fix in July, just missed the RDP, and the patch is used
> > in our nightly builds since then. Since that date I don't see a single
> > failure. We run these nightly tests with the fastdebug build, though.
> > But linuxx86_64, linuxs390x don't show the issue, nor all the other
> > platforms. As there is no special high load, and because it's that
> > well reproducible, I don't think I read the information of a thread of
> another
> > process with the same thread id.
> > With the output I implemented in the test, I see that the cpu time keeps
> > increasing a bit, then it's stable for a few iterations, and then -1.
>
> That can also be explained by a thread-id being recycled and then the
> new thread also terminating. Granted the timing and reproducibility
> makes that unlikely.
>
> This is quite bizarre and I don't like bizarre. :)
>
> Are you able to apply this patch to the test and run some tests on ppc?
>
> if ((res = pthread_join(thread, NULL)) != 0) {
> fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
> strerror(res), res);
> exit(1);
> }
>
> + while (pthread_kill(thread, 0) == 0) {
> + res++;
> + }
> + printf("Native thread was gone after %d iterations\n", res);
> return nativeThread;
> }
>
> Once pthread_kill gives ESRCH then so should pthread_get_cpuclockid().
> At least until the thread-id is recycled.
>
> Thanks,
> David
>
> > Best regards,
> > Goetz.
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Mittwoch, 10. Oktober 2018 01:22
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> threads
> >> don't terminate immediately
> >>
> >> Hi Goetz,
> >>
> >> There is already an open bug for this issue - JDK-8208159 - but it has
> >> only reproduced in a stress environment where we think thread-id's are
> >> being recycled (which means waiting longer won't help). This should be
> >> OS not CPU specific so I'm very interested to know in what circumstances
> >> you see this failure.
> >>
> >> I created an instrumented version of the test that did a pthread_kill on
> >> the target to check for ESRCH - which it got - yet we still see failures
> >> in those stress environments.
> >>
> >> David
> >>
> >> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
> >>> Hi,
> >>>
> >>> On ppc, one still sees increasing thread cpu times after a thread has
> joined.
> >>> This makes TestTerminatedThread fail.
> >>>
> >>> This change gives the check a few seconds to wait until the thread
> >> disappears.
> >>> Please review.
> >>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >>
> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
> >> erminatedThread.java.udiff.html
> >>>
> >>> Best regards,
> >>> Goetz.
> >>>
More information about the hotspot-runtime-dev
mailing list