RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Thu Oct 11 11:23:21 UTC 2018


Hi,

> I may just have to kill off this part of the test
You mean we should skip the tests for -1? Like this: 
   http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/02/
?

Best regards,
  Goetz.


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Donnerstag, 11. Oktober 2018 08:03
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
> don't terminate immediately
> 
> Hi Goetz,
> 
> On 11/10/2018 1:01 AM, Lindenmaier, Goetz wrote:
> > Hi David,
> >
> > I implemented your little experiment, and did 4 runs with my fix.
> > I copied you the relevant output here:
> > http://cr.openjdk.java.net/~goetz/wr18/8211931-
> terminatedThrd/01/with_my_fix.txt
> >
> > Your code completes in one loop.
> >  From my output you can see that the CPU time is increasing a little, but
> > after 3-4 iterations the thread goes away.
> >
> > I also did 4 runs without my fix:
> > http://cr.openjdk.java.net/~goetz/wr18/8211931-
> terminatedThrd/01/without_my_fix.txt
> > I got 3 failures, one pass.
> > Also here, your code completes in one loop.
> 
> Many thanks for doing that. This is so perplexing. While adding the
> extra loops as per your fix may have solved your problem, it will make
> the recycled thread-id problem that we have seen in stress testing even
> more likely.
> 
> I may just have to kill off this part of the test. It's wasting too many
> cycles just to try and check we are graceful went encountering a
> terminated unattached thread.
> 
> Thanks,
> David
> 
> > Best regards,
> >    Goetz.
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Mittwoch, 10. Oktober 2018 14:32
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> threads
> >> don't terminate immediately
> >>
> >> Hi Goetz,
> >>
> >> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
> >>> Hi David,
> >>>
> >>> This failure is very well reproducible, but only on linuxppc64 and
> >> linuxppc64le.
> >>
> >> That doesn't really make sense to me. I would not expect the
> >> process/thread lifecycle management code to be different based on the
> >> CPU involved. This should be a simple kernel + NPTL/libc issue.
> >>
> >>> I implemented this fix in July, just missed the RDP, and the patch is used
> >>> in our nightly builds since then. Since that date I don't see a single
> >>> failure.  We run these nightly tests with the fastdebug build, though.
> >>> But linuxx86_64, linuxs390x don't show the issue, nor all the other
> >>> platforms.  As there is no special high load, and because it's that
> >>> well reproducible, I don't think I read the information of a thread of
> >> another
> >>> process with the same thread id.
> >>> With the output I implemented in the test, I see that the cpu time keeps
> >>> increasing a bit, then it's stable for a few iterations, and then -1.
> >>
> >> That can also be explained by a thread-id being recycled and then the
> >> new thread also terminating. Granted the timing and reproducibility
> >> makes that unlikely.
> >>
> >> This is quite bizarre and I don't like bizarre. :)
> >>
> >> Are you able to apply this patch to the test and run some tests on ppc?
> >>
> >>     if ((res = pthread_join(thread, NULL)) != 0) {
> >>       fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
> >> strerror(res), res);
> >>       exit(1);
> >>     }
> >>
> >> +  while (pthread_kill(thread, 0) == 0) {
> >> +    res++;
> >> +  }
> >> +  printf("Native thread was gone after %d iterations\n", res);
> >>     return nativeThread;
> >> }
> >>
> >> Once pthread_kill gives ESRCH then so should pthread_get_cpuclockid().
> >> At least until the thread-id is recycled.
> >>
> >> Thanks,
> >> David
> >>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Mittwoch, 10. Oktober 2018 01:22
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> runtime-
> >>>> dev at openjdk.java.net
> >>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> >> threads
> >>>> don't terminate immediately
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> There is already an open bug for this issue - JDK-8208159 - but it has
> >>>> only reproduced in a stress environment where we think thread-id's
> are
> >>>> being recycled (which means waiting longer won't help). This should be
> >>>> OS not CPU specific so I'm very interested to know in what
> circumstances
> >>>> you see this failure.
> >>>>
> >>>> I created an instrumented version of the test that did a pthread_kill on
> >>>> the target to check for ESRCH - which it got - yet we still see failures
> >>>> in those stress environments.
> >>>>
> >>>> David
> >>>>
> >>>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On ppc, one still sees increasing thread cpu times after a thread has
> >> joined.
> >>>>> This makes TestTerminatedThread fail.
> >>>>>
> >>>>> This change gives the check a few seconds to wait until the thread
> >>>> disappears.
> >>>>> Please review.
> >>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >>>>
> >>
> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
> >>>> erminatedThread.java.udiff.html
> >>>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>


More information about the hotspot-runtime-dev mailing list