RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Thu Oct 11 12:12:23 UTC 2018


> Something like that yes. :) 
Can I consider this a review ? :))

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Donnerstag, 11. Oktober 2018 13:53
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
> don't terminate immediately
> 
> On 11/10/2018 9:23 PM, Lindenmaier, Goetz wrote:
> > Hi,
> >
> >> I may just have to kill off this part of the test
> > You mean we should skip the tests for -1? Like this:
> >     http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/02/
> > ?
> 
> Something like that yes. :) The main thing this test is doing is
> ensuring we don't crash when we encounter these
> terminated-but-still-attached threads.
> 
> Thanks,
> David
> 
> > Best regards,
> >    Goetz.
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Donnerstag, 11. Oktober 2018 08:03
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> threads
> >> don't terminate immediately
> >>
> >> Hi Goetz,
> >>
> >> On 11/10/2018 1:01 AM, Lindenmaier, Goetz wrote:
> >>> Hi David,
> >>>
> >>> I implemented your little experiment, and did 4 runs with my fix.
> >>> I copied you the relevant output here:
> >>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >> terminatedThrd/01/with_my_fix.txt
> >>>
> >>> Your code completes in one loop.
> >>>   From my output you can see that the CPU time is increasing a little, but
> >>> after 3-4 iterations the thread goes away.
> >>>
> >>> I also did 4 runs without my fix:
> >>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >> terminatedThrd/01/without_my_fix.txt
> >>> I got 3 failures, one pass.
> >>> Also here, your code completes in one loop.
> >>
> >> Many thanks for doing that. This is so perplexing. While adding the
> >> extra loops as per your fix may have solved your problem, it will make
> >> the recycled thread-id problem that we have seen in stress testing even
> >> more likely.
> >>
> >> I may just have to kill off this part of the test. It's wasting too many
> >> cycles just to try and check we are graceful went encountering a
> >> terminated unattached thread.
> >>
> >> Thanks,
> >> David
> >>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Mittwoch, 10. Oktober 2018 14:32
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> runtime-
> >>>> dev at openjdk.java.net
> >>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> >> threads
> >>>> don't terminate immediately
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> This failure is very well reproducible, but only on linuxppc64 and
> >>>> linuxppc64le.
> >>>>
> >>>> That doesn't really make sense to me. I would not expect the
> >>>> process/thread lifecycle management code to be different based on
> the
> >>>> CPU involved. This should be a simple kernel + NPTL/libc issue.
> >>>>
> >>>>> I implemented this fix in July, just missed the RDP, and the patch is
> used
> >>>>> in our nightly builds since then. Since that date I don't see a single
> >>>>> failure.  We run these nightly tests with the fastdebug build, though.
> >>>>> But linuxx86_64, linuxs390x don't show the issue, nor all the other
> >>>>> platforms.  As there is no special high load, and because it's that
> >>>>> well reproducible, I don't think I read the information of a thread of
> >>>> another
> >>>>> process with the same thread id.
> >>>>> With the output I implemented in the test, I see that the cpu time
> keeps
> >>>>> increasing a bit, then it's stable for a few iterations, and then -1.
> >>>>
> >>>> That can also be explained by a thread-id being recycled and then the
> >>>> new thread also terminating. Granted the timing and reproducibility
> >>>> makes that unlikely.
> >>>>
> >>>> This is quite bizarre and I don't like bizarre. :)
> >>>>
> >>>> Are you able to apply this patch to the test and run some tests on ppc?
> >>>>
> >>>>      if ((res = pthread_join(thread, NULL)) != 0) {
> >>>>        fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
> >>>> strerror(res), res);
> >>>>        exit(1);
> >>>>      }
> >>>>
> >>>> +  while (pthread_kill(thread, 0) == 0) {
> >>>> +    res++;
> >>>> +  }
> >>>> +  printf("Native thread was gone after %d iterations\n", res);
> >>>>      return nativeThread;
> >>>> }
> >>>>
> >>>> Once pthread_kill gives ESRCH then so should
> pthread_get_cpuclockid().
> >>>> At least until the thread-id is recycled.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>> Sent: Mittwoch, 10. Oktober 2018 01:22
> >>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> runtime-
> >>>>>> dev at openjdk.java.net
> >>>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >>>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> >>>> threads
> >>>>>> don't terminate immediately
> >>>>>>
> >>>>>> Hi Goetz,
> >>>>>>
> >>>>>> There is already an open bug for this issue - JDK-8208159 - but it has
> >>>>>> only reproduced in a stress environment where we think thread-id's
> >> are
> >>>>>> being recycled (which means waiting longer won't help). This should
> be
> >>>>>> OS not CPU specific so I'm very interested to know in what
> >> circumstances
> >>>>>> you see this failure.
> >>>>>>
> >>>>>> I created an instrumented version of the test that did a pthread_kill
> on
> >>>>>> the target to check for ESRCH - which it got - yet we still see failures
> >>>>>> in those stress environments.
> >>>>>>
> >>>>>> David
> >>>>>>
> >>>>>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On ppc, one still sees increasing thread cpu times after a thread has
> >>>> joined.
> >>>>>>> This makes TestTerminatedThread fail.
> >>>>>>>
> >>>>>>> This change gives the check a few seconds to wait until the thread
> >>>>>> disappears.
> >>>>>>> Please review.
> >>>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >>>>>>
> >>>>
> >>
> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
> >>>>>> erminatedThread.java.udiff.html
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>>       Goetz.
> >>>>>>>


More information about the hotspot-runtime-dev mailing list