RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Fri Oct 12 06:29:32 UTC 2018


Thanks David!

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Donnerstag, 11. Oktober 2018 14:34
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
> don't terminate immediately
> 
> On 11/10/2018 10:12 PM, Lindenmaier, Goetz wrote:
> >> Something like that yes. :)
> > Can I consider this a review ? :))
> 
> Yes :)
> 
> Thanks,
> David
> 
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Donnerstag, 11. Oktober 2018 13:53
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> threads
> >> don't terminate immediately
> >>
> >> On 11/10/2018 9:23 PM, Lindenmaier, Goetz wrote:
> >>> Hi,
> >>>
> >>>> I may just have to kill off this part of the test
> >>> You mean we should skip the tests for -1? Like this:
> >>>      http://cr.openjdk.java.net/~goetz/wr18/8211931-
> terminatedThrd/02/
> >>> ?
> >>
> >> Something like that yes. :) The main thing this test is doing is
> >> ensuring we don't crash when we encounter these
> >> terminated-but-still-attached threads.
> >>
> >> Thanks,
> >> David
> >>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Donnerstag, 11. Oktober 2018 08:03
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> runtime-
> >>>> dev at openjdk.java.net
> >>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> >> threads
> >>>> don't terminate immediately
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> On 11/10/2018 1:01 AM, Lindenmaier, Goetz wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> I implemented your little experiment, and did 4 runs with my fix.
> >>>>> I copied you the relevant output here:
> >>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >>>> terminatedThrd/01/with_my_fix.txt
> >>>>>
> >>>>> Your code completes in one loop.
> >>>>>    From my output you can see that the CPU time is increasing a little,
> but
> >>>>> after 3-4 iterations the thread goes away.
> >>>>>
> >>>>> I also did 4 runs without my fix:
> >>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >>>> terminatedThrd/01/without_my_fix.txt
> >>>>> I got 3 failures, one pass.
> >>>>> Also here, your code completes in one loop.
> >>>>
> >>>> Many thanks for doing that. This is so perplexing. While adding the
> >>>> extra loops as per your fix may have solved your problem, it will make
> >>>> the recycled thread-id problem that we have seen in stress testing
> even
> >>>> more likely.
> >>>>
> >>>> I may just have to kill off this part of the test. It's wasting too many
> >>>> cycles just to try and check we are graceful went encountering a
> >>>> terminated unattached thread.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>> Sent: Mittwoch, 10. Oktober 2018 14:32
> >>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> runtime-
> >>>>>> dev at openjdk.java.net
> >>>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >>>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
> >>>> threads
> >>>>>> don't terminate immediately
> >>>>>>
> >>>>>> Hi Goetz,
> >>>>>>
> >>>>>> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
> >>>>>>> Hi David,
> >>>>>>>
> >>>>>>> This failure is very well reproducible, but only on linuxppc64 and
> >>>>>> linuxppc64le.
> >>>>>>
> >>>>>> That doesn't really make sense to me. I would not expect the
> >>>>>> process/thread lifecycle management code to be different based on
> >> the
> >>>>>> CPU involved. This should be a simple kernel + NPTL/libc issue.
> >>>>>>
> >>>>>>> I implemented this fix in July, just missed the RDP, and the patch is
> >> used
> >>>>>>> in our nightly builds since then. Since that date I don't see a single
> >>>>>>> failure.  We run these nightly tests with the fastdebug build,
> though.
> >>>>>>> But linuxx86_64, linuxs390x don't show the issue, nor all the other
> >>>>>>> platforms.  As there is no special high load, and because it's that
> >>>>>>> well reproducible, I don't think I read the information of a thread of
> >>>>>> another
> >>>>>>> process with the same thread id.
> >>>>>>> With the output I implemented in the test, I see that the cpu time
> >> keeps
> >>>>>>> increasing a bit, then it's stable for a few iterations, and then -1.
> >>>>>>
> >>>>>> That can also be explained by a thread-id being recycled and then
> the
> >>>>>> new thread also terminating. Granted the timing and reproducibility
> >>>>>> makes that unlikely.
> >>>>>>
> >>>>>> This is quite bizarre and I don't like bizarre. :)
> >>>>>>
> >>>>>> Are you able to apply this patch to the test and run some tests on
> ppc?
> >>>>>>
> >>>>>>       if ((res = pthread_join(thread, NULL)) != 0) {
> >>>>>>         fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
> >>>>>> strerror(res), res);
> >>>>>>         exit(1);
> >>>>>>       }
> >>>>>>
> >>>>>> +  while (pthread_kill(thread, 0) == 0) {
> >>>>>> +    res++;
> >>>>>> +  }
> >>>>>> +  printf("Native thread was gone after %d iterations\n", res);
> >>>>>>       return nativeThread;
> >>>>>> }
> >>>>>>
> >>>>>> Once pthread_kill gives ESRCH then so should
> >> pthread_get_cpuclockid().
> >>>>>> At least until the thread-id is recycled.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>>
> >>>>>>> Best regards,
> >>>>>>>       Goetz.
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>>>> Sent: Mittwoch, 10. Oktober 2018 01:22
> >>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >>>> runtime-
> >>>>>>>> dev at openjdk.java.net
> >>>>>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
> >>>>>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails
> as
> >>>>>> threads
> >>>>>>>> don't terminate immediately
> >>>>>>>>
> >>>>>>>> Hi Goetz,
> >>>>>>>>
> >>>>>>>> There is already an open bug for this issue - JDK-8208159 - but it
> has
> >>>>>>>> only reproduced in a stress environment where we think thread-
> id's
> >>>> are
> >>>>>>>> being recycled (which means waiting longer won't help). This
> should
> >> be
> >>>>>>>> OS not CPU specific so I'm very interested to know in what
> >>>> circumstances
> >>>>>>>> you see this failure.
> >>>>>>>>
> >>>>>>>> I created an instrumented version of the test that did a
> pthread_kill
> >> on
> >>>>>>>> the target to check for ESRCH - which it got - yet we still see
> failures
> >>>>>>>> in those stress environments.
> >>>>>>>>
> >>>>>>>> David
> >>>>>>>>
> >>>>>>>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> On ppc, one still sees increasing thread cpu times after a thread
> has
> >>>>>> joined.
> >>>>>>>>> This makes TestTerminatedThread fail.
> >>>>>>>>>
> >>>>>>>>> This change gives the check a few seconds to wait until the
> thread
> >>>>>>>> disappears.
> >>>>>>>>> Please review.
> >>>>>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
> >>>>>>>> erminatedThread.java.udiff.html
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>>        Goetz.
> >>>>>>>>>


More information about the hotspot-runtime-dev mailing list