RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

David Holmes david.holmes at oracle.com
Wed Oct 10 12:31:47 UTC 2018


Hi Goetz,

On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
> Hi David,
> 
> This failure is very well reproducible, but only on linuxppc64 and linuxppc64le.

That doesn't really make sense to me. I would not expect the 
process/thread lifecycle management code to be different based on the 
CPU involved. This should be a simple kernel + NPTL/libc issue.

> I implemented this fix in July, just missed the RDP, and the patch is used
> in our nightly builds since then. Since that date I don't see a single
> failure.  We run these nightly tests with the fastdebug build, though.
> But linuxx86_64, linuxs390x don't show the issue, nor all the other
> platforms.  As there is no special high load, and because it's that
> well reproducible, I don't think I read the information of a thread of another
> process with the same thread id.
> With the output I implemented in the test, I see that the cpu time keeps
> increasing a bit, then it's stable for a few iterations, and then -1.

That can also be explained by a thread-id being recycled and then the 
new thread also terminating. Granted the timing and reproducibility 
makes that unlikely.

This is quite bizarre and I don't like bizarre. :)

Are you able to apply this patch to the test and run some tests on ppc?

   if ((res = pthread_join(thread, NULL)) != 0) {
     fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n", 
strerror(res), res);
     exit(1);
   }

+  while (pthread_kill(thread, 0) == 0) {
+    res++;
+  }
+  printf("Native thread was gone after %d iterations\n", res);
   return nativeThread;
}

Once pthread_kill gives ESRCH then so should pthread_get_cpuclockid(). 
At least until the thread-id is recycled.

Thanks,
David

> Best regards,
>    Goetz.
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Mittwoch, 10. Oktober 2018 01:22
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
>> don't terminate immediately
>>
>> Hi Goetz,
>>
>> There is already an open bug for this issue - JDK-8208159 - but it has
>> only reproduced in a stress environment where we think thread-id's are
>> being recycled (which means waiting longer won't help). This should be
>> OS not CPU specific so I'm very interested to know in what circumstances
>> you see this failure.
>>
>> I created an instrumented version of the test that did a pthread_kill on
>> the target to check for ESRCH - which it got - yet we still see failures
>> in those stress environments.
>>
>> David
>>
>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> On ppc, one still sees increasing thread cpu times after a thread has joined.
>>> This makes TestTerminatedThread fail.
>>>
>>> This change gives the check a few seconds to wait until the thread
>> disappears.
>>> Please review.
>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
>> erminatedThread.java.udiff.html
>>>
>>> Best regards,
>>>     Goetz.
>>>


More information about the hotspot-runtime-dev mailing list