RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

David Holmes david.holmes at oracle.com
Thu Oct 11 06:02:41 UTC 2018


Hi Goetz,

On 11/10/2018 1:01 AM, Lindenmaier, Goetz wrote:
> Hi David,
> 
> I implemented your little experiment, and did 4 runs with my fix.
> I copied you the relevant output here:
> http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/01/with_my_fix.txt
> 
> Your code completes in one loop.
>  From my output you can see that the CPU time is increasing a little, but
> after 3-4 iterations the thread goes away.
> 
> I also did 4 runs without my fix:
> http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/01/without_my_fix.txt
> I got 3 failures, one pass.
> Also here, your code completes in one loop.

Many thanks for doing that. This is so perplexing. While adding the 
extra loops as per your fix may have solved your problem, it will make 
the recycled thread-id problem that we have seen in stress testing even 
more likely.

I may just have to kill off this part of the test. It's wasting too many 
cycles just to try and check we are graceful went encountering a 
terminated unattached thread.

Thanks,
David

> Best regards,
>    Goetz.
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Mittwoch, 10. Oktober 2018 14:32
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
>> don't terminate immediately
>>
>> Hi Goetz,
>>
>> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
>>> Hi David,
>>>
>>> This failure is very well reproducible, but only on linuxppc64 and
>> linuxppc64le.
>>
>> That doesn't really make sense to me. I would not expect the
>> process/thread lifecycle management code to be different based on the
>> CPU involved. This should be a simple kernel + NPTL/libc issue.
>>
>>> I implemented this fix in July, just missed the RDP, and the patch is used
>>> in our nightly builds since then. Since that date I don't see a single
>>> failure.  We run these nightly tests with the fastdebug build, though.
>>> But linuxx86_64, linuxs390x don't show the issue, nor all the other
>>> platforms.  As there is no special high load, and because it's that
>>> well reproducible, I don't think I read the information of a thread of
>> another
>>> process with the same thread id.
>>> With the output I implemented in the test, I see that the cpu time keeps
>>> increasing a bit, then it's stable for a few iterations, and then -1.
>>
>> That can also be explained by a thread-id being recycled and then the
>> new thread also terminating. Granted the timing and reproducibility
>> makes that unlikely.
>>
>> This is quite bizarre and I don't like bizarre. :)
>>
>> Are you able to apply this patch to the test and run some tests on ppc?
>>
>>     if ((res = pthread_join(thread, NULL)) != 0) {
>>       fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
>> strerror(res), res);
>>       exit(1);
>>     }
>>
>> +  while (pthread_kill(thread, 0) == 0) {
>> +    res++;
>> +  }
>> +  printf("Native thread was gone after %d iterations\n", res);
>>     return nativeThread;
>> }
>>
>> Once pthread_kill gives ESRCH then so should pthread_get_cpuclockid().
>> At least until the thread-id is recycled.
>>
>> Thanks,
>> David
>>
>>> Best regards,
>>>     Goetz.
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 10. Oktober 2018 01:22
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net
>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
>> threads
>>>> don't terminate immediately
>>>>
>>>> Hi Goetz,
>>>>
>>>> There is already an open bug for this issue - JDK-8208159 - but it has
>>>> only reproduced in a stress environment where we think thread-id's are
>>>> being recycled (which means waiting longer won't help). This should be
>>>> OS not CPU specific so I'm very interested to know in what circumstances
>>>> you see this failure.
>>>>
>>>> I created an instrumented version of the test that did a pthread_kill on
>>>> the target to check for ESRCH - which it got - yet we still see failures
>>>> in those stress environments.
>>>>
>>>> David
>>>>
>>>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> On ppc, one still sees increasing thread cpu times after a thread has
>> joined.
>>>>> This makes TestTerminatedThread fail.
>>>>>
>>>>> This change gives the check a few seconds to wait until the thread
>>>> disappears.
>>>>> Please review.
>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>>>>
>> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
>>>> erminatedThread.java.udiff.html
>>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>


More information about the hotspot-runtime-dev mailing list