RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

David Holmes david.holmes at oracle.com
Thu Oct 11 11:52:41 UTC 2018


On 11/10/2018 9:23 PM, Lindenmaier, Goetz wrote:
> Hi,
> 
>> I may just have to kill off this part of the test
> You mean we should skip the tests for -1? Like this:
>     http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/02/
> ?

Something like that yes. :) The main thing this test is doing is 
ensuring we don't crash when we encounter these 
terminated-but-still-attached threads.

Thanks,
David

> Best regards,
>    Goetz.
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Donnerstag, 11. Oktober 2018 08:03
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
>> don't terminate immediately
>>
>> Hi Goetz,
>>
>> On 11/10/2018 1:01 AM, Lindenmaier, Goetz wrote:
>>> Hi David,
>>>
>>> I implemented your little experiment, and did 4 runs with my fix.
>>> I copied you the relevant output here:
>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>> terminatedThrd/01/with_my_fix.txt
>>>
>>> Your code completes in one loop.
>>>   From my output you can see that the CPU time is increasing a little, but
>>> after 3-4 iterations the thread goes away.
>>>
>>> I also did 4 runs without my fix:
>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>> terminatedThrd/01/without_my_fix.txt
>>> I got 3 failures, one pass.
>>> Also here, your code completes in one loop.
>>
>> Many thanks for doing that. This is so perplexing. While adding the
>> extra loops as per your fix may have solved your problem, it will make
>> the recycled thread-id problem that we have seen in stress testing even
>> more likely.
>>
>> I may just have to kill off this part of the test. It's wasting too many
>> cycles just to try and check we are graceful went encountering a
>> terminated unattached thread.
>>
>> Thanks,
>> David
>>
>>> Best regards,
>>>     Goetz.
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Mittwoch, 10. Oktober 2018 14:32
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net
>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
>> threads
>>>> don't terminate immediately
>>>>
>>>> Hi Goetz,
>>>>
>>>> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
>>>>> Hi David,
>>>>>
>>>>> This failure is very well reproducible, but only on linuxppc64 and
>>>> linuxppc64le.
>>>>
>>>> That doesn't really make sense to me. I would not expect the
>>>> process/thread lifecycle management code to be different based on the
>>>> CPU involved. This should be a simple kernel + NPTL/libc issue.
>>>>
>>>>> I implemented this fix in July, just missed the RDP, and the patch is used
>>>>> in our nightly builds since then. Since that date I don't see a single
>>>>> failure.  We run these nightly tests with the fastdebug build, though.
>>>>> But linuxx86_64, linuxs390x don't show the issue, nor all the other
>>>>> platforms.  As there is no special high load, and because it's that
>>>>> well reproducible, I don't think I read the information of a thread of
>>>> another
>>>>> process with the same thread id.
>>>>> With the output I implemented in the test, I see that the cpu time keeps
>>>>> increasing a bit, then it's stable for a few iterations, and then -1.
>>>>
>>>> That can also be explained by a thread-id being recycled and then the
>>>> new thread also terminating. Granted the timing and reproducibility
>>>> makes that unlikely.
>>>>
>>>> This is quite bizarre and I don't like bizarre. :)
>>>>
>>>> Are you able to apply this patch to the test and run some tests on ppc?
>>>>
>>>>      if ((res = pthread_join(thread, NULL)) != 0) {
>>>>        fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
>>>> strerror(res), res);
>>>>        exit(1);
>>>>      }
>>>>
>>>> +  while (pthread_kill(thread, 0) == 0) {
>>>> +    res++;
>>>> +  }
>>>> +  printf("Native thread was gone after %d iterations\n", res);
>>>>      return nativeThread;
>>>> }
>>>>
>>>> Once pthread_kill gives ESRCH then so should pthread_get_cpuclockid().
>>>> At least until the thread-id is recycled.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Mittwoch, 10. Oktober 2018 01:22
>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> runtime-
>>>>>> dev at openjdk.java.net
>>>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>>>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
>>>> threads
>>>>>> don't terminate immediately
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> There is already an open bug for this issue - JDK-8208159 - but it has
>>>>>> only reproduced in a stress environment where we think thread-id's
>> are
>>>>>> being recycled (which means waiting longer won't help). This should be
>>>>>> OS not CPU specific so I'm very interested to know in what
>> circumstances
>>>>>> you see this failure.
>>>>>>
>>>>>> I created an instrumented version of the test that did a pthread_kill on
>>>>>> the target to check for ESRCH - which it got - yet we still see failures
>>>>>> in those stress environments.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On ppc, one still sees increasing thread cpu times after a thread has
>>>> joined.
>>>>>>> This makes TestTerminatedThread fail.
>>>>>>>
>>>>>>> This change gives the check a few seconds to wait until the thread
>>>>>> disappears.
>>>>>>> Please review.
>>>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>>>>>>
>>>>
>> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
>>>>>> erminatedThread.java.udiff.html
>>>>>>>
>>>>>>> Best regards,
>>>>>>>       Goetz.
>>>>>>>


More information about the hotspot-runtime-dev mailing list