RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

Doerr, Martin martin.doerr at sap.com
Thu Oct 11 14:37:06 UTC 2018


Hi Götz,

thanks for the fix. Looks good to me, too.
It's a little unfortunate that we can't verify that the bean eventually returns -1, but ok.
The test still seems to fulfill its purpose, so I'm fine with it.

Best regards,
Martin


-----Original Message-----
From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net> On Behalf Of David Holmes
Sent: Donnerstag, 11. Oktober 2018 14:34
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(S): 8211932: [ppc][testbug] runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads don't terminate immediately

On 11/10/2018 10:12 PM, Lindenmaier, Goetz wrote:
>> Something like that yes. :)
> Can I consider this a review ? :))

Yes :)

Thanks,
David

> Best regards,
>    Goetz.
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Donnerstag, 11. Oktober 2018 13:53
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as threads
>> don't terminate immediately
>>
>> On 11/10/2018 9:23 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>>> I may just have to kill off this part of the test
>>> You mean we should skip the tests for -1? Like this:
>>>      http://cr.openjdk.java.net/~goetz/wr18/8211931-terminatedThrd/02/
>>> ?
>>
>> Something like that yes. :) The main thing this test is doing is
>> ensuring we don't crash when we encounter these
>> terminated-but-still-attached threads.
>>
>> Thanks,
>> David
>>
>>> Best regards,
>>>     Goetz.
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Donnerstag, 11. Oktober 2018 08:03
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net
>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
>> threads
>>>> don't terminate immediately
>>>>
>>>> Hi Goetz,
>>>>
>>>> On 11/10/2018 1:01 AM, Lindenmaier, Goetz wrote:
>>>>> Hi David,
>>>>>
>>>>> I implemented your little experiment, and did 4 runs with my fix.
>>>>> I copied you the relevant output here:
>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>>>> terminatedThrd/01/with_my_fix.txt
>>>>>
>>>>> Your code completes in one loop.
>>>>>    From my output you can see that the CPU time is increasing a little, but
>>>>> after 3-4 iterations the thread goes away.
>>>>>
>>>>> I also did 4 runs without my fix:
>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>>>> terminatedThrd/01/without_my_fix.txt
>>>>> I got 3 failures, one pass.
>>>>> Also here, your code completes in one loop.
>>>>
>>>> Many thanks for doing that. This is so perplexing. While adding the
>>>> extra loops as per your fix may have solved your problem, it will make
>>>> the recycled thread-id problem that we have seen in stress testing even
>>>> more likely.
>>>>
>>>> I may just have to kill off this part of the test. It's wasting too many
>>>> cycles just to try and check we are graceful went encountering a
>>>> terminated unattached thread.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Mittwoch, 10. Oktober 2018 14:32
>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> runtime-
>>>>>> dev at openjdk.java.net
>>>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>>>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
>>>> threads
>>>>>> don't terminate immediately
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> On 10/10/2018 8:25 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> This failure is very well reproducible, but only on linuxppc64 and
>>>>>> linuxppc64le.
>>>>>>
>>>>>> That doesn't really make sense to me. I would not expect the
>>>>>> process/thread lifecycle management code to be different based on
>> the
>>>>>> CPU involved. This should be a simple kernel + NPTL/libc issue.
>>>>>>
>>>>>>> I implemented this fix in July, just missed the RDP, and the patch is
>> used
>>>>>>> in our nightly builds since then. Since that date I don't see a single
>>>>>>> failure.  We run these nightly tests with the fastdebug build, though.
>>>>>>> But linuxx86_64, linuxs390x don't show the issue, nor all the other
>>>>>>> platforms.  As there is no special high load, and because it's that
>>>>>>> well reproducible, I don't think I read the information of a thread of
>>>>>> another
>>>>>>> process with the same thread id.
>>>>>>> With the output I implemented in the test, I see that the cpu time
>> keeps
>>>>>>> increasing a bit, then it's stable for a few iterations, and then -1.
>>>>>>
>>>>>> That can also be explained by a thread-id being recycled and then the
>>>>>> new thread also terminating. Granted the timing and reproducibility
>>>>>> makes that unlikely.
>>>>>>
>>>>>> This is quite bizarre and I don't like bizarre. :)
>>>>>>
>>>>>> Are you able to apply this patch to the test and run some tests on ppc?
>>>>>>
>>>>>>       if ((res = pthread_join(thread, NULL)) != 0) {
>>>>>>         fprintf(stderr, "TEST ERROR: pthread_join failed: %s (%d)\n",
>>>>>> strerror(res), res);
>>>>>>         exit(1);
>>>>>>       }
>>>>>>
>>>>>> +  while (pthread_kill(thread, 0) == 0) {
>>>>>> +    res++;
>>>>>> +  }
>>>>>> +  printf("Native thread was gone after %d iterations\n", res);
>>>>>>       return nativeThread;
>>>>>> }
>>>>>>
>>>>>> Once pthread_kill gives ESRCH then so should
>> pthread_get_cpuclockid().
>>>>>> At least until the thread-id is recycled.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Best regards,
>>>>>>>       Goetz.
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>> Sent: Mittwoch, 10. Oktober 2018 01:22
>>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>> runtime-
>>>>>>>> dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR(S): 8211932: [ppc][testbug]
>>>>>>>> runtime/jni/terminatedThread/TestTerminatedThread.java fails as
>>>>>> threads
>>>>>>>> don't terminate immediately
>>>>>>>>
>>>>>>>> Hi Goetz,
>>>>>>>>
>>>>>>>> There is already an open bug for this issue - JDK-8208159 - but it has
>>>>>>>> only reproduced in a stress environment where we think thread-id's
>>>> are
>>>>>>>> being recycled (which means waiting longer won't help). This should
>> be
>>>>>>>> OS not CPU specific so I'm very interested to know in what
>>>> circumstances
>>>>>>>> you see this failure.
>>>>>>>>
>>>>>>>> I created an instrumented version of the test that did a pthread_kill
>> on
>>>>>>>> the target to check for ESRCH - which it got - yet we still see failures
>>>>>>>> in those stress environments.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 10/10/2018 1:10 AM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On ppc, one still sees increasing thread cpu times after a thread has
>>>>>> joined.
>>>>>>>>> This makes TestTerminatedThread fail.
>>>>>>>>>
>>>>>>>>> This change gives the check a few seconds to wait until the thread
>>>>>>>> disappears.
>>>>>>>>> Please review.
>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr18/8211931-
>>>>>>>>
>>>>>>
>>>>
>> terminatedThrd/01/test/hotspot/jtreg/runtime/jni/terminatedThread/TestT
>>>>>>>> erminatedThread.java.udiff.html
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>        Goetz.
>>>>>>>>>


More information about the hotspot-runtime-dev mailing list