RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code

David Holmes david.holmes at oracle.com
Sun Jul 8 23:58:32 UTC 2018


tl;dr skip the new regression test on Solaris

New webrev:

http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/

This excludes the test from running on Solaris, so the makefile doesn't 
bother compiling this native test and the Java part of the test adds:

! * @requires os.family != "windows" & os.family != "solaris"
   * @summary Basic test of Thread and ThreadMXBean queries on a natively
   *          attached thread that has failed to detach before terminating.
+ * @comment The native code only supports POSIX so no windows testing; also
+ *          we have to skip solaris as a terminating thread that fails to
+ *          detach will hit an infinite loop due to TLS destructor 
issues - see
+ *          comments in JDK-8156708

Note this means that Solaris is not affected by the original issue 
because a still-attached native thread can't actually terminate due to 
the TLS destructor infinite-loop issue.

Thanks,
David

On 6/07/2018 6:07 PM, David Holmes wrote:
> <sigh> The new test is hanging on Solaris. I just discovered we don't 
> run these tests on Solaris until tier4.
> 
> David
> 
> On 6/07/2018 8:40 AM, David Holmes wrote:
>> Hi Chris,
>>
>> Thanks for looking at this.
>>
>> Updated webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>
>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>
>> More below ...
>>
>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>> Hi David,
>>>
>>> Solaris problems aside, overall it looks fine. Some minor things I 
>>> noted:
>>>
>>> I noticed that exitCode is never modified in agentA() or agentB(), so 
>>> there isn't much point to having it. If you reach the bottom of the 
>>> function, it passed, so PASSED can be returned. The code would be 
>>> more clear if it did this. As-is it is implied that you can reach the 
>>> bottom when it fails.
>>
>> I resisted any and all urges to do any kind of unrelated code cleanup 
>> in the tests - once you start you may end up doing a full rewrite.
>>
>>> Is detaching the threads along the failure paths really needed? 
>>> exit() is called, so this would seem to make it unnecessary.
>>
>> You're right that isn't necessary. I'll remove the changes from before 
>> the exits in ji05t001.c
>>
>>> I prefer assignments not to be embedded inside the "if" condition. 
>>> The DetachCurrentThread code in THREAD_return() is much more readable 
>>> than the similar code in agentA() and agentB().
>>
>> It's an existing style already used in that test e.g.
>>
>>   287     if ((res =
>>   288             JNI_ENV_PTR(vm)->AttachCurrentThread(
>>   289                 JNI_ENV_ARG(vm, (void **) &env), (void *) 0)) != 
>> 0) {
>>
>> and I don't mind it, so I'd prefer not to change it.
>>
>>> In the test:
>>>
>>>    54         // Generally as long as we don't crash of throw unexpected
>>>    55         // exceptions then the test passes. In some cases we 
>>> know exactly
>>>
>>> "of" should be "or".
>>
>> Well spotted. Thanks.
>>
>>> Shouldn't you be catching exceptions for all the Thread methods you 
>>> are calling? Otherwise the test will exit if one is thrown, and the 
>>> above comment indicates that you don't want this.
>>
>> I'm not expecting there to be any exceptions from any of the called 
>> methods. That would potentially indicate a problem in handling the 
>> terminated native thread, so would indicate a test failure.
>>
>>> Don't we normally put these tests in a package?
>>
>> Doesn't seem to be any hard and fast rule. I only uses packages when 
>> they are important for the test. In runtime we have 905 java files and 
>> only 116 have a package statement. It varies elsewhere.
>>
>> Thanks,
>> David
>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>> <sigh> Solaris compiler complains about doing a return from inside a 
>>>> do-while loop. I'll have to rework part of the fix tomorrow.
>>>>
>>>> David
>>>>
>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>
>>>>> Problem:
>>>>>
>>>>> The tests create native threads that attach to the VM through 
>>>>> JNI_AttachCurrentThread but which then terminate without detaching 
>>>>> themselves. When the VM exits and we're using Flight Recorder 
>>>>> "dumponexit" this leads to a call to VM_PrintThreads that in part 
>>>>> wants to print the per-thread CPU usage. When we encounter the 
>>>>> threads that have terminated already the low level 
>>>>> pthread_getcpuclockid calls returns ESRCH but the code doesn't 
>>>>> expect that and so fails an assert in debug mode and can SEGV in 
>>>>> product mode.
>>>>>
>>>>> Solution:
>>>>>
>>>>> Serviceability-side: fix the tests
>>>>>
>>>>> Change the tests so that the threads detach before terminating. The 
>>>>> two tests are (surprisingly) written in completely different 
>>>>> styles, so the solution also takes on two different styles.
>>>>>
>>>>> Runtime-side: make the VM more robust in the fact of JNI attached 
>>>>> threads that terminate before detaching, and add a regression test
>>>>>
>>>>> I took a good look at the low-level code for interacting with 
>>>>> arbitrary threads and as far as I can see the problem only exists 
>>>>> for this one case of pthread_getcpuclockid on Linux. Elsewhere the 
>>>>> potential for a library call failure just reports an error value 
>>>>> (such as -1 for the cpu time used).
>>>>>
>>>>> So the fix is simply to allow for ESRCH when calling 
>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that case.
>>>>>
>>>>> I created a new regression test to create a new native thread, 
>>>>> attach it and then let it terminate while still attached. The java 
>>>>> code then calls various Thread and ThreadMXBean functions on it to 
>>>>> ensure there are no crashes or unexpected exceptions.
>>>>>
>>>>> Testing:
>>>>>   - old tests with fixed run-time
>>>>>   - old run-time with fixed tests
>>>>>   - mach tier4 (which exposed the problem - that's where we enable 
>>>>> Flight recorder for the tests) [in progress]
>>>>>   - mach5 tier 1-3 for good measure [in progress]
>>>>>   - new regression test
>>>>>
>>>>> Thanks,
>>>>> David
>>>
>>>
>>>


More information about the serviceability-dev mailing list