RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code

David Holmes david.holmes at oracle.com
Tue Jul 10 00:14:44 UTC 2018


Thanks for looking at this Coleen.

David

On 10/07/2018 10:11 AM, coleen.phillimore at oracle.com wrote:
> 
> This looks good!  Thank you for fixing these failures.
> Coleen
> 
> On 7/9/18 7:22 PM, David Holmes wrote:
>> Adding back runtime
>>
>> On 10/07/2018 8:45 AM, Alex Menkov wrote:
>>> +1
>>
>> Thanks for looking at this Alex!
>>
>>> couple minor notes (no need to resend review)
>>
>> Webrev updated in place (v3) for others to see.
>>
>>> src/hotspot/os/linux/os_linux.cpp
>>> please replace
>>>
>>> 5581     }
>>> 5582     else {
>>>
>>> with
>>>      } else {
>>
>> Done.
>>
>>>
>>> test/hotspot/jtreg/runtime/jni/terminatedThread/libterminatedThread.c
>>> please fix error reporting (I suppose you mean "TEST ERROR: 
>>> pthread_create failed"/"TEST ERROR: pthread_join failed"):
>>>
>>>    85   if ((res = pthread_create(&thread, NULL, thread_start, NULL)) 
>>> != 0) {
>>>    86     fprintf(stderr, "TEST ERROR: pthread_created failed: %s 
>>> (%d)\n", strerror(res), res);
>>>    87     exit(1);
>>>    88   }
>>>    89
>>>    90   if ((res = pthread_join(thread, NULL)) != 0) {
>>>    91     fprintf(stderr, "TEST ERROR: pthread_created failed: %s 
>>> (%d)\n", strerror(res), res);
>>>    92     exit(1);
>>>    93   }
>>
>> Fixed - well spotted!
>>
>> Thanks,
>> David
>>
>>> --alex
>>>
>>> On 07/09/2018 15:17, David Holmes wrote:
>>>> Thanks Chris!
>>>>
>>>> Can I please get a second review.
>>>>
>>>> David
>>>>
>>>> On 10/07/2018 7:50 AM, Chris Plummer wrote:
>>>>> On 7/9/18 2:41 PM, David Holmes wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> On 10/07/2018 4:22 AM, Chris Plummer wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Would it be better to problem list this test on solaris using 
>>>>>>> JDK-8156708. That way when JDK-8156708 is fixed it can come off 
>>>>>>> the problem list and start executing on solaris.
>>>>>>
>>>>>> JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could 
>>>>>> only fix this for VM created threads. The general problem of TLS 
>>>>>> destructors looping if a thread terminates without detaching from 
>>>>>> the VM is not solvable - other than by not using TLS in the VM.
>>>>> Ok, I misunderstood your comments in the test.
>>>>>
>>>>> Changes look fine.
>>>>>
>>>>> Chris
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 7/8/18 4:58 PM, David Holmes wrote:
>>>>>>>> tl;dr skip the new regression test on Solaris
>>>>>>>>
>>>>>>>> New webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>>>>>>>
>>>>>>>> This excludes the test from running on Solaris, so the makefile 
>>>>>>>> doesn't bother compiling this native test and the Java part of 
>>>>>>>> the test adds:
>>>>>>>>
>>>>>>>> ! * @requires os.family != "windows" & os.family != "solaris"
>>>>>>>>   * @summary Basic test of Thread and ThreadMXBean queries on a 
>>>>>>>> natively
>>>>>>>>   *          attached thread that has failed to detach before 
>>>>>>>> terminating.
>>>>>>>> + * @comment The native code only supports POSIX so no windows 
>>>>>>>> testing; also
>>>>>>>> + *          we have to skip solaris as a terminating thread 
>>>>>>>> that fails to
>>>>>>>> + *          detach will hit an infinite loop due to TLS 
>>>>>>>> destructor issues - see
>>>>>>>> + *          comments in JDK-8156708
>>>>>>>>
>>>>>>>> Note this means that Solaris is not affected by the original 
>>>>>>>> issue because a still-attached native thread can't actually 
>>>>>>>> terminate due to the TLS destructor infinite-loop issue.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>>>>>>>> <sigh> The new test is hanging on Solaris. I just discovered we 
>>>>>>>>> don't run these tests on Solaris until tier4.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>>>>>>>> Hi Chris,
>>>>>>>>>>
>>>>>>>>>> Thanks for looking at this.
>>>>>>>>>>
>>>>>>>>>> Updated webrev:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>>>>>>>
>>>>>>>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>>>>>>>
>>>>>>>>>> More below ...
>>>>>>>>>>
>>>>>>>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>> Solaris problems aside, overall it looks fine. Some minor 
>>>>>>>>>>> things I noted:
>>>>>>>>>>>
>>>>>>>>>>> I noticed that exitCode is never modified in agentA() or 
>>>>>>>>>>> agentB(), so there isn't much point to having it. If you 
>>>>>>>>>>> reach the bottom of the function, it passed, so PASSED can be 
>>>>>>>>>>> returned. The code would be more clear if it did this. As-is 
>>>>>>>>>>> it is implied that you can reach the bottom when it fails.
>>>>>>>>>>
>>>>>>>>>> I resisted any and all urges to do any kind of unrelated code 
>>>>>>>>>> cleanup in the tests - once you start you may end up doing a 
>>>>>>>>>> full rewrite.
>>>>>>>>>>
>>>>>>>>>>> Is detaching the threads along the failure paths really 
>>>>>>>>>>> needed? exit() is called, so this would seem to make it 
>>>>>>>>>>> unnecessary.
>>>>>>>>>>
>>>>>>>>>> You're right that isn't necessary. I'll remove the changes 
>>>>>>>>>> from before the exits in ji05t001.c
>>>>>>>>>>
>>>>>>>>>>> I prefer assignments not to be embedded inside the "if" 
>>>>>>>>>>> condition. The DetachCurrentThread code in THREAD_return() is 
>>>>>>>>>>> much more readable than the similar code in agentA() and 
>>>>>>>>>>> agentB().
>>>>>>>>>>
>>>>>>>>>> It's an existing style already used in that test e.g.
>>>>>>>>>>
>>>>>>>>>>   287     if ((res =
>>>>>>>>>>   288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>>>>>>>>   289                 JNI_ENV_ARG(vm, (void **) &env), (void 
>>>>>>>>>> *) 0)) != 0) {
>>>>>>>>>>
>>>>>>>>>> and I don't mind it, so I'd prefer not to change it.
>>>>>>>>>>
>>>>>>>>>>> In the test:
>>>>>>>>>>>
>>>>>>>>>>>    54         // Generally as long as we don't crash of throw 
>>>>>>>>>>> unexpected
>>>>>>>>>>>    55         // exceptions then the test passes. In some 
>>>>>>>>>>> cases we know exactly
>>>>>>>>>>>
>>>>>>>>>>> "of" should be "or".
>>>>>>>>>>
>>>>>>>>>> Well spotted. Thanks.
>>>>>>>>>>
>>>>>>>>>>> Shouldn't you be catching exceptions for all the Thread 
>>>>>>>>>>> methods you are calling? Otherwise the test will exit if one 
>>>>>>>>>>> is thrown, and the above comment indicates that you don't 
>>>>>>>>>>> want this.
>>>>>>>>>>
>>>>>>>>>> I'm not expecting there to be any exceptions from any of the 
>>>>>>>>>> called methods. That would potentially indicate a problem in 
>>>>>>>>>> handling the terminated native thread, so would indicate a 
>>>>>>>>>> test failure.
>>>>>>>>>>
>>>>>>>>>>> Don't we normally put these tests in a package?
>>>>>>>>>>
>>>>>>>>>> Doesn't seem to be any hard and fast rule. I only uses 
>>>>>>>>>> packages when they are important for the test. In runtime we 
>>>>>>>>>> have 905 java files and only 116 have a package statement. It 
>>>>>>>>>> varies elsewhere.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>>>>>>>> <sigh> Solaris compiler complains about doing a return from 
>>>>>>>>>>>> inside a do-while loop. I'll have to rework part of the fix 
>>>>>>>>>>>> tomorrow.
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Problem:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The tests create native threads that attach to the VM 
>>>>>>>>>>>>> through JNI_AttachCurrentThread but which then terminate 
>>>>>>>>>>>>> without detaching themselves. When the VM exits and we're 
>>>>>>>>>>>>> using Flight Recorder "dumponexit" this leads to a call to 
>>>>>>>>>>>>> VM_PrintThreads that in part wants to print the per-thread 
>>>>>>>>>>>>> CPU usage. When we encounter the threads that have 
>>>>>>>>>>>>> terminated already the low level pthread_getcpuclockid 
>>>>>>>>>>>>> calls returns ESRCH but the code doesn't expect that and so 
>>>>>>>>>>>>> fails an assert in debug mode and can SEGV in product mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Solution:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Serviceability-side: fix the tests
>>>>>>>>>>>>>
>>>>>>>>>>>>> Change the tests so that the threads detach before 
>>>>>>>>>>>>> terminating. The two tests are (surprisingly) written in 
>>>>>>>>>>>>> completely different styles, so the solution also takes on 
>>>>>>>>>>>>> two different styles.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Runtime-side: make the VM more robust in the fact of JNI 
>>>>>>>>>>>>> attached threads that terminate before detaching, and add a 
>>>>>>>>>>>>> regression test
>>>>>>>>>>>>>
>>>>>>>>>>>>> I took a good look at the low-level code for interacting 
>>>>>>>>>>>>> with arbitrary threads and as far as I can see the problem 
>>>>>>>>>>>>> only exists for this one case of pthread_getcpuclockid on 
>>>>>>>>>>>>> Linux. Elsewhere the potential for a library call failure 
>>>>>>>>>>>>> just reports an error value (such as -1 for the cpu time 
>>>>>>>>>>>>> used).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the fix is simply to allow for ESRCH when calling 
>>>>>>>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in 
>>>>>>>>>>>>> that case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I created a new regression test to create a new native 
>>>>>>>>>>>>> thread, attach it and then let it terminate while still 
>>>>>>>>>>>>> attached. The java code then calls various Thread and 
>>>>>>>>>>>>> ThreadMXBean functions on it to ensure there are no crashes 
>>>>>>>>>>>>> or unexpected exceptions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Testing:
>>>>>>>>>>>>>   - old tests with fixed run-time
>>>>>>>>>>>>>   - old run-time with fixed tests
>>>>>>>>>>>>>   - mach tier4 (which exposed the problem - that's where we 
>>>>>>>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>>>>>>>>   - mach5 tier 1-3 for good measure [in progress]
>>>>>>>>>>>>>   - new regression test
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
> 


More information about the serviceability-dev mailing list