RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Tue Jul 10 00:11:34 UTC 2018


This looks good!  Thank you for fixing these failures.
Coleen

On 7/9/18 7:22 PM, David Holmes wrote:
> Adding back runtime
>
> On 10/07/2018 8:45 AM, Alex Menkov wrote:
>> +1
>
> Thanks for looking at this Alex!
>
>> couple minor notes (no need to resend review)
>
> Webrev updated in place (v3) for others to see.
>
>> src/hotspot/os/linux/os_linux.cpp
>> please replace
>>
>> 5581     }
>> 5582     else {
>>
>> with
>>      } else {
>
> Done.
>
>>
>> test/hotspot/jtreg/runtime/jni/terminatedThread/libterminatedThread.c
>> please fix error reporting (I suppose you mean "TEST ERROR: 
>> pthread_create failed"/"TEST ERROR: pthread_join failed"):
>>
>>    85   if ((res = pthread_create(&thread, NULL, thread_start, NULL)) 
>> != 0) {
>>    86     fprintf(stderr, "TEST ERROR: pthread_created failed: %s 
>> (%d)\n", strerror(res), res);
>>    87     exit(1);
>>    88   }
>>    89
>>    90   if ((res = pthread_join(thread, NULL)) != 0) {
>>    91     fprintf(stderr, "TEST ERROR: pthread_created failed: %s 
>> (%d)\n", strerror(res), res);
>>    92     exit(1);
>>    93   }
>
> Fixed - well spotted!
>
> Thanks,
> David
>
>> --alex
>>
>> On 07/09/2018 15:17, David Holmes wrote:
>>> Thanks Chris!
>>>
>>> Can I please get a second review.
>>>
>>> David
>>>
>>> On 10/07/2018 7:50 AM, Chris Plummer wrote:
>>>> On 7/9/18 2:41 PM, David Holmes wrote:
>>>>> Hi Chris,
>>>>>
>>>>> On 10/07/2018 4:22 AM, Chris Plummer wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Would it be better to problem list this test on solaris using 
>>>>>> JDK-8156708. That way when JDK-8156708 is fixed it can come off 
>>>>>> the problem list and start executing on solaris.
>>>>>
>>>>> JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could 
>>>>> only fix this for VM created threads. The general problem of TLS 
>>>>> destructors looping if a thread terminates without detaching from 
>>>>> the VM is not solvable - other than by not using TLS in the VM.
>>>> Ok, I misunderstood your comments in the test.
>>>>
>>>> Changes look fine.
>>>>
>>>> Chris
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 7/8/18 4:58 PM, David Holmes wrote:
>>>>>>> tl;dr skip the new regression test on Solaris
>>>>>>>
>>>>>>> New webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>>>>>>
>>>>>>> This excludes the test from running on Solaris, so the makefile 
>>>>>>> doesn't bother compiling this native test and the Java part of 
>>>>>>> the test adds:
>>>>>>>
>>>>>>> ! * @requires os.family != "windows" & os.family != "solaris"
>>>>>>>   * @summary Basic test of Thread and ThreadMXBean queries on a 
>>>>>>> natively
>>>>>>>   *          attached thread that has failed to detach before 
>>>>>>> terminating.
>>>>>>> + * @comment The native code only supports POSIX so no windows 
>>>>>>> testing; also
>>>>>>> + *          we have to skip solaris as a terminating thread 
>>>>>>> that fails to
>>>>>>> + *          detach will hit an infinite loop due to TLS 
>>>>>>> destructor issues - see
>>>>>>> + *          comments in JDK-8156708
>>>>>>>
>>>>>>> Note this means that Solaris is not affected by the original 
>>>>>>> issue because a still-attached native thread can't actually 
>>>>>>> terminate due to the TLS destructor infinite-loop issue.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>>>>>>> <sigh> The new test is hanging on Solaris. I just discovered we 
>>>>>>>> don't run these tests on Solaris until tier4.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> Thanks for looking at this.
>>>>>>>>>
>>>>>>>>> Updated webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>>>>>>
>>>>>>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>>>>>>
>>>>>>>>> More below ...
>>>>>>>>>
>>>>>>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> Solaris problems aside, overall it looks fine. Some minor 
>>>>>>>>>> things I noted:
>>>>>>>>>>
>>>>>>>>>> I noticed that exitCode is never modified in agentA() or 
>>>>>>>>>> agentB(), so there isn't much point to having it. If you 
>>>>>>>>>> reach the bottom of the function, it passed, so PASSED can be 
>>>>>>>>>> returned. The code would be more clear if it did this. As-is 
>>>>>>>>>> it is implied that you can reach the bottom when it fails.
>>>>>>>>>
>>>>>>>>> I resisted any and all urges to do any kind of unrelated code 
>>>>>>>>> cleanup in the tests - once you start you may end up doing a 
>>>>>>>>> full rewrite.
>>>>>>>>>
>>>>>>>>>> Is detaching the threads along the failure paths really 
>>>>>>>>>> needed? exit() is called, so this would seem to make it 
>>>>>>>>>> unnecessary.
>>>>>>>>>
>>>>>>>>> You're right that isn't necessary. I'll remove the changes 
>>>>>>>>> from before the exits in ji05t001.c
>>>>>>>>>
>>>>>>>>>> I prefer assignments not to be embedded inside the "if" 
>>>>>>>>>> condition. The DetachCurrentThread code in THREAD_return() is 
>>>>>>>>>> much more readable than the similar code in agentA() and 
>>>>>>>>>> agentB().
>>>>>>>>>
>>>>>>>>> It's an existing style already used in that test e.g.
>>>>>>>>>
>>>>>>>>>   287     if ((res =
>>>>>>>>>   288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>>>>>>>   289                 JNI_ENV_ARG(vm, (void **) &env), (void 
>>>>>>>>> *) 0)) != 0) {
>>>>>>>>>
>>>>>>>>> and I don't mind it, so I'd prefer not to change it.
>>>>>>>>>
>>>>>>>>>> In the test:
>>>>>>>>>>
>>>>>>>>>>    54         // Generally as long as we don't crash of throw 
>>>>>>>>>> unexpected
>>>>>>>>>>    55         // exceptions then the test passes. In some 
>>>>>>>>>> cases we know exactly
>>>>>>>>>>
>>>>>>>>>> "of" should be "or".
>>>>>>>>>
>>>>>>>>> Well spotted. Thanks.
>>>>>>>>>
>>>>>>>>>> Shouldn't you be catching exceptions for all the Thread 
>>>>>>>>>> methods you are calling? Otherwise the test will exit if one 
>>>>>>>>>> is thrown, and the above comment indicates that you don't 
>>>>>>>>>> want this.
>>>>>>>>>
>>>>>>>>> I'm not expecting there to be any exceptions from any of the 
>>>>>>>>> called methods. That would potentially indicate a problem in 
>>>>>>>>> handling the terminated native thread, so would indicate a 
>>>>>>>>> test failure.
>>>>>>>>>
>>>>>>>>>> Don't we normally put these tests in a package?
>>>>>>>>>
>>>>>>>>> Doesn't seem to be any hard and fast rule. I only uses 
>>>>>>>>> packages when they are important for the test. In runtime we 
>>>>>>>>> have 905 java files and only 116 have a package statement. It 
>>>>>>>>> varies elsewhere.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>>>>>>> <sigh> Solaris compiler complains about doing a return from 
>>>>>>>>>>> inside a do-while loop. I'll have to rework part of the fix 
>>>>>>>>>>> tomorrow.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>>>>>>
>>>>>>>>>>>> Problem:
>>>>>>>>>>>>
>>>>>>>>>>>> The tests create native threads that attach to the VM 
>>>>>>>>>>>> through JNI_AttachCurrentThread but which then terminate 
>>>>>>>>>>>> without detaching themselves. When the VM exits and we're 
>>>>>>>>>>>> using Flight Recorder "dumponexit" this leads to a call to 
>>>>>>>>>>>> VM_PrintThreads that in part wants to print the per-thread 
>>>>>>>>>>>> CPU usage. When we encounter the threads that have 
>>>>>>>>>>>> terminated already the low level pthread_getcpuclockid 
>>>>>>>>>>>> calls returns ESRCH but the code doesn't expect that and so 
>>>>>>>>>>>> fails an assert in debug mode and can SEGV in product mode.
>>>>>>>>>>>>
>>>>>>>>>>>> Solution:
>>>>>>>>>>>>
>>>>>>>>>>>> Serviceability-side: fix the tests
>>>>>>>>>>>>
>>>>>>>>>>>> Change the tests so that the threads detach before 
>>>>>>>>>>>> terminating. The two tests are (surprisingly) written in 
>>>>>>>>>>>> completely different styles, so the solution also takes on 
>>>>>>>>>>>> two different styles.
>>>>>>>>>>>>
>>>>>>>>>>>> Runtime-side: make the VM more robust in the fact of JNI 
>>>>>>>>>>>> attached threads that terminate before detaching, and add a 
>>>>>>>>>>>> regression test
>>>>>>>>>>>>
>>>>>>>>>>>> I took a good look at the low-level code for interacting 
>>>>>>>>>>>> with arbitrary threads and as far as I can see the problem 
>>>>>>>>>>>> only exists for this one case of pthread_getcpuclockid on 
>>>>>>>>>>>> Linux. Elsewhere the potential for a library call failure 
>>>>>>>>>>>> just reports an error value (such as -1 for the cpu time 
>>>>>>>>>>>> used).
>>>>>>>>>>>>
>>>>>>>>>>>> So the fix is simply to allow for ESRCH when calling 
>>>>>>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in 
>>>>>>>>>>>> that case.
>>>>>>>>>>>>
>>>>>>>>>>>> I created a new regression test to create a new native 
>>>>>>>>>>>> thread, attach it and then let it terminate while still 
>>>>>>>>>>>> attached. The java code then calls various Thread and 
>>>>>>>>>>>> ThreadMXBean functions on it to ensure there are no crashes 
>>>>>>>>>>>> or unexpected exceptions.
>>>>>>>>>>>>
>>>>>>>>>>>> Testing:
>>>>>>>>>>>>   - old tests with fixed run-time
>>>>>>>>>>>>   - old run-time with fixed tests
>>>>>>>>>>>>   - mach tier4 (which exposed the problem - that's where we 
>>>>>>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>>>>>>>   - mach5 tier 1-3 for good measure [in progress]
>>>>>>>>>>>>   - new regression test
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>



More information about the hotspot-runtime-dev mailing list