RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
David Holmes
david.holmes at oracle.com
Mon Jul 9 23:22:21 UTC 2018
Adding back runtime
On 10/07/2018 8:45 AM, Alex Menkov wrote:
> +1
Thanks for looking at this Alex!
> couple minor notes (no need to resend review)
Webrev updated in place (v3) for others to see.
> src/hotspot/os/linux/os_linux.cpp
> please replace
>
> 5581 }
> 5582 else {
>
> with
> } else {
Done.
>
> test/hotspot/jtreg/runtime/jni/terminatedThread/libterminatedThread.c
> please fix error reporting (I suppose you mean "TEST ERROR:
> pthread_create failed"/"TEST ERROR: pthread_join failed"):
>
> 85 if ((res = pthread_create(&thread, NULL, thread_start, NULL)) !=
> 0) {
> 86 fprintf(stderr, "TEST ERROR: pthread_created failed: %s
> (%d)\n", strerror(res), res);
> 87 exit(1);
> 88 }
> 89
> 90 if ((res = pthread_join(thread, NULL)) != 0) {
> 91 fprintf(stderr, "TEST ERROR: pthread_created failed: %s
> (%d)\n", strerror(res), res);
> 92 exit(1);
> 93 }
Fixed - well spotted!
Thanks,
David
> --alex
>
> On 07/09/2018 15:17, David Holmes wrote:
>> Thanks Chris!
>>
>> Can I please get a second review.
>>
>> David
>>
>> On 10/07/2018 7:50 AM, Chris Plummer wrote:
>>> On 7/9/18 2:41 PM, David Holmes wrote:
>>>> Hi Chris,
>>>>
>>>> On 10/07/2018 4:22 AM, Chris Plummer wrote:
>>>>> Hi David,
>>>>>
>>>>> Would it be better to problem list this test on solaris using
>>>>> JDK-8156708. That way when JDK-8156708 is fixed it can come off the
>>>>> problem list and start executing on solaris.
>>>>
>>>> JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could
>>>> only fix this for VM created threads. The general problem of TLS
>>>> destructors looping if a thread terminates without detaching from
>>>> the VM is not solvable - other than by not using TLS in the VM.
>>> Ok, I misunderstood your comments in the test.
>>>
>>> Changes look fine.
>>>
>>> Chris
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>> On 7/8/18 4:58 PM, David Holmes wrote:
>>>>>> tl;dr skip the new regression test on Solaris
>>>>>>
>>>>>> New webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>>>>>
>>>>>> This excludes the test from running on Solaris, so the makefile
>>>>>> doesn't bother compiling this native test and the Java part of the
>>>>>> test adds:
>>>>>>
>>>>>> ! * @requires os.family != "windows" & os.family != "solaris"
>>>>>> * @summary Basic test of Thread and ThreadMXBean queries on a
>>>>>> natively
>>>>>> * attached thread that has failed to detach before
>>>>>> terminating.
>>>>>> + * @comment The native code only supports POSIX so no windows
>>>>>> testing; also
>>>>>> + * we have to skip solaris as a terminating thread that
>>>>>> fails to
>>>>>> + * detach will hit an infinite loop due to TLS
>>>>>> destructor issues - see
>>>>>> + * comments in JDK-8156708
>>>>>>
>>>>>> Note this means that Solaris is not affected by the original issue
>>>>>> because a still-attached native thread can't actually terminate
>>>>>> due to the TLS destructor infinite-loop issue.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>>>>>> <sigh> The new test is hanging on Solaris. I just discovered we
>>>>>>> don't run these tests on Solaris until tier4.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> Thanks for looking at this.
>>>>>>>>
>>>>>>>> Updated webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>>>>>
>>>>>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>>>>>
>>>>>>>> More below ...
>>>>>>>>
>>>>>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> Solaris problems aside, overall it looks fine. Some minor
>>>>>>>>> things I noted:
>>>>>>>>>
>>>>>>>>> I noticed that exitCode is never modified in agentA() or
>>>>>>>>> agentB(), so there isn't much point to having it. If you reach
>>>>>>>>> the bottom of the function, it passed, so PASSED can be
>>>>>>>>> returned. The code would be more clear if it did this. As-is it
>>>>>>>>> is implied that you can reach the bottom when it fails.
>>>>>>>>
>>>>>>>> I resisted any and all urges to do any kind of unrelated code
>>>>>>>> cleanup in the tests - once you start you may end up doing a
>>>>>>>> full rewrite.
>>>>>>>>
>>>>>>>>> Is detaching the threads along the failure paths really needed?
>>>>>>>>> exit() is called, so this would seem to make it unnecessary.
>>>>>>>>
>>>>>>>> You're right that isn't necessary. I'll remove the changes from
>>>>>>>> before the exits in ji05t001.c
>>>>>>>>
>>>>>>>>> I prefer assignments not to be embedded inside the "if"
>>>>>>>>> condition. The DetachCurrentThread code in THREAD_return() is
>>>>>>>>> much more readable than the similar code in agentA() and agentB().
>>>>>>>>
>>>>>>>> It's an existing style already used in that test e.g.
>>>>>>>>
>>>>>>>> 287 if ((res =
>>>>>>>> 288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>>>>>> 289 JNI_ENV_ARG(vm, (void **) &env), (void *)
>>>>>>>> 0)) != 0) {
>>>>>>>>
>>>>>>>> and I don't mind it, so I'd prefer not to change it.
>>>>>>>>
>>>>>>>>> In the test:
>>>>>>>>>
>>>>>>>>> 54 // Generally as long as we don't crash of throw
>>>>>>>>> unexpected
>>>>>>>>> 55 // exceptions then the test passes. In some cases
>>>>>>>>> we know exactly
>>>>>>>>>
>>>>>>>>> "of" should be "or".
>>>>>>>>
>>>>>>>> Well spotted. Thanks.
>>>>>>>>
>>>>>>>>> Shouldn't you be catching exceptions for all the Thread methods
>>>>>>>>> you are calling? Otherwise the test will exit if one is thrown,
>>>>>>>>> and the above comment indicates that you don't want this.
>>>>>>>>
>>>>>>>> I'm not expecting there to be any exceptions from any of the
>>>>>>>> called methods. That would potentially indicate a problem in
>>>>>>>> handling the terminated native thread, so would indicate a test
>>>>>>>> failure.
>>>>>>>>
>>>>>>>>> Don't we normally put these tests in a package?
>>>>>>>>
>>>>>>>> Doesn't seem to be any hard and fast rule. I only uses packages
>>>>>>>> when they are important for the test. In runtime we have 905
>>>>>>>> java files and only 116 have a package statement. It varies
>>>>>>>> elsewhere.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>>>>>> <sigh> Solaris compiler complains about doing a return from
>>>>>>>>>> inside a do-while loop. I'll have to rework part of the fix
>>>>>>>>>> tomorrow.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>>>>>
>>>>>>>>>>> Problem:
>>>>>>>>>>>
>>>>>>>>>>> The tests create native threads that attach to the VM through
>>>>>>>>>>> JNI_AttachCurrentThread but which then terminate without
>>>>>>>>>>> detaching themselves. When the VM exits and we're using
>>>>>>>>>>> Flight Recorder "dumponexit" this leads to a call to
>>>>>>>>>>> VM_PrintThreads that in part wants to print the per-thread
>>>>>>>>>>> CPU usage. When we encounter the threads that have terminated
>>>>>>>>>>> already the low level pthread_getcpuclockid calls returns
>>>>>>>>>>> ESRCH but the code doesn't expect that and so fails an assert
>>>>>>>>>>> in debug mode and can SEGV in product mode.
>>>>>>>>>>>
>>>>>>>>>>> Solution:
>>>>>>>>>>>
>>>>>>>>>>> Serviceability-side: fix the tests
>>>>>>>>>>>
>>>>>>>>>>> Change the tests so that the threads detach before
>>>>>>>>>>> terminating. The two tests are (surprisingly) written in
>>>>>>>>>>> completely different styles, so the solution also takes on
>>>>>>>>>>> two different styles.
>>>>>>>>>>>
>>>>>>>>>>> Runtime-side: make the VM more robust in the fact of JNI
>>>>>>>>>>> attached threads that terminate before detaching, and add a
>>>>>>>>>>> regression test
>>>>>>>>>>>
>>>>>>>>>>> I took a good look at the low-level code for interacting with
>>>>>>>>>>> arbitrary threads and as far as I can see the problem only
>>>>>>>>>>> exists for this one case of pthread_getcpuclockid on Linux.
>>>>>>>>>>> Elsewhere the potential for a library call failure just
>>>>>>>>>>> reports an error value (such as -1 for the cpu time used).
>>>>>>>>>>>
>>>>>>>>>>> So the fix is simply to allow for ESRCH when calling
>>>>>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that
>>>>>>>>>>> case.
>>>>>>>>>>>
>>>>>>>>>>> I created a new regression test to create a new native
>>>>>>>>>>> thread, attach it and then let it terminate while still
>>>>>>>>>>> attached. The java code then calls various Thread and
>>>>>>>>>>> ThreadMXBean functions on it to ensure there are no crashes
>>>>>>>>>>> or unexpected exceptions.
>>>>>>>>>>>
>>>>>>>>>>> Testing:
>>>>>>>>>>> - old tests with fixed run-time
>>>>>>>>>>> - old run-time with fixed tests
>>>>>>>>>>> - mach tier4 (which exposed the problem - that's where we
>>>>>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>>>>>> - mach5 tier 1-3 for good measure [in progress]
>>>>>>>>>>> - new regression test
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>>>
>>>
More information about the serviceability-dev
mailing list