RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
David Holmes
david.holmes at oracle.com
Mon Jul 9 21:41:02 UTC 2018
Hi Chris,
On 10/07/2018 4:22 AM, Chris Plummer wrote:
> Hi David,
>
> Would it be better to problem list this test on solaris using
> JDK-8156708. That way when JDK-8156708 is fixed it can come off the
> problem list and start executing on solaris.
JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could only
fix this for VM created threads. The general problem of TLS destructors
looping if a thread terminates without detaching from the VM is not
solvable - other than by not using TLS in the VM.
Thanks,
David
> thanks,
>
> Chris
>
> On 7/8/18 4:58 PM, David Holmes wrote:
>> tl;dr skip the new regression test on Solaris
>>
>> New webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>
>> This excludes the test from running on Solaris, so the makefile
>> doesn't bother compiling this native test and the Java part of the
>> test adds:
>>
>> ! * @requires os.family != "windows" & os.family != "solaris"
>> * @summary Basic test of Thread and ThreadMXBean queries on a natively
>> * attached thread that has failed to detach before
>> terminating.
>> + * @comment The native code only supports POSIX so no windows
>> testing; also
>> + * we have to skip solaris as a terminating thread that
>> fails to
>> + * detach will hit an infinite loop due to TLS destructor
>> issues - see
>> + * comments in JDK-8156708
>>
>> Note this means that Solaris is not affected by the original issue
>> because a still-attached native thread can't actually terminate due to
>> the TLS destructor infinite-loop issue.
>>
>> Thanks,
>> David
>>
>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>> <sigh> The new test is hanging on Solaris. I just discovered we don't
>>> run these tests on Solaris until tier4.
>>>
>>> David
>>>
>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>> Hi Chris,
>>>>
>>>> Thanks for looking at this.
>>>>
>>>> Updated webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>
>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>
>>>> More below ...
>>>>
>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>> Hi David,
>>>>>
>>>>> Solaris problems aside, overall it looks fine. Some minor things I
>>>>> noted:
>>>>>
>>>>> I noticed that exitCode is never modified in agentA() or agentB(),
>>>>> so there isn't much point to having it. If you reach the bottom of
>>>>> the function, it passed, so PASSED can be returned. The code would
>>>>> be more clear if it did this. As-is it is implied that you can
>>>>> reach the bottom when it fails.
>>>>
>>>> I resisted any and all urges to do any kind of unrelated code
>>>> cleanup in the tests - once you start you may end up doing a full
>>>> rewrite.
>>>>
>>>>> Is detaching the threads along the failure paths really needed?
>>>>> exit() is called, so this would seem to make it unnecessary.
>>>>
>>>> You're right that isn't necessary. I'll remove the changes from
>>>> before the exits in ji05t001.c
>>>>
>>>>> I prefer assignments not to be embedded inside the "if" condition.
>>>>> The DetachCurrentThread code in THREAD_return() is much more
>>>>> readable than the similar code in agentA() and agentB().
>>>>
>>>> It's an existing style already used in that test e.g.
>>>>
>>>> 287 if ((res =
>>>> 288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>> 289 JNI_ENV_ARG(vm, (void **) &env), (void *) 0))
>>>> != 0) {
>>>>
>>>> and I don't mind it, so I'd prefer not to change it.
>>>>
>>>>> In the test:
>>>>>
>>>>> 54 // Generally as long as we don't crash of throw
>>>>> unexpected
>>>>> 55 // exceptions then the test passes. In some cases we
>>>>> know exactly
>>>>>
>>>>> "of" should be "or".
>>>>
>>>> Well spotted. Thanks.
>>>>
>>>>> Shouldn't you be catching exceptions for all the Thread methods you
>>>>> are calling? Otherwise the test will exit if one is thrown, and the
>>>>> above comment indicates that you don't want this.
>>>>
>>>> I'm not expecting there to be any exceptions from any of the called
>>>> methods. That would potentially indicate a problem in handling the
>>>> terminated native thread, so would indicate a test failure.
>>>>
>>>>> Don't we normally put these tests in a package?
>>>>
>>>> Doesn't seem to be any hard and fast rule. I only uses packages when
>>>> they are important for the test. In runtime we have 905 java files
>>>> and only 116 have a package statement. It varies elsewhere.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>> <sigh> Solaris compiler complains about doing a return from inside
>>>>>> a do-while loop. I'll have to rework part of the fix tomorrow.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>
>>>>>>> Problem:
>>>>>>>
>>>>>>> The tests create native threads that attach to the VM through
>>>>>>> JNI_AttachCurrentThread but which then terminate without
>>>>>>> detaching themselves. When the VM exits and we're using Flight
>>>>>>> Recorder "dumponexit" this leads to a call to VM_PrintThreads
>>>>>>> that in part wants to print the per-thread CPU usage. When we
>>>>>>> encounter the threads that have terminated already the low level
>>>>>>> pthread_getcpuclockid calls returns ESRCH but the code doesn't
>>>>>>> expect that and so fails an assert in debug mode and can SEGV in
>>>>>>> product mode.
>>>>>>>
>>>>>>> Solution:
>>>>>>>
>>>>>>> Serviceability-side: fix the tests
>>>>>>>
>>>>>>> Change the tests so that the threads detach before terminating.
>>>>>>> The two tests are (surprisingly) written in completely different
>>>>>>> styles, so the solution also takes on two different styles.
>>>>>>>
>>>>>>> Runtime-side: make the VM more robust in the fact of JNI attached
>>>>>>> threads that terminate before detaching, and add a regression test
>>>>>>>
>>>>>>> I took a good look at the low-level code for interacting with
>>>>>>> arbitrary threads and as far as I can see the problem only exists
>>>>>>> for this one case of pthread_getcpuclockid on Linux. Elsewhere
>>>>>>> the potential for a library call failure just reports an error
>>>>>>> value (such as -1 for the cpu time used).
>>>>>>>
>>>>>>> So the fix is simply to allow for ESRCH when calling
>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that case.
>>>>>>>
>>>>>>> I created a new regression test to create a new native thread,
>>>>>>> attach it and then let it terminate while still attached. The
>>>>>>> java code then calls various Thread and ThreadMXBean functions on
>>>>>>> it to ensure there are no crashes or unexpected exceptions.
>>>>>>>
>>>>>>> Testing:
>>>>>>> - old tests with fixed run-time
>>>>>>> - old run-time with fixed tests
>>>>>>> - mach tier4 (which exposed the problem - that's where we
>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>> - mach5 tier 1-3 for good measure [in progress]
>>>>>>> - new regression test
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>
>>>>>
>>>>>
>
>
More information about the serviceability-dev
mailing list