RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Chris Plummer
chris.plummer at oracle.com
Mon Jul 9 18:22:46 UTC 2018
Hi David,
Would it be better to problem list this test on solaris using
JDK-8156708. That way when JDK-8156708 is fixed it can come off the
problem list and start executing on solaris.
thanks,
Chris
On 7/8/18 4:58 PM, David Holmes wrote:
> tl;dr skip the new regression test on Solaris
>
> New webrev:
>
> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>
> This excludes the test from running on Solaris, so the makefile
> doesn't bother compiling this native test and the Java part of the
> test adds:
>
> ! * @requires os.family != "windows" & os.family != "solaris"
> * @summary Basic test of Thread and ThreadMXBean queries on a natively
> * attached thread that has failed to detach before
> terminating.
> + * @comment The native code only supports POSIX so no windows
> testing; also
> + * we have to skip solaris as a terminating thread that
> fails to
> + * detach will hit an infinite loop due to TLS destructor
> issues - see
> + * comments in JDK-8156708
>
> Note this means that Solaris is not affected by the original issue
> because a still-attached native thread can't actually terminate due to
> the TLS destructor infinite-loop issue.
>
> Thanks,
> David
>
> On 6/07/2018 6:07 PM, David Holmes wrote:
>> <sigh> The new test is hanging on Solaris. I just discovered we don't
>> run these tests on Solaris until tier4.
>>
>> David
>>
>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>> Hi Chris,
>>>
>>> Thanks for looking at this.
>>>
>>> Updated webrev:
>>>
>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>
>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>
>>> More below ...
>>>
>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>> Hi David,
>>>>
>>>> Solaris problems aside, overall it looks fine. Some minor things I
>>>> noted:
>>>>
>>>> I noticed that exitCode is never modified in agentA() or agentB(),
>>>> so there isn't much point to having it. If you reach the bottom of
>>>> the function, it passed, so PASSED can be returned. The code would
>>>> be more clear if it did this. As-is it is implied that you can
>>>> reach the bottom when it fails.
>>>
>>> I resisted any and all urges to do any kind of unrelated code
>>> cleanup in the tests - once you start you may end up doing a full
>>> rewrite.
>>>
>>>> Is detaching the threads along the failure paths really needed?
>>>> exit() is called, so this would seem to make it unnecessary.
>>>
>>> You're right that isn't necessary. I'll remove the changes from
>>> before the exits in ji05t001.c
>>>
>>>> I prefer assignments not to be embedded inside the "if" condition.
>>>> The DetachCurrentThread code in THREAD_return() is much more
>>>> readable than the similar code in agentA() and agentB().
>>>
>>> It's an existing style already used in that test e.g.
>>>
>>> 287 if ((res =
>>> 288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>> 289 JNI_ENV_ARG(vm, (void **) &env), (void *) 0))
>>> != 0) {
>>>
>>> and I don't mind it, so I'd prefer not to change it.
>>>
>>>> In the test:
>>>>
>>>> 54 // Generally as long as we don't crash of throw
>>>> unexpected
>>>> 55 // exceptions then the test passes. In some cases we
>>>> know exactly
>>>>
>>>> "of" should be "or".
>>>
>>> Well spotted. Thanks.
>>>
>>>> Shouldn't you be catching exceptions for all the Thread methods you
>>>> are calling? Otherwise the test will exit if one is thrown, and the
>>>> above comment indicates that you don't want this.
>>>
>>> I'm not expecting there to be any exceptions from any of the called
>>> methods. That would potentially indicate a problem in handling the
>>> terminated native thread, so would indicate a test failure.
>>>
>>>> Don't we normally put these tests in a package?
>>>
>>> Doesn't seem to be any hard and fast rule. I only uses packages when
>>> they are important for the test. In runtime we have 905 java files
>>> and only 116 have a package statement. It varies elsewhere.
>>>
>>> Thanks,
>>> David
>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>> <sigh> Solaris compiler complains about doing a return from inside
>>>>> a do-while loop. I'll have to rework part of the fix tomorrow.
>>>>>
>>>>> David
>>>>>
>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>
>>>>>> Problem:
>>>>>>
>>>>>> The tests create native threads that attach to the VM through
>>>>>> JNI_AttachCurrentThread but which then terminate without
>>>>>> detaching themselves. When the VM exits and we're using Flight
>>>>>> Recorder "dumponexit" this leads to a call to VM_PrintThreads
>>>>>> that in part wants to print the per-thread CPU usage. When we
>>>>>> encounter the threads that have terminated already the low level
>>>>>> pthread_getcpuclockid calls returns ESRCH but the code doesn't
>>>>>> expect that and so fails an assert in debug mode and can SEGV in
>>>>>> product mode.
>>>>>>
>>>>>> Solution:
>>>>>>
>>>>>> Serviceability-side: fix the tests
>>>>>>
>>>>>> Change the tests so that the threads detach before terminating.
>>>>>> The two tests are (surprisingly) written in completely different
>>>>>> styles, so the solution also takes on two different styles.
>>>>>>
>>>>>> Runtime-side: make the VM more robust in the fact of JNI attached
>>>>>> threads that terminate before detaching, and add a regression test
>>>>>>
>>>>>> I took a good look at the low-level code for interacting with
>>>>>> arbitrary threads and as far as I can see the problem only exists
>>>>>> for this one case of pthread_getcpuclockid on Linux. Elsewhere
>>>>>> the potential for a library call failure just reports an error
>>>>>> value (such as -1 for the cpu time used).
>>>>>>
>>>>>> So the fix is simply to allow for ESRCH when calling
>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that case.
>>>>>>
>>>>>> I created a new regression test to create a new native thread,
>>>>>> attach it and then let it terminate while still attached. The
>>>>>> java code then calls various Thread and ThreadMXBean functions on
>>>>>> it to ensure there are no crashes or unexpected exceptions.
>>>>>>
>>>>>> Testing:
>>>>>> - old tests with fixed run-time
>>>>>> - old run-time with fixed tests
>>>>>> - mach tier4 (which exposed the problem - that's where we
>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>> - mach5 tier 1-3 for good measure [in progress]
>>>>>> - new regression test
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>
>>>>
>>>>
More information about the serviceability-dev
mailing list