RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Chris Plummer
chris.plummer at oracle.com
Mon Jul 9 21:50:09 UTC 2018
On 7/9/18 2:41 PM, David Holmes wrote:
> Hi Chris,
>
> On 10/07/2018 4:22 AM, Chris Plummer wrote:
>> Hi David,
>>
>> Would it be better to problem list this test on solaris using
>> JDK-8156708. That way when JDK-8156708 is fixed it can come off the
>> problem list and start executing on solaris.
>
> JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could
> only fix this for VM created threads. The general problem of TLS
> destructors looping if a thread terminates without detaching from the
> VM is not solvable - other than by not using TLS in the VM.
Ok, I misunderstood your comments in the test.
Changes look fine.
Chris
>
> Thanks,
> David
>
>> thanks,
>>
>> Chris
>>
>> On 7/8/18 4:58 PM, David Holmes wrote:
>>> tl;dr skip the new regression test on Solaris
>>>
>>> New webrev:
>>>
>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>>
>>> This excludes the test from running on Solaris, so the makefile
>>> doesn't bother compiling this native test and the Java part of the
>>> test adds:
>>>
>>> ! * @requires os.family != "windows" & os.family != "solaris"
>>> * @summary Basic test of Thread and ThreadMXBean queries on a
>>> natively
>>> * attached thread that has failed to detach before
>>> terminating.
>>> + * @comment The native code only supports POSIX so no windows
>>> testing; also
>>> + * we have to skip solaris as a terminating thread that
>>> fails to
>>> + * detach will hit an infinite loop due to TLS destructor
>>> issues - see
>>> + * comments in JDK-8156708
>>>
>>> Note this means that Solaris is not affected by the original issue
>>> because a still-attached native thread can't actually terminate due
>>> to the TLS destructor infinite-loop issue.
>>>
>>> Thanks,
>>> David
>>>
>>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>>> <sigh> The new test is hanging on Solaris. I just discovered we
>>>> don't run these tests on Solaris until tier4.
>>>>
>>>> David
>>>>
>>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>>> Hi Chris,
>>>>>
>>>>> Thanks for looking at this.
>>>>>
>>>>> Updated webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>>
>>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>>
>>>>> More below ...
>>>>>
>>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Solaris problems aside, overall it looks fine. Some minor things
>>>>>> I noted:
>>>>>>
>>>>>> I noticed that exitCode is never modified in agentA() or
>>>>>> agentB(), so there isn't much point to having it. If you reach
>>>>>> the bottom of the function, it passed, so PASSED can be returned.
>>>>>> The code would be more clear if it did this. As-is it is implied
>>>>>> that you can reach the bottom when it fails.
>>>>>
>>>>> I resisted any and all urges to do any kind of unrelated code
>>>>> cleanup in the tests - once you start you may end up doing a full
>>>>> rewrite.
>>>>>
>>>>>> Is detaching the threads along the failure paths really needed?
>>>>>> exit() is called, so this would seem to make it unnecessary.
>>>>>
>>>>> You're right that isn't necessary. I'll remove the changes from
>>>>> before the exits in ji05t001.c
>>>>>
>>>>>> I prefer assignments not to be embedded inside the "if"
>>>>>> condition. The DetachCurrentThread code in THREAD_return() is
>>>>>> much more readable than the similar code in agentA() and agentB().
>>>>>
>>>>> It's an existing style already used in that test e.g.
>>>>>
>>>>> 287 if ((res =
>>>>> 288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>>> 289 JNI_ENV_ARG(vm, (void **) &env), (void *)
>>>>> 0)) != 0) {
>>>>>
>>>>> and I don't mind it, so I'd prefer not to change it.
>>>>>
>>>>>> In the test:
>>>>>>
>>>>>> 54 // Generally as long as we don't crash of throw
>>>>>> unexpected
>>>>>> 55 // exceptions then the test passes. In some cases
>>>>>> we know exactly
>>>>>>
>>>>>> "of" should be "or".
>>>>>
>>>>> Well spotted. Thanks.
>>>>>
>>>>>> Shouldn't you be catching exceptions for all the Thread methods
>>>>>> you are calling? Otherwise the test will exit if one is thrown,
>>>>>> and the above comment indicates that you don't want this.
>>>>>
>>>>> I'm not expecting there to be any exceptions from any of the
>>>>> called methods. That would potentially indicate a problem in
>>>>> handling the terminated native thread, so would indicate a test
>>>>> failure.
>>>>>
>>>>>> Don't we normally put these tests in a package?
>>>>>
>>>>> Doesn't seem to be any hard and fast rule. I only uses packages
>>>>> when they are important for the test. In runtime we have 905 java
>>>>> files and only 116 have a package statement. It varies elsewhere.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>>> <sigh> Solaris compiler complains about doing a return from
>>>>>>> inside a do-while loop. I'll have to rework part of the fix
>>>>>>> tomorrow.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>>
>>>>>>>> Problem:
>>>>>>>>
>>>>>>>> The tests create native threads that attach to the VM through
>>>>>>>> JNI_AttachCurrentThread but which then terminate without
>>>>>>>> detaching themselves. When the VM exits and we're using Flight
>>>>>>>> Recorder "dumponexit" this leads to a call to VM_PrintThreads
>>>>>>>> that in part wants to print the per-thread CPU usage. When we
>>>>>>>> encounter the threads that have terminated already the low
>>>>>>>> level pthread_getcpuclockid calls returns ESRCH but the code
>>>>>>>> doesn't expect that and so fails an assert in debug mode and
>>>>>>>> can SEGV in product mode.
>>>>>>>>
>>>>>>>> Solution:
>>>>>>>>
>>>>>>>> Serviceability-side: fix the tests
>>>>>>>>
>>>>>>>> Change the tests so that the threads detach before terminating.
>>>>>>>> The two tests are (surprisingly) written in completely
>>>>>>>> different styles, so the solution also takes on two different
>>>>>>>> styles.
>>>>>>>>
>>>>>>>> Runtime-side: make the VM more robust in the fact of JNI
>>>>>>>> attached threads that terminate before detaching, and add a
>>>>>>>> regression test
>>>>>>>>
>>>>>>>> I took a good look at the low-level code for interacting with
>>>>>>>> arbitrary threads and as far as I can see the problem only
>>>>>>>> exists for this one case of pthread_getcpuclockid on Linux.
>>>>>>>> Elsewhere the potential for a library call failure just reports
>>>>>>>> an error value (such as -1 for the cpu time used).
>>>>>>>>
>>>>>>>> So the fix is simply to allow for ESRCH when calling
>>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that
>>>>>>>> case.
>>>>>>>>
>>>>>>>> I created a new regression test to create a new native thread,
>>>>>>>> attach it and then let it terminate while still attached. The
>>>>>>>> java code then calls various Thread and ThreadMXBean functions
>>>>>>>> on it to ensure there are no crashes or unexpected exceptions.
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> - old tests with fixed run-time
>>>>>>>> - old run-time with fixed tests
>>>>>>>> - mach tier4 (which exposed the problem - that's where we
>>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>>> - mach5 tier 1-3 for good measure [in progress]
>>>>>>>> - new regression test
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>
>>
More information about the serviceability-dev
mailing list