RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
David Holmes
david.holmes at oracle.com
Mon Jul 9 22:17:13 UTC 2018
Thanks Chris!
Can I please get a second review.
David
On 10/07/2018 7:50 AM, Chris Plummer wrote:
> On 7/9/18 2:41 PM, David Holmes wrote:
>> Hi Chris,
>>
>> On 10/07/2018 4:22 AM, Chris Plummer wrote:
>>> Hi David,
>>>
>>> Would it be better to problem list this test on solaris using
>>> JDK-8156708. That way when JDK-8156708 is fixed it can come off the
>>> problem list and start executing on solaris.
>>
>> JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could
>> only fix this for VM created threads. The general problem of TLS
>> destructors looping if a thread terminates without detaching from the
>> VM is not solvable - other than by not using TLS in the VM.
> Ok, I misunderstood your comments in the test.
>
> Changes look fine.
>
> Chris
>>
>> Thanks,
>> David
>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 7/8/18 4:58 PM, David Holmes wrote:
>>>> tl;dr skip the new regression test on Solaris
>>>>
>>>> New webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>>>
>>>> This excludes the test from running on Solaris, so the makefile
>>>> doesn't bother compiling this native test and the Java part of the
>>>> test adds:
>>>>
>>>> ! * @requires os.family != "windows" & os.family != "solaris"
>>>> * @summary Basic test of Thread and ThreadMXBean queries on a
>>>> natively
>>>> * attached thread that has failed to detach before
>>>> terminating.
>>>> + * @comment The native code only supports POSIX so no windows
>>>> testing; also
>>>> + * we have to skip solaris as a terminating thread that
>>>> fails to
>>>> + * detach will hit an infinite loop due to TLS destructor
>>>> issues - see
>>>> + * comments in JDK-8156708
>>>>
>>>> Note this means that Solaris is not affected by the original issue
>>>> because a still-attached native thread can't actually terminate due
>>>> to the TLS destructor infinite-loop issue.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>>>> <sigh> The new test is hanging on Solaris. I just discovered we
>>>>> don't run these tests on Solaris until tier4.
>>>>>
>>>>> David
>>>>>
>>>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> Thanks for looking at this.
>>>>>>
>>>>>> Updated webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>>>
>>>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>>>
>>>>>> More below ...
>>>>>>
>>>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Solaris problems aside, overall it looks fine. Some minor things
>>>>>>> I noted:
>>>>>>>
>>>>>>> I noticed that exitCode is never modified in agentA() or
>>>>>>> agentB(), so there isn't much point to having it. If you reach
>>>>>>> the bottom of the function, it passed, so PASSED can be returned.
>>>>>>> The code would be more clear if it did this. As-is it is implied
>>>>>>> that you can reach the bottom when it fails.
>>>>>>
>>>>>> I resisted any and all urges to do any kind of unrelated code
>>>>>> cleanup in the tests - once you start you may end up doing a full
>>>>>> rewrite.
>>>>>>
>>>>>>> Is detaching the threads along the failure paths really needed?
>>>>>>> exit() is called, so this would seem to make it unnecessary.
>>>>>>
>>>>>> You're right that isn't necessary. I'll remove the changes from
>>>>>> before the exits in ji05t001.c
>>>>>>
>>>>>>> I prefer assignments not to be embedded inside the "if"
>>>>>>> condition. The DetachCurrentThread code in THREAD_return() is
>>>>>>> much more readable than the similar code in agentA() and agentB().
>>>>>>
>>>>>> It's an existing style already used in that test e.g.
>>>>>>
>>>>>> 287 if ((res =
>>>>>> 288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>>>> 289 JNI_ENV_ARG(vm, (void **) &env), (void *)
>>>>>> 0)) != 0) {
>>>>>>
>>>>>> and I don't mind it, so I'd prefer not to change it.
>>>>>>
>>>>>>> In the test:
>>>>>>>
>>>>>>> 54 // Generally as long as we don't crash of throw
>>>>>>> unexpected
>>>>>>> 55 // exceptions then the test passes. In some cases
>>>>>>> we know exactly
>>>>>>>
>>>>>>> "of" should be "or".
>>>>>>
>>>>>> Well spotted. Thanks.
>>>>>>
>>>>>>> Shouldn't you be catching exceptions for all the Thread methods
>>>>>>> you are calling? Otherwise the test will exit if one is thrown,
>>>>>>> and the above comment indicates that you don't want this.
>>>>>>
>>>>>> I'm not expecting there to be any exceptions from any of the
>>>>>> called methods. That would potentially indicate a problem in
>>>>>> handling the terminated native thread, so would indicate a test
>>>>>> failure.
>>>>>>
>>>>>>> Don't we normally put these tests in a package?
>>>>>>
>>>>>> Doesn't seem to be any hard and fast rule. I only uses packages
>>>>>> when they are important for the test. In runtime we have 905 java
>>>>>> files and only 116 have a package statement. It varies elsewhere.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>>>> <sigh> Solaris compiler complains about doing a return from
>>>>>>>> inside a do-while loop. I'll have to rework part of the fix
>>>>>>>> tomorrow.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>>>
>>>>>>>>> Problem:
>>>>>>>>>
>>>>>>>>> The tests create native threads that attach to the VM through
>>>>>>>>> JNI_AttachCurrentThread but which then terminate without
>>>>>>>>> detaching themselves. When the VM exits and we're using Flight
>>>>>>>>> Recorder "dumponexit" this leads to a call to VM_PrintThreads
>>>>>>>>> that in part wants to print the per-thread CPU usage. When we
>>>>>>>>> encounter the threads that have terminated already the low
>>>>>>>>> level pthread_getcpuclockid calls returns ESRCH but the code
>>>>>>>>> doesn't expect that and so fails an assert in debug mode and
>>>>>>>>> can SEGV in product mode.
>>>>>>>>>
>>>>>>>>> Solution:
>>>>>>>>>
>>>>>>>>> Serviceability-side: fix the tests
>>>>>>>>>
>>>>>>>>> Change the tests so that the threads detach before terminating.
>>>>>>>>> The two tests are (surprisingly) written in completely
>>>>>>>>> different styles, so the solution also takes on two different
>>>>>>>>> styles.
>>>>>>>>>
>>>>>>>>> Runtime-side: make the VM more robust in the fact of JNI
>>>>>>>>> attached threads that terminate before detaching, and add a
>>>>>>>>> regression test
>>>>>>>>>
>>>>>>>>> I took a good look at the low-level code for interacting with
>>>>>>>>> arbitrary threads and as far as I can see the problem only
>>>>>>>>> exists for this one case of pthread_getcpuclockid on Linux.
>>>>>>>>> Elsewhere the potential for a library call failure just reports
>>>>>>>>> an error value (such as -1 for the cpu time used).
>>>>>>>>>
>>>>>>>>> So the fix is simply to allow for ESRCH when calling
>>>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that
>>>>>>>>> case.
>>>>>>>>>
>>>>>>>>> I created a new regression test to create a new native thread,
>>>>>>>>> attach it and then let it terminate while still attached. The
>>>>>>>>> java code then calls various Thread and ThreadMXBean functions
>>>>>>>>> on it to ensure there are no crashes or unexpected exceptions.
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>> - old tests with fixed run-time
>>>>>>>>> - old run-time with fixed tests
>>>>>>>>> - mach tier4 (which exposed the problem - that's where we
>>>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>>>> - mach5 tier 1-3 for good measure [in progress]
>>>>>>>>> - new regression test
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>>>
>
>
More information about the serviceability-dev
mailing list