RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Tue Jul 10 02:07:39 UTC 2018
Hi David,
It looks good modulo the minor comments that others have already found.
Could I ask you to fix a couple of really minor issues in new test?
Unneeded spaces are at lines 84 and 51 in .java and .c files:
83 if (mbean.isThreadCpuTimeSupported() &&
84 mbean.isThreadCpuTimeEnabled() ) {
. . .
51 class_id = (*env)->FindClass (env, "java/lang/Thread");
Thanks,
Serguei
On 7/9/18 15:17, David Holmes wrote:
> Thanks Chris!
>
> Can I please get a second review.
>
> David
>
> On 10/07/2018 7:50 AM, Chris Plummer wrote:
>> On 7/9/18 2:41 PM, David Holmes wrote:
>>> Hi Chris,
>>>
>>> On 10/07/2018 4:22 AM, Chris Plummer wrote:
>>>> Hi David,
>>>>
>>>> Would it be better to problem list this test on solaris using
>>>> JDK-8156708. That way when JDK-8156708 is fixed it can come off the
>>>> problem list and start executing on solaris.
>>>
>>> JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could
>>> only fix this for VM created threads. The general problem of TLS
>>> destructors looping if a thread terminates without detaching from
>>> the VM is not solvable - other than by not using TLS in the VM.
>> Ok, I misunderstood your comments in the test.
>>
>> Changes look fine.
>>
>> Chris
>>>
>>> Thanks,
>>> David
>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>> On 7/8/18 4:58 PM, David Holmes wrote:
>>>>> tl;dr skip the new regression test on Solaris
>>>>>
>>>>> New webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
>>>>>
>>>>> This excludes the test from running on Solaris, so the makefile
>>>>> doesn't bother compiling this native test and the Java part of the
>>>>> test adds:
>>>>>
>>>>> ! * @requires os.family != "windows" & os.family != "solaris"
>>>>> * @summary Basic test of Thread and ThreadMXBean queries on a
>>>>> natively
>>>>> * attached thread that has failed to detach before
>>>>> terminating.
>>>>> + * @comment The native code only supports POSIX so no windows
>>>>> testing; also
>>>>> + * we have to skip solaris as a terminating thread that
>>>>> fails to
>>>>> + * detach will hit an infinite loop due to TLS
>>>>> destructor issues - see
>>>>> + * comments in JDK-8156708
>>>>>
>>>>> Note this means that Solaris is not affected by the original issue
>>>>> because a still-attached native thread can't actually terminate
>>>>> due to the TLS destructor infinite-loop issue.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 6/07/2018 6:07 PM, David Holmes wrote:
>>>>>> <sigh> The new test is hanging on Solaris. I just discovered we
>>>>>> don't run these tests on Solaris until tier4.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 6/07/2018 8:40 AM, David Holmes wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> Thanks for looking at this.
>>>>>>>
>>>>>>> Updated webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
>>>>>>>
>>>>>>> Only real changes in ji05t001.c. (And fixed typo in the new test)
>>>>>>>
>>>>>>> More below ...
>>>>>>>
>>>>>>> On 6/07/2018 7:55 AM, Chris Plummer wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Solaris problems aside, overall it looks fine. Some minor
>>>>>>>> things I noted:
>>>>>>>>
>>>>>>>> I noticed that exitCode is never modified in agentA() or
>>>>>>>> agentB(), so there isn't much point to having it. If you reach
>>>>>>>> the bottom of the function, it passed, so PASSED can be
>>>>>>>> returned. The code would be more clear if it did this. As-is it
>>>>>>>> is implied that you can reach the bottom when it fails.
>>>>>>>
>>>>>>> I resisted any and all urges to do any kind of unrelated code
>>>>>>> cleanup in the tests - once you start you may end up doing a
>>>>>>> full rewrite.
>>>>>>>
>>>>>>>> Is detaching the threads along the failure paths really needed?
>>>>>>>> exit() is called, so this would seem to make it unnecessary.
>>>>>>>
>>>>>>> You're right that isn't necessary. I'll remove the changes from
>>>>>>> before the exits in ji05t001.c
>>>>>>>
>>>>>>>> I prefer assignments not to be embedded inside the "if"
>>>>>>>> condition. The DetachCurrentThread code in THREAD_return() is
>>>>>>>> much more readable than the similar code in agentA() and agentB().
>>>>>>>
>>>>>>> It's an existing style already used in that test e.g.
>>>>>>>
>>>>>>> 287 if ((res =
>>>>>>> 288 JNI_ENV_PTR(vm)->AttachCurrentThread(
>>>>>>> 289 JNI_ENV_ARG(vm, (void **) &env), (void *)
>>>>>>> 0)) != 0) {
>>>>>>>
>>>>>>> and I don't mind it, so I'd prefer not to change it.
>>>>>>>
>>>>>>>> In the test:
>>>>>>>>
>>>>>>>> 54 // Generally as long as we don't crash of throw
>>>>>>>> unexpected
>>>>>>>> 55 // exceptions then the test passes. In some cases
>>>>>>>> we know exactly
>>>>>>>>
>>>>>>>> "of" should be "or".
>>>>>>>
>>>>>>> Well spotted. Thanks.
>>>>>>>
>>>>>>>> Shouldn't you be catching exceptions for all the Thread methods
>>>>>>>> you are calling? Otherwise the test will exit if one is thrown,
>>>>>>>> and the above comment indicates that you don't want this.
>>>>>>>
>>>>>>> I'm not expecting there to be any exceptions from any of the
>>>>>>> called methods. That would potentially indicate a problem in
>>>>>>> handling the terminated native thread, so would indicate a test
>>>>>>> failure.
>>>>>>>
>>>>>>>> Don't we normally put these tests in a package?
>>>>>>>
>>>>>>> Doesn't seem to be any hard and fast rule. I only uses packages
>>>>>>> when they are important for the test. In runtime we have 905
>>>>>>> java files and only 116 have a package statement. It varies
>>>>>>> elsewhere.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 7/5/18 2:58 AM, David Holmes wrote:
>>>>>>>>> <sigh> Solaris compiler complains about doing a return from
>>>>>>>>> inside a do-while loop. I'll have to rework part of the fix
>>>>>>>>> tomorrow.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 5/07/2018 6:19 PM, David Holmes wrote:
>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
>>>>>>>>>>
>>>>>>>>>> Problem:
>>>>>>>>>>
>>>>>>>>>> The tests create native threads that attach to the VM through
>>>>>>>>>> JNI_AttachCurrentThread but which then terminate without
>>>>>>>>>> detaching themselves. When the VM exits and we're using
>>>>>>>>>> Flight Recorder "dumponexit" this leads to a call to
>>>>>>>>>> VM_PrintThreads that in part wants to print the per-thread
>>>>>>>>>> CPU usage. When we encounter the threads that have terminated
>>>>>>>>>> already the low level pthread_getcpuclockid calls returns
>>>>>>>>>> ESRCH but the code doesn't expect that and so fails an assert
>>>>>>>>>> in debug mode and can SEGV in product mode.
>>>>>>>>>>
>>>>>>>>>> Solution:
>>>>>>>>>>
>>>>>>>>>> Serviceability-side: fix the tests
>>>>>>>>>>
>>>>>>>>>> Change the tests so that the threads detach before
>>>>>>>>>> terminating. The two tests are (surprisingly) written in
>>>>>>>>>> completely different styles, so the solution also takes on
>>>>>>>>>> two different styles.
>>>>>>>>>>
>>>>>>>>>> Runtime-side: make the VM more robust in the fact of JNI
>>>>>>>>>> attached threads that terminate before detaching, and add a
>>>>>>>>>> regression test
>>>>>>>>>>
>>>>>>>>>> I took a good look at the low-level code for interacting with
>>>>>>>>>> arbitrary threads and as far as I can see the problem only
>>>>>>>>>> exists for this one case of pthread_getcpuclockid on Linux.
>>>>>>>>>> Elsewhere the potential for a library call failure just
>>>>>>>>>> reports an error value (such as -1 for the cpu time used).
>>>>>>>>>>
>>>>>>>>>> So the fix is simply to allow for ESRCH when calling
>>>>>>>>>> pthread_getcpuclockid and return -1 for the cpu usage in that
>>>>>>>>>> case.
>>>>>>>>>>
>>>>>>>>>> I created a new regression test to create a new native
>>>>>>>>>> thread, attach it and then let it terminate while still
>>>>>>>>>> attached. The java code then calls various Thread and
>>>>>>>>>> ThreadMXBean functions on it to ensure there are no crashes
>>>>>>>>>> or unexpected exceptions.
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>> - old tests with fixed run-time
>>>>>>>>>> - old run-time with fixed tests
>>>>>>>>>> - mach tier4 (which exposed the problem - that's where we
>>>>>>>>>> enable Flight recorder for the tests) [in progress]
>>>>>>>>>> - mach5 tier 1-3 for good measure [in progress]
>>>>>>>>>> - new regression test
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>>
>>
>>
More information about the serviceability-dev
mailing list