RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

David Holmes david.holmes at oracle.com
Wed Aug 10 04:19:10 UTC 2016


On 10/08/2016 1:56 PM, Chris Plummer wrote:
> On 8/8/16 5:52 PM, David Holmes wrote:
>> On 9/08/2016 6:22 AM, Chris Plummer wrote:
>>> Hi David,
>>>
>>> Did you want me to implement any of the additional cleanup work I
>>> mentioned: manually inline _get_previous_fp, change os::current_frame()
>>> to walk back one less frame, possibly rename os::current_frame()?
>>
>> Up to you. I'm not insisting on anything, but the less reliance we
>> have on uncheckable (at build time) compiler behaviour, the better.
> Ok. Given the relatively short amount of time left to resolve p4s, I
> think I will just leave it as-is. I've started some more robust testing
> the NMT detailed enabled. Once that completes I'll do the push. I'll
> also file a couple of RFEs for further cleanup/improvements.
>
> Can I consider the changes officially reviewed by you now?

Yes.

Thanks,
David

> thanks,
>
> Chris
>>
>> Thanks,
>> David
>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 8/7/16 4:26 PM, David Holmes wrote:
>>>> Hi Chris,
>>>>
>>>> I don't have any good suggestions for this. So go with (2) and lets
>>>> work on (3).
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 5/08/2016 5:05 PM, Chris Plummer wrote:
>>>>> Hi David,
>>>>>
>>>>> If fixing os::current_frame() to have a better name and also make
>>>>> it go
>>>>> up one less frame makes these changes more palatable, I'm willing to
>>>>> make that change. I would prefer to do it with a follow up CR (it
>>>>> would
>>>>> probably have to be an RFE), but will do it with these changes if
>>>>> necessary. I still pull hairs over the proper name for this method,
>>>>> even
>>>>> if it is modified to return the frame of whoever called it. Usually
>>>>> the
>>>>> meaning conveyed by a method's name does not change based on
>>>>> whether you
>>>>> choose the caller's or callee's point of view, but in this case it
>>>>> does,
>>>>> and I'm not sure which point of view makes more sense. If we choose
>>>>> the
>>>>> caller's point of view, then the proper name remains
>>>>> os::current_frame(). If we choose the callee's point of view, then it
>>>>> should be os::callers_frame(). Maybe there's a name that is
>>>>> agnostic and
>>>>> means the same thing from both view points. I just haven't thought of
>>>>> one yet.
>>>>>
>>>>> With respect to ALWAYSINLINE, it does not work for solaris and windows
>>>>> slowdebug builds. Note the special case in the test I wrote to
>>>>> allow for
>>>>> AllocateHeap() in the stack trace in this case, even though it
>>>>> shouldn't
>>>>> be there because it uses ALWAYSINLINE. I could have made changes in
>>>>> the
>>>>> source to get rid of it from the stack trace, but I didn't feel the
>>>>> source code disruption was worth it for a slowdebug build, especially
>>>>> since there are only a allocation call sites where it is a problem. I
>>>>> could use ALWAYSINLINE for the cases where it will work to inline
>>>>> _get_previous_fp, but I don't really see that as being any more
>>>>> reliable
>>>>> than what is there now.
>>>>>
>>>>> As for making _get_previous_fp() a macro, that's made more complicated
>>>>> because it has #ifdefs already. I could move its implementation
>>>>> directly
>>>>> into os::current_frame(). That would fix the inlining problem. I think
>>>>> it could also use some cleanup with the #ifdefs. For example, for
>>>>> linux-x86 do we have to worry about the SPARC_WORKS and __clang__
>>>>> cases?
>>>>>
>>>>> And yes, even with my changes the code is no less fragile, and no less
>>>>> misdirected in its approach to getting a consistent allocation back
>>>>> trace.  As I see it, there are 3 options:
>>>>>
>>>>> (1) Do nothing, and leave it both broken and fragile.
>>>>> (2) Do the cleanup I've done to at least correct the known stack trace
>>>>> issues.
>>>>> (3) Find another solution that doesn't suffer from these fragility
>>>>> issues.
>>>>>
>>>>> Note that (3) does not preclude doing (2) first, and (2) seems a
>>>>> better
>>>>> alternative than leaving it in its broken state (1). That's why I have
>>>>> pursued these changes even though I know things will still be fragile.
>>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>> On 8/4/16 9:47 PM, David Holmes wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> On 5/08/2016 7:53 AM, Chris Plummer wrote:
>>>>>>> Ping!
>>>>>>
>>>>>> I took another look at this and my earlier comments from JDK-8133749.
>>>>>> I hate to see the functionality "fixed" yet still have a completely
>>>>>> confusing and mis-named API. I'm still far from convinced that
>>>>>> returning the callers caller wasn't an "error" that was done due to
>>>>>> the lack of inlining and the appearance of an unexpected stackframe.
>>>>>> You've now made things consistent - but os::current_frame() is
>>>>>> completely mis-leading in name. And I'm still concerned that
>>>>>> correctness here depends on C compiler inlining choices, with no way
>>>>>> to verify at build time that they were indeed inlined or not!
>>>>>> Don't we
>>>>>> have ALWAYSINLINE to mark things like _get_previous_fp ? For that
>>>>>> matter shouldn't _get_previous_fp be a macro so inlining plays no
>>>>>> role ?
>>>>>>
>>>>>> Sorry but this code seems to simply limp from one broken state to
>>>>>> another due to its fragility.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> On 8/2/16 1:31 PM, Chris Plummer wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Please review the following:
>>>>>>>>
>>>>>>>> webrev:
>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Bugs fixed:
>>>>>>>>
>>>>>>>> JDK-8133749: os::current_frame() is not returning the proper
>>>>>>>> frame on
>>>>>>>> ARM and solaris-x64
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>>>>>>>
>>>>>>>> JDK-8133747: NMT includes an extra stack frame due to assumption
>>>>>>>> NMT
>>>>>>>> is making on tail calls being used
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>>>>>>>
>>>>>>>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds
>>>>>>>> includes NativeCallStack::NativeCallStack() frame in backtrace
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>>>>>>>
>>>>>>>> The above bugs all result in the NMT detail stack traces including
>>>>>>>> extra frames in the stack traces. Certain frames are suppose to be
>>>>>>>> skipped, but sometimes are not. The frames that show up are:
>>>>>>>>
>>>>>>>>   NativeCallStack::NativeCallStack
>>>>>>>>   os::get_native_stack
>>>>>>>>
>>>>>>>> These are both methods used to generate the stack trace, and
>>>>>>>> therefore
>>>>>>>> should not be included it.  However, under some (most)
>>>>>>>> circumstances,
>>>>>>>> they were.
>>>>>>>>
>>>>>>>> Also, there was no test to make sure that any NMT detail output is
>>>>>>>> generated, or that it is correct. I've added one with this
>>>>>>>> webrev. Of
>>>>>>>> the 27 possible builds (9 platforms * 3 build flavors), only 9
>>>>>>>> of the
>>>>>>>> 27 initially passed this new test. They were the product and
>>>>>>>> fastdebug
>>>>>>>> builds for solaris-sparc, bsd-x64, and linux-x64; and the slowdebug
>>>>>>>> builds for solaris-x64, windows-x86, and windows-x64. All the rest
>>>>>>>> failed. They now all pass with my fixes in place.
>>>>>>>>
>>>>>>>> Here's a summary of the changes:
>>>>>>>>
>>>>>>>> src/os/posix/vm/os_posix.cpp
>>>>>>>> src/os/windows/vm/os_windows.cpp
>>>>>>>>
>>>>>>>> JDK-8133747 fixes: There was some frame skipping logic here
>>>>>>>> which was
>>>>>>>> sort of correct, but was misplace. There are no extra frames being
>>>>>>>> added in os::get_native_stack() due to lack of inlining or lack
>>>>>>>> of a
>>>>>>>> tail call, so no need for toSkip++ here. The logic has been
>>>>>>>> moved to
>>>>>>>> NativeCallStack::NativeCallStack, which is where the tail call is
>>>>>>>> (sometimes) made, and also corrected (see nativeCallStack.cpp
>>>>>>>> below).
>>>>>>>>
>>>>>>>> src/share/vm/utilities/nativeCallStack.cpp
>>>>>>>>
>>>>>>>> JDK-8133747 fixes: The frame skipping logic that was moved here
>>>>>>>> assumed that NativeCallStack::NativeCallStack would not appear
>>>>>>>> in the
>>>>>>>> call stack (due to a tail call be using to call
>>>>>>>> os::get_native_stack)
>>>>>>>> except in slow debug builds. However, some platforms also don't
>>>>>>>> use a
>>>>>>>> tail call even when optimized. From what I can tell that is the
>>>>>>>> case
>>>>>>>> for 32-bit platforms and for windows.
>>>>>>>>
>>>>>>>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>>>>>>>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>>>>>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>>>>>
>>>>>>>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to
>>>>>>>> skip one extra frame
>>>>>>>>
>>>>>>>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>>>>>>>
>>>>>>>> JDK-8133749 fixes: os:current_frame() was not consistent with other
>>>>>>>> platforms and needs to skip one more frame. This means it
>>>>>>>> returns the
>>>>>>>> frame for the caller's caller. So when called by
>>>>>>>> os:get_native_stack(), it returns the frame for whoever called
>>>>>>>> os::get_native_stack(). Although not intuitive, this is what
>>>>>>>> os:get_native_stack() expects. Probably a method rename and/or a
>>>>>>>> behavior change is justified here, but I would prefer to do that
>>>>>>>> with
>>>>>>>> a followup CR if anyone has a good suggestion on what to do.
>>>>>>>>
>>>>>>>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>>>>>>>
>>>>>>>> This is the new NTM detail test. It checks for frames that
>>>>>>>> shouldn't
>>>>>>>> be present and validates at least one stack trace is what is
>>>>>>>> expected.
>>>>>>>>
>>>>>>>> I verified that the above test now passes on all supported
>>>>>>>> platforms,
>>>>>>>> and also did a full jprt "-testset hotpot" run. I plan on doing
>>>>>>>> some
>>>>>>>> RBT testing with NMT detail enabled before committing.
>>>>>>>>
>>>>>>>> Regarding the community contributed ports that Oracle does not
>>>>>>>> support, I didn't make any changes there, but it looks like some of
>>>>>>>> these bugs do exist. Notably:
>>>>>>>>
>>>>>>>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes
>>>>>>>> done to the
>>>>>>>>  os_linux_x86.cp should also be applied here.
>>>>>>>> -linux-ppc: Hard to say for sure since the implementation of
>>>>>>>> os::current_frame is
>>>>>>>>  different than others, but it looks to me like it suffers from
>>>>>>>> both
>>>>>>>> JDK-8133749
>>>>>>>>  and JDK-8133740.
>>>>>>>> -aix-ppc: Looks to be the same implementation as linux-ppc, so
>>>>>>>> would
>>>>>>>> need the
>>>>>>>>  same changes.
>>>>>>>>
>>>>>>>> These ports may also be suffering from JDK-8133747, but that fix
>>>>>>>> is in
>>>>>>>> shared code (nativeCallStack.cpp). My changes there will need some
>>>>>>>> tweaking for these ports they don't use a tail call to call
>>>>>>>> os::get_native_stack().
>>>>>>>>
>>>>>>>> If the maintainers of these ports could send me some NMT detail
>>>>>>>> output, I can advise better on what changes are needed. Then you
>>>>>>>> can
>>>>>>>> implement and test them, and then send them back to me and I'll
>>>>>>>> include them with my changes. What I need is the following command
>>>>>>>> run
>>>>>>>> on product and slowdebug builds. Initially run without any of my
>>>>>>>> changes applied. If needed I may followup with a request that
>>>>>>>> they be
>>>>>>>> run with the changes applied:
>>>>>>>>
>>>>>>>> bin/java -XX:+UnlockDiagnosticVMOptions
>>>>>>>> -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics -version
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>
>>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list