RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

David Holmes david.holmes at oracle.com
Sun Aug 7 23:26:20 UTC 2016


Hi Chris,

I don't have any good suggestions for this. So go with (2) and lets work 
on (3).

Thanks,
David

On 5/08/2016 5:05 PM, Chris Plummer wrote:
> Hi David,
>
> If fixing os::current_frame() to have a better name and also make it go
> up one less frame makes these changes more palatable, I'm willing to
> make that change. I would prefer to do it with a follow up CR (it would
> probably have to be an RFE), but will do it with these changes if
> necessary. I still pull hairs over the proper name for this method, even
> if it is modified to return the frame of whoever called it. Usually the
> meaning conveyed by a method's name does not change based on whether you
> choose the caller's or callee's point of view, but in this case it does,
> and I'm not sure which point of view makes more sense. If we choose the
> caller's point of view, then the proper name remains
> os::current_frame(). If we choose the callee's point of view, then it
> should be os::callers_frame(). Maybe there's a name that is agnostic and
> means the same thing from both view points. I just haven't thought of
> one yet.
>
> With respect to ALWAYSINLINE, it does not work for solaris and windows
> slowdebug builds. Note the special case in the test I wrote to allow for
> AllocateHeap() in the stack trace in this case, even though it shouldn't
> be there because it uses ALWAYSINLINE. I could have made changes in the
> source to get rid of it from the stack trace, but I didn't feel the
> source code disruption was worth it for a slowdebug build, especially
> since there are only a allocation call sites where it is a problem. I
> could use ALWAYSINLINE for the cases where it will work to inline
> _get_previous_fp, but I don't really see that as being any more reliable
> than what is there now.
>
> As for making _get_previous_fp() a macro, that's made more complicated
> because it has #ifdefs already. I could move its implementation directly
> into os::current_frame(). That would fix the inlining problem. I think
> it could also use some cleanup with the #ifdefs. For example, for
> linux-x86 do we have to worry about the SPARC_WORKS and __clang__ cases?
>
> And yes, even with my changes the code is no less fragile, and no less
> misdirected in its approach to getting a consistent allocation back
> trace.  As I see it, there are 3 options:
>
> (1) Do nothing, and leave it both broken and fragile.
> (2) Do the cleanup I've done to at least correct the known stack trace
> issues.
> (3) Find another solution that doesn't suffer from these fragility issues.
>
> Note that (3) does not preclude doing (2) first, and (2) seems a better
> alternative than leaving it in its broken state (1). That's why I have
> pursued these changes even though I know things will still be fragile.
>
> thanks,
>
> Chris
>
> On 8/4/16 9:47 PM, David Holmes wrote:
>> Hi Chris,
>>
>> On 5/08/2016 7:53 AM, Chris Plummer wrote:
>>> Ping!
>>
>> I took another look at this and my earlier comments from JDK-8133749.
>> I hate to see the functionality "fixed" yet still have a completely
>> confusing and mis-named API. I'm still far from convinced that
>> returning the callers caller wasn't an "error" that was done due to
>> the lack of inlining and the appearance of an unexpected stackframe.
>> You've now made things consistent - but os::current_frame() is
>> completely mis-leading in name. And I'm still concerned that
>> correctness here depends on C compiler inlining choices, with no way
>> to verify at build time that they were indeed inlined or not! Don't we
>> have ALWAYSINLINE to mark things like _get_previous_fp ? For that
>> matter shouldn't _get_previous_fp be a macro so inlining plays no role ?
>>
>> Sorry but this code seems to simply limp from one broken state to
>> another due to its fragility.
>>
>> Thanks,
>> David
>> -----
>>
>>> On 8/2/16 1:31 PM, Chris Plummer wrote:
>>>> Hello,
>>>>
>>>> Please review the following:
>>>>
>>>> webrev:
>>>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/
>>>>
>>>>
>>>>
>>>> Bugs fixed:
>>>>
>>>> JDK-8133749: os::current_frame() is not returning the proper frame on
>>>> ARM and solaris-x64
>>>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>>>
>>>> JDK-8133747: NMT includes an extra stack frame due to assumption NMT
>>>> is making on tail calls being used
>>>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>>>
>>>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds
>>>> includes NativeCallStack::NativeCallStack() frame in backtrace
>>>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>>>
>>>> The above bugs all result in the NMT detail stack traces including
>>>> extra frames in the stack traces. Certain frames are suppose to be
>>>> skipped, but sometimes are not. The frames that show up are:
>>>>
>>>>   NativeCallStack::NativeCallStack
>>>>   os::get_native_stack
>>>>
>>>> These are both methods used to generate the stack trace, and therefore
>>>> should not be included it.  However, under some (most) circumstances,
>>>> they were.
>>>>
>>>> Also, there was no test to make sure that any NMT detail output is
>>>> generated, or that it is correct. I've added one with this webrev. Of
>>>> the 27 possible builds (9 platforms * 3 build flavors), only 9 of the
>>>> 27 initially passed this new test. They were the product and fastdebug
>>>> builds for solaris-sparc, bsd-x64, and linux-x64; and the slowdebug
>>>> builds for solaris-x64, windows-x86, and windows-x64. All the rest
>>>> failed. They now all pass with my fixes in place.
>>>>
>>>> Here's a summary of the changes:
>>>>
>>>> src/os/posix/vm/os_posix.cpp
>>>> src/os/windows/vm/os_windows.cpp
>>>>
>>>> JDK-8133747 fixes: There was some frame skipping logic here which was
>>>> sort of correct, but was misplace. There are no extra frames being
>>>> added in os::get_native_stack() due to lack of inlining or lack of a
>>>> tail call, so no need for toSkip++ here. The logic has been moved to
>>>> NativeCallStack::NativeCallStack, which is where the tail call is
>>>> (sometimes) made, and also corrected (see nativeCallStack.cpp below).
>>>>
>>>> src/share/vm/utilities/nativeCallStack.cpp
>>>>
>>>> JDK-8133747 fixes: The frame skipping logic that was moved here
>>>> assumed that NativeCallStack::NativeCallStack would not appear in the
>>>> call stack (due to a tail call be using to call os::get_native_stack)
>>>> except in slow debug builds. However, some platforms also don't use a
>>>> tail call even when optimized. From what I can tell that is the case
>>>> for 32-bit platforms and for windows.
>>>>
>>>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>>>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>
>>>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to
>>>> skip one extra frame
>>>>
>>>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>>>
>>>> JDK-8133749 fixes: os:current_frame() was not consistent with other
>>>> platforms and needs to skip one more frame. This means it returns the
>>>> frame for the caller's caller. So when called by
>>>> os:get_native_stack(), it returns the frame for whoever called
>>>> os::get_native_stack(). Although not intuitive, this is what
>>>> os:get_native_stack() expects. Probably a method rename and/or a
>>>> behavior change is justified here, but I would prefer to do that with
>>>> a followup CR if anyone has a good suggestion on what to do.
>>>>
>>>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>>>
>>>> This is the new NTM detail test. It checks for frames that shouldn't
>>>> be present and validates at least one stack trace is what is expected.
>>>>
>>>> I verified that the above test now passes on all supported platforms,
>>>> and also did a full jprt "-testset hotpot" run. I plan on doing some
>>>> RBT testing with NMT detail enabled before committing.
>>>>
>>>> Regarding the community contributed ports that Oracle does not
>>>> support, I didn't make any changes there, but it looks like some of
>>>> these bugs do exist. Notably:
>>>>
>>>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes
>>>> done to the
>>>>  os_linux_x86.cp should also be applied here.
>>>> -linux-ppc: Hard to say for sure since the implementation of
>>>> os::current_frame is
>>>>  different than others, but it looks to me like it suffers from both
>>>> JDK-8133749
>>>>  and JDK-8133740.
>>>> -aix-ppc: Looks to be the same implementation as linux-ppc, so would
>>>> need the
>>>>  same changes.
>>>>
>>>> These ports may also be suffering from JDK-8133747, but that fix is in
>>>> shared code (nativeCallStack.cpp). My changes there will need some
>>>> tweaking for these ports they don't use a tail call to call
>>>> os::get_native_stack().
>>>>
>>>> If the maintainers of these ports could send me some NMT detail
>>>> output, I can advise better on what changes are needed. Then you can
>>>> implement and test them, and then send them back to me and I'll
>>>> include them with my changes. What I need is the following command run
>>>> on product and slowdebug builds. Initially run without any of my
>>>> changes applied. If needed I may followup with a request that they be
>>>> run with the changes applied:
>>>>
>>>> bin/java -XX:+UnlockDiagnosticVMOptions
>>>> -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics -version
>>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list