A hotspot patch for stack profiling (frame pointer)

Mikael Gerdin mikael.gerdin at oracle.com
Mon Dec 8 17:15:35 UTC 2014


Maynard,

On 2014-12-08 16:05, Maynard Johnson wrote:
> On 12/05/2014 05:09 PM, Brendan Gregg wrote:
>> G'Day Volker,
>>
>> On Fri, Dec 5, 2014 at 11:22 AM, Volker Simonis
>> <volker.simonis at gmail.com> wrote:
>>> Hi Brendan,
>>>
>>> I'm still not understanding who is taking the actual stack traces (let
>>> alone the symbols) in your examples. Is this done by 'perf' itself
>>> based only on the frame pointer?
>>
>> perf is walking the frame pointers.
> Volker, to be specific, the perf profiling tool has a user space part and a
> kernel space part. The collection of stack traces is done by the kernel.
> When a user-specified event (or series of events) occur, the process
> being profiled is interrupted and the sampled information (which can
> optionally include a full stack trace) is made available to the user space
> perf tool to be saved to a file for future post-profiling processing.
>
> During the profiling phase, the perf tool collects information about the
> profiled process's memory mappings, which allows for this address-to-symbol.
> resolution, It's in the post-profiling phase where the sampled instruction,
> along with its associated stack trace, are resolved to the appropriate symbol
> (i.e., function/method) in a specific binary file (e.g., library, exectuable).
>
> And if the VM creates a /tmp/perf-<PID>.map file to save information about
> JITed methods, the perf's post-profiling tool will find it and use it to
> correlate sampled addresses it collected from the VM's executable anonymous
> memory mappings to the method names.

I seem to recall reading about perf having support for DWARF debug info.

If the VM (or a JVM/TI agent) could create DWARF debug symbols, could 
that be used to convey information about inlined functions and stack 
unwinding without frame pointers?
I realize that emitting DWARF debug symbols for generated code is not a 
trivial undertaking but since perf is running sampling in the kernel and 
we can't disable inlining that seems to be one of the few ways we can 
get complete stack traces.

There would be several other advantages to having DWARF symbols for 
generated code, GDB can use them when debugging the JVM for example.

An alternate approach could be to extend the information in 
perf-<PID>.map to have more detailed PC ranges with information about 
which functions are inlined. A lot of that information is available in 
the VM but not necessarily exposed via the tool APIs

/Mikael

>
> -Maynard
>>
>> A JVMTI agent, perf-map-agent, is providing a map file for symbol
>> translation under /tmp/perf-PID.map. Linux perf already hunts for such
>> a file when doing symbol translation.
>>
>>>
>>> As I wrote before, this is pretty hard to get right for a JVM, but
>>> there are good approximations. Have you looked at the 'jstack' tool
>>> which is part of the JDK? If you run it on a Java process, it will
>>> give you exact stack traces with full inlining information. However
>>> this only works at safepoints so it is probably not suitable for
>>> profiling with performance counters.
>>
>> Right, jstack works, and I get full correct stacks. I do really want
>> to take stacks at any moment: not just CPU samples, but when tracing
>> kernel TCP events, or PMC cache miss profiling, etc. perf can already
>> do many advanced tracing and profiling activities. I just needed the
>> Java stacks for context.
>>
>>> But you can also use 'jstack -F
>>> -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most
>>> of the time even with inlined Java frames. This is probably the best
>>> you can get when interrupting a running JVM at an arbitrary point in
>>> time. As you mentioned in one of your blogs, the VM can be in the
>>> C-Library or even in the kernel at that time which don't preserve the
>>> frame pointer either. So it will be already hard to even walk up to
>>> the first Java frame.
>>
>> Well, the JVMs I'm looking at are already built with
>> -fno-omit-frame-pointer (which is good). I edited hotspot to preserve
>> it as well.
>>
>> Here's before I changed hotspot:
>>
>> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-nofp.svg
>>
>> Yes, most stacks are clearly broken.
>>
>> After changing hotspot:
>>
>> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg
>>
>> It's looking pretty good. If you look carefully on the far left and
>> right, there are 0.8% stacks in read() and write() directly from java,
>> which may well be broken (unless a java thread is calling these
>> directly; there could also be some gcc inlining going on). Even if
>> they are broken, I can see 98% of my profile. Plus, I'd be interested
>> to know what exactly is reusing the frame pointer, so we could fix
>> that too.
>>
>> The Java stacks themselves are also about a third as deep as they
>> should be, due to inlining.
>>
>>>
>>> But nevertheless, if the output of 'jstack -F -m' is "good enough" for
>>> your purpose, you can implement something similar in 'perf' or a
>>> helper library of 'perf' and be happy (I don't actually know how perf
>>> takes stack traces but I suppose there may some kind of callback
>>> mechanism for walking unknown frames). This is actually not so hard.
>>> I've recently implemented a "print_native_stack()" function within
>>> hotspot itself (you can call it for example from gdb during debugging
>>> - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4).
>>> Maye you could call this functions directly from 'perf' if perf
>>> attaches with ptrace to the process (I assume it does or how else
>>> could it walk the stack)?
>>
>> An OS-cooperative stack walker would be great, and I think the hotspot
>> team is already doing this for Oracle Solaris. Thanks for the code
>> too, this is pretty interesting.
>>
>> jstack -F -m eats 0.5s of CPU for me, so it would need work to make
>> this into a 99 Hertz-capable profiler. Plus I'd like to pick arbitrary
>> kernel functions or tracepoints and get Java context from them, too.
>> Eg, TCP functions, memory allocation, disk I/O, etc.
>>
>>>
>>> These were just some random thoughts with the hope that they may be helpful.
>>
>> Yes, thanks!
>>
>> Brendan
>>
>


More information about the serviceability-dev mailing list