A hotspot patch for stack profiling (frame pointer)

Johannes Rudolph johannes.rudolph at googlemail.com
Mon Dec 8 18:32:46 UTC 2014


Hi there,

I'm the one who started the perf-map-agent project. Thanks, Brendan
and all others for having this discussion here.

On Fri, Dec 5, 2014 at 11:22 AM, Volker Simonis<volker.simonis at
gmail.com> wrote:
> That said, I still don't know how perf creates stack traces. Does it
> attach to the process with ptrace or how else does it inspect the
> stacks after a performance counter event?

`perf` is part of the Linux kernel. It supports different kind of
trace events like hardware performance counter in the CPU. IIUC it
creates a counter that is incremented on certain events (like CPU
cycles, cache misses, etc.) and if the counter overflows the currently
running program is interrupted and the kernel's event processing is
allowed to record the event. No debugger is needed as the event is
generated directly in the context of the monitored process (IIUC,
right?).

You can instruct the perf infrastructure (and thus the kernel) to
capture a stacktrace when the event triggers and the (very simple)
capturing logic is then run inside the kernel. [2] I guess adapting
the kernel stacktrace collection logic would be possible but probably
hard with all the constraints kernel code has to run under.

As I just found out (and as Mikael confirms), there's an alternative
stacktrace collection mode enabled by the `-g dwarf` setting of `perf
record` which instructs the kernel to capture the complete X bytes at
the top of the stack [3] and actually walks the stack in user mode. It
would be probably much easier to adapt this user-mode code to figure
out stacktraces for JIT'd methods if we can recover their frame pointers.

Would that work? Maybe it's enough to get the stackframe size of the
nmethod, because if `rsp` never changes during a JIT'd method (i.e. if
nmethods have a constant frame size) you may be able to walk the stack
by calculating the frame from `rsp` and the frame size (it seems
that's basically also how hotspot does it). How could you get at the
frame size information?

--
Johannes

[1] https://github.com/jrudolph/perf-map-agent
[2] https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/perf_event.c#L2051
[3] https://github.com/torvalds/linux/blob/master/tools/perf/util/evsel.c#L547

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net


More information about the serviceability-dev mailing list