A hotspot patch for stack profiling (frame pointer)
brendan.d.gregg at gmail.com
Thu Dec 4 22:55:37 UTC 2014
I've hacked hotspot to return the frame pointer, in part to see what this
involves, and also to have a working prototype for analysis. Along with an
agent to resolve symbols, this has allowed full stack profiling using Linux
perf_events. The following flame graphs show the resulting profiles.
A mixed mode CPU flame graph of a vert.x benchmark (click to zoom):
Same thing, but this time disabling inlining, to show more frames:
As expected, performance is worse without inlining. You can compare the
flame graphs side by side to see why. Less time spent doing work / I/O!
is my patch, and currently only works for x86-64. It removes RBP from the
register pools, and inserts "mov(rbp, rsp)" into two function prologues. It
is also unsupported: use at your own risk. I'm not a veteran hotspot
engineer, so chances I messed something up are high.
I'd love to be able to enable frame pointers in Oracle JDK, eg, with an
-XX:+NoOmitFramePointer option. It could be put under
-XX:+UnlockDiagnosticVMOptions or XX:+UnlockExperimentalVMOptions. So long
as we had some way to turn it on. If someone wants to include (improve,
rewrite) my patch, please do.
I don't have much perf data yet, but on the vert.x microbenchmark it looked
like returning the frame pointer cost 2.6% performance. I hope that's
somewhat worst-case for production workloads. (I was also able to recover
the 2.6% by fine tuning other options, so were this a production change,
I'd be hoping not to regress performance at all.)
We've discussed this before (
The Solaris-assisted approach that Serguei Spitsyn described (JDK-6617153)
should work very well. The JVM can run as-is, full stacks can be generated
on-demand, and symbols should always be correct.
The frame pointer approach costs a little performance, and only shows
partial stacks after inlining (unless you disable inlining, but that can
cost >40% performance). There is the other issue Volker Simonis mentioned
as well, where some stacks may not be profiled correctly. And, if you are
unlucky, symbols can move during the profile, so any static perf-map-agent
map will translate some incorrectly (I've considered developing a way to
detect this, and highlight such frames as dubious.)
At Netflix we are mostly Java on Linux. Switching to Oracle Solaris for
this feature is going to be a tough sell, especially when the value of full
stack profiling isn't widely understood. I personally think it might be a
bit easier if a -XX:+NoOmitFramePointer option existed, so Linux users can
try the feature, then consider the better Solaris version after gaining
solid experience on why it is so important.
We recently blogged about the value of stack profiling and flame graphs,
http://techblog.netflix.com/2014/11/nodejs-in-flames.html, although this
was for Node.js, which already has frame pointer support.
If anyone wants to try generating these mixed mode CPU flame graphs
themselves (in a test environment!), the first step is to compile OpenJDK 8
b132 with the previous patch, and get that running. Also install the
packages for the "perf" command. The remaining steps would be something
# git clone --depth=1 https://github.com/brendangregg/FlameGraph
# git clone --depth=1 https://github.com/jrudolph/perf-map-agent
# cd perf-map-agent
# export JAVA_HOME=/...
# cmake .
# perf record -F 99 -p `pgrep -n java` -g -- sleep 30
# java -cp attach-main.jar:$JAVA_HOME/lib/tools.jar
net.virtualvoid.perf.AttachOnce `pgrep -n java`
# perf script > ../FlameGraph/out.stacks
# cd ../FlameGraph
# ./stackcollapse-perf.pl < out.stacks | ./flamegraph.pl --color=java >
Finally, if you are new to CPU flame graphs, see
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-compiler-dev