RFR: 8112: Flamegraph model creation performance improvements

Vincent Alexander Beelte duke at openjdk.org
Thu Jul 27 15:40:10 UTC 2023


This pull request improves the performance of creating the model that the flame graph visualization is drawn from.

The first issue this fixes is not actually really "performance" but rather a case where the thread pool used to create the models can be fully saturated with tasks that are already invaliated and blocking the newest useful task.
The use case where that tends to happen to me goes as follows:
My jfr file contains about 1.8 million events (1.5 mio "Allocation in new TLAB", 200k "Allocation outside TLAB", 70k "Method Profiling Sample"). It was created by async-profiler during a 5 minute load test.
When loading this file and switching to the Java Application view (with the Flame Graph visualization already in focus) multiple tasks are generated to create a flamegraph for all event types and all threads. I did not investigate why there would be multiple tasks. "multiple" is "at least two"
Then without waiting for the graph to load I go on to filter to the thread that I am interested about which generates multiple tasks to generate a flamegraph for all event types in that thread.
Lastly I filter to the Method Profiling Samples which creates the last task, which then has to wait for enough of the other tasks to finish that one of the 3 thread in the pool can run it.
All in all with this specific jfr file and the current master (everything including commit 5ace151) in this scenario I am waiting about 55 seconds until I see the graph. (just measured with a stopwatch app)

My solution to this was to give the StacktraceTreeModel constructor a stop flag that it checks at the start both the inner and the outer loop. So that it can return early. The flag is then implemented by the FlamegraphSwingView.ModelRebuildRunnable.isInvalid field which is already checked at some places to see if the current task is still needed. It does feel strange to do this in a constructor but it was the easiest way to interrupt the most expensive part of creating the StacktraceTreeModel.

With this flag alone use case above goes from 55 seconds to about 6 seconds and there I am already the limiting factor stopping the clock in time.

When looking at the flamegraph of a flight recording of jmc where I had it draw me a bunch of different flamegraph with different filters in my 1.8 mio event file it did look like there where some additional low hanging fruit to pick:
![streams](https://github.com/openjdk/jmc/assets/917408/892a5851-ed3c-4337-8646-07ee802d3e63)
This is filtered to only the threads of the flame graph threadpool and method profiling samples.
It looks like a lot of the time is spend creating java.util streams and computing a node id
To remove the streams I implemented StacktraceTreeModel.getOrCreateNode just using a plain old for each loop.
To avoid computing the node id I moved Node.nodeId to do lazy initialization outside the constructor. (later I found that the nodeId wasn't actually used and I fully removed it)
That improved my "benchmark" (me clicking filters and writing down the timing logs I added) of generating the flamegraph for all threads and all events in my 1.8 mio event file from about 28 seconds to about 4 seconds.

That made the flamegraph look like this
![nostreams](https://github.com/openjdk/jmc/assets/917408/4693cb81-4fc3-4ff0-9026-a252c829cfaf)
I wondered if I could get more performance if I removed the switch over an enum that seemed to add a Enum.ordinal call in FrameSeparator.getCategory but other than the code now looking a little nicer (at least to my eyes with the delegation instead of a switch) that did not do anything I could measure. I was still at 4 seconds for generating the graph for all threads and events. But given that it looked nicer and performed the same I still left it in the pull request so the final flamegraph looks like this
![end](https://github.com/openjdk/jmc/assets/917408/77ee0b04-c9a8-49b6-a243-4166b08ebc2b)
Which at least to me does not contain any more obvious places other than maybe the big part of the AggregatableFrame.equals
I did make one attempt of saving the Node.children as a Map<AggregatableFrame, Node> but that was actually worse (about 8 seconds for the benchmark). I am guessing that is a case where the list is generally small enough that more time is lost from additionally having to compute hash codes than is gained from a better class of algorithm (or I just did it wrong ;))


I was not able to run the ui tests with "mvn verify -P uitests -Dspotbugs.skip=true"
maven fails for the module "org.openjdk.jmc.test.jemmy" with the message saying:
"[ERROR] Failed to execute goal org.eclipse.tycho:tycho-surefire-plugin:3.0.4:integration-test (default-integration-test) on project org.openjdk.jmc.test.jemmy: Could not find application "org.openjdk.jmc.rcp.application.app" in the test runtime. Make sure that the test runtime includes the bundle which defines this application."
This also happens on the master branch.
I do not think I touched anything that would not already be failing a unitest, but if I am not mistaken contributors are expected to run them before their pull requests. So I might need some help with that.

Also there are a lot of unrelated changes in this pull request from code formatting. They come from the usual save action to format the full file and since I imported the formatter settings I do not think the formatting is wrong. I can go through and revert them if that is a problem.
Some other unrelated changes are from eclipse being eclipse I assume. At least I did not intentionally touch any .project or .classpath I can revert those to if that is just something my eclipse did for my environment.

-------------

Commit messages:
 - Merge remote-tracking branch 'origin/master' into flamegraph_performance
 - Merge branch 'openjdk:master' into flamegraph_performance
 - Merge remote-tracking branch 'origin/master' into flamegraph_performance
 - Merge remote-tracking branch 'origin/master' into flamegraph_performance
 - fix automatic missmerge
 - remove node id because it was unused
 - add a test for stop flag
 - document the stop flag and also check it in the inner loop.
 - see what happens when delegating directly to the enums
 - avoiding one more stream in strack trace tree model and making node id
 - ... and 2 more: https://git.openjdk.org/jmc/compare/688729fe...d5acdfcc

Changes: https://git.openjdk.org/jmc/pull/502/files
 Webrev: https://webrevs.openjdk.org/?repo=jmc&pr=502&range=00
  Issue: https://bugs.openjdk.org/browse/JMC-8112
  Stats: 257 lines in 10 files changed: 102 ins; 50 del; 105 mod
  Patch: https://git.openjdk.org/jmc/pull/502.diff
  Fetch: git fetch https://git.openjdk.org/jmc.git pull/502/head:pull/502

PR: https://git.openjdk.org/jmc/pull/502


More information about the jmc-dev mailing list