RFR: 8112: Flamegraph model creation performance improvements

Thu Jul 27 15:40:10 UTC 2023

On Tue, 27 Jun 2023 16:06:45 GMT, Vincent Alexander Beelte <duke at openjdk.org> wrote:

> This pull request improves the performance of creating the model that the flame graph visualization is drawn from.
> 
> The first issue this fixes is not actually really "performance" but rather a case where the thread pool used to create the models can be fully saturated with tasks that are already invaliated and blocking the newest useful task.
> The use case where that tends to happen to me goes as follows:
> My jfr file contains about 1.8 million events (1.5 mio "Allocation in new TLAB", 200k "Allocation outside TLAB", 70k "Method Profiling Sample"). It was created by async-profiler during a 5 minute load test.
> When loading this file and switching to the Java Application view (with the Flame Graph visualization already in focus) multiple tasks are generated to create a flamegraph for all event types and all threads. I did not investigate why there would be multiple tasks. "multiple" is "at least two"
> Then without waiting for the graph to load I go on to filter to the thread that I am interested about which generates multiple tasks to generate a flamegraph for all event types in that thread.
> Lastly I filter to the Method Profiling Samples which creates the last task, which then has to wait for enough of the other tasks to finish that one of the 3 thread in the pool can run it.
> All in all with this specific jfr file and the current master (everything including commit 5ace151) in this scenario I am waiting about 55 seconds until I see the graph. (just measured with a stopwatch app)
> 
> My solution to this was to give the StacktraceTreeModel constructor a stop flag that it checks at the start both the inner and the outer loop. So that it can return early. The flag is then implemented by the FlamegraphSwingView.ModelRebuildRunnable.isInvalid field which is already checked at some places to see if the current task is still needed. It does feel strange to do this in a constructor but it was the easiest way to interrupt the most expensive part of creating the StacktraceTreeModel.
> 
> With this flag alone use case above goes from 55 seconds to about 6 seconds and there I am already the limiting factor stopping the clock in time.
> 
> When looking at the flamegraph of a flight recording of jmc where I had it draw me a bunch of different flamegraph with different filters in my 1.8 mio event file it did look like there where some additional low hanging fruit to pick:
> ![streams](https://github.com/openjdk/jmc/assets/917408/892a5851-ed3c-4337-8646-07ee802d3e63)
> This is filtered to on...

Just sneaking in a couple of points while this is still fresh, I'll try to take an actual look at this some time soon.

> When loading this file and switching to the Java Application view (with the Flame Graph visualization already in focus) multiple tasks are generated to create a flamegraph for all event types and all threads. I did not investigate why there would be multiple tasks. "multiple" is "at least two" Then without waiting for the graph to load I go on to filter to the thread that I am interested about which generates multiple tasks to generate a flamegraph for all event types in that thread. Lastly I filter to the Method Profiling Samples which creates the last task, which then has to wait for enough of the other tasks to finish that one of the 3 thread in the pool can run it. All in all with this specific jfr file and the current master (everything including commit [5ace151](https://github.com/openjdk/jmc/commit/5ace151b6dc00096b5b3212edfad40e86f8bcf8d)) in this scenario I am waiting about 55 seconds until I see the graph. (just measured with a stopwatch app)

This is would be a great fix, and there's already an outstanding JIRA ticket for something like this at: https://bugs.openjdk.org/browse/JMC-7080

> I was not able to run the ui tests with "mvn verify -P uitests -Dspotbugs.skip=true" maven fails for the module "org.openjdk.jmc.test.jemmy" with the message saying: "[ERROR] Failed to execute goal org.eclipse.tycho:tycho-surefire-plugin:3.0.4:integration-test (default-integration-test) on project org.openjdk.jmc.test.jemmy: Could not find application "org.openjdk.jmc.rcp.application.app" in the test runtime. Make sure that the test runtime includes the bundle which defines this application." This also happens on the master branch. I do not think I touched anything that would not already be failing a unitest, but if I am not mistaken contributors are expected to run them before their pull requests. So I might need some help with that.

What operating system are you running on?

I had encountered this last week (https://github.com/openjdk/jmc/pull/495#issuecomment-1597556556) when looking at the jmc/core refactoring PR, and from what I can tell it relates back to the commit (https://github.com/openjdk/jmc/commit/ae2fbf359aa8f7612a0d3e6f18857e08cfdfc309) to use JDK 17. I was able to reproduce this consistently on Linux, and @RealCLanger encountered this as well on Mac. I was able to run the uitests without issue on my Windows machine though, unless I had been mistakenly using an older commit.

The good (?) news is that there's no flameview specific uitests to worry about verifying here, so running the uitests would be mainly to verify that these changes don't affect any of the other ui pages.

Here's a jira issue that you can make reference to here for this PR: https://bugs.openjdk.org/browse/JMC-8112

-------------

PR Comment: https://git.openjdk.org/jmc/pull/502#issuecomment-1609940693
PR Comment: https://git.openjdk.org/jmc/pull/502#issuecomment-1652238680