RFR: 8344009: Improve compiler memory statistics
Thomas Stuefe
stuefe at openjdk.org
Thu Feb 13 16:14:02 UTC 2025
Greetings,
This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues.
Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends.
I wanted to track that information correctly and display it clearly in a way that is easy to understand.
The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea).
The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase.
The statistic gives us two new forms of output:
1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase:
Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816:
Phase Total ra node comp type index reglive regsplit cienv other
none 1205512 155104 982984 33712 0 0 0 0 0 33712
parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680
optimizer 916584 0 556416 0 0 0 0 0 0 360168
escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0
connectionGraph 720016 0 0 621832 0 0 0 0 98184 0
macroEliminate 196448 0 196448 0 0 0 0 0 0 0
iterGVN 327440 0 196368 131072 0 0 0 0 0 0
incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824 65456
incrementalInline_igvn 458512 0 163640 294872 0 0 0 0 0 0
incrementalInline_inline 32728 0 0 32728 0 0 0 0 0 0
337880 0 108504 229376 0 0 0 0 0 0
idealLoop 2499568 0 566696 1932872 0 0 0 0 0 0
idealLoopVerify 327600 0 0 327600 0 0 0 0 0 0
ccp 65456 0 32728 0 0 0 0 0 0 32728
macroExpand 1898544 0 1570944 327600 0 0 0 0 0 0
graphReshape 347920 0 347920 0 0 0 0 0 0 0
matcher 9480400 1817928 6417448 1245024 0 0 0 0 0 0
postselect_cleanup 163800 163800 0 0 0 0 0 0 0 0
scheduler 458192 32728 425464 0 0 0 0 0 0 0
regalloc 178072 178072 0 0 0 0 0 0 0 0
ctorChaitin 39432 39432 0 0 0 0 0 0 0 0
regAllocSplit 1865496 32728 1832768 0 0 0 0 0 0 0
chaitinCoalesce1 1277112 196608 1080504 0 0 0 0 0 0 0
peephole 32728 0 32728 0 0 0 0 0 0 0
output 17868312 17868312 0 0 0 0 0 0 0 0
shorten branches 458472 65456 32728 360288 0 0 0 0 0 0
This is pretty self-explanatory. In this example, when the compilation hit its peak of 58MB, it shows how much (column `Total`) we have allocated on behalf of each separate C2 phase that finished. Note that "none" means allocations happening outside of a TracePhase scope. The columns following `Total` show the breakup, in this particular phase, into individual arena types (resourcearea, node arena etc).
2) We also get a detailed timeline of phase execution and the gradual memory buildup. This also shows large spikes of memory that were confined inside the phase, and that we never got to see before:
Allocation timelime by phase:
Phase seq. number Bytes Nodes
>0 (outside) 102120 (+102120) 3 (+3)
>1 parse 11787496 (+11685376) 7151 (+7148)
<0 (cont.) (outside) 11787496 (+0) 7151 (+0)
>2 optimizer 11787496 (+0) 7151 (+0)
>3 iterGVN 12180392 (+392896) 6313 (-838)
<2 (cont.) optimizer 12180392 (+0) 6313 (+0)
>4 incrementalInline 12180392 (+0) 6313 (+0)
>5 incrementalInline_inline 12213120 (+32728) 6330 (+17)
<4 (cont.) incrementalInline 12213120 (+0) 6330 (+0)
>6 incrementalInline_pru 12213120 (+0) 6287 (-43)
<4 (cont.) incrementalInline 12213120 (+0) 6287 (+0)
>7 incrementalInline_igvn 12213120 (+0) 6286 (-1)
<4 (cont.) incrementalInline 16631400 (+4418280) 17200 (+10914)
>8 incrementalInline_pru 16631400 (+0) 10185 (-7015)
<4 (cont.) incrementalInline 16631400 (+0) 10185 (+0)
>9 incrementalInline_igvn 17122640 (+491240) 9374 (-811)
<4 (cont.) incrementalInline 17122640 (+0) 9374 (+0)
<2 (cont.) optimizer 17122640 (+0) 9396 (+22)
>10 incrementalInline_pru 17122640 (+0) 9360 (-36)
<2 (cont.) optimizer 17122640 (+0) 9360 (+0)
>11 incrementalInline_igvn 17122640 (+0) 9353 (-7)
<2 (cont.) optimizer 17745072 (+622432) 9318 (-35)
>12 18082952 (+337880) 9317 (-1)
<2 (cont.) optimizer 18082952 (+0) 9317 (+0)
>13 idealLoop 18420712 (+337760) 9247 (-70) significant temporary peak: 19762880 (+1679928)
< SNIP SNIP >
<41 (cont.) regalloc 52599072 (+0) 23604 (+0)
>71 chaitinSelect 52599072 (+0) 23604 (+0)
<41 (cont.) regalloc 52599072 (+0) 23604 (+0)
>72 postAllocCopyRemoval 52599072 (+0) 19285 (-4319) significant temporary peak: 58359200 (+5760128)
<41 (cont.) regalloc 52599072 (+0) 19285 (+0)
>73 mergeMultidefs 52599072 (+0) 19285 (+0)
<41 (cont.) regalloc 52599072 (+0) 19285 (+0)
>74 fixupSpills 52599072 (+0) 19256 (-29)
<41 (cont.) regalloc 40458304 (-12140768) 19256 (+0)
<0 (cont.) (outside) 40458304 (+0) 19256 (+0)
>75 blockOrdering 40458304 (+0) 19268 (+12)
<0 (cont.) (outside) 40458304 (+0) 19268 (+0)
>76 peephole 40491032 (+32728) 19332 (+64)
<0 (cont.) (outside) 40491032 (+0) 19332 (+0)
>77 output 40491032 (+0) 19336 (+4)
>78 shorten branches 40949504 (+458472) 19239 (-97)
<77 (cont.) output 40949504 (+0) 19240 (+1) significant temporary peak: 58817816 (+17868312)
>79 bldOopMaps 48294864 (+7345360) 19240 (+0)
<77 (cont.) output 48294864 (+0) 19240 (+0)
>80 fill buffer 50635256 (+2340392) 19574 (+334)
<77 (cont.) output 50635256 (+0) 19574 (+0)
>81 install_code 50635256 (+0) 19574 (+0)
<77 (cont.) output 50634272 (-984) 19574 (+0)
<0 (cont.) (outside) 2225624 (-48408648) 0 (-19574)
The timeline shows the individual phases in the order in which they were executed. The left side is the tree of possibly nested TracePhase scopes. Every phase execution (since a phase can run multiple times) now has a unique phase sequence number. The tree shows those phase sequence numbers, and if a child phase ends, you can see that we are back in the outer phase again ("cont."). Behind that, we see the phase name, allocations made during that phase (Bytes), and the number of nodes allocated. If the phase caused a large temporary spike, it shows up as a "significant temporary peak".
Here you can find a full printout of this information on a run of springboot petclinic with `-XX:CompileCommand=memstat,*.*` via `jcmd spring Compiler.memory verbose`: https://gist.github.com/tstuefe/9d00e7129d4cdf5d1dbc294a80cbe4ed
----
We also now print more detailed information if the JVM runs against a MemLimit and crashes (option `-XX:CompileCommand=memlimit`). Note that we run with a memlimit of 1GB by default in debug JVMs, and we already found a couple of footprint issues with this setting. See an example of a generated hs-err file here: https://gist.github.com/tstuefe/777ecd68b313097a8e9020ac9fea239a
----
Performance costs: Not really much to tell here; we only do a tiny bit more work now per arena chunk allocation and after the compilation ended, but nothing crazy. Nevertheless, I tested before and after this patch with SpecJBB, nothing rose above background noise.
Footprint costs: The new information causes the per-compilation data to be a lot more bulky. That is why I reduced the information stored long-term: we now only store the 32 most expensive compilations. To see the cost of every compilation in its gory details, use the `-XX:CompileCommand=memstat,*.*,print` option. Of course, you can always reduce the scope of your tracking, e.g. limit it to a certain class only.
---
Tests:
- I tested manually on x64 Linux. GHAs ran (including the new tests).
- Tests at SAP are green
-------------
Commit messages:
- more code grooming and comments
- grooming
- wip
- Improved compiler memory statistics
Changes: https://git.openjdk.org/jdk/pull/23530/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8344009
Stats: 1753 lines in 28 files changed: 1187 ins; 237 del; 329 mod
Patch: https://git.openjdk.org/jdk/pull/23530.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530
PR: https://git.openjdk.org/jdk/pull/23530
More information about the serviceability-dev
mailing list