RFR: 8344009: Improve compiler memory statistics
Thomas Stuefe
stuefe at openjdk.org
Fri Feb 14 06:42:09 UTC 2025
On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> Greetings,
>
> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues.
>
> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends.
>
> I wanted to track that information correctly and display it clearly in a way that is easy to understand.
>
> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea).
>
> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase.
>
> The statistic gives us two new forms of output:
>
> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase:
>
>
> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816:
> Phase Total ra node comp type index reglive regsplit cienv other
> none 1205512 155104 982984 33712 0 0 0 0 0 33712
> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680
> optimizer 916584 0 556416 0 0 0 0 0 0 360168
> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0
> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0
> macroEliminate 196448 0 196448 0 0 0 0 0 0 0
> iterGVN 327440 0 196368 131072 0 0 0 0 0 0
> incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824...
Some additional technical information about how this statistic works:
The JVM informs the statistics about the following events:
A) When a compilation starts
B) When a compilation ends.
C) When a new compilation phase starts. That can happen in nested form.
D) When a compilation phase ends.
E) Whenever an arena grows a new chunk (regardless of whether this was a cached chunk from the chunk pool or a newly allocated chunk).
F) When an arena sheds chunks - either by rolling back to a previous ResourceMark or because the arena itself gets deleted.
During compilation (between (A) and (B)), we keep the statistic state for this compilation in an `ArenaStatCounter` object that is attached to the current compiler thread.
When a new compilation phase starts (C), we push the phase info onto a `PhaseInfoStack`. When a phase ends, we pop that information.
When we are informed of a new chunk allocation (E), we:
- Set a stamp in the chunk header to mark it as being owned by this phase and this arena type
- In the `ArenaStatCounter` object, we adjust global counters and counters in a two-dimensional table (`ArenaCounterTable`) that keeps counters per arena tag and compilation phase.
- If total memory consumption for this compilation reaches a new peak, we take a snapshot of all counters as peak state.
- We also handle `MemLimit` violations here: if `-XX:CompileCommand=memlimit...` was enabled, and the total footprint of the compilation surpasses that limit, we either end the JVM with a fatal error or we bail on the compilation. That depends on the sub-option given to the command.
When informed of a chunk deletion (F), we:
- extract the stamp from the chunk header to know what phase/arena type this deallocation accounts to
- we then adjust the counters for that phase/arena type in the `ArenaCounterTable`
When a compilation phase ends (D), we adjust the "footprint timeline". The footprint timeline - `FootprintTimeline` - is a one-dimensional buffer of (phase info, counter) tupels. It represents the "flattened out" form of the phase invocation tree: an invocation of a child phase nested in a parent phase "interrupts" the parent phase, and when the child phase ends, the parent phase is "restarted" as a new entry in the timeline. For example, let's say we execute phase "optimizer", and inside that, call the phase "iterGVN" and then "incrementalInline". Between these two phases, we allocate from resource area. The invocation tree looks like this:
> optimizer 1024 KB
> iterGVN 1032 KB
< optimizer (cont.) 1032 KB + 1MB resource arena
> incrementalInline 1032 KB + 1MB resource arena
< optimizer (cont.) 1032 KB + 1MB resource arena
The flattened-out footprint timeline will look somewhat like this:
Phase Sequence Number | Phase Name | Footprint
5 optimizer 1024 KB
6 iterGVN 1032 KB
5 optimizer 1032 KB + 1MB
7 incrementalInline 1032 KB + 1MB
5 optimizer 1032 KB + 1MB
Finally, when the compilation ends, we print out the statistic for it (if the suboption `print` was given with `-XX:CompileCommand=memstat`). We also save a copy of the counters to a global table that contains the N most expensive compilations. That table will be printed when one uses `jcmd <pid> Compiler.memory`. We also print it into the hs-err file.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2658400920
More information about the serviceability-dev
mailing list