RFR: 8344009: Improve compiler memory statistics [v4]
Ashutosh Mehra
asmehra at openjdk.org
Tue Feb 25 06:37:57 UTC 2025
On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Greetings,
>>
>> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues.
>>
>> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends.
>>
>> I wanted to track that information correctly and display it clearly in a way that is easy to understand.
>>
>> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea).
>>
>> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase.
>>
>> The statistic gives us two new forms of output:
>>
>> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase:
>>
>>
>> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816:
>> Phase Total ra node comp type index reglive regsplit cienv other
>> none 1205512 155104 982984 33712 0 0 0 0 0 33712
>> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680
>> optimizer 916584 0 556416 0 0 0 0 0 0 360168
>> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0
>> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0
>> macroEliminate 196448 0 196448 0 0 0 0 0 0 0
>> iterGVN 327440 0 196368 131072 0 0 0 0 0 0
>> incrementalInline 3992816 0 3043704 62...
>
> Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> avoid Thread::current in high traffic chunk alloc path
src/hotspot/share/compiler/compilationMemStatInternals.hpp line 92:
> 90:
> 91: // A very simple fixed-width FIFO buffer, used for the phase timeline
> 92: template <typename T, int max>
Would `size` be a better name than `max`?
src/hotspot/share/compiler/compilationMemStatInternals.hpp line 160:
> 158: void init(T v) { start = cur = peak = v; }
> 159: void update(T v) { cur = v; if (v > peak) peak = v; }
> 160: dT end_delta() const { return (dT)cur - (dT)start; }
Should it be `return (dT)(cur - start); }`
src/hotspot/share/compiler/compilationMemStatInternals.hpp line 161:
> 159: void update(T v) { cur = v; if (v > peak) peak = v; }
> 160: dT end_delta() const { return (dT)cur - (dT)start; }
> 161: size_t temporary_peak_size() const { return MIN2(peak - cur, peak - start); }
shouldn't it be `MAX2(peak - cur, peak - start)`? Why not just `peak - start`?
src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 149:
> 147: int col = start_indent;
> 148: check_phase_trace_id(e.info.id);
> 149: if (omit_empty_phases && e._bytes.end_delta() == 0 && e._bytes.temporary_peak_size() == 0) {
`omit_empty_phases` is always false. Can it be just removed?
src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 205:
> 203: // seed current entry
> 204: Entry& e = _fifo.current();
> 205: e._bytes.start = e._bytes.cur = e._bytes.peak = cur_abs;
This can be replaced by `e._bytes.init(cur_abs)`.
Same for the next statement.
On same lines I would suggest to add `Entry::init()` and call it here.
src/hotspot/share/memory/arena.hpp line 48:
> 46: const size_t _len; // Size of this Chunk
> 47: // Used for Compilation Memory Statistic
> 48: uint64_t _stamp;
This is wasted space if compilation memory stats is not enabled. One way to avoid this is to subclass `Chunk` as a `StampedChunk` and use that if compilation memory stats is enabled. Is this complexity worth the space saving?
src/hotspot/share/opto/phase.hpp line 125:
> 123: f( _t_temporaryTimer1, "tempTimer1") \
> 124: f( _t_temporaryTimer2, "tempTimer2") \
> 125: f( _t_testTimer1, "testTimer1") \
Would `_t_testPhase1` and `_t_testPhase2` be a better name?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993551
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993720
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993803
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994072
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994121
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968999519
PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969041777
More information about the serviceability-dev
mailing list