RFR: JDK-8297958: NMT: Display peak values [v4]
Stefan Johansson
sjohanss at openjdk.org
Tue Dec 6 13:12:10 UTC 2022
On Tue, 6 Dec 2022 12:46:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Historical peak values can be very useful in analyzing memory footprint.
>>
>> For example, want to know how much memory the compiler needs during warmup? You have to get an NMT report at the exact time, with compile arenas at their largest combined extensions. But if we had peak values, you'd see how much the compiler needed by just looking at its arena peak size.
>>
>> We already collect peak values in debug builds, but never actually display them. Since we already pay for them, we might just as well print them.
>>
>> There is also a small issue that peak size and count are updated separately. It makes not much sense to treat these as independent values. Therefore this patch changes the meaning and implementation of peak count from today's "highest count" to "count at the point peak size was reached".
>>
>> How this looks like:
>>
>>
>> - GC (reserved=425748KB, committed=94868KB)
>> (malloc=38552KB #1086) (peak=38622KB #1409) <<<<
>> (mmap: reserved=387196KB, committed=56316KB)
>>
>> - GCCardSet (reserved=29KB, committed=29KB)
>> (malloc=29KB #387)
>>
>> - Compiler (reserved=200KB, committed=200KB)
>> (malloc=36KB #59) (peak=37KB #74) <<<<
>> (arena=165KB #5) (peak=6192KB #18) <<<<
>>
>>
>>
>> [0x00007f7e6b0866b0] Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x40
>> [0x00007f7e6c117a29] OopMap::OopMap(int, int)+0x69
>> [0x00007f7e6b2857a5] LinearScan::compute_oop_map(IntervalWalker*, LIR_Op*, CodeEmitInfo*, bool)+0x85
>> [0x00007f7e6b285f2a] LinearScan::compute_oop_map(IntervalWalker*, LIR_OpVisitState const&, LIR_Op*)+0x5a
>> (malloc=32KB type=Arena Chunk #1) (peak=288KB #9) <<<<
>>
>>
>> Notes:
>>
>> - This RFE just adds peak to the *debug* VM. In debug, we already have all values, its just a matter of printing them. I would love to see peak values in release builds too, but collecting them does cost one or more CAS per malloc and therefore we must analyze performance before enabling. Having them in release would also remove the remaining #ifdef ASSERT.
>>
>> - I omit printing peak values when we are at peak. So if "peak" is missing, current peak is implied.
>>
>> - I only print peak values in summary and detail mode, not in any of the diff modes, to keep code complexity low and because diff modes are more about baseline compare.
>>
>> - In detail mode, there is a small display issue that call sites will be omitted that have no current allocations. A hypothetical call site that allocated a zillion byte, then freed them all, will not be shown even though its peak value would be interesting. That is an issue for another RFE.
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
>
> - feedback johan
> - at peak
A very useful change Thomas and it looks good in general. Just two small things to clean up.
Regarding having it on in release build, I think that would make sense as long as the perf impact isn't to big. I would not expect it to be so bad, but certainly good to measure it closely before doing it.
src/hotspot/share/services/memReporter.cpp line 64:
> 62: }
> 63:
> 64: // blends out mtChunk count number
Before we explicitly passed in 0 as the count for `mtChunk`, but this is not true anymore since we pass in the `MemoryCounter`. So if we want to omit `mtChunk` count we need to reset the count here or handle it some other way.
src/hotspot/share/services/memReporter.cpp line 231:
> 229: // We don't know how many arena chunks are in used, so don't report the count
> 230: size_t count = (flag == mtChunk) ? 0 : malloc_memory->malloc_count();
> 231: print_malloc_line(malloc_memory->malloc_counter());
The above `count` is no longer needed.
-------------
Changes requested by sjohanss (Reviewer).
PR: https://git.openjdk.org/jdk/pull/11497
More information about the hotspot-runtime-dev
mailing list