RFR: 8157023: Integrate NMT with JFR

Thomas Stuefe stuefe at openjdk.org
Thu Dec 1 18:44:55 UTC 2022


On Thu, 1 Dec 2022 10:48:51 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this enhancement to include NMT information in JFR recordings.
> 
> **Summary**
> Native Memory Tracking summary information can be obtained from a running VM using `jcmd` if started with `-XX:NativeMemoryTracking=summary/detail`. Using `jcmd` requires you to run a separate process and to parse the output to get the needed information. This change adds JFR events for NMT information to enable additional ways to consume the NMT data.
> 
> There are two new events added:
> * _NativeMemoryUsage_ - The total native memory usage.
> * _NativeMemoryUsagePart_ - The native memory usage for each component.
> 
> These events are sent periodically and by default the interval is 1s. This can of course be discussed, but that is the staring point. When NMT is not enabled on events will be sent.
> 
> **Testing**
> * Added a simple test to verify that the events are sent as expected depending on if NMT is enabled or not.
> * Mach5 sanity testing

Hi Stefan,

I think this looks interesting, and potentially very useful. But I am not yet convinced that exposing all tags is the way to go. For those who are interested, the original ML thread: https://mail.openjdk.org/pipermail/core-libs-dev/2022-December/097404.html. 

The number of values may expand considerably in the future: we may want to use tags in a far more fine-granular manner than we do now, and/or change their encoding - eg to work in a hierarchy, or groups, or in combined an UL-like fashion. So their number may expand, and their meaning change, which could render this report obsolete quickly. E.g. if we add tag hierarchies, do we then only report leaf tags? How useful would that be? If we allow tag combinations, how would we report that?

Also note that I currently work on a patch for showing NMT peak malloc values, see https://bugs.openjdk.org/browse/JDK-8297958. Peak values are very useful to have. So, do we expose them too? One more value per category. But leaving them out would render the JFR NMT report less useful.

Bottomline, I am not yet convinced that reporting all NMT categories is that useful. And it exposes implementation details that may cause breakage in the future. We could restrict them to a subset of useful ones, and only report that.

Another thought, for virtual memory mappings you report reserved and committed. But I doubt that "reserved" is really of much use. In itself, it does not cost anything, at least not on 64-bit. For a select few categories, it can signify the largest amount of committable memory (e.g. heap and code space) but those are already reported in JFR. So I think we could omit "reserved" and save a bunch of code and make the NMT JFR report less overwhelming.

Cheers, Thomas

src/hotspot/share/jfr/metadata/metadata.xml line 711:

> 709:   <Event name="NativeMemoryUsagePart" category="Java Virtual Machine, Memory" label="Component Native Memory Usage" description="Native memory usage for a component" stackTrace="false" thread="false"
> 710:     startTime="false" period="everyChunk">
> 711:     <Field type="string" name="type" label="Memory Type" description="Component allocating the native memory" />

Is there a better way than to re-transmit the category name with every event?

src/hotspot/share/jfr/metadata/metadata.xml line 712:

> 710:     startTime="false" period="everyChunk">
> 711:     <Field type="string" name="type" label="Memory Type" description="Component allocating the native memory" />
> 712:     <Field type="ulong" contentType="bytes" name="reserved" label="Reserved Memory" description="Reserved bytes by this component" />

See my comment above. I am not sure we need reserved. If not, we could cut out a lot of code.

src/hotspot/share/services/memReporter.cpp line 25:

> 23:  */
> 24: #include "precompiled.hpp"
> 25: #include "jfr/jfrEvents.hpp"

I think this is an intermixing of layers. I think it would be cleaner if if JFR accessed the current values from outside, instead of JFR knowledge leaking into NMT.

src/hotspot/share/services/memReporter.cpp line 36:

> 34: #include "utilities/globalDefinitions.hpp"
> 35: 
> 36: size_t MemReporterBase::reserved_total(const MallocMemory* malloc, const VirtualMemory* vm) {

Why the const removal?

src/hotspot/share/services/memReporter.cpp line 861:

> 859: 
> 860:   MemBaseline usage;
> 861:   usage.baseline(true);

Note that this is quite expensive. So it depends on how often we do this. How often are these samples taken?

Eg. Baseline.baseline does walk all thread stacks (so, probing all of them via mincore()). It also copies a lot of counters around. 

It is also not threadsafe. Are we at a safepoint here? Normally NMT reports are only done at safepoints.

-------------

PR: https://git.openjdk.org/jdk/pull/11449


More information about the hotspot-dev mailing list