RFR: 8157023: Integrate NMT with JFR

Stefan Johansson sjohanss at openjdk.org
Fri Dec 2 09:06:22 UTC 2022


On Thu, 1 Dec 2022 18:23:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Please review this enhancement to include NMT information in JFR recordings.
>> 
>> **Summary**
>> Native Memory Tracking summary information can be obtained from a running VM using `jcmd` if started with `-XX:NativeMemoryTracking=summary/detail`. Using `jcmd` requires you to run a separate process and to parse the output to get the needed information. This change adds JFR events for NMT information to enable additional ways to consume the NMT data.
>> 
>> There are two new events added:
>> * _NativeMemoryUsage_ - The total native memory usage.
>> * _NativeMemoryUsagePart_ - The native memory usage for each component.
>> 
>> These events are sent periodically and by default the interval is 1s. This can of course be discussed, but that is the staring point. When NMT is not enabled on events will be sent.
>> 
>> **Testing**
>> * Added a simple test to verify that the events are sent as expected depending on if NMT is enabled or not.
>> * Mach5 sanity testing
>
> src/hotspot/share/services/memReporter.cpp line 861:
> 
>> 859: 
>> 860:   MemBaseline usage;
>> 861:   usage.baseline(true);
> 
> Note that this is quite expensive. So it depends on how often we do this. How often are these samples taken?
> 
> Eg. Baseline.baseline does walk all thread stacks (so, probing all of them via mincore()). It also copies a lot of counters around. 
> 
> It is also not threadsafe. Are we at a safepoint here? Normally NMT reports are only done at safepoints.

Haven't looked at the details of `baseline(bool summaryOnly)` that much, but since it `summaryOnly = true` I don't think it actually walk the thread stacks, right? 

We don't do this at a safepoint but looking at `MallocMemorySnapshot::copy_to(...)` it uses `ThreadCritical` to avoid things being cleaned out at the same time. Not sure if there are other thread safety problems though. I would expect this to have the same problems as a summary report triggered through `jcmd` because that isn't run at a safepoint either. But I do see that when used from `jcmd` we take a lock to serialize the NMT query, so we should probably do the same here.

  // Query lock is used to synchronize the access to tracking data.
  // So far, it is only used by JCmd query, but it may be used by
  // other tools.
  static inline Mutex* query_lock() {
    assert(NMTQuery_lock != NULL, "not initialized!");
    return NMTQuery_lock;
  }

So we certainly need to look closer at this. I would like to understand why the query lock is needed.

-------------

PR: https://git.openjdk.org/jdk/pull/11449


More information about the hotspot-dev mailing list