RFR: 8157023: Integrate NMT with JFR

Stefan Johansson sjohanss at openjdk.org
Fri Dec 2 10:33:21 UTC 2022


On Fri, 2 Dec 2022 09:03:50 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> src/hotspot/share/services/memReporter.cpp line 861:
>> 
>>> 859: 
>>> 860:   MemBaseline usage;
>>> 861:   usage.baseline(true);
>> 
>> Note that this is quite expensive. So it depends on how often we do this. How often are these samples taken?
>> 
>> Eg. Baseline.baseline does walk all thread stacks (so, probing all of them via mincore()). It also copies a lot of counters around. 
>> 
>> It is also not threadsafe. Are we at a safepoint here? Normally NMT reports are only done at safepoints.
>
> Haven't looked at the details of `baseline(bool summaryOnly)` that much, but since it `summaryOnly = true` I don't think it actually walk the thread stacks, right? 
> 
> We don't do this at a safepoint but looking at `MallocMemorySnapshot::copy_to(...)` it uses `ThreadCritical` to avoid things being cleaned out at the same time. Not sure if there are other thread safety problems though. I would expect this to have the same problems as a summary report triggered through `jcmd` because that isn't run at a safepoint either. But I do see that when used from `jcmd` we take a lock to serialize the NMT query, so we should probably do the same here.
> 
>   // Query lock is used to synchronize the access to tracking data.
>   // So far, it is only used by JCmd query, but it may be used by
>   // other tools.
>   static inline Mutex* query_lock() {
>     assert(NMTQuery_lock != NULL, "not initialized!");
>     return NMTQuery_lock;
>   }
> 
> So we certainly need to look closer at this. I would like to understand why the query lock is needed.

Looking a bit more at this. To me it seems like we are protecting the stored baseline with the lock is:

class MemTracker : AllStatic {
...
  // Stored baseline
  static MemBaseline      _baseline;
...
};

That the `NMTDCmd` code uses through `MemTracker::get_baseline()`. So if the JFR events keep its own baseline we should be fine without the lock. Or am I missing something here?

-------------

PR: https://git.openjdk.org/jdk/pull/11449


More information about the hotspot-dev mailing list