Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)

Thomas Stüfe thomas.stuefe at gmail.com
Thu Dec 1 07:26:07 UTC 2022


Hi Carter, Stefan,

thank you, I think it is good to have this discussion, it is important.

Side note, the discussion steered away from my original question - whether
to instrument the JDK with NMT. I still would love to discuss that, too.

About opening NMT up for user consumption, that is of course possible. But
I think the bigger question is which data we want to open for user
consumption, and at what granularity. And what contracts do we enter when
we do this.

NMT was originally a hotspot-dev-centric tool. It has a lot of
idiosyncrasies. Interpreting the results needs detailed knowledge about
hotspot memory management. Some examples:

- its reports are not consistent across JDK versions, not even across
different patch levels of the same JDK. So you cannot compare results, say,
between JDK11 and 17.
- before a certain version X (I believe JDK 11), the full thread stacks
were accounted for instead of just the in-use portion of the thread stacks.
I remember reading blogs about how thread stack consumption went down when
all that changed was NMT reporting.
- The memory sizes it shows may not have much to do with real RSS. It
systematically underreports some things, since it omits libc overhead and
retention, usage by system- and JNI libraries. But it also overreports
things since it mostly (not always) accounts in terms of "committed"
memory, which usually means mmap()ed or malloc()ed memory. But that is just
committed, not physical memory, it does not translate to RSS usage
directly. That memory may never be touched. OTOH NMT probes thread stacks
with mincore(), so for that section, "committed" really means "physical".

I am fine with opening up NMT via JFR. But does this mean we have to be
more consistent? Do we have to care about downward compatibility of NMT
reports? Are we then still free to redesign the tag system (see my original
mail) or will this tie us down with the current NMT tag system forever? As
a negative example, JFR exposes metaspace allocator details (chunk
statistics) which have been broken ever since JDK 16 when the underlying
implementation changed.

Therefore I am curious about what end users use NMT really for.

@Carter: can you give us examples of which NMT sections had been
particularly useful to you? Maybe we can define a subset to expose instead
of exposing all tags. E.g. I can see thread stack usage being very useful,
but things like ObjectMonitor footprint not so much.

Cheers, Thomas




On Wed, Nov 30, 2022 at 9:45 PM Carter Kozak <ckozak at ckozak.net> wrote:

> This looks fantastic, thank you so much! I can confirm that the proposed
> design would solve my use-case.
>
> I'd enjoy discussing the NMT event  contract somewhere more specific
> to the implementation, but I don't want to muddle this thread with
> implementation details.
>
> Carter Kozak
>
> On Wed, Nov 30, 2022, at 03:37, Stefan Johansson wrote:
>
> Hi Carter,
>
> Your mail made me pick up an old item from my wishlist: to have native
> memory tracking information available in JFR recordings. When we, in GC,
> do improvements to decrease the native memory overhead of our
> algorithms, NMT is a very good tool to track the progress. We have
> scripts that sound very similar to what you describe and more than once
> I've been thinking about adding this information into JFR. But it has
> not been a priority and the greater value has been unclear.
>
> Hearing that others might also benefit from such a change I took a
> discussion with the JFR team on how to best proceed with this. I have
> created a branch for this and will probably create a PR for it shortly,
> but I thought I would drop it here first:
> https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt
>
> The change adds two new JFR events: one for the total usage and one for
> the usage of each memory type. These are sent only if Native Memory
> Tracking is turned on, and they are enabled in the default JFR profile
> with an interval of 1s. This might change during reviewing but it was a
> good starting point.
>
> With this you will be able to use JFR streaming to access the events
> from within your running process. I hope this will help your use cases
> and please let us know if you have any comments or suggestions.
>
> Thanks,
> Stefan
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20221201/21e28f7d/attachment-0001.htm>


More information about the core-libs-dev mailing list