Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)
Kennke, Roman
rkennke at amazon.de
Mon Dec 5 11:03:18 UTC 2022
Hi Thomas,
I very much like the idea and also your proposals how to do it. Insights
in JDK's native memory usage is sorely lacking and would be very useful!
I don't have all that much to add about the details beyond what you
already covered, though :-)
Cheers,
Roman
> Are there any opinions about whether or not to extend NMT across the JDK?
>
> This blocks https://bugs.openjdk.org/browse/JDK-8296360
> <https://bugs.openjdk.org/browse/JDK-8296360>, and I had a PR prepared
> as https://github.com/openjdk/jdk/pull/10988
> <https://github.com/openjdk/jdk/pull/10988>. Originally I was hoping to
> get this into JDK 20, but I don't think that is realistic anymore. I am
> fine with postponing my work in favor of a baseline discussion, but so
> far there is very little discussion about this topic.
>
> How should I proceed?
>
> Thanks, Thomas
>
>
>
> On Wed, Nov 9, 2022 at 8:12 AM Thomas Stüfe <thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>> wrote:
>
> Hi Alan,
>
> (replaced hotspot-runtime-dev with hotspot-dev, since its more of a
> general topic)
>
> thank you for your time!
>
> I am very happy to talk this through. I think native memory
> observability in the JDK (and customer code!) is sorely lacking.
> Witness the countless "where did my native memory go" blog articles.
> At SAP we have been struggling with this topic for a long time and
> have come up with a mixture of solutions. The aforementioned tracker
> was one, which extended our version of NMT across the JDK. Our
> SapMachine MallocTracer, which allows us to trace uninstrumented
> customer code, another. We even experimented with exchanging the
> allocator (using jemalloc) to gain insights. But that is a whole
> different topic with deep logistical implications, I don't want to
> touch it here. Exchanging the allocator does not help to observe
> virtual memory or the brk segment, of course.
>
> And to make the picture complete, another insight we currently lack
> is the implicit allocator overhead, which can be very significant
> and is hidden by the libc. We also have observability for that in
> the SapMachine, and I miss it in OpenJDK.
>
> As you noticed, my original intent was just to instrument Zlib and
> possibly improve tracking for DBBs. Although, thinking beyond that,
> another attractive instrumentation target would be mapped NIO
> buffers at least.
>
> So I think native memory observability is important. Arguably we
> could even extend observability to cover other OS resources, e.g.
> file handles. If we shift code around, to java/Panama: data that
> move the java heap does not need to be tracked, but other memory
> will always come from one of the basic system APIs, regardless of
> who allocates it and where in the stack allocation happens. Be it
> native JDK code, Panama, or even customer JNI code.
>
> If we agree on the importance of native memory observability, then I
> believe NMT is the right tool for it. It is a good tool. The
> machinery is already there. It covers both C-heap and virtual memory
> APIs, as well as thread stacks, and could easily be extended to
> cover sbrk if needed. And I assume that whatever shape OpenJDK takes
> on in the future, there always will be a libjvm.so at its core, so
> we will always have it. But even if not, NMT could be separated from
> libjvm.so quite easily, since it has no deep ties with the JVM.
>
> About coupling JVM with outside code: We don't have to directly link
> against libjvm.so. We can keep things loose if the intent is to be
> runnable without a JVM, or be JVM-version-agnostic. That could take
> the form of a function-pointer interface like JVMTI. Or outside code
> could dynamically dlsym the JVM allocation hooks. In any case
> gracefully falling back to system allocation routines when necessary.
>
> And I agree, polluting the NMT tag space with outside meaning is
> ugly. I only did it because I planned to go no further than
> instrumenting Zlib and possibly DBBs. But if we take this further,
> my preferred solution would be a reserved tag range or -ranges for
> outside use, whose inner meaning would be opaque to the JVM. Kind of
> like SIGRTMIN+SIGRTMAX. Then, outside code could register tags and
> their meta information with the JVM, or we find a different way to
> convey the tag meaning to NMT (config files, or callbacks). That
> could even be opened up for customer use.
>
> This also touches on another question, that of NMT tag space. NMT
> tags are very useful since they allow cheap tracking without
> capturing call stacks. However, tags are underused and show growing
> pains since they are too one-dimensional and restrictive. We had
> competing interests in the past about tag granularity. It is all
> over the place. We have coarse-grained tags like "mtThread", and
> very fine-grained ones like "mtObjectMonitor". There are several
> ways we could improve, e.g., by making them combinable like UL does,
> or allowing for a hierarchy of them - either a hard-wired limited
> one like "domain"+"tag", or an unlimited tree-like one. Technically
> interesting since whatever the new encoding is, they still must fit
> into a malloc header. I opened
> https://bugs.openjdk.org/browse/JDK-8281819
> <https://bugs.openjdk.org/browse/JDK-8281819> to track ideas like these.
>
> Instrumenting Panama allocations, including the ability to tag
> allocations, would be a very good idea. For instance, if we ever
> remove the native Zlib layer and convert it to java using Panama, we
> can do the same with Panama I do now natively - use the Zlib zalloc
> interface to hook in JVM memory allocation functions. The result
> could be completely identical, and the end user looking at the NMT
> output need never know that anything changed.
>
> And that goes for all instrumentation - if today we add it to JNI
> code, and that code gets removed tomorrow, we can add it to Panama
> code too. Unless data structures move to the heap, in which case
> there is no need to track them.
>
> You mentioned that NMT was more of an in-house support tool. Our
> experience is different. Even though it was positioned as a tool for
> JVM developers, and we never cared for the backward compatibility or
> consistency, it gets used a *lot* by our customers. We have to
> explain its output frequently. Also, many blog articles exist
> documenting its use. So, maybe it would be okay to elevate it to a
> user-facing tool since it seems to occupy that role anyway. We may
> also open up consumption of NMT results via java APIs, or expose its
> results via MXBeans.
>
> If this is to be a JEP, okay, but I'm afraid it would stall things a
> bit. I am interested in getting a simpler and quicker solution for
> older support releases at least, possibly based on my PR. I know
> that would be unconventional though.
>
> Thank you,
>
> Thomas
>
>
> On Sun, Nov 6, 2022 at 9:31 AM Alan Bateman <Alan.Bateman at oracle.com
> <mailto:Alan.Bateman at oracle.com>> wrote:
>
> On 04/11/2022 16:54, Thomas Stüfe wrote:
> > Hi all,
> >
> > I am currently working on
> https://bugs.openjdk.org/browse/JDK-8296360
> <https://bugs.openjdk.org/browse/JDK-8296360>;
> > I was preparing the final PR [1], but then Alan did ask me to
> discuss
> > this on core-libs first.
> >
> > Backstory:
> >
> > NMT tracks hotspot native allocations but does not cover the JDK
> > libraries (small exception: Unsafe.AllocateMemory). However, the
> > native memory footprint of JDK libraries can be significant.
> We have
> > no in-VM tracker for these and need tools like valgrind or our
> > SapMachine MallocTracer [2] to observe them.
>
> Thanks for starting a discussion on this as this is a topic that
> requires agreement from several areas. If this is the start of
> something
> bigger, where you want to have all allocation sites in the
> libraries
> using NMT, then I think it needs a write-up, maybe a JEP.
>
> For starters, I think it needs some agreement on using NMT for
> memory
> allocated outside of libjvm. You mentioned Unsafe as an
> exception but
> that is implemented in the VM so you get tracking for free,
> albeit I
> think all allocations are in the "mtOther" category.
>
> A general concern is that it creates more coupling between the
> VM code
> and the libraries code. As you probably know, we've removed most
> of the
> dependences on JVM_* functions from non-core areas over many
> years. So I
> think that needs consideration as I assume we don't want
> memory/allocation.hpp declaring a dozen catagories for
> allocations done
> in say java.desktop module for example. Maybe your proposal will be
> strictly limited to java.base but even then, do we really want
> the VM
> even knowing about categories that are specific to zip
> compression or
> decompression?
>
> There are probably longer term trends that should be part of the
> discussion too. One general trend is that "run time" is becoming
> more
> and more a hybrid of code in libvm and the Java libraries. Lambdas,
> module system, virtual threads implementations are a few
> examples in the
> last few release. This comes with many "Java on Java" challenges,
> including serviceability where users of the platform will expect
> tools
> to just work and won't care where the code is. NMT is probably
> more for
> support teams and not something that most developers will ever
> use but I
> think is part of the challenge of having serviceability
> solutions "just
> work".
>
> In addition to having more of the Java runtime written in Java,
> there
> will likely be less JNI code in the future. It's very possible
> that the
> JNI code (including the JNI methods in libzip) will be replaced
> with
> code that uses Panama memory and linker APIs once they are become
> permanent. The effect of that would to have a lot of the memory
> allocations be tracked in the mtOther category again. Maybe
> integration
> with memory tracking should be looked at in conjunction with
> these APIs
> and this migration. I could imagine the proposed "Arena" API
> (MemorySession in Java 19) having some integration with NMT and
> it might
> be interesting to look into that.
>
> So yes, this topic does need broader discussion and it might be
> a bit
> premature to start with a PR for libzip without talking about
> the bigger
> picture first.
>
> -Alan
>
>
>
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
More information about the core-libs-dev
mailing list