Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Dec 5 12:43:47 UTC 2022
Thank you for the positive encouragement, Roman :-)
Cheers, Thomas
On Mon, Dec 5, 2022 at 12:03 PM Kennke, Roman <rkennke at amazon.de> wrote:
> Hi Thomas,
>
> I very much like the idea and also your proposals how to do it. Insights
> in JDK's native memory usage is sorely lacking and would be very useful!
> I don't have all that much to add about the details beyond what you
> already covered, though :-)
>
> Cheers,
> Roman
>
>
> > Are there any opinions about whether or not to extend NMT across the JDK?
> >
> > This blocks https://bugs.openjdk.org/browse/JDK-8296360
> > <https://bugs.openjdk.org/browse/JDK-8296360>, and I had a PR prepared
> > as https://github.com/openjdk/jdk/pull/10988
> > <https://github.com/openjdk/jdk/pull/10988>. Originally I was hoping to
> > get this into JDK 20, but I don't think that is realistic anymore. I am
> > fine with postponing my work in favor of a baseline discussion, but so
> > far there is very little discussion about this topic.
> >
> > How should I proceed?
> >
> > Thanks, Thomas
> >
> >
> >
> > On Wed, Nov 9, 2022 at 8:12 AM Thomas Stüfe <thomas.stuefe at gmail.com
> > <mailto:thomas.stuefe at gmail.com>> wrote:
> >
> > Hi Alan,
> >
> > (replaced hotspot-runtime-dev with hotspot-dev, since its more of a
> > general topic)
> >
> > thank you for your time!
> >
> > I am very happy to talk this through. I think native memory
> > observability in the JDK (and customer code!) is sorely lacking.
> > Witness the countless "where did my native memory go" blog articles.
> > At SAP we have been struggling with this topic for a long time and
> > have come up with a mixture of solutions. The aforementioned tracker
> > was one, which extended our version of NMT across the JDK. Our
> > SapMachine MallocTracer, which allows us to trace uninstrumented
> > customer code, another. We even experimented with exchanging the
> > allocator (using jemalloc) to gain insights. But that is a whole
> > different topic with deep logistical implications, I don't want to
> > touch it here. Exchanging the allocator does not help to observe
> > virtual memory or the brk segment, of course.
> >
> > And to make the picture complete, another insight we currently lack
> > is the implicit allocator overhead, which can be very significant
> > and is hidden by the libc. We also have observability for that in
> > the SapMachine, and I miss it in OpenJDK.
> >
> > As you noticed, my original intent was just to instrument Zlib and
> > possibly improve tracking for DBBs. Although, thinking beyond that,
> > another attractive instrumentation target would be mapped NIO
> > buffers at least.
> >
> > So I think native memory observability is important. Arguably we
> > could even extend observability to cover other OS resources, e.g.
> > file handles. If we shift code around, to java/Panama: data that
> > move the java heap does not need to be tracked, but other memory
> > will always come from one of the basic system APIs, regardless of
> > who allocates it and where in the stack allocation happens. Be it
> > native JDK code, Panama, or even customer JNI code.
> >
> > If we agree on the importance of native memory observability, then I
> > believe NMT is the right tool for it. It is a good tool. The
> > machinery is already there. It covers both C-heap and virtual memory
> > APIs, as well as thread stacks, and could easily be extended to
> > cover sbrk if needed. And I assume that whatever shape OpenJDK takes
> > on in the future, there always will be a libjvm.so at its core, so
> > we will always have it. But even if not, NMT could be separated from
> > libjvm.so quite easily, since it has no deep ties with the JVM.
> >
> > About coupling JVM with outside code: We don't have to directly link
> > against libjvm.so. We can keep things loose if the intent is to be
> > runnable without a JVM, or be JVM-version-agnostic. That could take
> > the form of a function-pointer interface like JVMTI. Or outside code
> > could dynamically dlsym the JVM allocation hooks. In any case
> > gracefully falling back to system allocation routines when necessary.
> >
> > And I agree, polluting the NMT tag space with outside meaning is
> > ugly. I only did it because I planned to go no further than
> > instrumenting Zlib and possibly DBBs. But if we take this further,
> > my preferred solution would be a reserved tag range or -ranges for
> > outside use, whose inner meaning would be opaque to the JVM. Kind of
> > like SIGRTMIN+SIGRTMAX. Then, outside code could register tags and
> > their meta information with the JVM, or we find a different way to
> > convey the tag meaning to NMT (config files, or callbacks). That
> > could even be opened up for customer use.
> >
> > This also touches on another question, that of NMT tag space. NMT
> > tags are very useful since they allow cheap tracking without
> > capturing call stacks. However, tags are underused and show growing
> > pains since they are too one-dimensional and restrictive. We had
> > competing interests in the past about tag granularity. It is all
> > over the place. We have coarse-grained tags like "mtThread", and
> > very fine-grained ones like "mtObjectMonitor". There are several
> > ways we could improve, e.g., by making them combinable like UL does,
> > or allowing for a hierarchy of them - either a hard-wired limited
> > one like "domain"+"tag", or an unlimited tree-like one. Technically
> > interesting since whatever the new encoding is, they still must fit
> > into a malloc header. I opened
> > https://bugs.openjdk.org/browse/JDK-8281819
> > <https://bugs.openjdk.org/browse/JDK-8281819> to track ideas like
> these.
> >
> > Instrumenting Panama allocations, including the ability to tag
> > allocations, would be a very good idea. For instance, if we ever
> > remove the native Zlib layer and convert it to java using Panama, we
> > can do the same with Panama I do now natively - use the Zlib zalloc
> > interface to hook in JVM memory allocation functions. The result
> > could be completely identical, and the end user looking at the NMT
> > output need never know that anything changed.
> >
> > And that goes for all instrumentation - if today we add it to JNI
> > code, and that code gets removed tomorrow, we can add it to Panama
> > code too. Unless data structures move to the heap, in which case
> > there is no need to track them.
> >
> > You mentioned that NMT was more of an in-house support tool. Our
> > experience is different. Even though it was positioned as a tool for
> > JVM developers, and we never cared for the backward compatibility or
> > consistency, it gets used a *lot* by our customers. We have to
> > explain its output frequently. Also, many blog articles exist
> > documenting its use. So, maybe it would be okay to elevate it to a
> > user-facing tool since it seems to occupy that role anyway. We may
> > also open up consumption of NMT results via java APIs, or expose its
> > results via MXBeans.
> >
> > If this is to be a JEP, okay, but I'm afraid it would stall things a
> > bit. I am interested in getting a simpler and quicker solution for
> > older support releases at least, possibly based on my PR. I know
> > that would be unconventional though.
> >
> > Thank you,
> >
> > Thomas
> >
> >
> > On Sun, Nov 6, 2022 at 9:31 AM Alan Bateman <Alan.Bateman at oracle.com
> > <mailto:Alan.Bateman at oracle.com>> wrote:
> >
> > On 04/11/2022 16:54, Thomas Stüfe wrote:
> > > Hi all,
> > >
> > > I am currently working on
> > https://bugs.openjdk.org/browse/JDK-8296360
> > <https://bugs.openjdk.org/browse/JDK-8296360>;
> > > I was preparing the final PR [1], but then Alan did ask me to
> > discuss
> > > this on core-libs first.
> > >
> > > Backstory:
> > >
> > > NMT tracks hotspot native allocations but does not cover the
> JDK
> > > libraries (small exception: Unsafe.AllocateMemory). However,
> the
> > > native memory footprint of JDK libraries can be significant.
> > We have
> > > no in-VM tracker for these and need tools like valgrind or our
> > > SapMachine MallocTracer [2] to observe them.
> >
> > Thanks for starting a discussion on this as this is a topic that
> > requires agreement from several areas. If this is the start of
> > something
> > bigger, where you want to have all allocation sites in the
> > libraries
> > using NMT, then I think it needs a write-up, maybe a JEP.
> >
> > For starters, I think it needs some agreement on using NMT for
> > memory
> > allocated outside of libjvm. You mentioned Unsafe as an
> > exception but
> > that is implemented in the VM so you get tracking for free,
> > albeit I
> > think all allocations are in the "mtOther" category.
> >
> > A general concern is that it creates more coupling between the
> > VM code
> > and the libraries code. As you probably know, we've removed most
> > of the
> > dependences on JVM_* functions from non-core areas over many
> > years. So I
> > think that needs consideration as I assume we don't want
> > memory/allocation.hpp declaring a dozen catagories for
> > allocations done
> > in say java.desktop module for example. Maybe your proposal will
> be
> > strictly limited to java.base but even then, do we really want
> > the VM
> > even knowing about categories that are specific to zip
> > compression or
> > decompression?
> >
> > There are probably longer term trends that should be part of the
> > discussion too. One general trend is that "run time" is becoming
> > more
> > and more a hybrid of code in libvm and the Java libraries.
> Lambdas,
> > module system, virtual threads implementations are a few
> > examples in the
> > last few release. This comes with many "Java on Java" challenges,
> > including serviceability where users of the platform will expect
> > tools
> > to just work and won't care where the code is. NMT is probably
> > more for
> > support teams and not something that most developers will ever
> > use but I
> > think is part of the challenge of having serviceability
> > solutions "just
> > work".
> >
> > In addition to having more of the Java runtime written in Java,
> > there
> > will likely be less JNI code in the future. It's very possible
> > that the
> > JNI code (including the JNI methods in libzip) will be replaced
> > with
> > code that uses Panama memory and linker APIs once they are become
> > permanent. The effect of that would to have a lot of the memory
> > allocations be tracked in the mtOther category again. Maybe
> > integration
> > with memory tracking should be looked at in conjunction with
> > these APIs
> > and this migration. I could imagine the proposed "Arena" API
> > (MemorySession in Java 19) having some integration with NMT and
> > it might
> > be interesting to look into that.
> >
> > So yes, this topic does need broader discussion and it might be
> > a bit
> > premature to start with a PR for libzip without talking about
> > the bigger
> > picture first.
> >
> > -Alan
> >
> >
> >
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20221205/24f5e52d/attachment-0001.htm>
More information about the core-libs-dev
mailing list