Proposal: Extend Native Memory Tracking across the whole process via interposition

Thomas Stüfe thomas.stuefe at gmail.com
Wed Dec 6 13:21:08 UTC 2023


Hi Robbin,

On Wed, Dec 6, 2023 at 12:29 PM Robbin Ehn <rehn at rivosinc.com> wrote:

> Hi Thomas, cool!
>
> Yea, this solves many problems folks have in the wild.
>
> I'd like the entire NMT to move out of libjvm.so and be a launcher +
> separate lib thingy :)
> That would make classifying a bit of hassle.
>
> Anyways, maybe I'm reading things wrong, but it seems like with this,
> even free() is serialized in the entire process? (the_free() +
> CriticalSection)
> It seems like we need to avoid serializing as much as possible to not
> be the bottleneck?
> Have you done any benchmarking? Or do you consider this just a
> functionally POC?
>
>
Yes, this was just for a functional POC. I first wanted to see if I get it
running. We probably should be able to get rid of the mutex and make the
logic lock free.


> Also doing the callbacks outside of the CriticalSection is less-error
> prone, so I would suggest that.
>
>
Sure thing. And thanks for the positive feedback!


> Thanks, Robbin
>
>
Cheers, Thomas

>
> On Fri, Dec 1, 2023 at 6:32 PM Thomas Stuefe <tstuefe at redhat.com> wrote:
> >
> > Hi, community,
> >
> > I experimented with extending Native Memory Tracking across the whole
> process. I want to share my findings and propose a new JDK feature to allow
> us to do that.
> >
> > TL;DR
> >
> > Proposed is a "native memory interposition library" shipped with the JDK
> that would intercept all native memory calls from everywhere and redirect
> them to NMT.
> >
> > Motivation:
> >
> > NMT is very useful but limited in its coverage. It only covers Hotspot
> and a select few sites from the JDK. Most of the JDK, third-party native
> code, and system libraries are not covered. This is a large hole in our
> observability. I have seen people do (and done myself! eg [1]) strange and
> weird things to hunt memory leaks in native code. This is especially tricky
> in locked-down customer scenarios.
> >
> > But NMT is a capable tracker. We could use it for much more than just
> tracking Hotspot.
> >
> > In the past, developers have attempted to extend NMT instrumentation
> over parts of the JDK (e.g. [2]), which met resistance from Oracle. This is
> understandable: a naive extension would require libraries to link against
> the libjvm and instrument their coding. That introduces new dependencies
> nobody wants.
> >
> > ---
> >
> > I propose a different way that works without instrumenting any caller
> code. I hope this proposal proves less controversial than brute-force NMT
> instrumentation of the JDK. And it would allow introspection of non-JDK
> parts too.
> >
> > We could ship an interception library (a "libjnmt.so") within the JDK.
> That library, if preloaded, would redirect native memory requests to NMT. A
> customer who wants to analyze the native memory footprint of its apps could
> start the JVM with LD_PRELOAD=libjnmt and then use NMT for introspection.
> >
> > Oracle and we continuously improve NMT; extending its reach across the
> whole process would leverage that investment nicely.
> >
> > It also meshes well with other improvements. For example, we report NMT
> numbers via JFR since [4] - with interposition, we could now expose
> third-party native allocations via JFR. The new jcmd "System.map" would
> automatically show memory mappings from outside Hotspot. There is a
> precedent (libjsig), so shipping interposition libraries is not that
> strange.
> >
> > ---
> >
> > I have a Linux-based POC that works and looks promising [3]. With that
> prototype, I can see:
> >
> > - allocations from the JDK - e.g., now I finally see mapped byte buffers.
> > - allocations from third-party user code
> > - most allocations from system libraries, e.g., from the system zlib
> > - allocations via the new FFI interface
> >
> > The prototype tracks both mmap and malloc. Technically, the tricky part
> was to handle the initialization window: being able to correctly handle
> allocations starting at the process C++ initialization while dynamically
> handing over allocations to the libjvm once it is loaded and NMT is
> initialized. Another tricky problem was to prevent circularities stemming
> from call intercepting. The prototype solves these problems and is already
> stable enough to be used.
> >
> > Note that the patch is not complex or large. Some small interaction with
> the JVM is needed, though, so this cannot be done just with an outside
> library.
> >
> > The prototype was developed and tested on Linux x64 and with glibc 2.31.
> It seems stable so far, but of course, the work is in an early stage, and
> bugs may exist. If you want to play with the prototype, build it [3] and
> then call:
> >
> > LD_PRELOAD=${JDK_DIR}/lib/server/libjnmt.so ${JDK_DIR}/bin/java
> -XX:NativeMemoryTracking=detail <program> <args>
> >
> > Example: quarkus with "third-party code" injected that leaks
> periodically [5]:
> >
> > LEAK_MALLOC=1 LEAK_MMAP=1 LD_PRELOAD=${JDK_DIR}/lib/server/libjnmt.so
> ${JDK_DIR}/bin/java -agentpath:/shared/projects/jvmti-leak/leaker.so
> -XX:NativeMemoryTracking=detail -jar
> ./quarkus-profiling-workshop/target/quarkus-app/quarkus-run.jar
> >
> > In Summary mode, we see the slowly growing leaks:
> >
> > -External (via interposition) (reserved=82216KB, committed=82216KB)
> >                             (malloc=81588KB #585) (at peak)
> >                             (mmap: reserved=628KB, committed=628KB, at
> peak)
> >
> >
> > and in Detail mode, their call stacks:
> >
> > [0x00007ff067ee7000 - 0x00007ff067ee8000] reserved and committed 4KB for
> External (via interposition) from
> >     [0x00007ff067ef5056]the_mmap(void*, unsigned long, int, int, int,
> long)+0x66 in libjnmt.so
> >     [0x00007ff067ef5781]mmap+0x71 in libjnmt.so
> >     [0x00007ff067ee955a]leak_mmap+0x3f in leaker.so
> >     [0x00007ff067ee95b1]leakleak+0x1c in leaker.so
> >     [0x00007ff067ee95c6]leakleakleak+0x12 in leaker.so
> >     [0x00007ff067ee95db]leakabit+0x12 in leaker.so
> >     [0x00007ff067ee95f8]leaky_thread+0x1a in leaker.so
> >
> >
> > [0x00007ff067ef5166]the_malloc(unsigned long)+0x106 in libjnmt.so
> > [0x00007ff067ee94ae]do_malloc+0xb8 in leaker.so
> > [0x00007ff067ee9518]leak_malloc+0x20 in leaker.so
> > [0x00007ff067ee95a7]leakleak+0x12 in leaker.so
> > [0x00007ff067ee95c6]leakleakleak+0x12 in leaker.so
> > [0x00007ff067ee95db]leakabit+0x12 in leaker.so
> > [0x00007ff067ee95f8]leaky_thread+0x1a in leaker.so
> >                              (malloc=17679KB type=External (via
> interposition) #34) (at peak)
> >
> > ---
> >
> > What about MEMFLAGS?
> >
> > The prototype does not extend MEMFLAGS apart from introducing a new
> "External" category that tracks allocations done via interposition. The
> question of MEMFLAGS - in particular, opening it up to outside extension -
> has been contentious. It is orthogonal to this proposal - nice but not
> required.
> >
> > This proposal makes external allocations visible under the new
> "External" tag:
> > - in NMT summary mode, we only have the "External" total, which is
> already useful even as a lump sum: it shows the footprint non-hotspot
> libraries contribute to RSS. An RSS increase that is reflected neither by
> hotspot allocations nor by "External" can only stem from a select few
> places, e.g. from libc malloc retention.
> > - In NMT detail mode, this proposal shows us the call stacks to foreign
> call sites, pinpointing at least the libraries involved.
> >
> > --
> >
> > What do you think, does this make sense?
> >
> > Thanks, Thomas
> >
> >
> > [1] https://github.com/SAP/SapMachine/wiki/SapMachine-MallocTracer
> > [2]
> https://mail.openjdk.org/pipermail/core-libs-dev/2022-November/096197.html
> > [3] https://github.com/tstuefe/jdk/tree/libjnmt
> > [4] https://bugs.openjdk.org/browse/JDK-8157023
> > [5] https://github.com/tstuefe/jvmti_leak
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20231206/237f1f30/attachment-0001.htm>


More information about the jdk-dev mailing list