Proposal: Extend Native Memory Tracking across the whole process via interposition
Thomas Stuefe
tstuefe at redhat.com
Tue Dec 5 12:50:13 UTC 2023
Hi Brice,
On Tue, Dec 5, 2023 at 12:49 AM Brice Dutheil <brice.dutheil at gmail.com>
wrote:
> Hi Joha,
>
> Thomas will correct me as he is proposed the idea and much more
> experienced, also I'm a mere reader of this ML.
>
> So, I have not toyed with the code, but I believe this should work, at
> least on linux if linker has no restrictions.
>
> Typically interception happens because there is a function with the right
> signature preloaded (via `LD_PRELOAD`) that linker will look up. The
> magic can work because in order to do real work and invoke the right
> methods down the line using `dlsym(RTLD_NEXT, name)`. And that should be
> the next library on the path or the system as the linker should process
> from left to right this `LD_PRELOAD`.
>
> ```
> void *malloc(size_t size) {
> void *(*p_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
>
> // report back mem operation
>
> return p_malloc(size);
> }
> ```
>
> https://man7.org/linux/man-pages/man8/ld.so.8.html
> https://www.man7.org/linux/man-pages/man3/dlsym.3.html
>
> That said this might be tricky to avoid loops, if one function calls
> `malloc`.
>
I think a simpler way would be to just add a way for libjnmt.so to use
custom allocators. If it is just about using a standard replacement like
jemalloc, a custom-tailored solution for that would be a lot simpler. But,
again, not sure about the use case.
Cheers, Thomas
>
> Also I suppose this could work on macos via `DYLD_PRELOAD` but unsure
> since macos has some restrictions.
>
> --
> Brice
>
>
> On Mon, Dec 4, 2023 at 13:14 Johan Sjölén <johan.sjolen at oracle.com> wrote:
>
>> Hi Thomas,
>>
>> If a user would like to switch out the malloc which a JVM is using, would
>> they be able to do that while simultaneously using your interception
>> library?
>>
>> Thank you,
>> Johan
>>
>> Hi, community,
>>
>> I experimented with extending Native Memory Tracking across the whole
>> process. I want to share my findings and propose a new JDK feature to allow
>> us to do that.
>>
>> TL;DR
>>
>> Proposed is a "native memory interposition library" shipped with the JDK
>> that would intercept all native memory calls from everywhere and redirect
>> them to NMT.
>>
>> Motivation:
>>
>> NMT is very useful but limited in its coverage. It only covers Hotspot
>> and a select few sites from the JDK. Most of the JDK, third-party native
>> code, and system libraries are not covered. This is a large hole in our
>> observability. I have seen people do (and done myself! eg [1]) strange and
>> weird things to hunt memory leaks in native code. This is especially tricky
>> in locked-down customer scenarios.
>>
>> But NMT is a capable tracker. We could use it for much more than just
>> tracking Hotspot.
>>
>> In the past, developers have attempted to extend NMT instrumentation over
>> parts of the JDK (e.g. [2]), which met resistance from Oracle. This is
>> understandable: a naive extension would require libraries to link against
>> the libjvm and instrument their coding. That introduces new dependencies
>> nobody wants.
>>
>> ---
>>
>> I propose a different way that works without instrumenting any caller
>> code. I hope this proposal proves less controversial than brute-force NMT
>> instrumentation of the JDK. And it would allow introspection of non-JDK
>> parts too.
>>
>> We could ship an interception library (a "libjnmt.so") within the JDK.
>> That library, if preloaded, would redirect native memory requests to NMT. A
>> customer who wants to analyze the native memory footprint of its apps could
>> start the JVM with LD_PRELOAD=libjnmt and then use NMT for introspection.
>>
>> Oracle and we continuously improve NMT; extending its reach across the
>> whole process would leverage that investment nicely.
>>
>> It also meshes well with other improvements. For example, we report NMT
>> numbers via JFR since [4] - with interposition, we could now expose
>> third-party native allocations via JFR. The new jcmd "System.map" would
>> automatically show memory mappings from outside Hotspot. There is a
>> precedent (libjsig), so shipping interposition libraries is not that
>> strange.
>>
>> ---
>>
>> I have a Linux-based POC that works and looks promising [3]. With that
>> prototype, I can see:
>>
>> - allocations from the JDK - e.g., now I finally see mapped byte buffers.
>> - allocations from third-party user code
>> - most allocations from system libraries, e.g., from the system zlib
>> - allocations via the new FFI interface
>>
>> The prototype tracks both mmap and malloc. Technically, the tricky part
>> was to handle the initialization window: being able to correctly handle
>> allocations starting at the process C++ initialization while dynamically
>> handing over allocations to the libjvm once it is loaded and NMT is
>> initialized. Another tricky problem was to prevent circularities stemming
>> from call intercepting. The prototype solves these problems and is already
>> stable enough to be used.
>>
>> Note that the patch is not complex or large. Some small interaction with
>> the JVM is needed, though, so this cannot be done just with an outside
>> library.
>>
>> The prototype was developed and tested on Linux x64 and with glibc 2.31.
>> It seems stable so far, but of course, the work is in an early stage, and
>> bugs may exist. If you want to play with the prototype, build it [3] and
>> then call:
>>
>> LD_PRELOAD=${JDK_DIR}/lib/server/libjnmt.so ${JDK_DIR}/bin/java
>> -XX:NativeMemoryTracking=detail <program> <args>
>>
>> Example: quarkus with "third-party code" injected that leaks periodically
>> [5]:
>>
>> LEAK_MALLOC=1 LEAK_MMAP=1 LD_PRELOAD=${JDK_DIR}/lib/server/libjnmt.so
>> ${JDK_DIR}/bin/java -agentpath:/shared/projects/jvmti-leak/leaker.so
>> -XX:NativeMemoryTracking=detail -jar ./quarkus-profiling-workshop/
>> target/quarkus-app/quarkus-run.jar
>>
>> In Summary mode, we see the slowly growing leaks:
>>
>> -External (via interposition) (reserved=82216KB, committed=82216KB)
>> (malloc=81588KB #585) (at peak)
>> (mmap: reserved=628KB, committed=628KB, at
>> peak)
>>
>>
>> and in Detail mode, their call stacks:
>>
>> [0x00007ff067ee7000 - 0x00007ff067ee8000] reserved and committed 4KB for
>> External (via interposition) from
>> [0x00007ff067ef5056]the_mmap(void*, unsigned long, int, int, int,
>> long)+0x66 in libjnmt.so
>> [0x00007ff067ef5781]mmap+0x71 in libjnmt.so
>> [0x00007ff067ee955a]leak_mmap+0x3f in leaker.so
>> [0x00007ff067ee95b1]leakleak+0x1c in leaker.so
>> [0x00007ff067ee95c6]leakleakleak+0x12 in leaker.so
>> [0x00007ff067ee95db]leakabit+0x12 in leaker.so
>> [0x00007ff067ee95f8]leaky_thread+0x1a in leaker.so
>>
>>
>> [0x00007ff067ef5166]the_malloc(unsigned long)+0x106 in libjnmt.so
>> [0x00007ff067ee94ae]do_malloc+0xb8 in leaker.so
>> [0x00007ff067ee9518]leak_malloc+0x20 in leaker.so
>> [0x00007ff067ee95a7]leakleak+0x12 in leaker.so
>> [0x00007ff067ee95c6]leakleakleak+0x12 in leaker.so
>> [0x00007ff067ee95db]leakabit+0x12 in leaker.so
>> [0x00007ff067ee95f8]leaky_thread+0x1a in leaker.so
>> (malloc=17679KB type=External (via
>> interposition) #34) (at peak)
>>
>> ---
>>
>> What about MEMFLAGS?
>>
>> The prototype does not extend MEMFLAGS apart from introducing a new
>> "External" category that tracks allocations done via interposition. The
>> question of MEMFLAGS - in particular, opening it up to outside extension -
>> has been contentious. It is orthogonal to this proposal - nice but not
>> required.
>>
>> This proposal makes external allocations visible under the new "External"
>> tag:
>> - in NMT summary mode, we only have the "External" total, which is
>> already useful even as a lump sum: it shows the footprint non-hotspot
>> libraries contribute to RSS. An RSS increase that is reflected neither by
>> hotspot allocations nor by "External" can only stem from a select few
>> places, e.g. from libc malloc retention.
>> - In NMT detail mode, this proposal shows us the call stacks to foreign
>> call sites, pinpointing at least the libraries involved.
>>
>> --
>>
>> What do you think, does this make sense?
>>
>> Thanks, Thomas
>>
>>
>> [1] https://github.com/SAP/SapMachine/wiki/SapMachine-MallocTracer
>> [2]
>> https://mail.openjdk.org/pipermail/core-libs-dev/2022-November/096197.html
>> [3] https://github.com/tstuefe/jdk/tree/libjnmt
>> [4] https://bugs.openjdk.org/browse/JDK-8157023
>> [5] https://github.com/tstuefe/jvmti_leak
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20231205/42a7cf61/attachment-0001.htm>
More information about the jdk-dev
mailing list