Extend NMT to JDK native libraries?
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Nov 28 14:19:08 UTC 2018
Hi Coleen,
On Wed, Nov 28, 2018 at 12:59 AM <coleen.phillimore at oracle.com> wrote:
>
>
>
> On 11/26/18 3:43 PM, Thomas Stüfe wrote:
> > Hi Zhengyu,
> >
> >
> > On Wed, Nov 21, 2018 at 5:26 PM Zhengyu Gu <zgu at redhat.com> wrote:
> >>> But note the unbalanced-malloc issue. The more we expose NMT to the
> >>> JDK wild, the more probable are such bugs. Within the hotspot all is
> >>> nice and tidy.
> >> Hotspot also has a few unbalanced-malloc instances.
> > Oh, where? Really, raw malloc with os::free, or the other way around?
> >
> > Would they not create the problem I described, corrupted C-heap etc?
> >
> >> And yes, we will
> >> find more in library, I would consider it as initial investment to fix
> >> them, especially, if we can do them module-by-module, no?
> > Yes, they should be fixed if possible. My concern is that it is
> > realistic to assume that we miss some, or even if not that new ones
> > will creep in if the code changes. And that the penalties for these
> > unbalanced mallocs is severe - crashes and heap corruptions - if we
> > use malloc headers.
> >
> > Even worse, they would only show up, in release builds, if NMT is
> > switched on, since only NMT adds malloc headers in release builds.
> > That would be annoying, telling a customer to switch on NMT just for
> > him to hit such a code path and crash the VM.
> >
> > I don't know, I seem to miss something here. Do you not see this as
> > serious problem preventing NMT from covering code outside hotspot?
>
> It seems like it would be a lot more effort than it is worth and
> probably very hard to get bug-free. I don't think we currently have any
> mismatched os::malloc -> ::free or vice versa calls in the vm now.
> There are a couple intentional ::malloc/::free pairs though.
>
> For tracking memory outside of the JVM, it seems like tools like
> valgrind or other OS specific tools are probably a lot better than NMT,
> and somebody has already written and debugged them.
>
Well, we did it and got it bug-free 99% of all cases - but you are
right, 100% are almost impossible. That is why we avoided malloc
headers and instead opted for a hash map to track mallocs. Funny
enough, that was not even bad in terms of performance. We leave it on
by default.
But I understand that you are skeptic about the usefulness of full
coverage. We did find a few memory leaks, but that could be found with
valgrind too - on nice platforms, on terrible ones like AIX you are on
your own :(
Which leaves, as a purpose, memory tracking. But for memory tracking,
I think we can instrument the JDK piecemeal, as Zhengyu mentioned.
Instrumenting big users like mmaps from NIO would already be very
useful.
Thanks, Thomas
> Coleen
>
>
> >
> >> Mismatched statistics is quite annoying ... Did see people actually
> >> counting bytes and expecting to match :-) JDK-8191369 actually was
> >> driven by customers, who tried to match smap.
> >>
> > I remember :) and yes, we have many curious customers too. But that is
> > fine. We want NMT to be as exact as possible.
> >
> > Cheers, Thomas
> >
> >> Thanks,
> >>
> >> -Zhengyu
> >>
> >>> If we wanted to avoid these bugs, we would have to remove malloc
> >>> headers from both os::malloc() and NMT MallocTracker and add a malloc
> >>> pointer hashtable of some sorts to the latter. This is not very
> >>> difficult, but would still need an initial investment.
> >>
> >>
> >>> Thanks, Thomas
> >>>
> >>>> Thanks,
> >>>>
> >>>> -Zhengyu
> >>>>
> >>>>> (I think even if we were to instrument parts of the JDK - e.g. just
> >>>>> NIO - this would already be very helpful. In parts we do this already
> >>>>> for mtOther.).
> >>>>>
> >>>>> On Wed, Nov 21, 2018 at 3:54 PM Zhengyu Gu <zgu at redhat.com> wrote:
> >>>>>> FYI: There was a phase 2 RFE: Native Memory Tracking (Phase 2)
> >>>>>> https://bugs.openjdk.java.net/browse/JDK-6995789
> >>>>>>
> >>>>>> -Zhengyu
> >>>>>>
> >>>>>> On 11/21/18 9:28 AM, Thomas Stüfe wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> (yet again not sure if this is serviceablity-dev or not - I start at
> >>>>>>> hs-dev, feel free to move this mail around.)
> >>>>>>>
> >>>>>>> Do we have any plans to extend NMT to cover native JDK libaries too?
> >>>>>>> That would be a really cool feature.
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> We at SAP have done a similar thing in the past:
> >>>>>>>
> >>>>>>> We have a monitoring facility in our port which tracks C-heap
> >>>>>>> allocations, non-imaginatively called "malloc statistic". This feature
> >>>>>>> predates NMT somewhat - had we had NMT at that time, we would not have
> >>>>>>> bothered. Our Malloc statistic is less powerful than NMT and
> >>>>>>> implementation-wise completely at odds with it, so I never felt the
> >>>>>>> urge to bring it upstream. However, one thing we did do is we extended
> >>>>>>> its coverage to the JDK native code.
> >>>>>>>
> >>>>>>> This has been quite helpful in the past to find leaks in JDK, see
> >>>>>>> e.g.: https://bugs.openjdk.java.net/browse/JDK-8155211
> >>>>>>>
> >>>>>>> We did this by exposing os::malloc, os::free etc from libjvm.so
> >>>>>>> ("JVM_malloc", "JVM_realloc", "JVM_free"). In the JDK native code, we
> >>>>>>> then either manually replaced calls to raw ::malloc(), ::free() etc
> >>>>>>> with JVM_malloc(), JVM_free(). Or, in places where this was possible,
> >>>>>>> we did this replacement stuff wholesale by employing a header which
> >>>>>>> re-defined malloc(), free() etc JVM_malloc, JVM_free etc. Of course,
> >>>>>>> we also had to add a number of linkage dependencies to the libjvm.so.
> >>>>>>>
> >>>>>>> All this is pretty standard stuff.
> >>>>>>>
> >>>>>>> One detail stood out: malloc headers are evil. In our experience, JDK
> >>>>>>> native code was more difficult to control and "unbalanced
> >>>>>>> malloc/frees" kept creeping in - especially with the
> >>>>>>> wholesale-redefinition technique. Unbalanced mallocs/frees means cases
> >>>>>>> where malloc() is instrumented but ::free() stays raw, or the other
> >>>>>>> way around. Both combinations are catastrophic since os::malloc uses
> >>>>>>> malloc headers. We typically corrupted the C-Heap and crashed, often
> >>>>>>> much later in completely unrelated places.
> >>>>>>>
> >>>>>>> These types of bugs were very hard to spot and hence very expensive.
> >>>>>>> And they can creep in in many ways. One example, there exist a
> >>>>>>> surprising number of system APIs which return results in C-heap and
> >>>>>>> require the user to free that, which of course must happen with raw
> >>>>>>> ::free(), not os::free().
> >>>>>>>
> >>>>>>> We fixed this by not using malloc headers. That means a pointer
> >>>>>>> returned by os::malloc() is compatible with raw ::free() and vice
> >>>>>>> versa. The only bad thing happening would be our statistic numbers
> >>>>>>> being slightly off.
> >>>>>>>
> >>>>>>> Instead of malloc headers we use a hand-groomed hash table to track
> >>>>>>> the malloced memory. It is actually quite fast, fast enough that this
> >>>>>>> malloc statistic feature is on-by-default in our port.
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Of course, if we extend NMT to JDK native code we also would want to
> >>>>>>> extend it to mmap() etc - we never did this with our statistic, since
> >>>>>>> it only tracked malloc.
> >>>>>>>
> >>>>>>> What do you think? Did anyone else play with similar ideas? Would it
> >>>>>>> be worth the effort?
> >>>>>>>
> >>>>>>> Cheers, Thomas
> >>>>>>>
>
More information about the hotspot-dev
mailing list