Extend NMT to JDK native libraries?
zgu at redhat.com
zgu at redhat.com
Mon Dec 3 14:33:00 UTC 2018
On Mon, 2018-11-26 at 21:43 +0100, Thomas Stüfe wrote:
Hi Thomas,
Sorry for replying so late.
> Hi Zhengyu,
>
>
> On Wed, Nov 21, 2018 at 5:26 PM Zhengyu Gu <zgu at redhat.com> wrote:
> >
> > >
> > > But note the unbalanced-malloc issue. The more we expose NMT to
> > > the
> > > JDK wild, the more probable are such bugs. Within the hotspot all
> > > is
> > > nice and tidy.
> >
> > Hotspot also has a few unbalanced-malloc instances.
>
> Oh, where? Really, raw malloc with os::free, or the other way around?
Maybe I misunderstand what you mean of unbalanced-malloc. I was
thinking something like [1], the library calls that potential realloc
buffers? or could you point me some examples in JDK library code that
you referred to?
There is another hurdle in my mind, I am not sure how did you solve it
with hashtable approach, that is GuardedMemory, although, it is debug
only feature, it also installs a header. Could you elaborate?
Thanks,
-Zhengyu
>
> Would they not create the problem I described, corrupted C-heap etc?
>
> > And yes, we will
> > find more in library, I would consider it as initial investment to
> > fix
> > them, especially, if we can do them module-by-module, no?
>
> Yes, they should be fixed if possible. My concern is that it is
> realistic to assume that we miss some, or even if not that new ones
> will creep in if the code changes. And that the penalties for these
> unbalanced mallocs is severe - crashes and heap corruptions - if we
> use malloc headers.
>
> Even worse, they would only show up, in release builds, if NMT is
> switched on, since only NMT adds malloc headers in release builds.
> That would be annoying, telling a customer to switch on NMT just for
> him to hit such a code path and crash the VM.
>
> I don't know, I seem to miss something here. Do you not see this as
> serious problem preventing NMT from covering code outside hotspot?
>
> >
> > Mismatched statistics is quite annoying ... Did see people actually
> > counting bytes and expecting to match :-) JDK-8191369 actually was
> > driven by customers, who tried to match smap.
> >
>
> I remember :) and yes, we have many curious customers too. But that
> is
> fine. We want NMT to be as exact as possible.
>
> Cheers, Thomas
>
> >
> > Thanks,
> >
> > -Zhengyu
> >
> > >
> > > If we wanted to avoid these bugs, we would have to remove malloc
> > > headers from both os::malloc() and NMT MallocTracker and add a
> > > malloc
> > > pointer hashtable of some sorts to the latter. This is not very
> > > difficult, but would still need an initial investment.
> >
> >
> >
> > >
> > > Thanks, Thomas
> > >
> > > >
> > > > Thanks,
> > > >
> > > > -Zhengyu
> > > >
> > > > >
> > > > > (I think even if we were to instrument parts of the JDK -
> > > > > e.g. just
> > > > > NIO - this would already be very helpful. In parts we do this
> > > > > already
> > > > > for mtOther.).
> > > > >
> > > > > On Wed, Nov 21, 2018 at 3:54 PM Zhengyu Gu <zgu at redhat.com>
> > > > > wrote:
> > > > > >
> > > > > > FYI: There was a phase 2 RFE: Native Memory Tracking (Phase
> > > > > > 2)
> > > > > > https://bugs.openjdk.java.net/browse/JDK-6995789
> > > > > >
> > > > > > -Zhengyu
> > > > > >
> > > > > > On 11/21/18 9:28 AM, Thomas Stüfe wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > (yet again not sure if this is serviceablity-dev or not -
> > > > > > > I start at
> > > > > > > hs-dev, feel free to move this mail around.)
> > > > > > >
> > > > > > > Do we have any plans to extend NMT to cover native JDK
> > > > > > > libaries too?
> > > > > > > That would be a really cool feature.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > We at SAP have done a similar thing in the past:
> > > > > > >
> > > > > > > We have a monitoring facility in our port which tracks C-
> > > > > > > heap
> > > > > > > allocations, non-imaginatively called "malloc statistic".
> > > > > > > This feature
> > > > > > > predates NMT somewhat - had we had NMT at that time, we
> > > > > > > would not have
> > > > > > > bothered. Our Malloc statistic is less powerful than NMT
> > > > > > > and
> > > > > > > implementation-wise completely at odds with it, so I
> > > > > > > never felt the
> > > > > > > urge to bring it upstream. However, one thing we did do
> > > > > > > is we extended
> > > > > > > its coverage to the JDK native code.
> > > > > > >
> > > > > > > This has been quite helpful in the past to find leaks in
> > > > > > > JDK, see
> > > > > > > e.g.: https://bugs.openjdk.java.net/browse/JDK-8155211
> > > > > > >
> > > > > > > We did this by exposing os::malloc, os::free etc from
> > > > > > > libjvm.so
> > > > > > > ("JVM_malloc", "JVM_realloc", "JVM_free"). In the JDK
> > > > > > > native code, we
> > > > > > > then either manually replaced calls to raw ::malloc(),
> > > > > > > ::free() etc
> > > > > > > with JVM_malloc(), JVM_free(). Or, in places where this
> > > > > > > was possible,
> > > > > > > we did this replacement stuff wholesale by employing a
> > > > > > > header which
> > > > > > > re-defined malloc(), free() etc JVM_malloc, JVM_free etc.
> > > > > > > Of course,
> > > > > > > we also had to add a number of linkage dependencies to
> > > > > > > the libjvm.so.
> > > > > > >
> > > > > > > All this is pretty standard stuff.
> > > > > > >
> > > > > > > One detail stood out: malloc headers are evil. In our
> > > > > > > experience, JDK
> > > > > > > native code was more difficult to control and "unbalanced
> > > > > > > malloc/frees" kept creeping in - especially with the
> > > > > > > wholesale-redefinition technique. Unbalanced
> > > > > > > mallocs/frees means cases
> > > > > > > where malloc() is instrumented but ::free() stays raw, or
> > > > > > > the other
> > > > > > > way around. Both combinations are catastrophic since
> > > > > > > os::malloc uses
> > > > > > > malloc headers. We typically corrupted the C-Heap and
> > > > > > > crashed, often
> > > > > > > much later in completely unrelated places.
> > > > > > >
> > > > > > > These types of bugs were very hard to spot and hence very
> > > > > > > expensive.
> > > > > > > And they can creep in in many ways. One example, there
> > > > > > > exist a
> > > > > > > surprising number of system APIs which return results in
> > > > > > > C-heap and
> > > > > > > require the user to free that, which of course must
> > > > > > > happen with raw
> > > > > > > ::free(), not os::free().
> > > > > > >
> > > > > > > We fixed this by not using malloc headers. That means a
> > > > > > > pointer
> > > > > > > returned by os::malloc() is compatible with raw ::free()
> > > > > > > and vice
> > > > > > > versa. The only bad thing happening would be our
> > > > > > > statistic numbers
> > > > > > > being slightly off.
> > > > > > >
> > > > > > > Instead of malloc headers we use a hand-groomed hash
> > > > > > > table to track
> > > > > > > the malloced memory. It is actually quite fast, fast
> > > > > > > enough that this
> > > > > > > malloc statistic feature is on-by-default in our port.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Of course, if we extend NMT to JDK native code we also
> > > > > > > would want to
> > > > > > > extend it to mmap() etc - we never did this with our
> > > > > > > statistic, since
> > > > > > > it only tracked malloc.
> > > > > > >
> > > > > > > What do you think? Did anyone else play with similar
> > > > > > > ideas? Would it
> > > > > > > be worth the effort?
> > > > > > >
> > > > > > > Cheers, Thomas
> > > > > > >
More information about the hotspot-dev
mailing list