Extend NMT to JDK native libraries?
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Dec 3 17:20:10 UTC 2018
Hi Zhengyu,
On Mon, Dec 3, 2018 at 3:33 PM <zgu at redhat.com> wrote:
>
> On Mon, 2018-11-26 at 21:43 +0100, Thomas Stüfe wrote:
>
> Hi Thomas,
>
> Sorry for replying so late.
>
no problem, this is a backburner discussion :)
> > Oh, where? Really, raw malloc with os::free, or the other way around?
>
> Maybe I misunderstand what you mean of unbalanced-malloc. I was
> thinking something like [1], the library calls that potential realloc
> buffers? or could you point me some examples in JDK library code that
> you referred to?
I do not have concreate examples, but our experience, when we did
this, was that it was very difficult to get right 100% of the time.
And it bitrots very easily. For example, if you go the
wholesale-redefine-malloc-road, and someone accidentally changes the
includes and one of the c files now misses your redefinition includes
and falls back to raw malloc. Or if you miss a system API which
returns malloced space. Basically, this is like herding cats, very
difficult to catch every single one :)
All this is solvable given enough manpower, but after trying a bit we
went another road and just avoided malloc headers. That way, we can
live with 99% coverage - the remaining 1% are annoying but do not
crash the system, and we can hunt them down at our leisure.
Hotspot is less of a problem, because the project scope is small and
there is a limited number of devs which all are all experienced and
know to avoid raw malloc() and free(). Plus I think we have safeguards
at build time now.
>
> There is another hurdle in my mind, I am not sure how did you solve it
> with hashtable approach, that is GuardedMemory, although, it is debug
> only feature, it also installs a header. Could you elaborate?
We just did not use it. Basically, we rolled our own
os::malloc/os::free implementation.
To catch errors, we use two techniques:
- we still have trailing guards
- quite heavyweight but powerful: we have a mode were every single
allocation gets a full page and is preceeded/succeeded by a protected
page (a bit smarter than that, we need just 2 pages per allocation).
Overflowing boundaries will instantly crash. Also, we protect the
payload page after its allocation has been free()d, for a certain
amount of time, and therefore can catch use-after-free errors.
I always found the mechanism neat, but it is completely at odds with
the current hotspot implementation of os::malloc and NMT, so we never
attempted to bring it upstream.
Cheers, Thomas
>
> Thanks,
>
> -Zhengyu
> >
> > Would they not create the problem I described, corrupted C-heap etc?
> >
> > > And yes, we will
> > > find more in library, I would consider it as initial investment to
> > > fix
> > > them, especially, if we can do them module-by-module, no?
> >
> > Yes, they should be fixed if possible. My concern is that it is
> > realistic to assume that we miss some, or even if not that new ones
> > will creep in if the code changes. And that the penalties for these
> > unbalanced mallocs is severe - crashes and heap corruptions - if we
> > use malloc headers.
> >
> > Even worse, they would only show up, in release builds, if NMT is
> > switched on, since only NMT adds malloc headers in release builds.
> > That would be annoying, telling a customer to switch on NMT just for
> > him to hit such a code path and crash the VM.
> >
> > I don't know, I seem to miss something here. Do you not see this as
> > serious problem preventing NMT from covering code outside hotspot?
> >
> > >
> > > Mismatched statistics is quite annoying ... Did see people actually
> > > counting bytes and expecting to match :-) JDK-8191369 actually was
> > > driven by customers, who tried to match smap.
> > >
> >
> > I remember :) and yes, we have many curious customers too. But that
> > is
> > fine. We want NMT to be as exact as possible.
> >
> > Cheers, Thomas
> >
> > >
> > > Thanks,
> > >
> > > -Zhengyu
> > >
> > > >
> > > > If we wanted to avoid these bugs, we would have to remove malloc
> > > > headers from both os::malloc() and NMT MallocTracker and add a
> > > > malloc
> > > > pointer hashtable of some sorts to the latter. This is not very
> > > > difficult, but would still need an initial investment.
> > >
> > >
> > >
> > > >
> > > > Thanks, Thomas
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > -Zhengyu
> > > > >
> > > > > >
> > > > > > (I think even if we were to instrument parts of the JDK -
> > > > > > e.g. just
> > > > > > NIO - this would already be very helpful. In parts we do this
> > > > > > already
> > > > > > for mtOther.).
> > > > > >
> > > > > > On Wed, Nov 21, 2018 at 3:54 PM Zhengyu Gu <zgu at redhat.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > FYI: There was a phase 2 RFE: Native Memory Tracking (Phase
> > > > > > > 2)
> > > > > > > https://bugs.openjdk.java.net/browse/JDK-6995789
> > > > > > >
> > > > > > > -Zhengyu
> > > > > > >
> > > > > > > On 11/21/18 9:28 AM, Thomas Stüfe wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > (yet again not sure if this is serviceablity-dev or not -
> > > > > > > > I start at
> > > > > > > > hs-dev, feel free to move this mail around.)
> > > > > > > >
> > > > > > > > Do we have any plans to extend NMT to cover native JDK
> > > > > > > > libaries too?
> > > > > > > > That would be a really cool feature.
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > We at SAP have done a similar thing in the past:
> > > > > > > >
> > > > > > > > We have a monitoring facility in our port which tracks C-
> > > > > > > > heap
> > > > > > > > allocations, non-imaginatively called "malloc statistic".
> > > > > > > > This feature
> > > > > > > > predates NMT somewhat - had we had NMT at that time, we
> > > > > > > > would not have
> > > > > > > > bothered. Our Malloc statistic is less powerful than NMT
> > > > > > > > and
> > > > > > > > implementation-wise completely at odds with it, so I
> > > > > > > > never felt the
> > > > > > > > urge to bring it upstream. However, one thing we did do
> > > > > > > > is we extended
> > > > > > > > its coverage to the JDK native code.
> > > > > > > >
> > > > > > > > This has been quite helpful in the past to find leaks in
> > > > > > > > JDK, see
> > > > > > > > e.g.: https://bugs.openjdk.java.net/browse/JDK-8155211
> > > > > > > >
> > > > > > > > We did this by exposing os::malloc, os::free etc from
> > > > > > > > libjvm.so
> > > > > > > > ("JVM_malloc", "JVM_realloc", "JVM_free"). In the JDK
> > > > > > > > native code, we
> > > > > > > > then either manually replaced calls to raw ::malloc(),
> > > > > > > > ::free() etc
> > > > > > > > with JVM_malloc(), JVM_free(). Or, in places where this
> > > > > > > > was possible,
> > > > > > > > we did this replacement stuff wholesale by employing a
> > > > > > > > header which
> > > > > > > > re-defined malloc(), free() etc JVM_malloc, JVM_free etc.
> > > > > > > > Of course,
> > > > > > > > we also had to add a number of linkage dependencies to
> > > > > > > > the libjvm.so.
> > > > > > > >
> > > > > > > > All this is pretty standard stuff.
> > > > > > > >
> > > > > > > > One detail stood out: malloc headers are evil. In our
> > > > > > > > experience, JDK
> > > > > > > > native code was more difficult to control and "unbalanced
> > > > > > > > malloc/frees" kept creeping in - especially with the
> > > > > > > > wholesale-redefinition technique. Unbalanced
> > > > > > > > mallocs/frees means cases
> > > > > > > > where malloc() is instrumented but ::free() stays raw, or
> > > > > > > > the other
> > > > > > > > way around. Both combinations are catastrophic since
> > > > > > > > os::malloc uses
> > > > > > > > malloc headers. We typically corrupted the C-Heap and
> > > > > > > > crashed, often
> > > > > > > > much later in completely unrelated places.
> > > > > > > >
> > > > > > > > These types of bugs were very hard to spot and hence very
> > > > > > > > expensive.
> > > > > > > > And they can creep in in many ways. One example, there
> > > > > > > > exist a
> > > > > > > > surprising number of system APIs which return results in
> > > > > > > > C-heap and
> > > > > > > > require the user to free that, which of course must
> > > > > > > > happen with raw
> > > > > > > > ::free(), not os::free().
> > > > > > > >
> > > > > > > > We fixed this by not using malloc headers. That means a
> > > > > > > > pointer
> > > > > > > > returned by os::malloc() is compatible with raw ::free()
> > > > > > > > and vice
> > > > > > > > versa. The only bad thing happening would be our
> > > > > > > > statistic numbers
> > > > > > > > being slightly off.
> > > > > > > >
> > > > > > > > Instead of malloc headers we use a hand-groomed hash
> > > > > > > > table to track
> > > > > > > > the malloced memory. It is actually quite fast, fast
> > > > > > > > enough that this
> > > > > > > > malloc statistic feature is on-by-default in our port.
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Of course, if we extend NMT to JDK native code we also
> > > > > > > > would want to
> > > > > > > > extend it to mmap() etc - we never did this with our
> > > > > > > > statistic, since
> > > > > > > > it only tracked malloc.
> > > > > > > >
> > > > > > > > What do you think? Did anyone else play with similar
> > > > > > > > ideas? Would it
> > > > > > > > be worth the effort?
> > > > > > > >
> > > > > > > > Cheers, Thomas
> > > > > > > >
More information about the hotspot-dev
mailing list