Extend NMT to JDK native libraries?

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Tue Nov 27 23:57:18 UTC 2018



On 11/26/18 3:43 PM, Thomas Stüfe wrote:
> Hi Zhengyu,
>
>
> On Wed, Nov 21, 2018 at 5:26 PM Zhengyu Gu <zgu at redhat.com> wrote:
>>> But note the unbalanced-malloc issue. The more we expose NMT to the
>>> JDK wild, the more probable are such bugs. Within the hotspot all is
>>> nice and tidy.
>> Hotspot also has a few unbalanced-malloc instances.
> Oh, where? Really, raw malloc with os::free, or the other way around?
>
> Would they not create the problem I described, corrupted C-heap etc?
>
>> And yes, we will
>> find more in library, I would consider it as initial investment to fix
>> them, especially, if we can do them module-by-module, no?
> Yes, they should be fixed if possible. My concern is that it is
> realistic to assume that we miss some, or even if not that new ones
> will creep in if the code changes. And that the penalties for these
> unbalanced mallocs is severe - crashes and heap corruptions - if we
> use malloc headers.
>
> Even worse, they would only show up, in release builds, if NMT is
> switched on, since only NMT adds malloc headers in release builds.
> That would be annoying, telling a customer to switch on NMT just for
> him to hit such a code path and crash the VM.
>
> I don't know, I seem to miss something here. Do you not see this as
> serious problem preventing NMT from covering code outside hotspot?

It seems like it would be a lot more effort than it is worth and 
probably very hard to get bug-free.  I don't think we currently have any 
mismatched os::malloc -> ::free or vice versa calls in the vm now.  
There are a couple intentional ::malloc/::free pairs though.

For tracking memory outside of the JVM, it seems like tools like 
valgrind or other OS specific tools are probably a lot better than NMT, 
and somebody has already written and debugged them.

Coleen


>
>> Mismatched statistics is quite annoying ... Did see people actually
>> counting bytes and expecting to match :-) JDK-8191369 actually was
>> driven by customers, who tried to match smap.
>>
> I remember :) and yes, we have many curious customers too. But that is
> fine. We want NMT to be as exact as possible.
>
> Cheers, Thomas
>
>> Thanks,
>>
>> -Zhengyu
>>
>>> If we wanted to avoid these bugs, we would have to remove malloc
>>> headers from both os::malloc() and NMT MallocTracker and add a malloc
>>> pointer hashtable of some sorts to the latter. This is not very
>>> difficult, but would still need an initial investment.
>>
>>
>>> Thanks, Thomas
>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>> (I think even if we were to instrument parts of the JDK - e.g. just
>>>>> NIO - this would already be very helpful. In parts we do this already
>>>>> for mtOther.).
>>>>>
>>>>> On Wed, Nov 21, 2018 at 3:54 PM Zhengyu Gu <zgu at redhat.com> wrote:
>>>>>> FYI: There was a phase 2 RFE: Native Memory Tracking (Phase 2)
>>>>>> https://bugs.openjdk.java.net/browse/JDK-6995789
>>>>>>
>>>>>> -Zhengyu
>>>>>>
>>>>>> On 11/21/18 9:28 AM, Thomas Stüfe wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> (yet again not sure if this is serviceablity-dev or not - I start at
>>>>>>> hs-dev, feel free to move this mail around.)
>>>>>>>
>>>>>>> Do we have any plans to extend NMT to cover native JDK libaries too?
>>>>>>> That would be a really cool feature.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> We at SAP have done a similar thing in the past:
>>>>>>>
>>>>>>> We have a monitoring facility in our port which tracks C-heap
>>>>>>> allocations, non-imaginatively called "malloc statistic". This feature
>>>>>>> predates NMT somewhat - had we had NMT at that time, we would not have
>>>>>>> bothered. Our Malloc statistic is less powerful than NMT and
>>>>>>> implementation-wise completely at odds with it, so I never felt the
>>>>>>> urge to bring it upstream. However, one thing we did do is we extended
>>>>>>> its coverage to the JDK native code.
>>>>>>>
>>>>>>> This has been quite helpful in the past to find leaks in JDK, see
>>>>>>> e.g.: https://bugs.openjdk.java.net/browse/JDK-8155211
>>>>>>>
>>>>>>> We did this by exposing os::malloc, os::free etc from libjvm.so
>>>>>>> ("JVM_malloc", "JVM_realloc", "JVM_free"). In the JDK native code, we
>>>>>>> then either manually replaced calls to raw ::malloc(), ::free() etc
>>>>>>> with JVM_malloc(), JVM_free(). Or, in places where this was possible,
>>>>>>> we did this replacement stuff wholesale by employing a header which
>>>>>>> re-defined malloc(), free() etc JVM_malloc, JVM_free etc. Of course,
>>>>>>> we also had to add a number of linkage dependencies to the libjvm.so.
>>>>>>>
>>>>>>> All this is pretty standard stuff.
>>>>>>>
>>>>>>> One detail stood out: malloc headers are evil. In our experience, JDK
>>>>>>> native code was more difficult to control and "unbalanced
>>>>>>> malloc/frees" kept creeping in - especially with the
>>>>>>> wholesale-redefinition technique. Unbalanced mallocs/frees means cases
>>>>>>> where malloc() is instrumented but ::free() stays raw, or the other
>>>>>>> way around. Both combinations are catastrophic since os::malloc uses
>>>>>>> malloc headers. We typically corrupted the C-Heap and crashed, often
>>>>>>> much later in completely unrelated places.
>>>>>>>
>>>>>>> These types of bugs were very hard to spot and hence very expensive.
>>>>>>> And they can creep in in many ways. One example, there exist a
>>>>>>> surprising number of system APIs which return results in C-heap and
>>>>>>> require the user to free that, which of course must happen with raw
>>>>>>> ::free(), not os::free().
>>>>>>>
>>>>>>> We fixed this by not using malloc headers. That means a pointer
>>>>>>> returned by os::malloc() is compatible with raw ::free() and vice
>>>>>>> versa. The only bad thing happening would be our statistic numbers
>>>>>>> being slightly off.
>>>>>>>
>>>>>>> Instead of malloc headers we use a hand-groomed hash table to track
>>>>>>> the malloced memory. It is actually quite fast, fast enough that this
>>>>>>> malloc statistic feature is on-by-default in our port.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Of course, if we extend NMT to JDK native code we also would want to
>>>>>>> extend it to mmap() etc - we never did this with our statistic, since
>>>>>>> it only tracked malloc.
>>>>>>>
>>>>>>> What do you think? Did anyone else play with similar ideas? Would it
>>>>>>> be worth the effort?
>>>>>>>
>>>>>>> Cheers, Thomas
>>>>>>>



More information about the hotspot-dev mailing list