RFR: JDK-8275320: NMT should perform buffer overrun checks

Tue Nov 23 06:48:14 UTC 2021

On Sun, 17 Oct 2021 13:30:17 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>>> > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one?
>>> > > 
>>> > > 
>>> > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301:
>>> > > Disadvantages of the current solution:
>>> > > 
>>> > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs.
>>> > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds.
>>> > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes.
>>> > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller.
>>> > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build).
>>> > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful.
>>> > > 
>>> > > Thanks, Thomas
>>> > 
>>> > 
>>> > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole.
>>> 
>>> Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right?
>> 
>> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw).
>> 
>> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. 
>> 
>> Cheers, Thomas
>
>> > > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one?
>> > > > 
>> > > > 
>> > > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301:
>> > > > Disadvantages of the current solution:
>> > > > 
>> > > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs.
>> > > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds.
>> > > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes.
>> > > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller.
>> > > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build).
>> > > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful.
>> > > > 
>> > > > Thanks, Thomas
>> > > 
>> > > 
>> > > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole.
>> > 
>> > 
>> > Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right?
>> 
>> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw).
>> 
>> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block.
>> 
>> Cheers, Thomas
> 
> I have no problem on technical side. Changing NMT default value, I believe, needs CSR. Probably should start with a CSR to get a consensus.
> 
> Thanks.
> 
> -Zhengyu

Thank you @zhengyu123 and @simonis! I'll do one round of stress tests more, then push.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952