RFR: JDK-8277822: Remove debug-only heap overrun checks in os::malloc and friends [v3]
Thomas Stuefe
stuefe at openjdk.java.net
Fri Jan 14 19:18:29 UTC 2022
On Fri, 14 Jan 2022 14:20:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> After integrating C heap overrun checks into NMT (https://bugs.openjdk.java.net/browse/JDK-8275320), we can now remove debug-only guarding memory for os::malloc() and friends.
>>
>> This will:
>> - reduce complexity of `os::malloc`, `os::free`, `os::realloc` (especially that one since realloc is tricky)
>> - saves *48 bytes per malloc* in debug builds. In my experiences this can sum up to a quite substantial overhead compared to release builds, especially since malloc allocations are often fine grained.
>> - reduce behavioral differences between debug and release builds wrt memory layout and allocation paths.
>>
>> Currently, we have always-on heap overrun checks in debug VMs. In order to preserve this feature after the patch, NMT is switched always on per default in debug builds (but can be switched off if needed, as opposed to the baked-in debug-only guards today).
>>
>> We use NMT level summary here, which is very cheap since we don't track stacks, we just count some numbers in NMT categories. Memory overhead will still be a lot less than what we paid before with the quite generous debug-only guards.
>>
>> Finally, this would also be a good coverage test for NMT.
>>
>> For more information, please refer to the umbrella JBS (https://bugs.openjdk.java.net/browse/JDK-8275301).
>>
>> -----------
>>
>> Memory stomp reports are better now since NMT buffer overrun reporting (see JDK-8275320) is better.
>>
>> Before:
>>
>>
>> [0.051s][warning][malloc,free] ## nof_mallocs = 11948, nof_frees = 2017
>> [0.051s][warning][malloc,free] ## memory stomp:
>> [0.051s][warning][malloc,free] GuardedMemory(0x00007fb3115ea560) base_addr=0x00007fb30c1df630 tag=0x0000000000000000 user_size=4096 user_data=0x00007fb30c1df650
>> [0.051s][warning][malloc,free] Header guard @0x00007fb30c1df630 is OK
>> [0.051s][warning][malloc,free] Trailer guard @0x00007fb30c1e0650 is BROKEN
>> [0.051s][warning][malloc,free] User data appears to be in use
>> # To suppress the following error report, specify this argument
>>
>>
>> Old reporting was susceptible to C-heap corruption: a sufficiently corrupted C-heap caused the libc to abort before reporting was completed:
>>
>>
>> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-fastdebug$ ./images/jdk/bin/java -XX:+UseNewCode
>> [0.047s][warning][malloc,free] ## nof_mallocs = 11948, nof_frees = 2017
>> [0.047s][warning][malloc,free] ## memory stomp:
>> java: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
>> Aborted (core dumped)
>>
>>
>> The new reports, done by NMT, print out a hex dump of the corrupted area, before calling into libc. So this printout is something we always get, even if C-heap is corrupted beyond help:
>>
>>
>> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-fastdebug$ ./images/jdk/bin/java -XX:+UseNewCode
>> NMT Block at 0x00007f33c01a8e70, corruption at: 0x00007f33c01a9e80:
>> 0x00007f33c01a8df0: 00 00 00 00 00 00 00 00 75 00 00 00 00 00 00 00
>> 0x00007f33c01a8e00: 50 00 00 00 00 00 00 00 00 00 00 00 01 00 9e e9
>> 0x00007f33c01a8e10: a8 b2 46 c6 33 7f 00 00 d0 7a 11 c0 33 7f 00 00
>> 0x00007f33c01a8e20: f0 4e 11 c0 33 7f 00 00 50 50 52 c6 33 7f 00 00
>> 0x00007f33c01a8e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8e40: 00 00 00 00 00 00 00 00 a0 ae 52 c6 33 7f 00 00
>> 0x00007f33c01a8e50: ff b7 f7 c5 33 7f 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8e60: e8 8e 00 00 00 00 00 00 25 10 00 00 00 00 00 00
>> 0x00007f33c01a8e70: 00 10 00 00 00 00 00 00 00 00 00 00 0f 00 9e e9
>> 0x00007f33c01a8e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ...
>> 0x00007f33c01a9e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e80: 00 00 00 00 00 00 00 00 81 01 00 00 00 00 00 00
>> 0x00007f33c01a9e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> # To suppress the following error report, specify this argument
>>
>>
>> At first glance it seems that "nof_mallocs" etc are missing from this output. This is true, I removed them. They are not needed since the hs-err file contains an NMT summary report, which has more and finer grained information than that.
>>
>> -----------
>>
>>
>> Tests:
>> - GHAs (the windows failures are related to 8241403: "JavaThread::get_thread_name() should be ThreadSMR-aware")
>> - we tested this patch for several days in SAP nightlies
>> - manual tests x64 and x86 Linux
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Merge branch 'master' into JDK-8277822-Remove-debug-only-heap-overrun-checks-in-os-malloc-and-friends
> - Merge branch 'master' into JDK-8277822-Remove-debug-only-heap-overrun-checks-in-os-malloc-and-friends
> - initialize newly malloced memory
> - JDK-8277822-Remove-debug-only-heap-overrun-checks-in-os-malloc-and-friends
Hi Coleen,
thanks for looking at this! I already had given up hope :)
> Could you provide a link where NMT checks for heap overwrites to save me looking for it? Thanks.
Sure. This came all with https://bugs.openjdk.java.net/browse/JDK-8275320.
Upon free (either one of os::free or os::realloc) we call, if NMT is enabled, `MallocTracker::record_free` -> `MallocHeader::release()` -> `MallocHeader::check_block_integrity()`. There, both header and footer canary of the block-to-be-freed are checked, as well as some other stuff.
https://github.com/openjdk/jdk/blob/fb8fdc0fbf17dd7e900cb688df4917b97b26b9ab/src/hotspot/share/services/mallocTracker.cpp#L160-L220
I also added an ASCII picture of the NMT headers:
https://github.com/openjdk/jdk/blob/fb8fdc0fbf17dd7e900cb688df4917b97b26b9ab/src/hotspot/share/services/mallocTracker.hpp#L240-L292
that may clarify things. We have two canaries, one in the header directly preceding the user payload, one following the user payload.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6554
More information about the hotspot-runtime-dev
mailing list