RFR: JDK-8277822: Remove debug-only heap overrun checks in os::malloc and friends [v3]

David Holmes dholmes at openjdk.java.net
Tue Jan 18 01:36:28 UTC 2022


On Fri, 14 Jan 2022 14:20:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> After integrating C heap overrun checks into NMT (https://bugs.openjdk.java.net/browse/JDK-8275320), we can now remove debug-only guarding memory for os::malloc() and friends.
>> 
>> This will:
>> - reduce complexity of `os::malloc`, `os::free`, `os::realloc` (especially that one since realloc is tricky)
>> - saves *48 bytes per malloc* in debug builds. In my experiences this can sum up to a quite substantial overhead compared to release builds, especially since malloc allocations are often fine grained.
>> - reduce behavioral differences between debug and release builds wrt memory layout and allocation paths.
>> 
>> Currently, we have always-on heap overrun checks in debug VMs. In order to preserve this feature after the patch, NMT is switched always on per default in debug builds (but can be switched off if needed, as opposed to the baked-in debug-only guards today).
>> 
>> We use NMT level summary here, which is very cheap since we don't track stacks, we just count some numbers in NMT categories. Memory overhead will still be a lot less than what we paid before with the quite generous debug-only guards.
>> 
>> Finally, this would also be a good coverage test for NMT.
>> 
>> For more information, please refer to the umbrella JBS (https://bugs.openjdk.java.net/browse/JDK-8275301).
>> 
>> -----------
>> 
>> Memory stomp reports are better now since NMT buffer overrun reporting (see JDK-8275320) is better. 
>> 
>> Before:
>> 
>> 
>> [0.051s][warning][malloc,free] ## nof_mallocs = 11948, nof_frees = 2017
>> [0.051s][warning][malloc,free] ## memory stomp:
>> [0.051s][warning][malloc,free] GuardedMemory(0x00007fb3115ea560) base_addr=0x00007fb30c1df630 tag=0x0000000000000000 user_size=4096 user_data=0x00007fb30c1df650
>> [0.051s][warning][malloc,free]   Header guard @0x00007fb30c1df630 is OK
>> [0.051s][warning][malloc,free]   Trailer guard @0x00007fb30c1e0650 is BROKEN
>> [0.051s][warning][malloc,free]   User data appears to be in use
>> # To suppress the following error report, specify this argument
>> 
>> 
>> Old reporting was susceptible to C-heap corruption: a sufficiently corrupted C-heap caused the libc to abort before reporting was completed:
>> 
>> 
>> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-fastdebug$ ./images/jdk/bin/java -XX:+UseNewCode
>> [0.047s][warning][malloc,free] ## nof_mallocs = 11948, nof_frees = 2017
>> [0.047s][warning][malloc,free] ## memory stomp:
>> java: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
>> Aborted (core dumped)
>> 
>> 
>> The new reports, done by NMT, print out a hex dump of the corrupted area, before calling into libc. So this printout is something we always get, even if C-heap is corrupted beyond help:
>> 
>> 
>> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-fastdebug$ ./images/jdk/bin/java -XX:+UseNewCode
>> NMT Block at 0x00007f33c01a8e70, corruption at: 0x00007f33c01a9e80: 
>> 0x00007f33c01a8df0:   00 00 00 00 00 00 00 00 75 00 00 00 00 00 00 00
>> 0x00007f33c01a8e00:   50 00 00 00 00 00 00 00 00 00 00 00 01 00 9e e9
>> 0x00007f33c01a8e10:   a8 b2 46 c6 33 7f 00 00 d0 7a 11 c0 33 7f 00 00
>> 0x00007f33c01a8e20:   f0 4e 11 c0 33 7f 00 00 50 50 52 c6 33 7f 00 00
>> 0x00007f33c01a8e30:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8e40:   00 00 00 00 00 00 00 00 a0 ae 52 c6 33 7f 00 00
>> 0x00007f33c01a8e50:   ff b7 f7 c5 33 7f 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8e60:   e8 8e 00 00 00 00 00 00 25 10 00 00 00 00 00 00
>> 0x00007f33c01a8e70:   00 10 00 00 00 00 00 00 00 00 00 00 0f 00 9e e9
>> 0x00007f33c01a8e80:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8e90:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ea0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8eb0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ec0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ed0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a8ee0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> ...
>> 0x00007f33c01a9e00:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e10:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e20:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e30:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e40:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e50:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e60:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e70:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9e80:   00 00 00 00 00 00 00 00 81 01 00 00 00 00 00 00
>> 0x00007f33c01a9e90:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ea0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9eb0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ec0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ed0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ee0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00007f33c01a9ef0:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> # To suppress the following error report, specify this argument
>> 
>> 
>> At first glance it seems that "nof_mallocs" etc are missing from this output. This is true, I removed them. They are not needed since the hs-err file contains an NMT summary report, which has more and finer grained information than that.
>> 
>> -----------
>> 
>> 
>> Tests: 
>> - GHAs (the windows failures are related to 8241403: "JavaThread::get_thread_name() should be ThreadSMR-aware")
>> - we tested this patch for several days in SAP nightlies
>> - manual tests x64 and x86 Linux
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8277822-Remove-debug-only-heap-overrun-checks-in-os-malloc-and-friends
>  - Merge branch 'master' into JDK-8277822-Remove-debug-only-heap-overrun-checks-in-os-malloc-and-friends
>  - initialize newly malloced memory
>  - JDK-8277822-Remove-debug-only-heap-overrun-checks-in-os-malloc-and-friends

src/hotspot/share/runtime/globals.hpp line 546:

> 544:           range(0, 9)                                                       \
> 545:                                                                             \
> 546:   product(ccstr, NativeMemoryTracking, DEBUG_ONLY("summary") NOT_DEBUG("off"), \

Hi Thomas,

I'm a bit concerned about always enabling part of NMT in debug as that affects the bulk of our hotspot testing. What is the potential overhead from enabling NMT this way? We would need to run this patch through our CI to ensure it doesn't introduce any issues.

Cheers,
David

-------------

PR: https://git.openjdk.java.net/jdk/pull/6554


More information about the hotspot-runtime-dev mailing list