RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
Volker Simonis
simonis at openjdk.java.net
Thu Nov 18 13:37:49 UTC 2021
On Thu, 11 Nov 2021 06:30:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>>
>> This proposal adds NMT buffer overflow checking:
>>
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>>
>> For more details, please see the JBS issue.
>>
>> ----
>>
>> Patch notes:
>>
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>>
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>>
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>>
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>>
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>>
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>>
>> --------------
>>
>> Example output a buffer overrun would provide:
>>
>>
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1:
>> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>>
>> -------
>>
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>
> - Merge
> - Let NMT do overflow detection
Hi Thoms,
your change looks good. I only have a few remarks and comments inline.
Best regards,
Volker
src/hotspot/share/services/mallocTracker.cpp line 138:
> 136: os::print_hex_dump(st, from, to, 1);
> 137: assert(bad_address >= from, "sanity");
> 138: // if the corruption is in the block body of in the footer, print out that part too
// If the ... body or in the footer...
src/hotspot/share/services/mallocTracker.cpp line 143:
> 141: from2 = MAX2(to, from2);
> 142: address to2 = from2 + 96;
> 143: if (to2 > to) {
Don't understand this. If `from2 = MAX2(to, from2)` then `from2 >= to`. So shouldn't `to2` (which is `from2 + 96`) always be bigger then `to`?
src/hotspot/share/services/mallocTracker.cpp line 169:
> 167: // use SafeFetch but since this is a hot path we don't. If we are
> 168: // wrong, we will crash when accessing the canary, which hopefully
> 169: // generates distinct crash report.
No need for two spaces after `//`
src/hotspot/share/services/mallocTracker.cpp line 174:
> 172: // we check here are the bare minimum of what we know will malloc() give us
> 173: // (which is 64-bit even on 32-bit platforms).
> 174: if (!is_aligned(this, sizeof(uint64_t))) {
Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions:
> "malloc returns a pointer which is suitably aligned for any built-in type"
Why is this 64 bit on a 32-bit platform?
src/hotspot/share/services/mallocTracker.hpp line 314:
> 312: static const uint8_t _footer_canary_dead_mark = 0xFB;
> 313: NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;)
> 314: NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;)
Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values?
test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 71:
> 69: // this should generate two hex dumps, one with the front header, one with the overwritten
> 70: // portion.
> 71: static void test_overwrite_back_long() {
I think the test isn't really checking that we get two hex dumps, right?
-------------
PR: https://git.openjdk.java.net/jdk/pull/5952
More information about the hotspot-dev
mailing list