RFR: 8369393: NMT: poison the canaries of malloc header under ASAN build [v24]

Tue Nov 25 14:25:32 UTC 2025

On Wed, 19 Nov 2025 10:48:35 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

>> NMT can detect malloc'd memory corruption using canary tests at header and footer of every memory region. This can only be done at free time of the memory where NMT checks the canaries and report error if they are not as expected.
>> In this PR, the canary parts also are poisoned using ASAN API to get notified whenever a read/write op is done. on the canary parts. `_size` member of the malloc header is also poisoned, since it is used for finding the footer address.
>
> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision:
> 
>   pragma fix

Hi @afshin-zafari , 

In general, I support the goal of this PR, but in its current form, it incurs too high a cost for the added complexity. 

The benefit this PR provides is that, with NMT enabled, we won't catch 1-2-byte overwriters at the end and up to 16-byte underwriters at the start of a block. Underwriters are very rare; overwriters are a lot more common. So, if someone overwrites by one or two bytes, today we only see it if the block is free'd. If someone overwrites three bytes, we already have protection. With your patch, we see also 1-2 byte overwriters.

But there is a straightforward workaround: just disable NMT. Then, the protection extends right to the borders of the user payload.

Considering that, the benefit of this PR is there, but it's not huge. Therefore, the complexity cost should be correspondingly low.

As I wrote before, I don't think poisoning and unpoisoning in this fine-grained fashion is the best approach. It often makes little sense. For example, in os::malloc we have at least one poison-unpoison-poison cycle, possibly more. It would be a lot simpler to do this carefully and only where it's needed:

- poison the malloc header and footer when we *hand out a malloc'ed block* to the user *at the end of*  `os::malloc` or `os::realloc`
- unpoison the header and footer *on the entry* to `os::realloc` and `os::free` to be able to read the MST marker.

That's it. No more poisoning should be needed.

NMT block integrity checks should simply not run for ASAN. They make zero sense. They check header corruptions, but who is going to corrupt the canary if the header is now protected by ASAN? In fact, we should not even need to initialize the canaries.

Basically, that means not calling either one of `check_block_integrity` or `is_valid_malloced_pointer` or `print_block_on_error` if ASAN is enabled. Resp. stubbing them out for ASAN. But I see only one call site that does this. Ideally, one would have a new define like "NMT_BLOCK_INTEGRITY_CHECKS", and guard the NMT integrity checks with that one, and switch it off for ASAN.

That means that a large part of this PR can be simplified. Many functions can stay as they are, and there should be no need for an RAII object like `AsanPoisoningHelper`. We should only have two poison calls for the header and footer, and two unpoison calls, none of which share the same extent, so RAII can't be used.

Other than that, there are many cosmetic changes in this PR, e.g., many accessor functions added. That is a matter of taste - they also create unnecessary noise. But I would prefer it if these cosmetics were done in a separate PR - I would really like to see only the functional changes in this PR.

src/hotspot/share/nmt/mallocHeader.hpp line 141:

> 139:   void asan_unpoison_self() const {
> 140:     AsanPoisoningHelper::unpoison_memory(this, sizeof(MallocHeader));
> 141:     AsanPoisoningHelper::unpoison_memory(footer_address(), sizeof(uint16_t));

Why not use the macros directly?

src/hotspot/share/nmt/mallocHeader.inline.hpp line 38:

> 36: 
> 37: inline MallocHeader::MallocHeader(size_t size, MemTag mem_tag, uint32_t mst_marker)
> 38:   : _size(size), _mst_marker(mst_marker), _mem_tag(mem_tag), _unused(0),

Why this initialization? It seems incidental to the goal of the PR.

src/hotspot/share/nmt/mallocTracker.cpp line 199:

> 197:   {
> 198:     const MallocHeader* header2 = MallocHeader::resolve_checked(memblock);
> 199:     AsanPoisoningHelper aph(header2, sizeof(MallocHeader));

Why is this needed? We just malloced, the header should be unpoisened.

src/hotspot/share/sanitizers/address.hpp line 85:

> 83: // General un/poisoning API are provided too.
> 84: class AsanPoisoningHelper {
> 85:   const void* _memory_region;

I meant pointer mutability:

Suggestion:

  const void* const _memory_region;

The object should be immutable. There is no need for it to be mutable; changing the address in-flight would be an error.

src/hotspot/share/sanitizers/address.hpp line 95:

> 93:     ASAN_POISON_MEMORY_REGION(_memory_region, _size);
> 94:   }
> 95:   static void poison_memory(const void* addr, const size_t size) {

Why are these functions needed? They don't seem to add anything to `ASAN_POISON_MEMORY_REGION` ?

Not only for complexity reduction, but also for "grep-ability" . The more aliasing of names one does, the more complex it gets to do simple searches quickly like "where do we unpoison memory?"

src/hotspot/share/sanitizers/address.hpp line 106:

> 104: 
> 105: class AsanPoisoningHelper {
> 106:  public:

For more brevity, this class can look like this:

struct AsanPoisoningHelper {
  AsanPoisoningHelper() = delete;
  AsanPoisoningHelper(const void* addr, const size_t size) {}
};

src/hotspot/share/sanitizers/address.hpp line 109:

> 107:   AsanPoisoningHelper() = delete;
> 108:   AsanPoisoningHelper(const void* addr, const size_t size) {}
> 109:   ~AsanPoisoningHelper() { }

destructor is not needed

src/hotspot/share/sanitizers/address.hpp line 111:

> 109:   ~AsanPoisoningHelper() { }
> 110:   static void poison_memory(const void* addr, const size_t size) { }
> 111:   static void unpoison_memory(const void* addr, size_t size) { }

Arguably not needed, see my remark above

-------------

PR Review: https://git.openjdk.org/jdk/pull/27685#pullrequestreview-3482227957
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2559974057
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2559975874
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2559962335
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2558658174
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2541647225
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2559498936
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2559496332
PR Review Comment: https://git.openjdk.org/jdk/pull/27685#discussion_r2559500304