RFR: 8360048: NMT crash in gtest/NMTGtests.java: fatal error: NMT corruption: Block at 0x0000017748307120: header canary broken [v2]

Gerard Ziemski gziemski at openjdk.org
Mon Jul 14 22:20:39 UTC 2025


On Mon, 14 Jul 2025 20:56:08 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Really hard to understand where the fix is. Bug synopsis also does not help :) I assume the problem is really in the test?

Agree, it's confusing, Afshin said:

> The canary header test failed since there were concurrent remove and free() from the tree. The remove operations are synch'ed with corresponding NMT lock.

but frankly I don't see any locks involved in this code path:

This where we detect the issue:


inline OutTypeParam MallocHeader::resolve_checked_impl(InTypeParam memblock) {
  char msg[256];
  address corruption = nullptr;
  if (!is_valid_malloced_pointer(memblock, msg, sizeof(msg))) {
    fatal("Not a valid malloc pointer: " PTR_FORMAT ": %s", p2i(memblock), msg);
  }
  OutTypeParam header_pointer = (OutTypeParam)memblock - 1;
  if (!header_pointer->check_block_integrity(msg, sizeof(msg), &corruption)) {
    header_pointer->print_block_on_error(tty, corruption != nullptr ? corruption : (address)header_pointer);
    fatal("NMT corruption: Block at " PTR_FORMAT ": %s", p2i(memblock), msg);
  }
  return header_pointer;
}


called by:


inline MallocHeader* MallocHeader::resolve_checked(void* memblock) {
  return MallocHeader::resolve_checked_impl<void*, MallocHeader*>(memblock);
}


called by:


void* MallocTracker::record_free_block(void* memblock) {
...
  MallocHeader* header = MallocHeader::resolve_checked(memblock);
...
}


called by:


static inline void* record_free(void* memblock) {
...
    return MallocTracker::record_free_block(memblock);
}


called by:


void  os::free(void *memblock) {
...
  permit_forbidden_function::free(old_outer_ptr);
}


called by:


  void Treap::remove_all() {
...
      _allocator.free(head);
...
  }


called by:


  static void test_add_committed_region_adjacent_overlapping() {
    RegionsTree* rtree = VirtualMemoryTracker::Instance::tree();
    rtree->tree().remove_all();

    size_t size  = 0x01000000;
    ReservedSpace rs = MemoryReserver::reserve(size, mtTest);
    MemTracker::NmtVirtualMemoryLocker nvml;
...


As you can see in the old code, we call `remove_all` before we lock (MemTracker::NmtVirtualMemoryLocker)

I think the simplest temp fix here would be to do:


  static void test_add_committed_region_adjacent_overlapping() {
    MemTracker::NmtVirtualMemoryLocker nvml;

    RegionsTree* rtree = VirtualMemoryTracker::Instance::tree();
    rtree->tree().remove_all();

    size_t size  = 0x01000000;
    ReservedSpace rs = MemoryReserver::reserve(size, mtTest);


In the new code we don't call `remove_all()`

Afshin original fix incorporated feedback, not directly applicable to this fix, and now I wish we went with a simple fix and left other enhancements for later. Live and learn...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26284#issuecomment-3071197578


More information about the hotspot-runtime-dev mailing list