RFR: 8360048: NMT crash in gtest/NMTGtests.java: fatal error: NMT corruption: Block at 0x0000017748307120: header canary broken [v2]
Gerard Ziemski
gziemski at openjdk.org
Mon Jul 14 22:20:39 UTC 2025
On Mon, 14 Jul 2025 20:56:08 GMT, David Holmes <dholmes at openjdk.org> wrote:
> Really hard to understand where the fix is. Bug synopsis also does not help :) I assume the problem is really in the test?
Agree, it's confusing, Afshin said:
> The canary header test failed since there were concurrent remove and free() from the tree. The remove operations are synch'ed with corresponding NMT lock.
but frankly I don't see any locks involved in this code path:
This where we detect the issue:
inline OutTypeParam MallocHeader::resolve_checked_impl(InTypeParam memblock) {
char msg[256];
address corruption = nullptr;
if (!is_valid_malloced_pointer(memblock, msg, sizeof(msg))) {
fatal("Not a valid malloc pointer: " PTR_FORMAT ": %s", p2i(memblock), msg);
}
OutTypeParam header_pointer = (OutTypeParam)memblock - 1;
if (!header_pointer->check_block_integrity(msg, sizeof(msg), &corruption)) {
header_pointer->print_block_on_error(tty, corruption != nullptr ? corruption : (address)header_pointer);
fatal("NMT corruption: Block at " PTR_FORMAT ": %s", p2i(memblock), msg);
}
return header_pointer;
}
called by:
inline MallocHeader* MallocHeader::resolve_checked(void* memblock) {
return MallocHeader::resolve_checked_impl<void*, MallocHeader*>(memblock);
}
called by:
void* MallocTracker::record_free_block(void* memblock) {
...
MallocHeader* header = MallocHeader::resolve_checked(memblock);
...
}
called by:
static inline void* record_free(void* memblock) {
...
return MallocTracker::record_free_block(memblock);
}
called by:
void os::free(void *memblock) {
...
permit_forbidden_function::free(old_outer_ptr);
}
called by:
void Treap::remove_all() {
...
_allocator.free(head);
...
}
called by:
static void test_add_committed_region_adjacent_overlapping() {
RegionsTree* rtree = VirtualMemoryTracker::Instance::tree();
rtree->tree().remove_all();
size_t size = 0x01000000;
ReservedSpace rs = MemoryReserver::reserve(size, mtTest);
MemTracker::NmtVirtualMemoryLocker nvml;
...
As you can see in the old code, we call `remove_all` before we lock (MemTracker::NmtVirtualMemoryLocker)
I think the simplest temp fix here would be to do:
static void test_add_committed_region_adjacent_overlapping() {
MemTracker::NmtVirtualMemoryLocker nvml;
RegionsTree* rtree = VirtualMemoryTracker::Instance::tree();
rtree->tree().remove_all();
size_t size = 0x01000000;
ReservedSpace rs = MemoryReserver::reserve(size, mtTest);
In the new code we don't call `remove_all()`
Afshin original fix incorporated feedback, not directly applicable to this fix, and now I wish we went with a simple fix and left other enhancements for later. Live and learn...
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26284#issuecomment-3071197578
More information about the hotspot-runtime-dev
mailing list