RFR: JDK-8304815: Use NMT for more precise hs_err location printing [v4]

Johan Sjölen jsjolen at openjdk.org
Fri Mar 24 11:47:35 UTC 2023


On Fri, 24 Mar 2023 07:13:23 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> (This is a byproduct of work on the arm port for https://github.com/openjdk/jdk/pull/10907. I needed better debugging information in the hs-err file and in gdb.)
>> 
>> Back in 2022 @zhengyu123 had the very nice idea of using NMT mapping info for smartening up pp in gdb: [JDK-8280289](https://bugs.openjdk.org/browse/JDK-8280289).
>> 
>> The same idea can be applied to hs_err file location printing. NMT has information about all its mappings and can tell us where it thinks a given unknown pointer points into.
>> 
>> This could be even more useful if the "find malloc block" part of that functionality would be smarter. As it is now, it only works if the pointer in question points to the start of a user-allocated area. Would be nice if the code could (carefully) search for the next valid-looking malloc header instead.
>> 
>> --------------
>> 
>> This patch does this: we introduce a new API, MemTracker::print_containing_region(void*), that tries to make sense of a given unknown pointer.
>> 
>> It will search its mmap regions and print those if found. For malloc'ed pointers, it will carefully sniff out the immediate surroundings of the block, trying to find what looks like a valid malloc header. It uses SafeFetch to not trip over unmapped or protected pages. Note that, of course, we may get false recognition positives if it finds something that looks like a valid header. But even that could be useful (e.g. a remnant dead header may indicate we access memory after free).
>> 
>> Looks like this (its arm, so 32-bit pointers):
>> 
>> 
>> Register to memory mapping:
>> 
>> ->  r0  = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer
>> ->  r1  = 0x75b02010 into life malloced block starting at 0x75b02010, size 184, tag mtObjectMonitor
>> ->  r2  = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer
>>     r3  = 0x0 is nullptr
>> ->  r4  = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer
>>     r5  = 0xb6d3bbc8: <offset 0x018f6bc8> in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000
>>     r6  = 0xffffffff is an unknown value
>>     r7  = 0x0 is nullptr
>>     r8  = 0x0000000a is an unknown value
>>     r9  = 0x728a4308 is a thread
>>     r10 = 0x0000000b is an unknown value
>> ->  fp  = 0x753fe8cc in mmap'd memory region [0x75380000 - 0x75400000] by Thread Stack
>>     r12 = 0x0 is nullptr
>> ->  sp  = 0x753fe8b0 in mmap'd memory region [0x75380000 - 0x75400000] by Thread Stack
>>     lr  = 0xb69bc1d0: <offset 0x015771d0> in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000
>>     pc  = 0xb69c8670: <offset 0x01583670> in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000
>> 
>> 
>> 
>> The small caveat here is that NMT reporting needs ThreadCritical, and thus using it for location printing may block error reporting if it crashed inside a ThreadCritical section. We face the same issue today when printing the NMT report as part of the hs-err file. 
>> 
>> I think the usefulness of these printouts justify this risk. However I opened https://bugs.openjdk.org/browse/JDK-8304824 to investigate a better locking strategy for NMT.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More small fixes

Thanks!

I've got some requested changes, but I like it as a whole.

src/hotspot/share/services/mallocHeader.inline.hpp line 122:

> 120: // Used for debugging purposes only. Check header if it could constitute a valid (life or dead) header.
> 121: inline bool MallocHeader::looks_valid() const {
> 122:   // Note: we define these restrictions lose enough to also catch moderately corrupted blocks.

lose -> loose

src/hotspot/share/services/mallocTracker.cpp line 201:

> 199:   if (!MemTracker::enabled()) {
> 200:     return false;
> 201:   }

I'll paraphrase you: "Your robbing MemTracker of its purpose in life!" :). This should be an `assert` and the corresponding `MemTracker` function should check whether it's enabled or not (which it does).

src/hotspot/share/services/mallocTracker.cpp line 253:

> 251:       }
> 252:     }
> 253:     candidate --;

I rewrote the loop, trying to do as many up front checks as possible and add more comments. It works exactly the same, but I thought this was a bit easier for a first reader to go through. If you like some or all of it, please apply it!

```c++
for (; candidate >= last_candidate; candidate--) {
  if (!os::is_readable_pointer(candidate)) {
    // Probably OOB, give up
    return false;
  }
  if (!candidate->looks_valid()) {
    // This is definitely not a header, go on to the next candidate.
    continue;
  }

  // fudge factor:
  // We don't report blocks for which p is clearly outside of. That would cause us to return true and possibly prevent
  // subsequent tests of p, see os::print_location(). But if p is just outside of the found block, this may be a
  // narrow oob error and we'd like to know that.
  const int fudge = 8;
  const address start_block = (address)candidate;
  const address start_payload = (address)(candidate + 1);
  const address end_payload = start_payload + candidate->size();
  const address end_payload_plus_fudge = end_payload + fudge;
  if (!(addr >= start_block && addr < end_payload_plus_fudge)) {
    // The address of the pointer is not within what the header recorded.
    if (candidate->is_live()) {
      // The candidate is live, meaning we found a header
      // of some allocation that is not part of this pointer.
      // Then we can give up, we're clearly reading into some other payload.
      return false;
    } else {
      // The candidate is dead, so this may be a remnant of an older freed block.
      // We continue searching.
      continue;
    }
  }

  // We've found a reasonable candidate. Print the info.
  const char* where = nullptr;
  if (addr < start_payload) {
    where = "into header of";
  } else if (addr < end_payload) {
    where = "into";
  } else {
    where = "just outside of";
  }
  st->print_cr(PTR_FORMAT " %s %s malloced block starting at " PTR_FORMAT ", size " SIZE_FORMAT ", tag %s",
               p2i(p), where,
               (candidate->is_dead() ? "dead" : "live"),
               p2i(candidate + 1), // lets print the payload start, not the header
               candidate->size(), NMTUtil::flag_to_enum_name(candidate->flags()));
  if (MemTracker::tracking_level() == NMT_detail) {
    NativeCallStack ncs;
    if (candidate->get_stack(ncs)) {
      ncs.print_on(st);
      st->cr();
    }
  }
  return true;
}

test/hotspot/gtest/nmt/test_nmt_locationprinting.cpp line 31:

> 29: #include "unittest.hpp"
> 30: 
> 31: //#define LOG_PLEASE

Outdated?

-------------

Changes requested by jsjolen (Committer).

PR Review: https://git.openjdk.org/jdk/pull/13162#pullrequestreview-1356413633
PR Review Comment: https://git.openjdk.org/jdk/pull/13162#discussion_r1147397935
PR Review Comment: https://git.openjdk.org/jdk/pull/13162#discussion_r1147405406
PR Review Comment: https://git.openjdk.org/jdk/pull/13162#discussion_r1147469057
PR Review Comment: https://git.openjdk.org/jdk/pull/13162#discussion_r1147471544


More information about the hotspot-dev mailing list