Crash in the MallocTracker::record_free
David Holmes
david.holmes at oracle.com
Mon Aug 15 22:52:48 UTC 2022
On 16/08/2022 2:04 am, Thomas Stüfe wrote:
> From the hex dump, I think it possible that you try to delete a
> HandleMark, and that crashes.
>
> This corresponds with the code location in the stack, which is
>
> ```
> V [libjvm.so+0x1a6d77a] Thread::~Thread()+0x8a
> ```
>
> if gdb does not lie to me, this is near:
>
> ```
> 0x7ffff74f91a9 <Thread::~Thread()+137>: callq 0x7ffff6843210
> <HandleMark::operator delete(void*)>
> ```
>
> He complains about a broken NMT header at 0x00007fba3f56ca90. So this is
> the assumed NMT header:
>
> ```
> 0x00007fba3f56ca90: 00 00 00 00 00 00 00 00 98 d0 37 3f ba 7f 00 00
> ```
>
> But this is what the caller actually freed. Since NMT header precedes
> the allocation:
>
> ```
> 0x00007fba3f56caa0: 90 98 02 38 ba 7f 00 00 80 a3 02 38 ba 7f 00 00
> 0x00007fba3f56cab0: e0 a3 02 38 ba 7f 00 00 f0 a3 02 38 ba 7f 00 00
> 0x00007fba3f56cac0: c8 a4 02 38 ba 7f 00 00 d8 00 00 00 00 00 00 00
> 0x00007fba3f56cad0: 60 a6 02 38 ba 7f 00 00 a0 e4 89 3e ba 7f 00 00
> 0x00007fba3f56cae0: 90 98 02 38 ba 7f 00 00 90 ac 02 38 ba 7f 00 00
> 0x00007fba3f56caf0: 78 05 40 f0 b9 7f 00 00 00 00 00 00 00 00 00 00
> 0x00007fba3f56cb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ```
>
> HandleMark has seven pointer-sized members:
>
> ```
> Thread *_thread; // thread that owns this mark
> HandleArea *_area; // saved handle area
> Chunk *_chunk; // saved arena chunk
> char *_hwm, *_max; // saved arena info
> size_t _size_in_bytes; // size of handle area
> // Link to previous active HandleMark in thread
> HandleMark* _previous_handle_mark;
> ```
>
> If I interprete 0x00007fba3f56caa0 as HandleMark*:
>
> ```
> 0x00007fba3f56caa0: 90 98 02 38 ba 7f 00 00 80 a3 02 38 ba 7f 00 00
> _thread _area
> 0x00007fba3f56cab0: e0 a3 02 38 ba 7f 00 00 f0 a3 02 38 ba 7f 00 00
> _chunk _hwm
> 0x00007fba3f56cac0: c8 a4 02 38 ba 7f 00 00 d8 00 00 00 00 00 00 00
> _max _size_in_bytes
> 0x00007fba3f56cad0: 60 a6 02 38 ba 7f 00 00
> _previous_handle_mark
> ```
>
> Note how this fits:
>
> - The only 8-byte datum in this range is a size-like value for
> _size_in_bytes (d8 = 216). All other slots look like pointers.
>
> - _chunk (0x7fba3802a3e0), _hwm (0x7fba3802a3f0) and _max
> (0x7fba3802a4c8) look like they fit an Arena of size 216 perfectly:
> _max - _chunk = 232. A Chunk looks like this:
>
> ```
> //
> // +-----------+--+--------------------------------------------+
> // | |g | |
> // | Chunk |a | Payload |
> // | |p | |
> // +-----------+--+--------------------------------------------+
> // A B C D
> ```
>
> and it has two pointer-sized members, so sizeof(Chunk) = 16. (232 - 16)
> gives you the arena size of 216.
>
> Also, _hwm points to the beginning of the Arena (_chunk + 16). So this
> looks like a mark for a HandleArea that has been cleared or never used.
>
> -------------
>
> I see two possibilities. Either someone overwrote the NMT header of the
> block. Or, this HandleMark has never been allocated on the C-heap but
> lives on the stack. In that case, something with
> `Thread::thread->last_handle_mark()` could have gone wrong. Most
> HandleMark instances live on the stack, the only ones that don't are the
> initial marks created in `Thread::Thread()`.
>
> -----
>
> This is all pure hypotheses, since I don't have an hs-err file or any
> information other than the stack and the hex dump.
Thanks Thomas! That was way more than I expected! I suspect there is a
HandleMark on the stack. I'm doing something unusual, but not prohibited.
Cheers,
David
> Cheers, Thomas
>
> On Mon, Aug 15, 2022 at 4:02 PM <daniel.daugherty at oracle.com
> <mailto:daniel.daugherty at oracle.com>> wrote:
>
> This discussion reminded me of one of the bugs that I filed last week:
>
> JDK-8292318
> runtime/cds/appcds/dynamicArchive/methodHandles/MethodHandlesAsCollectorTest.java
>
> fails with "fatal error: NMT corruption: Block at <addr>: block address
> is unaligned"
> https://bugs.openjdk.org/browse/JDK-8292318
> <https://bugs.openjdk.org/browse/JDK-8292318>
>
> So there might some overlap here...
>
> Dan
>
>
>
> On 8/15/22 9:17 AM, David Holmes wrote:
> > On 15/08/2022 7:21 pm, Thomas Stüfe wrote:
> >>
> >>
> >> On Mon, Aug 15, 2022 at 11:16 AM David Holmes
> >> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com <mailto:david.holmes at oracle.com>>>
> wrote:
> >>
> >> On 15/08/2022 6:28 pm, Thomas Stüfe wrote:
> >> > On Mon, Aug 15, 2022 at 10:27 AM Thomas Stüfe
> >> <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>
> <mailto:thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
> >> > <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>
> >> <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>>>> wrote:
> >> >
> >> >
> >> > On Mon, Aug 15, 2022 at 10:08 AM David Holmes
> >> > <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> >> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>>
> >> wrote:
> >> >
> >> > Hi Thomas,
> >> >
> >> > On 15/08/2022 3:42 pm, Thomas Stüfe wrote:
> >> > > ..The only way I could see this happening is
> if tty
> >> itself is
> >> > not valid.
> >> > > Is this very early during initialization?
> >> >
> >> > Quite the opposite, it is during VM termination.
> >> >
> >> >
> >> > Yes, that makes sense too. tty
> is defaultStream::instance,
> >> which we
> >> > delete in ostream_exit() in DestroyVM.
> >> >
> >> > Since this has bugging me for a while (more during
> >> > pre-initialization, but its the same problem) I opened
> >> > https://bugs.openjdk.org/browse/JDK-8292351
> <https://bugs.openjdk.org/browse/JDK-8292351>
> >> <https://bugs.openjdk.org/browse/JDK-8292351
> <https://bugs.openjdk.org/browse/JDK-8292351>>
> >> > <https://bugs.openjdk.org/browse/JDK-8292351
> <https://bugs.openjdk.org/browse/JDK-8292351>
> >> <https://bugs.openjdk.org/browse/JDK-8292351
> <https://bugs.openjdk.org/browse/JDK-8292351>>> to track this. Should
> >> > be trivial to fix.
> >>
> >> Perhaps trivial to not crash, but it means we are attempting
> various
> >> "logging" actions after they can't be done - which is a pain.
> >> Afterall
> >> we do want to see the real error here.
> >>
> >>
> >> That can happen though. As in your case, NMT telling us that the
> >> C-heap is corrupted during VM cleanup. So it is a valid scenario.
> >
> > Sure, I guess we can try and do the stream/tty deletion a bit
> later ...
> >
> >> >
> >> > As for your crash, if you comment out "delete
> >> defaultStream::instance;"
> >> > in ostream.cpp, you should be able to see the real issue.
> >>
> >> Thanks - needed to do a little more than just that but I got
> the
> >> idea.
> >> So that leads me to:
> >>
> >> # Internal Error
> >>
> (/scratch/users/daholme/jdk-dev3.git/open/src/hotspot/share/services/mallocHeader.inline.hpp:61),
> >>
> >> pid=2666, tid=2668
> >> # fatal error: NMT corruption: Block at 0x00007fba3f56ca90:
> header
> >> canary broken
> >>
> >>
> >> No hex dump?
> >
> > stdout: [NMT Block at 0x00007fba3f56ca90, corruption at:
> > 0x00007fba3f56ca90:
> > 0x00007fba3f56ca10: e0 ca 56 3f ba 7f 00 00 f8 cc 56 3f ba 7f 00 00
> > 0x00007fba3f56ca20: 0a 00 00 00 00 00 00 00 78 05 40 f0 b9 7f 00 00
> > 0x00007fba3f56ca30: 40 03 7e 27 ba 7f 00 00 08 cc 56 3f ba 7f 00 00
> > 0x00007fba3f56ca40: 50 cb 56 3f ba 7f 00 00 f0 27 6f 3d ba 7f 00 00
> > 0x00007fba3f56ca50: 01 00 00 00 00 00 00 00 90 98 02 38 ba 7f 00 00
> > 0x00007fba3f56ca60: 02 00 00 00 00 00 00 00 a0 e4 89 3e ba 7f 00 00
> > 0x00007fba3f56ca70: a0 ca 56 3f ba 7f 00 00 0a 00 00 00 0e 00 00 00
> > 0x00007fba3f56ca80: 40 03 7e 27 ba 7f 00 00 f8 cc 56 3f ba 7f 00 00
> > 0x00007fba3f56ca90: 00 00 00 00 00 00 00 00 98 d0 37 3f ba 7f 00 00
> > 0x00007fba3f56caa0: 90 98 02 38 ba 7f 00 00 80 a3 02 38 ba 7f 00 00
> > 0x00007fba3f56cab0: e0 a3 02 38 ba 7f 00 00 f0 a3 02 38 ba 7f 00 00
> > 0x00007fba3f56cac0: c8 a4 02 38 ba 7f 00 00 d8 00 00 00 00 00 00 00
> > 0x00007fba3f56cad0: 60 a6 02 38 ba 7f 00 00 a0 e4 89 3e ba 7f 00 00
> > 0x00007fba3f56cae0: 90 98 02 38 ba 7f 00 00 90 ac 02 38 ba 7f 00 00
> > 0x00007fba3f56caf0: 78 05 40 f0 b9 7f 00 00 00 00 00 00 00 00 00 00
> > 0x00007fba3f56cb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >
> > David
> > -----
> >
> >>
> >> >
> >> > Cheers,
> >> > David
> >> >
> >> > > On Mon, Aug 15, 2022 at 7:40 AM Thomas Stüfe
> >> > <thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com> <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>>
> >> <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com> <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>>>
> >> > > <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>
> >> <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>>
> >> > <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>
> >> <mailto:thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>>>>> wrote:
> >> > >
> >> > > Hi David,
> >> > >
> >> > > NMT detected a heap corruption or the
> pointer to
> >> be freed is
> >> > > invalid. And then the reporter itself crashed
> >> while
> >> > printing the
> >> > > report. That is strange, since the reporter
> >> should be
> >> > resilient
> >> > > against crashes. It uses
> os::print_hex_dump, which
> >> should use
> >> > > SafeFetch to check if the to-be-dumped info is
> >> accessible.
> >> > >
> >> > > Any more information?
> >> > >
> >> > > Cheers, Thomas
> >> > >
> >> > > On Mon, Aug 15, 2022 at 7:18 AM David Holmes
> >> > > <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> >> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> >> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>
> >> > <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> >> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> >> > <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> >> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>>>> wrote:
> >> > >
> >> > > I'm getting the below crash:
> >> > >
> >> > > #
> >> > > # A fatal error has been detected by
> the Java
> >> Runtime
> >> > Environment:
> >> > > #
> >> > > # SIGSEGV (0xb) at pc=0x00007f3e3e24b417,
> >> pid=4537,
> >> > tid=4538
> >> > > #
> >> > > # JRE version: Java(TM) SE Runtime
> Environment
> >> (20.0)
> >> > (fastdebug
> >> > > build
> >> > > 20-internal-2022-06-10-0001400.daholme...)
> >> > > # Java VM: Java HotSpot(TM) 64-Bit
> Server VM
> >> (fastdebug
> >> > > 20-internal-2022-06-10-0001400.daholme..., mixed
> >> > mode, sharing,
> >> > > tiered,
> >> > > compressed oops, compressed class
> ptrs, g1 gc,
> >> > linux-amd64)
> >> > > # Problematic frame:
> >> > > # V [libjvm.so+0x174b417]
> >> > outputStream::print_cr(char const*,
> >> > > ...)+0x67
> >> > >
> >> > > --------------- T H R E A D
> ---------------
> >> > >
> >> > > Current thread (0x00007f3e38029890):
> Thread
> >> "Unknown
> >> > thread"
> >> > > [stack:
> >> > > 0x00007f3e3f8a9000,0x00007f3e3f9aa000] [id=4538]
> >> > >
> >> > > Stack:
> >> [0x00007f3e3f8a9000,0x00007f3e3f9aa000],
> >> > > sp=0x00007f3e3f9a84b0,
> >> > > free space=1021k
> >> > > Native frames: (J=compiled Java code,
> >> j=interpreted,
> >> > Vv=VM code,
> >> > > C=native code)
> >> > > V [libjvm.so+0x174b417]
> >> outputStream::print_cr(char
> >> > const*,
> >> > > ...)+0x67
> >> > > V [libjvm.so+0x157d708]
> >> > > MallocHeader::print_block_on_error(outputStream*,
> >> > unsigned
> >> > > char*) const+0x38
> >> > > V [libjvm.so+0x157f2ef]
> >> > MallocHeader::assert_block_integrity()
> >> > > const+0x10f
> >> > > V [libjvm.so+0x157eb6b]
> >> > MallocTracker::record_free(void*)+0x3b
> >> > > V [libjvm.so+0x172d3ef]
> os::free(void*)+0xbf
> >> > > V [libjvm.so+0x1a6d77a]
> >> Thread::~Thread()+0x8a
> >> > > V [libjvm.so+0x1060292]
> >> JavaThread::~JavaThread()+0x12
> >> > > V [libjvm.so+0x1a8b174]
> >> Threads::destroy_vm()+0x264
> >> > >
> >> > > Any tips on trying to figure out what
> is going
> >> wrong?
> >> > >
> >> > > Thanks,
> >> > > David
> >> > >
> >> >
> >>
>
More information about the hotspot-runtime-dev
mailing list