<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi again,</p>

    <p>I just realized that I made a typo in the reproduction repository

      link. This is the right one:</p>

    <p>   

      <a class="moz-txt-link-freetext" href="https://github.com/atorrescogollo/poc-jdk-sigabrt-coredump-bug">https://github.com/atorrescogollo/poc-jdk-sigabrt-coredump-bug</a></p>

    <p>Sorry about that.</p>

    <p>Álvaro</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 12/2/26 18:04, Álvaro Torres Cogollo

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:c9a85c1c-103a-43b0-92de-2bdee85a33eb@gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      Hi,<br>

      <br>

      We've been hitting a problem in production that I think might be a

      bug in hotspot's signal handling. Let me know if this should go

      somewhere else.<br>

      <br>

      The issue is that when a native library crashes due to memory

      corruption (like an invalid free() call), the JVM exits

      immediately without generating any core dump or error report, even

      though we have -XX:+CreateCoredumpOnCrash enabled.<br>

      <br>

      Here's what we're seeing when it crashes:

      <pre>    munmap_chunk(): invalid pointer</pre>

      <br>

      Or when using tcmalloc:

      <pre>    src/tcmalloc.cc:333] Attempt to free invalid pointer 0xffff38000b60</pre>

      <br>

      We're running with:

      <pre>    JAVA_TOOL_OPTIONS=-XX:+CreateCoredumpOnCrash -XX:ErrorFile=/core-dumps/hs_err_pid%p.log</pre>

      <br>

      But when these crashes happen, we get nothing - just the error

      message above and the process dies. This makes debugging really

      difficult, especially since the crashes happen randomly in

      production.<br>

      <br>

      After digging through the hotspot source, I noticed that signal

      handlers are installed for SIGSEGV, SIGBUS, SIGFPE, etc., but not

      for SIGABRT:<br>

      <br>

          <a class="moz-txt-link-freetext"

href="https://github.com/openjdk/jdk/blob/37dc1be67d4c15a040dc99dbc105c3269c65063d/src/hotspot/os/posix/signals_posix.cpp#L1352-L1358"

        moz-do-not-send="true">https://github.com/openjdk/jdk/blob/37dc1be67d4c15a040dc99dbc105c3269c65063d/src/hotspot/os/posix/signals_posix.cpp#L1352-L1358</a><br>

      <br>

      When glibc detects the memory corruption, it calls abort() which

      raises SIGABRT. Since there's no handler for it, the JVM can't

      catch it and generate the diagnostics.<br>

      <br>

      To demonstrate the issue, I put together a small reproduction

      case:<br>

      <br>

          <a class="moz-txt-link-freetext"

href="https://github.com/atorrescogollo/poc-jdk-sigabrt-coredump-handling"

        moz-do-not-send="true">https://github.com/atorrescogollo/poc-jdk-sigabrt-coredump-handling</a><br>

      <br>

      The repo has a Spring Boot app with three endpoints that show the

      problem:<br>

      <br>

      1. /crash/unsafe - Uses Java Unsafe to write to address 0<br>

         Result: SIGSEGV -> Works correctly, generates hs_err file<br>

      <br>

      2. /crash/null - JNI code that dereferences a null pointer<br>

         Result: SIGSEGV -> Works correctly, generates hs_err file<br>

      <br>

      3. /crash/free - JNI code that calls free() on a stack variable<br>

         Result: SIGABRT -> BROKEN, just prints "munmap_chunk():

      invalid pointer" and dies<br>

      <br>

      You can reproduce it with:

      <pre>    docker-compose up -d

    curl localhost:8080/crash/free

    docker-compose logs</pre>

      <br>

      And you'll see it just prints the error and exits, no hs_err file

      gets created.<br>

      <br>

      I also tested a potential fix by adding SIGABRT handling to

      hotspot. With that change, scenario 3 correctly generates an

      hs_err file and core dump. The patch basically:<br>

      <p>    <a class="moz-txt-link-freetext"

href="https://github.com/atorrescogollo/poc-jdk-sigabrt-coredump-bug/blob/main/jdk17.patch"

          moz-do-not-send="true">https://github.com/atorrescogollo/poc-jdk-sigabrt-coredump-bug/blob/main/jdk17.patch</a></p>

      - Adds set_signal_handler(SIGABRT) in signals_posix.cpp<br>

      - Resets SIGABRT to SIG_DFL before calling abort() in os_posix.cpp

      to avoid recursive handling<br>

      <br>

      After applying it, the /crash/free endpoint generates proper

      diagnostics:

      <pre>    # SIGABRT (0x6) at pc=0x0000ffffbd177608 (sent by kill), pid=1, tid=41

    # Problematic frame:

    # C  [libc.so.6+0x87608]

    # Core dump will be written. Default location: //core

    # An error report file with more information is saved as:

    # /core-dumps/java_error1.log</pre>

      <br>

      I'm not sure if there's a specific reason why SIGABRT isn't

      handled currently. If there is, are there any alternative

      approaches to capture diagnostics when native libraries trigger

      abort()? For us and probably others dealing with native library

      bugs in production, having some way to get these diagnostics would

      be really valuable.<br>

      <br>

      Thanks,<br>

      <br>

      Álvaro<br>

      <br>

    </blockquote>

  </body>

</html>