RFC: call report_java_out_of_memory_error() for -XX:AbortVMOnException=java.lang.OutOfMemoryError

Liu, Xin xxinliu at amazon.com
Wed Sep 8 18:48:06 UTC 2021


hi, Volker,

I think it's possible to allow OnError=jcmd to attach to the parent
process. HotSpot defines OnError as "Run user-defined commands on fatal
error; see VMError.cpp for examples". It's a callback for fatal errors.
fatal() means HotSpot starts aborting and does not notify other threads.
 In other words, other threads are in normal states. I observe so in
gdb.  All other threads are successful to enter safepoint safe state
except the main java thread.

I am not advocating using jcmd %p in OnError. I am exploring a
possibility. Currently, we will end up a deadlock if you do so. If we
made it, we would at least get more information from Hotspot. We will
fail-fast if something bad happen. I think fail-fast is still better
than a hanging process.

thanks,
--lx




On 9/8/21 10:35 AM, Volker Simonis wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> I'm not sure if running a jcmd process which attaches to the dying VM
> as part of the OnError scripts is a use case we really want to
> support?
> 
> There's a reason why the VM is crashing and attaching to this dying VM
> will most probably only cause other follow-up errors.
> 
> On Wed, Sep 8, 2021 at 7:22 PM Liu, Xin <xxinliu at amazon.com> wrote:
>>
>> Hi, David,
>>
>> Thanks for the head-up. yes, it works for me.
>>
>> There's one more thing. One drawback is that the script providing to
>> OnError can't trap hotspot itself or we end up with a deadlock.
>>
>>
>> If we use 'jcmd %p Thread.print' or 'jcmd %p GC.heap_dump <file>' in
>> OnError=, (%p means the java process itself), the main java thread which
>> is waiting for os::fork_and_exec(cmd) will prevent hotspot reach to the
>> safepoint. It's deadlock because no safepoint mean fork_and_exec can't
>> complete.
>>
>> eg.
>> $java -Xmx50m -XX:AbortVMOnException=java.lang.OutOfMemoryError
>> -XX:OnError='jcmd %p Thread.print' -XX:+SafepointTimeout OomDumpExample
>> direct
>> # To suppress the following error report, specify this argument
>> # after -XX: or in .hotspotrc:  SuppressErrorAt=/exceptions.cpp:541
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error
>> (/home/xxinliu/Devel/jdk/src/hotspot/share/utilities/exceptions.cpp:541), pid=107552,
>> tid=107553
>> #  fatal error: Saw java.lang.OutOfMemoryError, aborting
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (slowdebug build
>> 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (slowdebug
>> 18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops,
>> compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # V  [libjvm.so+0x924e8c]  Exceptions::debug_check_abort(char const*,
>> char const*)+0x8a
>> #
>> # No core dump will be written. Core dumps have been disabled. To enable
>> core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid107552.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   https://bugreport.java.com/bugreport/crash.jsp
>> #
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 107552 Thread.print" ...
>> 107552:
>> [13.045s][warning][safepoint]
>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: Timeout
>> detected:
>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: Timed out
>> while spinning to reach a safepoint.
>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: Threads
>> which did not reach the safepoint:
>> [13.045s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=1552.12ms
>> elapsed=13.04s tid=0x00007f43600278e0 nid=107553 runnable
>> [0x00007f4369d9f000]
>> [13.045s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [13.045s][warning][safepoint] Thread: 0x00007f43600278e0  [0x1a421]
>> State: _running _at_poll_safepoint 0
>> [13.045s][warning][safepoint]    JavaThread state: _thread_in_vm
>> [13.045s][warning][safepoint]
>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>>
>>
>> I haven't figured out how yet, but I think I can lift this constraint.
>> Once I did, OnError would have more freedom to dump thread or heap
>> before dieing. Can I file bug about this?
>>
>> thanks,
>> --lx
>>
>>
>> On 8/30/21 9:26 PM, David Holmes wrote:
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> Hi,
>>>
>>> On 28/08/2021 4:54 am, Liu, Xin wrote:
>>>> Hi,
>>>>
>>>> Recently I revisit JDK-8155004/JDK-8257790 because a new team trip over.
>>>> -XX:AbortVMOnException=java.lang.OutOfMemoryError works. I wonder
>>>> whether it is a good idea to call report_java_out_of_memory_error() when
>>>> OOME is trapped. In this way, HotSpot will trigger OnOutOfMemoryError
>>>> callbacks.
>>>
>>> Why not just use AbortVMOnException together with OnError to get the
>>> callbacks?
>>>
>>> Cheers,
>>> David
>>>
>>>> I understand JDK-8257790 is not a bug. I don't want to overthrow that
>>>> conclusion. I just wonder if we can handle it better in the presence of
>>>> -XX:AbortVMOnException=java.lang.OutOfMemoryError.
>>>>
>>>> For Java webservers, OOME may lead to a zombie process. We may have a
>>>> bug in code or indeed run out of memory. OOME is suppressed or terminate
>>>> the thread but don't terminate the java process. eg.
>>>>
>>>> public class Main {
>>>>      volatile static boolean done = false;
>>>>
>>>>      public static void main(String[] args) {
>>>>          String msg = "a long long message.";
>>>>          // write your code here
>>>>          Runnable runnable = () -> {
>>>>              int cnt = Integer.MAX_VALUE / msg.length() + 1;
>>>>              //it will throw a OutOfMemoryError.
>>>>              msg.repeat(cnt);
>>>>              done = true;
>>>>          };
>>>>
>>>>          Thread thread = new Thread(runnable);
>>>>          thread.start();
>>>>          while(!done) {
>>>>          } // this simulates the main loop of event handling
>>>>      }
>>>> }
>>>>
>>>> Java developers can use
>>>> -XX:AbortVMOnException=java.lang.OutOfMemoryError to exercise fail-fast
>>>> principle. Java web application which handle traffics are usually
>>>> distributed in a cluster. A failure of a single host usually is not a
>>>> big deal. As long as java exits, it's easy to restart and backfill it.
>>>>
>>>> My proposing change is very simple. Just call
>>>> report_java_out_of_memory() if value_string is OOME. It's no-op if users
>>>> never specify anything. If they do specify flags like
>>>> Crash/ExitOnOutOfMemory,  OnOutOfMemoryError or
>>>> HeapDumpOnOutOfMemoryError, HotSpot will let report_java_out_of_memory
>>>> does the cleanup job. fatal() works but too brutal. I think we should
>>>> let java exits with error code.
>>>>
>>>>
>>>> diff --git a/src/hotspot/share/utilities/exceptions.cpp
>>>> b/src/hotspot/share/utilities/exceptions.cpp
>>>> index bd95b8306be..fd8a83deaf3 100644
>>>> --- a/src/hotspot/share/utilities/exceptions.cpp
>>>> +++ b/src/hotspot/share/utilities/exceptions.cpp
>>>> @@ -538,6 +538,9 @@ void Exceptions::debug_check_abort(const char
>>>> *value_string, const char* message
>>>>         strstr(value_string, AbortVMOnException)) {
>>>>       if (AbortVMOnExceptionMessage == NULL || (message != NULL &&
>>>>           strstr(message, AbortVMOnExceptionMessage))) {
>>>> +      if(!strcmp(value_string, "java.lang.OutOfMemoryError")) {
>>>> +        report_java_out_of_memory(message);
>>>> +      }
>>>>         fatal("Saw %s, aborting", value_string);
>>>>       }
>>>>     }
>>>>
>>>>
>>>> thanks,
>>>> --lx
>>>>


More information about the hotspot-runtime-dev mailing list