[EXTERNAL] RFC: call report_java_out_of_memory_error() for -XX:AbortVMOnException=java.lang.OutOfMemoryError

David Holmes david.holmes at oracle.com
Wed Sep 8 22:08:07 UTC 2021


On 9/09/2021 4:48 am, Liu, Xin wrote:
> hi, Volker,
> 
> I think it's possible to allow OnError=jcmd to attach to the parent
> process. HotSpot defines OnError as "Run user-defined commands on fatal
> error; see VMError.cpp for examples". It's a callback for fatal errors.
> fatal() means HotSpot starts aborting and does not notify other threads.
>   In other words, other threads are in normal states. I observe so in
> gdb.  All other threads are successful to enter safepoint safe state
> except the main java thread.
> 
> I am not advocating using jcmd %p in OnError. I am exploring a
> possibility. Currently, we will end up a deadlock if you do so. If we
> made it, we would at least get more information from Hotspot. We will
> fail-fast if something bad happen. I think fail-fast is still better
> than a hanging process.

It may be possible to fix the safepoint deadlock by transitioning the 
thread executing the OnError command to _thread_in_native beforehand - 
but there are constraints on doing that e.g. no oocks can be held. And 
if the error is processed in the VMThread then there is nothing that can 
be done.

But I agree with Volker that given a fatal error has been encountered, 
trying to report other information about the VM in a live manner is 
fraught with peril. Using a core dump for post mortem analysis would 
probably be better.

Cheers,
David

> thanks,
> --lx
> 
> 
> 
> 
> On 9/8/21 10:35 AM, Volker Simonis wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> I'm not sure if running a jcmd process which attaches to the dying VM
>> as part of the OnError scripts is a use case we really want to
>> support?
>>
>> There's a reason why the VM is crashing and attaching to this dying VM
>> will most probably only cause other follow-up errors.
>>
>> On Wed, Sep 8, 2021 at 7:22 PM Liu, Xin <xxinliu at amazon.com> wrote:
>>>
>>> Hi, David,
>>>
>>> Thanks for the head-up. yes, it works for me.
>>>
>>> There's one more thing. One drawback is that the script providing to
>>> OnError can't trap hotspot itself or we end up with a deadlock.
>>>
>>>
>>> If we use 'jcmd %p Thread.print' or 'jcmd %p GC.heap_dump <file>' in
>>> OnError=, (%p means the java process itself), the main java thread which
>>> is waiting for os::fork_and_exec(cmd) will prevent hotspot reach to the
>>> safepoint. It's deadlock because no safepoint mean fork_and_exec can't
>>> complete.
>>>
>>> eg.
>>> $java -Xmx50m -XX:AbortVMOnException=java.lang.OutOfMemoryError
>>> -XX:OnError='jcmd %p Thread.print' -XX:+SafepointTimeout OomDumpExample
>>> direct
>>> # To suppress the following error report, specify this argument
>>> # after -XX: or in .hotspotrc:  SuppressErrorAt=/exceptions.cpp:541
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  Internal Error
>>> (/home/xxinliu/Devel/jdk/src/hotspot/share/utilities/exceptions.cpp:541), pid=107552,
>>> tid=107553
>>> #  fatal error: Saw java.lang.OutOfMemoryError, aborting
>>> #
>>> # JRE version: OpenJDK Runtime Environment (18.0) (slowdebug build
>>> 18-internal+0-adhoc.xxinliu.jdk)
>>> # Java VM: OpenJDK 64-Bit Server VM (slowdebug
>>> 18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops,
>>> compressed class ptrs, g1 gc, linux-amd64)
>>> # Problematic frame:
>>> # V  [libjvm.so+0x924e8c]  Exceptions::debug_check_abort(char const*,
>>> char const*)+0x8a
>>> #
>>> # No core dump will be written. Core dumps have been disabled. To enable
>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>> #
>>> # An error report file with more information is saved as:
>>> # /local/home/xxinliu/JDK-2085/hs_err_pid107552.log
>>> #
>>> # If you would like to submit a bug report, please visit:
>>> #   https://bugreport.java.com/bugreport/crash.jsp
>>> #
>>> #
>>> # -XX:OnError="jcmd %p Thread.print"
>>> #   Executing /bin/sh -c "jcmd 107552 Thread.print" ...
>>> 107552:
>>> [13.045s][warning][safepoint]
>>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: Timeout
>>> detected:
>>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: Timed out
>>> while spinning to reach a safepoint.
>>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: Threads
>>> which did not reach the safepoint:
>>> [13.045s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=1552.12ms
>>> elapsed=13.04s tid=0x00007f43600278e0 nid=107553 runnable
>>> [0x00007f4369d9f000]
>>> [13.045s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>>> [13.045s][warning][safepoint] Thread: 0x00007f43600278e0  [0x1a421]
>>> State: _running _at_poll_safepoint 0
>>> [13.045s][warning][safepoint]    JavaThread state: _thread_in_vm
>>> [13.045s][warning][safepoint]
>>> [13.045s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>>>
>>>
>>> I haven't figured out how yet, but I think I can lift this constraint.
>>> Once I did, OnError would have more freedom to dump thread or heap
>>> before dieing. Can I file bug about this?
>>>
>>> thanks,
>>> --lx
>>>
>>>
>>> On 8/30/21 9:26 PM, David Holmes wrote:
>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On 28/08/2021 4:54 am, Liu, Xin wrote:
>>>>> Hi,
>>>>>
>>>>> Recently I revisit JDK-8155004/JDK-8257790 because a new team trip over.
>>>>> -XX:AbortVMOnException=java.lang.OutOfMemoryError works. I wonder
>>>>> whether it is a good idea to call report_java_out_of_memory_error() when
>>>>> OOME is trapped. In this way, HotSpot will trigger OnOutOfMemoryError
>>>>> callbacks.
>>>>
>>>> Why not just use AbortVMOnException together with OnError to get the
>>>> callbacks?
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> I understand JDK-8257790 is not a bug. I don't want to overthrow that
>>>>> conclusion. I just wonder if we can handle it better in the presence of
>>>>> -XX:AbortVMOnException=java.lang.OutOfMemoryError.
>>>>>
>>>>> For Java webservers, OOME may lead to a zombie process. We may have a
>>>>> bug in code or indeed run out of memory. OOME is suppressed or terminate
>>>>> the thread but don't terminate the java process. eg.
>>>>>
>>>>> public class Main {
>>>>>       volatile static boolean done = false;
>>>>>
>>>>>       public static void main(String[] args) {
>>>>>           String msg = "a long long message.";
>>>>>           // write your code here
>>>>>           Runnable runnable = () -> {
>>>>>               int cnt = Integer.MAX_VALUE / msg.length() + 1;
>>>>>               //it will throw a OutOfMemoryError.
>>>>>               msg.repeat(cnt);
>>>>>               done = true;
>>>>>           };
>>>>>
>>>>>           Thread thread = new Thread(runnable);
>>>>>           thread.start();
>>>>>           while(!done) {
>>>>>           } // this simulates the main loop of event handling
>>>>>       }
>>>>> }
>>>>>
>>>>> Java developers can use
>>>>> -XX:AbortVMOnException=java.lang.OutOfMemoryError to exercise fail-fast
>>>>> principle. Java web application which handle traffics are usually
>>>>> distributed in a cluster. A failure of a single host usually is not a
>>>>> big deal. As long as java exits, it's easy to restart and backfill it.
>>>>>
>>>>> My proposing change is very simple. Just call
>>>>> report_java_out_of_memory() if value_string is OOME. It's no-op if users
>>>>> never specify anything. If they do specify flags like
>>>>> Crash/ExitOnOutOfMemory,  OnOutOfMemoryError or
>>>>> HeapDumpOnOutOfMemoryError, HotSpot will let report_java_out_of_memory
>>>>> does the cleanup job. fatal() works but too brutal. I think we should
>>>>> let java exits with error code.
>>>>>
>>>>>
>>>>> diff --git a/src/hotspot/share/utilities/exceptions.cpp
>>>>> b/src/hotspot/share/utilities/exceptions.cpp
>>>>> index bd95b8306be..fd8a83deaf3 100644
>>>>> --- a/src/hotspot/share/utilities/exceptions.cpp
>>>>> +++ b/src/hotspot/share/utilities/exceptions.cpp
>>>>> @@ -538,6 +538,9 @@ void Exceptions::debug_check_abort(const char
>>>>> *value_string, const char* message
>>>>>          strstr(value_string, AbortVMOnException)) {
>>>>>        if (AbortVMOnExceptionMessage == NULL || (message != NULL &&
>>>>>            strstr(message, AbortVMOnExceptionMessage))) {
>>>>> +      if(!strcmp(value_string, "java.lang.OutOfMemoryError")) {
>>>>> +        report_java_out_of_memory(message);
>>>>> +      }
>>>>>          fatal("Saw %s, aborting", value_string);
>>>>>        }
>>>>>      }
>>>>>
>>>>>
>>>>> thanks,
>>>>> --lx
>>>>>


More information about the hotspot-runtime-dev mailing list