RFC: call report_java_out_of_memory_error() for -XX:AbortVMOnException=java.lang.OutOfMemoryError

Liu, Xin xxinliu at amazon.com
Wed Sep 8 17:21:25 UTC 2021


Hi, David,

Thanks for the head-up. yes, it works for me.

There's one more thing. One drawback is that the script providing to
OnError can't trap hotspot itself or we end up with a deadlock.


If we use 'jcmd %p Thread.print' or 'jcmd %p GC.heap_dump <file>' in
OnError=, (%p means the java process itself), the main java thread which
is waiting for os::fork_and_exec(cmd) will prevent hotspot reach to the
safepoint. It's deadlock because no safepoint mean fork_and_exec can't
complete.

eg.
$java -Xmx50m -XX:AbortVMOnException=java.lang.OutOfMemoryError
-XX:OnError='jcmd %p Thread.print' -XX:+SafepointTimeout OomDumpExample
direct
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/exceptions.cpp:541
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error
(/home/xxinliu/Devel/jdk/src/hotspot/share/utilities/exceptions.cpp:541), pid=107552,
tid=107553
#  fatal error: Saw java.lang.OutOfMemoryError, aborting
#
# JRE version: OpenJDK Runtime Environment (18.0) (slowdebug build
18-internal+0-adhoc.xxinliu.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug
18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops,
compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x924e8c]  Exceptions::debug_check_abort(char const*,
char const*)+0x8a
#
# No core dump will be written. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /local/home/xxinliu/JDK-2085/hs_err_pid107552.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
#
# -XX:OnError="jcmd %p Thread.print"
#   Executing /bin/sh -c "jcmd 107552 Thread.print" ...
107552:
[13.045s][warning][safepoint]
[13.045s][warning][safepoint] # SafepointSynchronize::begin: Timeout
detected:
[13.045s][warning][safepoint] # SafepointSynchronize::begin: Timed out
while spinning to reach a safepoint.
[13.045s][warning][safepoint] # SafepointSynchronize::begin: Threads
which did not reach the safepoint:
[13.045s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=1552.12ms
elapsed=13.04s tid=0x00007f43600278e0 nid=107553 runnable
[0x00007f4369d9f000]
[13.045s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
[13.045s][warning][safepoint] Thread: 0x00007f43600278e0  [0x1a421]
State: _running _at_poll_safepoint 0
[13.045s][warning][safepoint]    JavaThread state: _thread_in_vm
[13.045s][warning][safepoint]
[13.045s][warning][safepoint] # SafepointSynchronize::begin: (End of list)


I haven't figured out how yet, but I think I can lift this constraint.
Once I did, OnError would have more freedom to dump thread or heap
before dieing. Can I file bug about this?

thanks,
--lx


On 8/30/21 9:26 PM, David Holmes wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hi,
> 
> On 28/08/2021 4:54 am, Liu, Xin wrote:
>> Hi,
>>
>> Recently I revisit JDK-8155004/JDK-8257790 because a new team trip over.
>> -XX:AbortVMOnException=java.lang.OutOfMemoryError works. I wonder
>> whether it is a good idea to call report_java_out_of_memory_error() when
>> OOME is trapped. In this way, HotSpot will trigger OnOutOfMemoryError
>> callbacks.
> 
> Why not just use AbortVMOnException together with OnError to get the
> callbacks?
> 
> Cheers,
> David
> 
>> I understand JDK-8257790 is not a bug. I don't want to overthrow that
>> conclusion. I just wonder if we can handle it better in the presence of
>> -XX:AbortVMOnException=java.lang.OutOfMemoryError.
>>
>> For Java webservers, OOME may lead to a zombie process. We may have a
>> bug in code or indeed run out of memory. OOME is suppressed or terminate
>> the thread but don't terminate the java process. eg.
>>
>> public class Main {
>>      volatile static boolean done = false;
>>
>>      public static void main(String[] args) {
>>          String msg = "a long long message.";
>>          // write your code here
>>          Runnable runnable = () -> {
>>              int cnt = Integer.MAX_VALUE / msg.length() + 1;
>>              //it will throw a OutOfMemoryError.
>>              msg.repeat(cnt);
>>              done = true;
>>          };
>>
>>          Thread thread = new Thread(runnable);
>>          thread.start();
>>          while(!done) {
>>          } // this simulates the main loop of event handling
>>      }
>> }
>>
>> Java developers can use
>> -XX:AbortVMOnException=java.lang.OutOfMemoryError to exercise fail-fast
>> principle. Java web application which handle traffics are usually
>> distributed in a cluster. A failure of a single host usually is not a
>> big deal. As long as java exits, it's easy to restart and backfill it.
>>
>> My proposing change is very simple. Just call
>> report_java_out_of_memory() if value_string is OOME. It's no-op if users
>> never specify anything. If they do specify flags like
>> Crash/ExitOnOutOfMemory,  OnOutOfMemoryError or
>> HeapDumpOnOutOfMemoryError, HotSpot will let report_java_out_of_memory
>> does the cleanup job. fatal() works but too brutal. I think we should
>> let java exits with error code.
>>
>>
>> diff --git a/src/hotspot/share/utilities/exceptions.cpp
>> b/src/hotspot/share/utilities/exceptions.cpp
>> index bd95b8306be..fd8a83deaf3 100644
>> --- a/src/hotspot/share/utilities/exceptions.cpp
>> +++ b/src/hotspot/share/utilities/exceptions.cpp
>> @@ -538,6 +538,9 @@ void Exceptions::debug_check_abort(const char
>> *value_string, const char* message
>>         strstr(value_string, AbortVMOnException)) {
>>       if (AbortVMOnExceptionMessage == NULL || (message != NULL &&
>>           strstr(message, AbortVMOnExceptionMessage))) {
>> +      if(!strcmp(value_string, "java.lang.OutOfMemoryError")) {
>> +        report_java_out_of_memory(message);
>> +      }
>>         fatal("Saw %s, aborting", value_string);
>>       }
>>     }
>>
>>
>> thanks,
>> --lx
>>


More information about the hotspot-runtime-dev mailing list