RFC: call report_java_out_of_memory_error() for -XX:AbortVMOnException=java.lang.OutOfMemoryError
Liu, Xin
xxinliu at amazon.com
Wed Sep 8 17:21:25 UTC 2021
Hi, David,
Thanks for the head-up. yes, it works for me.
There's one more thing. One drawback is that the script providing to
OnError can't trap hotspot itself or we end up with a deadlock.
If we use 'jcmd %p Thread.print' or 'jcmd %p GC.heap_dump <file>' in
OnError=, (%p means the java process itself), the main java thread which
is waiting for os::fork_and_exec(cmd) will prevent hotspot reach to the
safepoint. It's deadlock because no safepoint mean fork_and_exec can't
complete.
eg.
$java -Xmx50m -XX:AbortVMOnException=java.lang.OutOfMemoryError
-XX:OnError='jcmd %p Thread.print' -XX:+SafepointTimeout OomDumpExample
direct
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=/exceptions.cpp:541
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error
(/home/xxinliu/Devel/jdk/src/hotspot/share/utilities/exceptions.cpp:541), pid=107552,
tid=107553
# fatal error: Saw java.lang.OutOfMemoryError, aborting
#
# JRE version: OpenJDK Runtime Environment (18.0) (slowdebug build
18-internal+0-adhoc.xxinliu.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug
18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops,
compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x924e8c] Exceptions::debug_check_abort(char const*,
char const*)+0x8a
#
# No core dump will be written. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /local/home/xxinliu/JDK-2085/hs_err_pid107552.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
#
# -XX:OnError="jcmd %p Thread.print"
# Executing /bin/sh -c "jcmd 107552 Thread.print" ...
107552:
[13.045s][warning][safepoint]
[13.045s][warning][safepoint] # SafepointSynchronize::begin: Timeout
detected:
[13.045s][warning][safepoint] # SafepointSynchronize::begin: Timed out
while spinning to reach a safepoint.
[13.045s][warning][safepoint] # SafepointSynchronize::begin: Threads
which did not reach the safepoint:
[13.045s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=1552.12ms
elapsed=13.04s tid=0x00007f43600278e0 nid=107553 runnable
[0x00007f4369d9f000]
[13.045s][warning][safepoint] java.lang.Thread.State: RUNNABLE
[13.045s][warning][safepoint] Thread: 0x00007f43600278e0 [0x1a421]
State: _running _at_poll_safepoint 0
[13.045s][warning][safepoint] JavaThread state: _thread_in_vm
[13.045s][warning][safepoint]
[13.045s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
I haven't figured out how yet, but I think I can lift this constraint.
Once I did, OnError would have more freedom to dump thread or heap
before dieing. Can I file bug about this?
thanks,
--lx
On 8/30/21 9:26 PM, David Holmes wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Hi,
>
> On 28/08/2021 4:54 am, Liu, Xin wrote:
>> Hi,
>>
>> Recently I revisit JDK-8155004/JDK-8257790 because a new team trip over.
>> -XX:AbortVMOnException=java.lang.OutOfMemoryError works. I wonder
>> whether it is a good idea to call report_java_out_of_memory_error() when
>> OOME is trapped. In this way, HotSpot will trigger OnOutOfMemoryError
>> callbacks.
>
> Why not just use AbortVMOnException together with OnError to get the
> callbacks?
>
> Cheers,
> David
>
>> I understand JDK-8257790 is not a bug. I don't want to overthrow that
>> conclusion. I just wonder if we can handle it better in the presence of
>> -XX:AbortVMOnException=java.lang.OutOfMemoryError.
>>
>> For Java webservers, OOME may lead to a zombie process. We may have a
>> bug in code or indeed run out of memory. OOME is suppressed or terminate
>> the thread but don't terminate the java process. eg.
>>
>> public class Main {
>> volatile static boolean done = false;
>>
>> public static void main(String[] args) {
>> String msg = "a long long message.";
>> // write your code here
>> Runnable runnable = () -> {
>> int cnt = Integer.MAX_VALUE / msg.length() + 1;
>> //it will throw a OutOfMemoryError.
>> msg.repeat(cnt);
>> done = true;
>> };
>>
>> Thread thread = new Thread(runnable);
>> thread.start();
>> while(!done) {
>> } // this simulates the main loop of event handling
>> }
>> }
>>
>> Java developers can use
>> -XX:AbortVMOnException=java.lang.OutOfMemoryError to exercise fail-fast
>> principle. Java web application which handle traffics are usually
>> distributed in a cluster. A failure of a single host usually is not a
>> big deal. As long as java exits, it's easy to restart and backfill it.
>>
>> My proposing change is very simple. Just call
>> report_java_out_of_memory() if value_string is OOME. It's no-op if users
>> never specify anything. If they do specify flags like
>> Crash/ExitOnOutOfMemory, OnOutOfMemoryError or
>> HeapDumpOnOutOfMemoryError, HotSpot will let report_java_out_of_memory
>> does the cleanup job. fatal() works but too brutal. I think we should
>> let java exits with error code.
>>
>>
>> diff --git a/src/hotspot/share/utilities/exceptions.cpp
>> b/src/hotspot/share/utilities/exceptions.cpp
>> index bd95b8306be..fd8a83deaf3 100644
>> --- a/src/hotspot/share/utilities/exceptions.cpp
>> +++ b/src/hotspot/share/utilities/exceptions.cpp
>> @@ -538,6 +538,9 @@ void Exceptions::debug_check_abort(const char
>> *value_string, const char* message
>> strstr(value_string, AbortVMOnException)) {
>> if (AbortVMOnExceptionMessage == NULL || (message != NULL &&
>> strstr(message, AbortVMOnExceptionMessage))) {
>> + if(!strcmp(value_string, "java.lang.OutOfMemoryError")) {
>> + report_java_out_of_memory(message);
>> + }
>> fatal("Saw %s, aborting", value_string);
>> }
>> }
>>
>>
>> thanks,
>> --lx
>>
More information about the hotspot-runtime-dev
mailing list