Emergency JFR dump at OOME

Yasumasa Suenaga yasuenag at gmail.com
Thu Nov 1 03:15:39 UTC 2018


Hi Erik,

> If a user on the other hand has specified a flag that the JVM should exit on OOME, then it makes sense to dump a recording with the old object events and shortest path-to-gc-root.
> 
> That part seems to be missing.

I tried to add a flag `EmitLeakProfilerEventsOnOOME` to control it as below.
It works fine on my environment.

```
diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.cpp
--- a/src/hotspot/share/jfr/jfr.cpp     Thu Nov 01 02:12:13 2018 +0100
+++ b/src/hotspot/share/jfr/jfr.cpp     Thu Nov 01 12:13:17 2018 +0900
@@ -91,6 +91,10 @@
    return JfrOptionSet::parse_start_flight_recording_option(option, delimiter);
  }

+void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool emit_all) {
+  LeakProfiler::emit_events(cutoff_ticks, emit_all);
+}
+
  Thread* Jfr::sampler_thread() {
    return JfrThreadSampling::sampler_thread();
  }
diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.hpp
--- a/src/hotspot/share/jfr/jfr.hpp     Thu Nov 01 02:12:13 2018 +0100
+++ b/src/hotspot/share/jfr/jfr.hpp     Thu Nov 01 12:13:17 2018 +0900
@@ -52,6 +52,7 @@
    static bool on_flight_recorder_option(const JavaVMOption** option, char* delimiter);
    static bool on_start_flight_recording_option(const JavaVMOption** option, char* delimiter);
    static void weak_oops_do(BoolObjectClosure* is_alive, OopClosure* f);
+  static void emit_leak_profiler_events(jlong cutoff_ticks, bool emit_all);
    static Thread* sampler_thread();
  };

diff -r 3a8208766f7b src/hotspot/share/runtime/globals.hpp
--- a/src/hotspot/share/runtime/globals.hpp     Thu Nov 01 02:12:13 2018 +0100
+++ b/src/hotspot/share/runtime/globals.hpp     Thu Nov 01 12:13:17 2018 +0900
@@ -2596,6 +2596,9 @@
    JFR_ONLY(product(ccstr, StartFlightRecording, NULL,                       \
            "Start flight recording with options"))                           \
                                                                              \
+  JFR_ONLY(product(bool, EmitLeakProfilerEventsOnOOME, false,               \
+          "Emit LeakProfiler events when OutOfMemoryError occurs"))         \
+                                                                            \
    experimental(bool, UseFastUnorderedTimeStamps, false,                     \
            "Use platform unstable time where supported for timestamps only")

diff -r 3a8208766f7b src/hotspot/share/utilities/debug.cpp
--- a/src/hotspot/share/utilities/debug.cpp     Thu Nov 01 02:12:13 2018 +0100
+++ b/src/hotspot/share/utilities/debug.cpp     Thu Nov 01 12:13:17 2018 +0900
@@ -58,6 +58,9 @@
  #include "utilities/globalDefinitions.hpp"
  #include "utilities/macros.hpp"
  #include "utilities/vmError.hpp"
+#if INCLUDE_JFR
+#include "jfr/jfr.hpp"
+#endif

  #include <stdio.h>

@@ -306,6 +309,13 @@
    // commands multiple times we just do it once when the first threads reports
    // the error.
    if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
+
+#if INCLUDE_JFR
+      if (EmitLeakProfilerEventsOnOOME) {
+        Jfr::emit_leak_profiler_events(max_jlong, false);
+      }
+#endif
+
      // create heap dump before OnOutOfMemoryError commands are executed
      if (HeapDumpOnOutOfMemoryError) {
        tty->print_cr("java.lang.OutOfMemoryError: %s", message);
```


Thanks,

Yasumasa


On 2018/11/01 8:47, Erik Gahlin wrote:
> 
> 
>> On 31 Oct 2018, at 03:58, Yasumasa Suenaga <yasuenag at gmail.com> wrote:
>>
>> 2018年10月31日(水) 0:35 Markus Gronlund <markus.gronlund at oracle.com>:
>>>
>>> I think I provided you with the wrong settings.
>>>
>>> Please change:
>>>
>>> JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>
>>> To
>>>
>>> JFR_ONLY(Jfr::emit_leak_profiler_events(max_jlong, false);)
>>>
>>> I think this will get you the GC roots as well.
>>
>> Thanks! It works fine.
>>
>>
>>> We need to think about if / how this should be integrated. If so, it might be that it needs to be guarded behind some flag to not always issue a full safepoint, root scanning and edge traversals on every OOME.
>>
>> Do you mean VM operation should be added for Jfr::emit_leak_profiler_events()?
>> I think it can do so because HeapDumper is called from this function.
> 
> When a recording ends, old object samples are written. They have the most up to date information about what is leaking. I don’t think we should emit old object events before that happens. It will make recordings harder to analyze and introduce an intrusive safepoint.
> 
> If a user on the other hand has specified a flag that the JVM should exit on OOME, then it makes sense to dump a recording with the old object events and shortest path-to-gc-root.
> 
> That part seems to be missing.
> 
> Erik
> 
> 
>>
>> Anyway, I want you to merge this change to JFR. :-)
>>
>>
>> Yasumasa
>>
>>
>>> Markus
>>>
>>>
>>> -----Original Message-----
>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>> Sent: den 30 oktober 2018 14:54
>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>> Subject: Re: Emergency JFR dump at OOME
>>>
>>> Thanks Markus!
>>> It works fine on my environment.
>>>
>>> Could you apply this change to JFR?
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> P.S.
>>>    I got flight record with path-to-gc-roots=true, however "GC Root" in JMC
>>>    is null. Is it correct?
>>>    (Application is OOME.java which I shared before)
>>>
>>>
>>> On 2018/10/30 21:01, Markus Gronlund wrote:
>>>> Hi again,
>>>>
>>>> Maybe you can try something like this:
>>>>
>>>> # HG changeset patch
>>>> # User mgronlun
>>>> # Date 1540900357 -3600
>>>> #      Tue Oct 30 12:52:37 2018 +0100
>>>> # Node ID 32a48c323970c5fc4d0d1ffff5860a3c55c4a4dc
>>>> # Parent  80d104390dd2821fd95d56981bf9d37f1cc2e363
>>>> [mq]: yasumasa
>>>>
>>>> diff --git a/src/hotspot/share/jfr/jfr.cpp
>>>> b/src/hotspot/share/jfr/jfr.cpp
>>>> --- a/src/hotspot/share/jfr/jfr.cpp
>>>> +++ b/src/hotspot/share/jfr/jfr.cpp
>>>> @@ -91,6 +91,10 @@
>>>>     return JfrOptionSet::parse_start_flight_recording_option(option, delimiter);
>>>>   }
>>>>
>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool
>>>> +emit_all) {
>>>> +  LeakProfiler::emit_events(cutoff_ticks, emit_all); }
>>>> +
>>>>   Thread* Jfr::sampler_thread() {
>>>>     return JfrThreadSampling::sampler_thread();
>>>>   }
>>>> diff --git a/src/hotspot/share/jfr/jfr.hpp
>>>> b/src/hotspot/share/jfr/jfr.hpp
>>>> --- a/src/hotspot/share/jfr/jfr.hpp
>>>> +++ b/src/hotspot/share/jfr/jfr.hpp
>>>> @@ -52,6 +52,7 @@
>>>>     static bool on_flight_recorder_option(const JavaVMOption** option, char* delimiter);
>>>>     static bool on_start_flight_recording_option(const JavaVMOption** option, char* delimiter);
>>>>     static void weak_oops_do(BoolObjectClosure* is_alive, OopClosure*
>>>> f);
>>>> +  static void emit_leak_profiler_events(jlong cutoff_ticks, bool
>>>> + emit_all);
>>>>     static Thread* sampler_thread();
>>>>   };
>>>>
>>>> diff --git a/src/hotspot/share/utilities/debug.cpp
>>>> b/src/hotspot/share/utilities/debug.cpp
>>>> --- a/src/hotspot/share/utilities/debug.cpp
>>>> +++ b/src/hotspot/share/utilities/debug.cpp
>>>> @@ -58,6 +58,9 @@
>>>>   #include "utilities/globalDefinitions.hpp"
>>>>   #include "utilities/macros.hpp"
>>>>   #include "utilities/vmError.hpp"
>>>> +#if INCLUDE_JFR
>>>> +#include "jfr/jfr.hpp"
>>>> +#endif
>>>>
>>>>   #include <stdio.h>
>>>>
>>>> @@ -306,6 +309,8 @@
>>>>     // commands multiple times we just do it once when the first threads reports
>>>>     // the error.
>>>>     if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
>>>> +    JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>> +
>>>>       // create heap dump before OnOutOfMemoryError commands are executed
>>>>       if (HeapDumpOnOutOfMemoryError) {
>>>>         tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>
>>>>
>>>>
>>>> This should write the contents of the leak profiler at the first reported OOME; it will go through the regular chunk writing mechanism in order that it do not destroy existing dump logic.
>>>>
>>>> Let me know if it works for you.
>>>>
>>>> Thanks
>>>> Markus
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>> Sent: den 30 oktober 2018 02:35
>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>> Subject: Re: Emergency JFR dump at OOME
>>>>
>>>> Hi Markus,
>>>>
>>>> I confirmed with GDB that Leak Profiler is called by shutdown hook.
>>>> I think it is very useful to obtain information when OOME occurs because the user might not get heap dump.
>>>>
>>>> So I want JFR to call Leak Profiler when OOME occurs before being destroyed problematic thread.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>> 2018年10月29日(月) 23:41 Markus Gronlund <markus.gronlund at oracle.com>:
>>>>>
>>>>> I think it is called.
>>>>>
>>>>> Remember that main thread has already exited when the shutdown logic is called, the allocations in your test can already have been removed by the GC at this point (marked as dead in the Leak Profiler).
>>>>>
>>>>> In general, small tests like these are not representative for the Leak Profiler, because it works by acquiring samples over longer periods of time, and then there is the problem of the main thread could already have exited at the point of dump.
>>>>>
>>>>> Please take a look at the tests located in test/jdk/jdk/jfr/event/oldobject for some reference on how you can take more control to increase the chances of getting samples.
>>>>>
>>>>> Thanks
>>>>> Markus
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>> Sent: den 29 oktober 2018 15:13
>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>
>>>>> Hi Markus,
>>>>>
>>>>> I tried to get flight record of OOME sample application as below:
>>>>> ---------------
>>>>> import java.util.*;
>>>>>
>>>>> public class OOME{
>>>>>     public static void main(String[] args){
>>>>>       var list = new ArrayList<byte[]>();
>>>>>       while(true){
>>>>>         list.add(new byte[1024]);
>>>>>       }
>>>>>     }
>>>>> }
>>>>> ---------------
>>>>>
>>>>> Command:
>>>>>     $ /usr/local/jdk-11.0.1/bin/java
>>>>> -XX:StartFlightRecording=filename=oome.jfr,settings=profile -Xmx256m
>>>>> OOME
>>>>>
>>>>>
>>>>> I could get flight record into oome.jfr, but JMC did not show any objects on "Live Objects" window.
>>>>> OOME.java will finish immidiatery. It will not invoke any periodic tasks.
>>>>> So I guess LeakProfiler::emit_events() will not be called.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2018/10/29 22:47, Markus Gronlund wrote:
>>>>>> Rotate() and / or stop() is always called as part of shutdown, indirectly by the shutdown thread asking the recorder thread in the VM.
>>>>>>
>>>>>> LeakProfiler::emit_events() will be called on shutdown if the jdk.OldObjectSample event is enabled.
>>>>>>
>>>>>> Absence of EventDumpReason implies a normal shutdown.
>>>>>>
>>>>>> Markus
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>> Sent: den 29 oktober 2018 11:58
>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>
>>>>>> Hi Markus,
>>>>>>
>>>>>>> This would screw up the logic for the registered Shutdown hook that will run if the VM is also shutting down (which is not implied).
>>>>>>
>>>>>> Does it mean JfrRecorderService::rotate() will be called at JfrEmergencyDump::on_vm_shutdown() ?
>>>>>> I think EventDumpReason::commit() and  LeakProfiler::emit_events() should be called when OOME occurs even if it would not be treated as emergency dump. So we should add them to Universe::gen_out_of_memory_error().
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2018/10/29 17:28, Markus Gronlund wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> I don't think so.
>>>>>>>
>>>>>>> Handling OOMEs is very intricate, OOMEs are thread local and it is difficult to get it right.
>>>>>>>
>>>>>>> The suggested patch would do an emergency dump unconditionally on the first reported OOME. This would screw up the logic for the registered Shutdown hook that will run if the VM is also shutting down (which is not implied).
>>>>>>> But, it is only the first invocation of report_java_out_of_memory() that is happening here and user code can catch OOME's. Other threads might run fine for quite some time and might not even run into OOME's.
>>>>>>> The shutdown hook registered for dumping JFR recordings on VM Exit is set up to attempt to handle graceful shutdown if possible (no OOME), but if it gets an OOME, it will trigger the OOME emergency dump logic.
>>>>>>>
>>>>>>> Remember that you will need to state you would like a recording dumped out to disk on VM exit for the shutdown hook logic to complete.
>>>>>>>
>>>>>>> This you can do by:
>>>>>>>
>>>>>>> -XX:StartFlightRecording:dumponexit=true
>>>>>>>
>>>>>>> Or by:
>>>>>>>
>>>>>>> -XX:StartFlightRecording:filename=myrec.jfr
>>>>>>>
>>>>>>> If the Shutdown hook gets an OOME during the exit logic, it will take the emergency path to create a file called hs_oom_<pid>.jfr.
>>>>>>>
>>>>>>> There is a current known issue with the fact that OOME's are pre-allocated so they don't turn up in the recordings as Errors (because they are pre-allocated before JFR starts). We might want to add something to Universe::gen_out_of_memory_error() to report this in some way.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Markus
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>> Sent: den 25 oktober 2018 12:58
>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>> Subject: Emergency JFR dump at OOME
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> According to [1], I guess JFR dumps flight record to file.
>>>>>>> But current JFR don't do so.
>>>>>>>
>>>>>>> Should we fix it as below?
>>>>>>> --------------------
>>>>>>> diff -r 003c062e16ea src/hotspot/share/utilities/debug.cpp
>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp     Wed Oct 24 21:17:30 2018 -0700
>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp     Thu Oct 25 19:56:54 2018 +0900
>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>      #include "utilities/globalDefinitions.hpp"
>>>>>>>      #include "utilities/macros.hpp"
>>>>>>>      #include "utilities/vmError.hpp"
>>>>>>> +#if INCLUDE_JFR
>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>> +#endif
>>>>>>>
>>>>>>>      #include <stdio.h>
>>>>>>>
>>>>>>> @@ -321,6 +324,8 @@
>>>>>>>            fatal("OutOfMemory encountered: %s", message);
>>>>>>>          }
>>>>>>>
>>>>>>> +    JFR_ONLY(Jfr::on_vm_shutdown(false);)
>>>>>>> +
>>>>>>>          if (ExitOnOutOfMemoryError) {
>>>>>>>            tty->print_cr("Terminating due to java.lang.OutOfMemoryError: %s", message);
>>>>>>>            os::exit(3);
>>>>>>> --------------------
>>>>>>>
>>>>>>> I will file it to JBS and will send review request if it is verified.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> https://hg.openjdk.java.net/jdk/jdk/file/003c062e16ea/src/hotspot/s
>>>>>>> hare/jfr/recorder/repository/jfrEmergencyDump.cpp#l159
>>>>>>>
> 


More information about the hotspot-jfr-dev mailing list