PING: Emergency JFR dump at OOME

Erik Gahlin erik.gahlin at oracle.com
Thu Jan 10 13:08:18 UTC 2019


Hi Yasumasa,

I will look into this next week.

Thanks
Erik

> PING: Did you read my email?
>
> I believe this enhancement helps us to resolve memory issue.
>
>
> Yasumasa
>
>
> On 2019/01/01 13:16, Yasumasa Suenaga wrote:
>> Hi,
>>
>> I want to discuss about this enhancement (JDK-8213435) again.
>>
>> I uploaded my proposal:
>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8213435/webrev.00/
>>
>> This change provides new option "emitOnOOME" to *.JFC file.
>> If this options set to true, old object sampling events will be 
>> emitted when OOME occurred.
>>
>> This option set to false by default. So it is not affect to Epsilon 
>> user :-)
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2018/11/07 5:29, Erik Gahlin wrote:
>>> On 2018-11-06 15:39, Erik Gahlin wrote:
>>>> On 2018-11-01 14:26, Yasumasa Suenaga wrote:
>>>>> Hi Erik,
>>>>>
>>>>> On 2018/11/01 14:13, Erik Gahlin wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> Thanks for looking into this, but I don’t think emitting the Old 
>>>>>> Object events is the right thing to do unless we produce a JFR 
>>>>>> file at the same time.
>>>>>>
>>>>>> If anything should be added, it should be when the JVM exits.
>>>>>>
>>>>>>    if (ExitOnOutOfMemoryError) { "
>>>>>>        tty->print_cr("Terminating due to 
>>>>>> java.lang.OutOfMemoryError: %s", message);
>>>>>> + perhaps call emergency dump jfr here?
>>>>>>        os::exit(3);
>>>>>>      }
>>>>>>
>>>>>> That said, thinking this over, I’m not even sure this is a good 
>>>>>> idea.
>>>>>>
>>>>>> I don’t know how the ExitOnOutOfMemoryError flag is used in 
>>>>>> production environments. Do people want a .jfr file at that 
>>>>>> point? For all uses cases? Will it irritate people if they need 
>>>>>> to clean up jfr files? Will it make them turn off Flight Recorder?
>>>>>
>>>>> IMHO ExitOnOutOfMemoryError (and CrashOnOutOfMemoryError) should 
>>>>> be used in production system because any threads might caught OOME 
>>>>> which causes by another thread.
>>>>> For example, when request processor on Tomcat consumes a lot of 
>>>>> memory, Tomcat acceptor thread might caught OOME. If so, Tomcat 
>>>>> cannot process any requests in spite of the process is running.
>>>> Yes, but do you always want a .jfr if this happens?
>>>>
>>>> Let's say you are using the Epsilon GC (which sets 
>>>> ExitOnOutOfMemoryError) and have a script that restarts the JVM 
>>>> when it exits. Then your hard disk may fill up with .jfr files.
>>>>
>>>> One idea would be to only do it for recordings that have specified 
>>>> -XX:StartFlightRecording:dumponexit=true (which is implicitly set 
>>>> by -XX:StartFlightRecording:filename=<filename>) That way, it would 
>>>> be opt in behavior.
>>>>
>>>> This is non-trivial to implement since we can't call Java and 
>>>> allocate objects when we are out of memory. One could perhaps make 
>>>> an up call to Java and there take a previously prepared String 
>>>> representation of the destination path and push that reference back 
>>>> into native, which can then later be inspected.  A filename must 
>>>> also be generated if a user hasn't specified one, which is trickier 
>>>> do to in native. Files in the disk repository should also be 
>>>> removed when the JVM exits.
>>>>
>>> I filed an enhancement request where this could discussed further.
>>> https://bugs.openjdk.java.net/browse/JDK-8213435
>>>
>>> Erik
>>>
>>>>>
>>>>>
>>>>>> Emergency (native) dumps should probably be reserved for cases 
>>>>>> when the JVM crashes, similar to the hs_err file.
>>>>>
>>>>> I thought JfrEmergencyDump::on_vm_shutdown() should handle all of 
>>>>> OOME, but It seems not to be the case.
>>>>>
>>>>>
>>>>>> I’m reluctant to add a specific flag for the Old Object event for 
>>>>>> the following reasons:
>>>>>>
>>>>>> 1) Very few people would find out about the flag, so it would add 
>>>>>> little value in practise. Focus should be to build a product that 
>>>>>> works well out of the box and not tie the implementation to a 
>>>>>> flag that must be respected for the next decade or so.
>>>>>>
>>>>>> 2) The feature is new and there is not a good tool for 
>>>>>> visualising old object samples. Once it exists, it’e easier to 
>>>>>> see if emitting events and dumping a recording at this stage 
>>>>>> would help users in real world scenarios.
>>>>>>
>>>>>> 3) The way configuration issues have been handled historically is 
>>>>>> using a .jfc file. For instance, one could imagine something like 
>>>>>> this:
>>>>>>
>>>>>> <event name=“jdk.OldObjectSample”>
>>>>>>   <setting name=“firstOOME”>true</setting>
>>>>>>>>>>>> </event>
>>>>>>
>>>>>> or perhaps a dedicated event, for example 
>>>>>> “jdk.FirstOOMEObjectSample” (if there really is a use case to 
>>>>>> emit events at this particular time that is not covered by the 
>>>>>> Old Object events emitted when the recording ends). There are 
>>>>>> plans to allow events to be configured on command line, so you 
>>>>>> would not need to create a new jfc file to make a slight change. 
>>>>>> That feature would then automatically provide command line 
>>>>>> capabilities for the Old Object event in the scenario you describe.
>>>>>>
>>>>>> To me it seems best to wait with the enhancement for now.
>>>>>
>>>>> I agree with you "jdk.FirstOOMEObjectSample" should be added as 
>>>>> event setting.
>>>>> I don't care it can set whether *.jfc and commandline option.
>>>>>
>>>>> Anyway, I think it is very useful if we can get old object 
>>>>> information in flight record when OOME occurs.
>>>>> Of course we can get heap dump with HeapDumpOnOutOfMemoryError, 
>>>>> but it is not contain time-series data like a flight record.
>>>>>
>>>>>
>>>>> I hope this proposal is accepted JFR team.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> Thanks
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>> On 1 Nov 2018, at 04:15, Yasumasa Suenaga <yasuenag at gmail.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>>> If a user on the other hand has specified a flag that the JVM 
>>>>>>>> should exit on OOME, then it makes sense to dump a recording 
>>>>>>>> with the old object events and shortest path-to-gc-root.
>>>>>>>> That part seems to be missing.
>>>>>>>
>>>>>>> I tried to add a flag `EmitLeakProfilerEventsOnOOME` to control 
>>>>>>> it as below.
>>>>>>> It works fine on my environment.
>>>>>>>
>>>>>>> ```
>>>>>>> diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.cpp
>>>>>>> --- a/src/hotspot/share/jfr/jfr.cpp     Thu Nov 01 02:12:13 2018 
>>>>>>> +0100
>>>>>>> +++ b/src/hotspot/share/jfr/jfr.cpp     Thu Nov 01 12:13:17 2018 
>>>>>>> +0900
>>>>>>> @@ -91,6 +91,10 @@
>>>>>>>    return 
>>>>>>> JfrOptionSet::parse_start_flight_recording_option(option, 
>>>>>>> delimiter);
>>>>>>> }
>>>>>>>
>>>>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool 
>>>>>>> emit_all) {
>>>>>>> +  LeakProfiler::emit_events(cutoff_ticks, emit_all);
>>>>>>> +}
>>>>>>> +
>>>>>>> Thread* Jfr::sampler_thread() {
>>>>>>>    return JfrThreadSampling::sampler_thread();
>>>>>>> }
>>>>>>> diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.hpp
>>>>>>> --- a/src/hotspot/share/jfr/jfr.hpp     Thu Nov 01 02:12:13 2018 
>>>>>>> +0100
>>>>>>> +++ b/src/hotspot/share/jfr/jfr.hpp     Thu Nov 01 12:13:17 2018 
>>>>>>> +0900
>>>>>>> @@ -52,6 +52,7 @@
>>>>>>>    static bool on_flight_recorder_option(const JavaVMOption** 
>>>>>>> option, char* delimiter);
>>>>>>>    static bool on_start_flight_recording_option(const 
>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>    static void weak_oops_do(BoolObjectClosure* is_alive, 
>>>>>>> OopClosure* f);
>>>>>>> +  static void emit_leak_profiler_events(jlong cutoff_ticks, 
>>>>>>> bool emit_all);
>>>>>>>    static Thread* sampler_thread();
>>>>>>> };
>>>>>>>
>>>>>>> diff -r 3a8208766f7b src/hotspot/share/runtime/globals.hpp
>>>>>>> --- a/src/hotspot/share/runtime/globals.hpp     Thu Nov 01 
>>>>>>> 02:12:13 2018 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/globals.hpp     Thu Nov 01 
>>>>>>> 12:13:17 2018 +0900
>>>>>>> @@ -2596,6 +2596,9 @@
>>>>>>>    JFR_ONLY(product(ccstr, StartFlightRecording, 
>>>>>>> NULL,                       \
>>>>>>>            "Start flight recording with 
>>>>>>> options"))                           \
>>>>>>> \
>>>>>>> +  JFR_ONLY(product(bool, EmitLeakProfilerEventsOnOOME, 
>>>>>>> false,               \
>>>>>>> +          "Emit LeakProfiler events when OutOfMemoryError 
>>>>>>> occurs"))         \
>>>>>>> + \
>>>>>>>    experimental(bool, UseFastUnorderedTimeStamps, 
>>>>>>> false,                     \
>>>>>>>            "Use platform unstable time where supported for 
>>>>>>> timestamps only")
>>>>>>>
>>>>>>> diff -r 3a8208766f7b src/hotspot/share/utilities/debug.cpp
>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp     Thu Nov 01 
>>>>>>> 02:12:13 2018 +0100
>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp     Thu Nov 01 
>>>>>>> 12:13:17 2018 +0900
>>>>>>> @@ -58,6 +58,9 @@
>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>> #include "utilities/macros.hpp"
>>>>>>> #include "utilities/vmError.hpp"
>>>>>>> +#if INCLUDE_JFR
>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>> +#endif
>>>>>>>
>>>>>>> #include <stdio.h>
>>>>>>>
>>>>>>> @@ -306,6 +309,13 @@
>>>>>>>    // commands multiple times we just do it once when the first 
>>>>>>> threads reports
>>>>>>>    // the error.
>>>>>>>    if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
>>>>>>> +
>>>>>>> +#if INCLUDE_JFR
>>>>>>> +      if (EmitLeakProfilerEventsOnOOME) {
>>>>>>> +        Jfr::emit_leak_profiler_events(max_jlong, false);
>>>>>>> +      }
>>>>>>> +#endif
>>>>>>> +
>>>>>>>      // create heap dump before OnOutOfMemoryError commands are 
>>>>>>> executed
>>>>>>>      if (HeapDumpOnOutOfMemoryError) {
>>>>>>>        tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>>>> ```
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2018/11/01 8:47, Erik Gahlin wrote:
>>>>>>>>> On 31 Oct 2018, at 03:58, Yasumasa Suenaga 
>>>>>>>>> <yasuenag at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> 2018年10月31日(水) 0:35 Markus Gronlund 
>>>>>>>>> <markus.gronlund at oracle.com>:
>>>>>>>>>>
>>>>>>>>>> I think I provided you with the wrong settings.
>>>>>>>>>>
>>>>>>>>>> Please change:
>>>>>>>>>>
>>>>>>>>>> JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>>>>>>>>
>>>>>>>>>> To
>>>>>>>>>>
>>>>>>>>>> JFR_ONLY(Jfr::emit_leak_profiler_events(max_jlong, false);)
>>>>>>>>>>
>>>>>>>>>> I think this will get you the GC roots as well.
>>>>>>>>>
>>>>>>>>> Thanks! It works fine.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> We need to think about if / how this should be integrated. If 
>>>>>>>>>> so, it might be that it needs to be guarded behind some flag 
>>>>>>>>>> to not always issue a full safepoint, root scanning and edge 
>>>>>>>>>> traversals on every OOME.
>>>>>>>>>
>>>>>>>>> Do you mean VM operation should be added for 
>>>>>>>>> Jfr::emit_leak_profiler_events()?
>>>>>>>>> I think it can do so because HeapDumper is called from this 
>>>>>>>>> function.
>>>>>>>> When a recording ends, old object samples are written. They 
>>>>>>>> have the most up to date information about what is leaking. I 
>>>>>>>> don’t think we should emit old object events before that 
>>>>>>>> happens. It will make recordings harder to analyze and 
>>>>>>>> introduce an intrusive safepoint.
>>>>>>>> If a user on the other hand has specified a flag that the JVM 
>>>>>>>> should exit on OOME, then it makes sense to dump a recording 
>>>>>>>> with the old object events and shortest path-to-gc-root.
>>>>>>>> That part seems to be missing.
>>>>>>>> Erik
>>>>>>>>>
>>>>>>>>> Anyway, I want you to merge this change to JFR. :-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Markus
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>> Sent: den 30 oktober 2018 14:54
>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>
>>>>>>>>>> Thanks Markus!
>>>>>>>>>> It works fine on my environment.
>>>>>>>>>>
>>>>>>>>>> Could you apply this change to JFR?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> P.S.
>>>>>>>>>>    I got flight record with path-to-gc-roots=true, however 
>>>>>>>>>> "GC Root" in JMC
>>>>>>>>>>    is null. Is it correct?
>>>>>>>>>>    (Application is OOME.java which I shared before)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2018/10/30 21:01, Markus Gronlund wrote:
>>>>>>>>>>> Hi again,
>>>>>>>>>>>
>>>>>>>>>>> Maybe you can try something like this:
>>>>>>>>>>>
>>>>>>>>>>> # HG changeset patch
>>>>>>>>>>> # User mgronlun
>>>>>>>>>>> # Date 1540900357 -3600
>>>>>>>>>>> #      Tue Oct 30 12:52:37 2018 +0100
>>>>>>>>>>> # Node ID 32a48c323970c5fc4d0d1ffff5860a3c55c4a4dc
>>>>>>>>>>> # Parent 80d104390dd2821fd95d56981bf9d37f1cc2e363
>>>>>>>>>>> [mq]: yasumasa
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>> b/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>> @@ -91,6 +91,10 @@
>>>>>>>>>>>     return 
>>>>>>>>>>> JfrOptionSet::parse_start_flight_recording_option(option, 
>>>>>>>>>>> delimiter);
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool
>>>>>>>>>>> +emit_all) {
>>>>>>>>>>> +  LeakProfiler::emit_events(cutoff_ticks, emit_all); }
>>>>>>>>>>> +
>>>>>>>>>>>   Thread* Jfr::sampler_thread() {
>>>>>>>>>>>     return JfrThreadSampling::sampler_thread();
>>>>>>>>>>>   }
>>>>>>>>>>> diff --git a/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>> b/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>> @@ -52,6 +52,7 @@
>>>>>>>>>>>     static bool on_flight_recorder_option(const 
>>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>>>     static bool on_start_flight_recording_option(const 
>>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>>>     static void weak_oops_do(BoolObjectClosure* is_alive, 
>>>>>>>>>>> OopClosure*
>>>>>>>>>>> f);
>>>>>>>>>>> +  static void emit_leak_profiler_events(jlong cutoff_ticks, 
>>>>>>>>>>> bool
>>>>>>>>>>> + emit_all);
>>>>>>>>>>>     static Thread* sampler_thread();
>>>>>>>>>>>   };
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>> b/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>>>   #include "utilities/globalDefinitions.hpp"
>>>>>>>>>>>   #include "utilities/macros.hpp"
>>>>>>>>>>>   #include "utilities/vmError.hpp"
>>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>>> +#endif
>>>>>>>>>>>
>>>>>>>>>>>   #include <stdio.h>
>>>>>>>>>>>
>>>>>>>>>>> @@ -306,6 +309,8 @@
>>>>>>>>>>>     // commands multiple times we just do it once when the 
>>>>>>>>>>> first threads reports
>>>>>>>>>>>     // the error.
>>>>>>>>>>>     if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
>>>>>>>>>>> + JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>>>>>>>>> +
>>>>>>>>>>>       // create heap dump before OnOutOfMemoryError commands 
>>>>>>>>>>> are executed
>>>>>>>>>>>       if (HeapDumpOnOutOfMemoryError) {
>>>>>>>>>>> tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This should write the contents of the leak profiler at the 
>>>>>>>>>>> first reported OOME; it will go through the regular chunk 
>>>>>>>>>>> writing mechanism in order that it do not destroy existing 
>>>>>>>>>>> dump logic.
>>>>>>>>>>>
>>>>>>>>>>> Let me know if it works for you.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Markus
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>> Sent: den 30 oktober 2018 02:35
>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>
>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>
>>>>>>>>>>> I confirmed with GDB that Leak Profiler is called by 
>>>>>>>>>>> shutdown hook.
>>>>>>>>>>> I think it is very useful to obtain information when OOME 
>>>>>>>>>>> occurs because the user might not get heap dump.
>>>>>>>>>>>
>>>>>>>>>>> So I want JFR to call Leak Profiler when OOME occurs before 
>>>>>>>>>>> being destroyed problematic thread.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>> 2018年10月29日(月) 23:41 Markus Gronlund 
>>>>>>>>>>> <markus.gronlund at oracle.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> I think it is called.
>>>>>>>>>>>>
>>>>>>>>>>>> Remember that main thread has already exited when the 
>>>>>>>>>>>> shutdown logic is called, the allocations in your test can 
>>>>>>>>>>>> already have been removed by the GC at this point (marked 
>>>>>>>>>>>> as dead in the Leak Profiler).
>>>>>>>>>>>>
>>>>>>>>>>>> In general, small tests like these are not representative 
>>>>>>>>>>>> for the Leak Profiler, because it works by acquiring 
>>>>>>>>>>>> samples over longer periods of time, and then there is the 
>>>>>>>>>>>> problem of the main thread could already have exited at the 
>>>>>>>>>>>> point of dump.
>>>>>>>>>>>>
>>>>>>>>>>>> Please take a look at the tests located in 
>>>>>>>>>>>> test/jdk/jdk/jfr/event/oldobject for some reference on how 
>>>>>>>>>>>> you can take more control to increase the chances of 
>>>>>>>>>>>> getting samples.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Markus
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>> Sent: den 29 oktober 2018 15:13
>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>
>>>>>>>>>>>> I tried to get flight record of OOME sample application as 
>>>>>>>>>>>> below:
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> import java.util.*;
>>>>>>>>>>>>
>>>>>>>>>>>> public class OOME{
>>>>>>>>>>>>     public static void main(String[] args){
>>>>>>>>>>>>       var list = new ArrayList<byte[]>();
>>>>>>>>>>>>       while(true){
>>>>>>>>>>>>         list.add(new byte[1024]);
>>>>>>>>>>>>       }
>>>>>>>>>>>>     }
>>>>>>>>>>>> }
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> Command:
>>>>>>>>>>>>     $ /usr/local/jdk-11.0.1/bin/java
>>>>>>>>>>>> -XX:StartFlightRecording=filename=oome.jfr,settings=profile 
>>>>>>>>>>>> -Xmx256m
>>>>>>>>>>>> OOME
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I could get flight record into oome.jfr, but JMC did not 
>>>>>>>>>>>> show any objects on "Live Objects" window.
>>>>>>>>>>>> OOME.java will finish immidiatery. It will not invoke any 
>>>>>>>>>>>> periodic tasks.
>>>>>>>>>>>> So I guess LeakProfiler::emit_events() will not be called.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2018/10/29 22:47, Markus Gronlund wrote:
>>>>>>>>>>>>> Rotate() and / or stop() is always called as part of 
>>>>>>>>>>>>> shutdown, indirectly by the shutdown thread asking the 
>>>>>>>>>>>>> recorder thread in the VM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> LeakProfiler::emit_events() will be called on shutdown if 
>>>>>>>>>>>>> the jdk.OldObjectSample event is enabled.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Absence of EventDumpReason implies a normal shutdown.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>> Sent: den 29 oktober 2018 11:58
>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This would screw up the logic for the registered Shutdown 
>>>>>>>>>>>>>> hook that will run if the VM is also shutting down (which 
>>>>>>>>>>>>>> is not implied).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does it mean JfrRecorderService::rotate() will be called 
>>>>>>>>>>>>> at JfrEmergencyDump::on_vm_shutdown() ?
>>>>>>>>>>>>> I think EventDumpReason::commit() and 
>>>>>>>>>>>>> LeakProfiler::emit_events() should be called when OOME 
>>>>>>>>>>>>> occurs even if it would not be treated as emergency dump. 
>>>>>>>>>>>>> So we should add them to Universe::gen_out_of_memory_error().
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2018/10/29 17:28, Markus Gronlund wrote:
>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think so.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Handling OOMEs is very intricate, OOMEs are thread local 
>>>>>>>>>>>>>> and it is difficult to get it right.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The suggested patch would do an emergency dump 
>>>>>>>>>>>>>> unconditionally on the first reported OOME. This would 
>>>>>>>>>>>>>> screw up the logic for the registered Shutdown hook that 
>>>>>>>>>>>>>> will run if the VM is also shutting down (which is not 
>>>>>>>>>>>>>> implied).
>>>>>>>>>>>>>> But, it is only the first invocation of 
>>>>>>>>>>>>>> report_java_out_of_memory() that is happening here and 
>>>>>>>>>>>>>> user code can catch OOME's. Other threads might run fine 
>>>>>>>>>>>>>> for quite some time and might not even run into OOME's.
>>>>>>>>>>>>>> The shutdown hook registered for dumping JFR recordings 
>>>>>>>>>>>>>> on VM Exit is set up to attempt to handle graceful 
>>>>>>>>>>>>>> shutdown if possible (no OOME), but if it gets an OOME, 
>>>>>>>>>>>>>> it will trigger the OOME emergency dump logic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Remember that you will need to state you would like a 
>>>>>>>>>>>>>> recording dumped out to disk on VM exit for the shutdown 
>>>>>>>>>>>>>> hook logic to complete.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This you can do by:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -XX:StartFlightRecording:dumponexit=true
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Or by:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -XX:StartFlightRecording:filename=myrec.jfr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If the Shutdown hook gets an OOME during the exit logic, 
>>>>>>>>>>>>>> it will take the emergency path to create a file called 
>>>>>>>>>>>>>> hs_oom_<pid>.jfr.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is a current known issue with the fact that OOME's 
>>>>>>>>>>>>>> are pre-allocated so they don't turn up in the recordings 
>>>>>>>>>>>>>> as Errors (because they are pre-allocated before JFR 
>>>>>>>>>>>>>> starts). We might want to add something to 
>>>>>>>>>>>>>> Universe::gen_out_of_memory_error() to report this in 
>>>>>>>>>>>>>> some way.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>> Sent: den 25 oktober 2018 12:58
>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>> Subject: Emergency JFR dump at OOME
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> According to [1], I guess JFR dumps flight record to file.
>>>>>>>>>>>>>> But current JFR don't do so.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should we fix it as below?
>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>> diff -r 003c062e16ea src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp Wed Oct 24 
>>>>>>>>>>>>>> 21:17:30 2018 -0700
>>>>>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp Thu Oct 25 
>>>>>>>>>>>>>> 19:56:54 2018 +0900
>>>>>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>>>>>>      #include "utilities/globalDefinitions.hpp"
>>>>>>>>>>>>>>      #include "utilities/macros.hpp"
>>>>>>>>>>>>>>      #include "utilities/vmError.hpp"
>>>>>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      #include <stdio.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -321,6 +324,8 @@
>>>>>>>>>>>>>>            fatal("OutOfMemory encountered: %s", message);
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + JFR_ONLY(Jfr::on_vm_shutdown(false);)
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>          if (ExitOnOutOfMemoryError) {
>>>>>>>>>>>>>>            tty->print_cr("Terminating due to 
>>>>>>>>>>>>>> java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>>>>>>            os::exit(3);
>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will file it to JBS and will send review request if it 
>>>>>>>>>>>>>> is verified.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://hg.openjdk.java.net/jdk/jdk/file/003c062e16ea/src/hotspot/s 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hare/jfr/recorder/repository/jfrEmergencyDump.cpp#l159
>>>>>>>>>>>>>>
>>>>>>
>>>>
>>>



More information about the hotspot-jfr-dev mailing list