PING: Emergency JFR dump at OOME
Erik Gahlin
erik.gahlin at oracle.com
Tue Jan 22 14:48:40 UTC 2019
On 2019-01-22 15:15, Yasumasa Suenaga wrote:
> Hi Erik,
>
> On 2019/01/22 22:43, Erik Gahlin wrote:
>> There are several problems with the patch, it will not combine settings
>> if there are multiple recordings running at the same time and the
>> configuration will not show up in JMC. It is also unlikely that someone
>> will discover the setting in the .jfc file so it will add little benefit
>> for the community.
>>
>> I think it is better to add support through the existing mechanism
>> (dumponexit=true), which can be done when we have proper native support
>> for dumping recordings [1].
>
> It would be great!
>
> Should I add dependency to JDK-8196050 JDK-8213435 (this enhancement)
> on JBS?
> or should I close JDK-8213435 as duplicate of JDK-8196050?
You could close as duplicate, but please add a comment to JDK-8196050
noting it should also work with ExitOnOutOfMemoryError.
Thanks
Erik
>
>
> Thanks,
>
> Yasumasa
>
>
>
>> From a user perspective, it is irrelevant if the recording dumps/exits
>> in Java (in the shutdown hook), or in native (due to OOME).
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8196050
>>
>> Erik
>>
>>> Hi Yasumasa,
>>>
>>> I will look into this next week.
>>>
>>> Thanks
>>> Erik
>>>
>>>> PING: Did you read my email?
>>>>
>>>> I believe this enhancement helps us to resolve memory issue.
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/01/01 13:16, Yasumasa Suenaga wrote:
>>>>> Hi,
>>>>>
>>>>> I want to discuss about this enhancement (JDK-8213435) again.
>>>>>
>>>>> I uploaded my proposal:
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8213435/webrev.00/
>>>>>
>>>>> This change provides new option "emitOnOOME" to *.JFC file.
>>>>> If this options set to true, old object sampling events will be
>>>>> emitted when OOME occurred.
>>>>>
>>>>> This option set to false by default. So it is not affect to Epsilon
>>>>> user :-)
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2018/11/07 5:29, Erik Gahlin wrote:
>>>>>> On 2018-11-06 15:39, Erik Gahlin wrote:
>>>>>>> On 2018-11-01 14:26, Yasumasa Suenaga wrote:
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>> On 2018/11/01 14:13, Erik Gahlin wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> Thanks for looking into this, but I don?t think emitting the Old
>>>>>>>>> Object events is the right thing to do unless we produce a JFR
>>>>>>>>> file at the same time.
>>>>>>>>>
>>>>>>>>> If anything should be added, it should be when the JVM exits.
>>>>>>>>>
>>>>>>>>> if (ExitOnOutOfMemoryError) { "
>>>>>>>>> tty->print_cr("Terminating due to
>>>>>>>>> java.lang.OutOfMemoryError: %s", message);
>>>>>>>>> + perhaps call emergency dump jfr here?
>>>>>>>>> os::exit(3);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> That said, thinking this over, I?m not even sure this is a good
>>>>>>>>> idea.
>>>>>>>>>
>>>>>>>>> I don?t know how the ExitOnOutOfMemoryError flag is used in
>>>>>>>>> production environments. Do people want a .jfr file at that
>>>>>>>>> point? For all uses cases? Will it irritate people if they need
>>>>>>>>> to clean up jfr files? Will it make them turn off Flight
>>>>>>>>> Recorder?
>>>>>>>>
>>>>>>>> IMHO ExitOnOutOfMemoryError (and CrashOnOutOfMemoryError) should
>>>>>>>> be used in production system because any threads might caught
>>>>>>>> OOME which causes by another thread.
>>>>>>>> For example, when request processor on Tomcat consumes a lot of
>>>>>>>> memory, Tomcat acceptor thread might caught OOME. If so, Tomcat
>>>>>>>> cannot process any requests in spite of the process is running.
>>>>>>> Yes, but do you always want a .jfr if this happens?
>>>>>>>
>>>>>>> Let's say you are using the Epsilon GC (which sets
>>>>>>> ExitOnOutOfMemoryError) and have a script that restarts the JVM
>>>>>>> when it exits. Then your hard disk may fill up with .jfr files.
>>>>>>>
>>>>>>> One idea would be to only do it for recordings that have specified
>>>>>>> -XX:StartFlightRecording:dumponexit=true (which is implicitly set
>>>>>>> by -XX:StartFlightRecording:filename=<filename>) That way, it
>>>>>>> would be opt in behavior.
>>>>>>>
>>>>>>> This is non-trivial to implement since we can't call Java and
>>>>>>> allocate objects when we are out of memory. One could perhaps make
>>>>>>> an up call to Java and there take a previously prepared String
>>>>>>> representation of the destination path and push that reference
>>>>>>> back into native, which can then later be inspected. A filename
>>>>>>> must also be generated if a user hasn't specified one, which is
>>>>>>> trickier do to in native. Files in the disk repository should also
>>>>>>> be removed when the JVM exits.
>>>>>>>
>>>>>> I filed an enhancement request where this could discussed further.
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213435
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Emergency (native) dumps should probably be reserved for cases
>>>>>>>>> when the JVM crashes, similar to the hs_err file.
>>>>>>>>
>>>>>>>> I thought JfrEmergencyDump::on_vm_shutdown() should handle all of
>>>>>>>> OOME, but It seems not to be the case.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I?m reluctant to add a specific flag for the Old Object event
>>>>>>>>> for the following reasons:
>>>>>>>>>
>>>>>>>>> 1) Very few people would find out about the flag, so it would
>>>>>>>>> add little value in practise. Focus should be to build a product
>>>>>>>>> that works well out of the box and not tie the implementation to
>>>>>>>>> a flag that must be respected for the next decade or so.
>>>>>>>>>
>>>>>>>>> 2) The feature is new and there is not a good tool for
>>>>>>>>> visualising old object samples. Once it exists, it?e easier to
>>>>>>>>> see if emitting events and dumping a recording at this stage
>>>>>>>>> would help users in real world scenarios.
>>>>>>>>>
>>>>>>>>> 3) The way configuration issues have been handled historically
>>>>>>>>> is using a .jfc file. For instance, one could imagine something
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>> <event name=?jdk.OldObjectSample?>
>>>>>>>>> <setting name=?firstOOME?>true</setting>
>>>>>>>>> ?
>>>>>>>>> </event>
>>>>>>>>>
>>>>>>>>> or perhaps a dedicated event, for example
>>>>>>>>> ?jdk.FirstOOMEObjectSample? (if there really is a use case to
>>>>>>>>> emit events at this particular time that is not covered by the
>>>>>>>>> Old Object events emitted when the recording ends). There are
>>>>>>>>> plans to allow events to be configured on command line, so you
>>>>>>>>> would not need to create a new jfc file to make a slight change.
>>>>>>>>> That feature would then automatically provide command line
>>>>>>>>> capabilities for the Old Object event in the scenario you
>>>>>>>>> describe.
>>>>>>>>>
>>>>>>>>> To me it seems best to wait with the enhancement for now.
>>>>>>>>
>>>>>>>> I agree with you "jdk.FirstOOMEObjectSample" should be added as
>>>>>>>> event setting.
>>>>>>>> I don't care it can set whether *.jfc and commandline option.
>>>>>>>>
>>>>>>>> Anyway, I think it is very useful if we can get old object
>>>>>>>> information in flight record when OOME occurs.
>>>>>>>> Of course we can get heap dump with HeapDumpOnOutOfMemoryError,
>>>>>>>> but it is not contain time-series data like a flight record.
>>>>>>>>
>>>>>>>>
>>>>>>>> I hope this proposal is accepted JFR team.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Erik
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 1 Nov 2018, at 04:15, Yasumasa Suenaga <yasuenag at
>>>>>>>>>> gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Erik,
>>>>>>>>>>
>>>>>>>>>>> If a user on the other hand has specified a flag that the JVM
>>>>>>>>>>> should exit on OOME, then it makes sense to dump a recording
>>>>>>>>>>> with the old object events and shortest path-to-gc-root.
>>>>>>>>>>> That part seems to be missing.
>>>>>>>>>>
>>>>>>>>>> I tried to add a flag `EmitLeakProfilerEventsOnOOME` to control
>>>>>>>>>> it as below.
>>>>>>>>>> It works fine on my environment.
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.cpp Thu Nov 01 02:12:13
>>>>>>>>>> 2018 +0100
>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.cpp Thu Nov 01 12:13:17
>>>>>>>>>> 2018 +0900
>>>>>>>>>> @@ -91,6 +91,10 @@
>>>>>>>>>> return
>>>>>>>>>> JfrOptionSet::parse_start_flight_recording_option(option,
>>>>>>>>>> delimiter);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool
>>>>>>>>>> emit_all) {
>>>>>>>>>> + LeakProfiler::emit_events(cutoff_ticks, emit_all);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> Thread* Jfr::sampler_thread() {
>>>>>>>>>> return JfrThreadSampling::sampler_thread();
>>>>>>>>>> }
>>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.hpp Thu Nov 01 02:12:13
>>>>>>>>>> 2018 +0100
>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.hpp Thu Nov 01 12:13:17
>>>>>>>>>> 2018 +0900
>>>>>>>>>> @@ -52,6 +52,7 @@
>>>>>>>>>> static bool on_flight_recorder_option(const JavaVMOption**
>>>>>>>>>> option, char* delimiter);
>>>>>>>>>> static bool on_start_flight_recording_option(const
>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>> static void weak_oops_do(BoolObjectClosure* is_alive,
>>>>>>>>>> OopClosure* f);
>>>>>>>>>> + static void emit_leak_profiler_events(jlong cutoff_ticks,
>>>>>>>>>> bool emit_all);
>>>>>>>>>> static Thread* sampler_thread();
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/runtime/globals.hpp
>>>>>>>>>> --- a/src/hotspot/share/runtime/globals.hpp Thu Nov 01
>>>>>>>>>> 02:12:13 2018 +0100
>>>>>>>>>> +++ b/src/hotspot/share/runtime/globals.hpp Thu Nov 01
>>>>>>>>>> 12:13:17 2018 +0900
>>>>>>>>>> @@ -2596,6 +2596,9 @@
>>>>>>>>>> JFR_ONLY(product(ccstr, StartFlightRecording,
>>>>>>>>>> NULL, \
>>>>>>>>>> "Start flight recording with
>>>>>>>>>> options")) \
>>>>>>>>>> \
>>>>>>>>>> + JFR_ONLY(product(bool, EmitLeakProfilerEventsOnOOME,
>>>>>>>>>> false, \
>>>>>>>>>> + "Emit LeakProfiler events when OutOfMemoryError
>>>>>>>>>> occurs")) \
>>>>>>>>>> + \
>>>>>>>>>> experimental(bool, UseFastUnorderedTimeStamps,
>>>>>>>>>> false, \
>>>>>>>>>> "Use platform unstable time where supported for
>>>>>>>>>> timestamps only")
>>>>>>>>>>
>>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/utilities/debug.cpp
>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp Thu Nov 01
>>>>>>>>>> 02:12:13 2018 +0100
>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp Thu Nov 01
>>>>>>>>>> 12:13:17 2018 +0900
>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>>>>> #include "utilities/macros.hpp"
>>>>>>>>>> #include "utilities/vmError.hpp"
>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>> +#endif
>>>>>>>>>>
>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>
>>>>>>>>>> @@ -306,6 +309,13 @@
>>>>>>>>>> // commands multiple times we just do it once when the first
>>>>>>>>>> threads reports
>>>>>>>>>> // the error.
>>>>>>>>>> if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
>>>>>>>>>> +
>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>> + if (EmitLeakProfilerEventsOnOOME) {
>>>>>>>>>> + Jfr::emit_leak_profiler_events(max_jlong, false);
>>>>>>>>>> + }
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>> // create heap dump before OnOutOfMemoryError commands are
>>>>>>>>>> executed
>>>>>>>>>> if (HeapDumpOnOutOfMemoryError) {
>>>>>>>>>> tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2018/11/01 8:47, Erik Gahlin wrote:
>>>>>>>>>>>> On 31 Oct 2018, at 03:58, Yasumasa Suenaga
>>>>>>>>>>>> <yasuenag at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> 2018?10?31?(?) 0:35 Markus Gronlund
>>>>>>>>>>>> <markus.gronlund at oracle.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think I provided you with the wrong settings.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please change:
>>>>>>>>>>>>>
>>>>>>>>>>>>> JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>>>>>>>>>>>
>>>>>>>>>>>>> To
>>>>>>>>>>>>>
>>>>>>>>>>>>> JFR_ONLY(Jfr::emit_leak_profiler_events(max_jlong, false);)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this will get you the GC roots as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks! It works fine.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> We need to think about if / how this should be integrated.
>>>>>>>>>>>>> If so, it might be that it needs to be guarded behind some
>>>>>>>>>>>>> flag to not always issue a full safepoint, root scanning and
>>>>>>>>>>>>> edge traversals on every OOME.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you mean VM operation should be added for
>>>>>>>>>>>> Jfr::emit_leak_profiler_events()?
>>>>>>>>>>>> I think it can do so because HeapDumper is called from this
>>>>>>>>>>>> function.
>>>>>>>>>>> When a recording ends, old object samples are written. They
>>>>>>>>>>> have the most up to date information about what is leaking. I
>>>>>>>>>>> don?t think we should emit old object events before that
>>>>>>>>>>> happens. It will make recordings harder to analyze and
>>>>>>>>>>> introduce an intrusive safepoint.
>>>>>>>>>>> If a user on the other hand has specified a flag that the JVM
>>>>>>>>>>> should exit on OOME, then it makes sense to dump a recording
>>>>>>>>>>> with the old object events and shortest path-to-gc-root.
>>>>>>>>>>> That part seems to be missing.
>>>>>>>>>>> Erik
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, I want you to merge this change to JFR. :-)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>> Sent: den 30 oktober 2018 14:54
>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Markus!
>>>>>>>>>>>>> It works fine on my environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you apply this change to JFR?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S.
>>>>>>>>>>>>> I got flight record with path-to-gc-roots=true, however
>>>>>>>>>>>>> "GC Root" in JMC
>>>>>>>>>>>>> is null. Is it correct?
>>>>>>>>>>>>> (Application is OOME.java which I shared before)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2018/10/30 21:01, Markus Gronlund wrote:
>>>>>>>>>>>>>> Hi again,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe you can try something like this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # HG changeset patch
>>>>>>>>>>>>>> # User mgronlun
>>>>>>>>>>>>>> # Date 1540900357 -3600
>>>>>>>>>>>>>> # Tue Oct 30 12:52:37 2018 +0100
>>>>>>>>>>>>>> # Node ID 32a48c323970c5fc4d0d1ffff5860a3c55c4a4dc
>>>>>>>>>>>>>> # Parent 80d104390dd2821fd95d56981bf9d37f1cc2e363
>>>>>>>>>>>>>> [mq]: yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>>> b/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>>> @@ -91,6 +91,10 @@
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>> JfrOptionSet::parse_start_flight_recording_option(option,
>>>>>>>>>>>>>> delimiter);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks,
>>>>>>>>>>>>>> bool
>>>>>>>>>>>>>> +emit_all) {
>>>>>>>>>>>>>> + LeakProfiler::emit_events(cutoff_ticks, emit_all); }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> Thread* Jfr::sampler_thread() {
>>>>>>>>>>>>>> return JfrThreadSampling::sampler_thread();
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> diff --git a/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>>> b/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>>> @@ -52,6 +52,7 @@
>>>>>>>>>>>>>> static bool on_flight_recorder_option(const
>>>>>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>>>>>> static bool on_start_flight_recording_option(const
>>>>>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>>>>>> static void weak_oops_do(BoolObjectClosure* is_alive,
>>>>>>>>>>>>>> OopClosure*
>>>>>>>>>>>>>> f);
>>>>>>>>>>>>>> + static void emit_leak_profiler_events(jlong
>>>>>>>>>>>>>> cutoff_ticks, bool
>>>>>>>>>>>>>> + emit_all);
>>>>>>>>>>>>>> static Thread* sampler_thread();
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>> b/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>>>>>>>>> #include "utilities/macros.hpp"
>>>>>>>>>>>>>> #include "utilities/vmError.hpp"
>>>>>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -306,6 +309,8 @@
>>>>>>>>>>>>>> // commands multiple times we just do it once when the
>>>>>>>>>>>>>> first threads reports
>>>>>>>>>>>>>> // the error.
>>>>>>>>>>>>>> if (Atomic::cmpxchg(1, &out_of_memory_reported, 0)
>>>>>>>>>>>>>> == 0) {
>>>>>>>>>>>>>> + JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> // create heap dump before OnOutOfMemoryError
>>>>>>>>>>>>>> commands are executed
>>>>>>>>>>>>>> if (HeapDumpOnOutOfMemoryError) {
>>>>>>>>>>>>>> tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This should write the contents of the leak profiler at the
>>>>>>>>>>>>>> first reported OOME; it will go through the regular chunk
>>>>>>>>>>>>>> writing mechanism in order that it do not destroy existing
>>>>>>>>>>>>>> dump logic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let me know if it works for you.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>> Sent: den 30 oktober 2018 02:35
>>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I confirmed with GDB that Leak Profiler is called by
>>>>>>>>>>>>>> shutdown hook.
>>>>>>>>>>>>>> I think it is very useful to obtain information when OOME
>>>>>>>>>>>>>> occurs because the user might not get heap dump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I want JFR to call Leak Profiler when OOME occurs before
>>>>>>>>>>>>>> being destroyed problematic thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018?10?29?(?) 23:41 Markus Gronlund
>>>>>>>>>>>>>> <markus.gronlund at oracle.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think it is called.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Remember that main thread has already exited when the
>>>>>>>>>>>>>>> shutdown logic is called, the allocations in your test can
>>>>>>>>>>>>>>> already have been removed by the GC at this point (marked
>>>>>>>>>>>>>>> as dead in the Leak Profiler).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In general, small tests like these are not representative
>>>>>>>>>>>>>>> for the Leak Profiler, because it works by acquiring
>>>>>>>>>>>>>>> samples over longer periods of time, and then there is the
>>>>>>>>>>>>>>> problem of the main thread could already have exited at
>>>>>>>>>>>>>>> the point of dump.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please take a look at the tests located in
>>>>>>>>>>>>>>> test/jdk/jdk/jfr/event/oldobject for some reference on how
>>>>>>>>>>>>>>> you can take more control to increase the chances of
>>>>>>>>>>>>>>> getting samples.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>>> Sent: den 29 oktober 2018 15:13
>>>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tried to get flight record of OOME sample application as
>>>>>>>>>>>>>>> below:
>>>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>>> import java.util.*;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> public class OOME{
>>>>>>>>>>>>>>> public static void main(String[] args){
>>>>>>>>>>>>>>> var list = new ArrayList<byte[]>();
>>>>>>>>>>>>>>> while(true){
>>>>>>>>>>>>>>> list.add(new byte[1024]);
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Command:
>>>>>>>>>>>>>>> $ /usr/local/jdk-11.0.1/bin/java
>>>>>>>>>>>>>>> -XX:StartFlightRecording=filename=oome.jfr,settings=profile
>>>>>>>>>>>>>>> -Xmx256m
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OOME
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I could get flight record into oome.jfr, but JMC did not
>>>>>>>>>>>>>>> show any objects on "Live Objects" window.
>>>>>>>>>>>>>>> OOME.java will finish immidiatery. It will not invoke any
>>>>>>>>>>>>>>> periodic tasks.
>>>>>>>>>>>>>>> So I guess LeakProfiler::emit_events() will not be called.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2018/10/29 22:47, Markus Gronlund wrote:
>>>>>>>>>>>>>>>> Rotate() and / or stop() is always called as part of
>>>>>>>>>>>>>>>> shutdown, indirectly by the shutdown thread asking the
>>>>>>>>>>>>>>>> recorder thread in the VM.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> LeakProfiler::emit_events() will be called on shutdown if
>>>>>>>>>>>>>>>> the jdk.OldObjectSample event is enabled.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Absence of EventDumpReason implies a normal shutdown.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>>>> Sent: den 29 oktober 2018 11:58
>>>>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This would screw up the logic for the registered
>>>>>>>>>>>>>>>>> Shutdown hook that will run if the VM is also shutting
>>>>>>>>>>>>>>>>> down (which is not implied).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does it mean JfrRecorderService::rotate() will be called
>>>>>>>>>>>>>>>> at JfrEmergencyDump::on_vm_shutdown() ?
>>>>>>>>>>>>>>>> I think EventDumpReason::commit() and
>>>>>>>>>>>>>>>> LeakProfiler::emit_events() should be called when OOME
>>>>>>>>>>>>>>>> occurs even if it would not be treated as emergency dump.
>>>>>>>>>>>>>>>> So we should add them to
>>>>>>>>>>>>>>>> Universe::gen_out_of_memory_error().
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2018/10/29 17:28, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't think so.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Handling OOMEs is very intricate, OOMEs are thread local
>>>>>>>>>>>>>>>>> and it is difficult to get it right.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The suggested patch would do an emergency dump
>>>>>>>>>>>>>>>>> unconditionally on the first reported OOME. This would
>>>>>>>>>>>>>>>>> screw up the logic for the registered Shutdown hook that
>>>>>>>>>>>>>>>>> will run if the VM is also shutting down (which is not
>>>>>>>>>>>>>>>>> implied).
>>>>>>>>>>>>>>>>> But, it is only the first invocation of
>>>>>>>>>>>>>>>>> report_java_out_of_memory() that is happening here and
>>>>>>>>>>>>>>>>> user code can catch OOME's. Other threads might run fine
>>>>>>>>>>>>>>>>> for quite some time and might not even run into OOME's.
>>>>>>>>>>>>>>>>> The shutdown hook registered for dumping JFR recordings
>>>>>>>>>>>>>>>>> on VM Exit is set up to attempt to handle graceful
>>>>>>>>>>>>>>>>> shutdown if possible (no OOME), but if it gets an OOME,
>>>>>>>>>>>>>>>>> it will trigger the OOME emergency dump logic.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Remember that you will need to state you would like a
>>>>>>>>>>>>>>>>> recording dumped out to disk on VM exit for the shutdown
>>>>>>>>>>>>>>>>> hook logic to complete.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This you can do by:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -XX:StartFlightRecording:dumponexit=true
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Or by:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -XX:StartFlightRecording:filename=myrec.jfr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If the Shutdown hook gets an OOME during the exit logic,
>>>>>>>>>>>>>>>>> it will take the emergency path to create a file called
>>>>>>>>>>>>>>>>> hs_oom_<pid>.jfr.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is a current known issue with the fact that OOME's
>>>>>>>>>>>>>>>>> are pre-allocated so they don't turn up in the
>>>>>>>>>>>>>>>>> recordings as Errors (because they are pre-allocated
>>>>>>>>>>>>>>>>> before JFR starts). We might want to add something to
>>>>>>>>>>>>>>>>> Universe::gen_out_of_memory_error() to report this in
>>>>>>>>>>>>>>>>> some way.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>> Sent: den 25 oktober 2018 12:58
>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>> Subject: Emergency JFR dump at OOME
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> According to [1], I guess JFR dumps flight record to
>>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>> But current JFR don't do so.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Should we fix it as below?
>>>>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>>>> diff -r 003c062e16ea
>>>>>>>>>>>>>>>>> src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp Wed Oct 24
>>>>>>>>>>>>>>>>> 21:17:30 2018 -0700
>>>>>>>>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp Thu Oct 25
>>>>>>>>>>>>>>>>> 19:56:54 2018 +0900
>>>>>>>>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>>>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>>>>>>>>>>>> #include "utilities/macros.hpp"
>>>>>>>>>>>>>>>>> #include "utilities/vmError.hpp"
>>>>>>>>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> @@ -321,6 +324,8 @@
>>>>>>>>>>>>>>>>> fatal("OutOfMemory encountered: %s",
>>>>>>>>>>>>>>>>> message);
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> + JFR_ONLY(Jfr::on_vm_shutdown(false);)
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> if (ExitOnOutOfMemoryError) {
>>>>>>>>>>>>>>>>> tty->print_cr("Terminating due to
>>>>>>>>>>>>>>>>> java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>>>>>>>>> os::exit(3);
>>>>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I will file it to JBS and will send review request if it
>>>>>>>>>>>>>>>>> is verified.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://hg.openjdk.java.net/jdk/jdk/file/003c062e16ea/src/hotspot/s
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> hare/jfr/recorder/repository/jfrEmergencyDump.cpp#l159
>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>
More information about the hotspot-jfr-dev
mailing list