PING: Emergency JFR dump at OOME
Yasumasa Suenaga
yasuenag at gmail.com
Tue Jan 22 14:15:04 UTC 2019
Hi Erik,
On 2019/01/22 22:43, Erik Gahlin wrote:
> There are several problems with the patch, it will not combine settings
> if there are multiple recordings running at the same time and the
> configuration will not show up in JMC. It is also unlikely that someone
> will discover the setting in the .jfc file so it will add little benefit
> for the community.
>
> I think it is better to add support through the existing mechanism
> (dumponexit=true), which can be done when we have proper native support
> for dumping recordings [1].
It would be great!
Should I add dependency to JDK-8196050 JDK-8213435 (this enhancement) on JBS?
or should I close JDK-8213435 as duplicate of JDK-8196050?
Thanks,
Yasumasa
> From a user perspective, it is irrelevant if the recording dumps/exits
> in Java (in the shutdown hook), or in native (due to OOME).
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8196050
>
> Erik
>
>> Hi Yasumasa,
>>
>> I will look into this next week.
>>
>> Thanks
>> Erik
>>
>>> PING: Did you read my email?
>>>
>>> I believe this enhancement helps us to resolve memory issue.
>>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/01/01 13:16, Yasumasa Suenaga wrote:
>>>> Hi,
>>>>
>>>> I want to discuss about this enhancement (JDK-8213435) again.
>>>>
>>>> I uploaded my proposal:
>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8213435/webrev.00/
>>>>
>>>> This change provides new option "emitOnOOME" to *.JFC file.
>>>> If this options set to true, old object sampling events will be
>>>> emitted when OOME occurred.
>>>>
>>>> This option set to false by default. So it is not affect to Epsilon
>>>> user :-)
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2018/11/07 5:29, Erik Gahlin wrote:
>>>>> On 2018-11-06 15:39, Erik Gahlin wrote:
>>>>>> On 2018-11-01 14:26, Yasumasa Suenaga wrote:
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> On 2018/11/01 14:13, Erik Gahlin wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> Thanks for looking into this, but I don?t think emitting the Old
>>>>>>>> Object events is the right thing to do unless we produce a JFR
>>>>>>>> file at the same time.
>>>>>>>>
>>>>>>>> If anything should be added, it should be when the JVM exits.
>>>>>>>>
>>>>>>>> if (ExitOnOutOfMemoryError) { "
>>>>>>>> tty->print_cr("Terminating due to
>>>>>>>> java.lang.OutOfMemoryError: %s", message);
>>>>>>>> + perhaps call emergency dump jfr here?
>>>>>>>> os::exit(3);
>>>>>>>> }
>>>>>>>>
>>>>>>>> That said, thinking this over, I?m not even sure this is a good
>>>>>>>> idea.
>>>>>>>>
>>>>>>>> I don?t know how the ExitOnOutOfMemoryError flag is used in
>>>>>>>> production environments. Do people want a .jfr file at that
>>>>>>>> point? For all uses cases? Will it irritate people if they need
>>>>>>>> to clean up jfr files? Will it make them turn off Flight Recorder?
>>>>>>>
>>>>>>> IMHO ExitOnOutOfMemoryError (and CrashOnOutOfMemoryError) should
>>>>>>> be used in production system because any threads might caught
>>>>>>> OOME which causes by another thread.
>>>>>>> For example, when request processor on Tomcat consumes a lot of
>>>>>>> memory, Tomcat acceptor thread might caught OOME. If so, Tomcat
>>>>>>> cannot process any requests in spite of the process is running.
>>>>>> Yes, but do you always want a .jfr if this happens?
>>>>>>
>>>>>> Let's say you are using the Epsilon GC (which sets
>>>>>> ExitOnOutOfMemoryError) and have a script that restarts the JVM
>>>>>> when it exits. Then your hard disk may fill up with .jfr files.
>>>>>>
>>>>>> One idea would be to only do it for recordings that have specified
>>>>>> -XX:StartFlightRecording:dumponexit=true (which is implicitly set
>>>>>> by -XX:StartFlightRecording:filename=<filename>) That way, it
>>>>>> would be opt in behavior.
>>>>>>
>>>>>> This is non-trivial to implement since we can't call Java and
>>>>>> allocate objects when we are out of memory. One could perhaps make
>>>>>> an up call to Java and there take a previously prepared String
>>>>>> representation of the destination path and push that reference
>>>>>> back into native, which can then later be inspected. A filename
>>>>>> must also be generated if a user hasn't specified one, which is
>>>>>> trickier do to in native. Files in the disk repository should also
>>>>>> be removed when the JVM exits.
>>>>>>
>>>>> I filed an enhancement request where this could discussed further.
>>>>> https://bugs.openjdk.java.net/browse/JDK-8213435
>>>>>
>>>>> Erik
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Emergency (native) dumps should probably be reserved for cases
>>>>>>>> when the JVM crashes, similar to the hs_err file.
>>>>>>>
>>>>>>> I thought JfrEmergencyDump::on_vm_shutdown() should handle all of
>>>>>>> OOME, but It seems not to be the case.
>>>>>>>
>>>>>>>
>>>>>>>> I?m reluctant to add a specific flag for the Old Object event
>>>>>>>> for the following reasons:
>>>>>>>>
>>>>>>>> 1) Very few people would find out about the flag, so it would
>>>>>>>> add little value in practise. Focus should be to build a product
>>>>>>>> that works well out of the box and not tie the implementation to
>>>>>>>> a flag that must be respected for the next decade or so.
>>>>>>>>
>>>>>>>> 2) The feature is new and there is not a good tool for
>>>>>>>> visualising old object samples. Once it exists, it?e easier to
>>>>>>>> see if emitting events and dumping a recording at this stage
>>>>>>>> would help users in real world scenarios.
>>>>>>>>
>>>>>>>> 3) The way configuration issues have been handled historically
>>>>>>>> is using a .jfc file. For instance, one could imagine something
>>>>>>>> like this:
>>>>>>>>
>>>>>>>> <event name=?jdk.OldObjectSample?>
>>>>>>>> <setting name=?firstOOME?>true</setting>
>>>>>>>> ?
>>>>>>>> </event>
>>>>>>>>
>>>>>>>> or perhaps a dedicated event, for example
>>>>>>>> ?jdk.FirstOOMEObjectSample? (if there really is a use case to
>>>>>>>> emit events at this particular time that is not covered by the
>>>>>>>> Old Object events emitted when the recording ends). There are
>>>>>>>> plans to allow events to be configured on command line, so you
>>>>>>>> would not need to create a new jfc file to make a slight change.
>>>>>>>> That feature would then automatically provide command line
>>>>>>>> capabilities for the Old Object event in the scenario you describe.
>>>>>>>>
>>>>>>>> To me it seems best to wait with the enhancement for now.
>>>>>>>
>>>>>>> I agree with you "jdk.FirstOOMEObjectSample" should be added as
>>>>>>> event setting.
>>>>>>> I don't care it can set whether *.jfc and commandline option.
>>>>>>>
>>>>>>> Anyway, I think it is very useful if we can get old object
>>>>>>> information in flight record when OOME occurs.
>>>>>>> Of course we can get heap dump with HeapDumpOnOutOfMemoryError,
>>>>>>> but it is not contain time-series data like a flight record.
>>>>>>>
>>>>>>>
>>>>>>> I hope this proposal is accepted JFR team.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Erik
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 1 Nov 2018, at 04:15, Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Erik,
>>>>>>>>>
>>>>>>>>>> If a user on the other hand has specified a flag that the JVM
>>>>>>>>>> should exit on OOME, then it makes sense to dump a recording
>>>>>>>>>> with the old object events and shortest path-to-gc-root.
>>>>>>>>>> That part seems to be missing.
>>>>>>>>>
>>>>>>>>> I tried to add a flag `EmitLeakProfilerEventsOnOOME` to control
>>>>>>>>> it as below.
>>>>>>>>> It works fine on my environment.
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.cpp
>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.cpp Thu Nov 01 02:12:13
>>>>>>>>> 2018 +0100
>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.cpp Thu Nov 01 12:13:17
>>>>>>>>> 2018 +0900
>>>>>>>>> @@ -91,6 +91,10 @@
>>>>>>>>> return
>>>>>>>>> JfrOptionSet::parse_start_flight_recording_option(option,
>>>>>>>>> delimiter);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool
>>>>>>>>> emit_all) {
>>>>>>>>> + LeakProfiler::emit_events(cutoff_ticks, emit_all);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> Thread* Jfr::sampler_thread() {
>>>>>>>>> return JfrThreadSampling::sampler_thread();
>>>>>>>>> }
>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/jfr/jfr.hpp
>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.hpp Thu Nov 01 02:12:13
>>>>>>>>> 2018 +0100
>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.hpp Thu Nov 01 12:13:17
>>>>>>>>> 2018 +0900
>>>>>>>>> @@ -52,6 +52,7 @@
>>>>>>>>> static bool on_flight_recorder_option(const JavaVMOption**
>>>>>>>>> option, char* delimiter);
>>>>>>>>> static bool on_start_flight_recording_option(const
>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>> static void weak_oops_do(BoolObjectClosure* is_alive,
>>>>>>>>> OopClosure* f);
>>>>>>>>> + static void emit_leak_profiler_events(jlong cutoff_ticks,
>>>>>>>>> bool emit_all);
>>>>>>>>> static Thread* sampler_thread();
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/runtime/globals.hpp
>>>>>>>>> --- a/src/hotspot/share/runtime/globals.hpp Thu Nov 01
>>>>>>>>> 02:12:13 2018 +0100
>>>>>>>>> +++ b/src/hotspot/share/runtime/globals.hpp Thu Nov 01
>>>>>>>>> 12:13:17 2018 +0900
>>>>>>>>> @@ -2596,6 +2596,9 @@
>>>>>>>>> JFR_ONLY(product(ccstr, StartFlightRecording,
>>>>>>>>> NULL, \
>>>>>>>>> "Start flight recording with
>>>>>>>>> options")) \
>>>>>>>>> \
>>>>>>>>> + JFR_ONLY(product(bool, EmitLeakProfilerEventsOnOOME,
>>>>>>>>> false, \
>>>>>>>>> + "Emit LeakProfiler events when OutOfMemoryError
>>>>>>>>> occurs")) \
>>>>>>>>> + \
>>>>>>>>> experimental(bool, UseFastUnorderedTimeStamps,
>>>>>>>>> false, \
>>>>>>>>> "Use platform unstable time where supported for
>>>>>>>>> timestamps only")
>>>>>>>>>
>>>>>>>>> diff -r 3a8208766f7b src/hotspot/share/utilities/debug.cpp
>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp Thu Nov 01
>>>>>>>>> 02:12:13 2018 +0100
>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp Thu Nov 01
>>>>>>>>> 12:13:17 2018 +0900
>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>>>> #include "utilities/macros.hpp"
>>>>>>>>> #include "utilities/vmError.hpp"
>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>> +#endif
>>>>>>>>>
>>>>>>>>> #include <stdio.h>
>>>>>>>>>
>>>>>>>>> @@ -306,6 +309,13 @@
>>>>>>>>> // commands multiple times we just do it once when the first
>>>>>>>>> threads reports
>>>>>>>>> // the error.
>>>>>>>>> if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
>>>>>>>>> +
>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>> + if (EmitLeakProfilerEventsOnOOME) {
>>>>>>>>> + Jfr::emit_leak_profiler_events(max_jlong, false);
>>>>>>>>> + }
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>>>> // create heap dump before OnOutOfMemoryError commands are
>>>>>>>>> executed
>>>>>>>>> if (HeapDumpOnOutOfMemoryError) {
>>>>>>>>> tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2018/11/01 8:47, Erik Gahlin wrote:
>>>>>>>>>>> On 31 Oct 2018, at 03:58, Yasumasa Suenaga
>>>>>>>>>>> <yasuenag at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 2018?10?31?(?) 0:35 Markus Gronlund
>>>>>>>>>>> <markus.gronlund at oracle.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> I think I provided you with the wrong settings.
>>>>>>>>>>>>
>>>>>>>>>>>> Please change:
>>>>>>>>>>>>
>>>>>>>>>>>> JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>>>>>>>>>>
>>>>>>>>>>>> To
>>>>>>>>>>>>
>>>>>>>>>>>> JFR_ONLY(Jfr::emit_leak_profiler_events(max_jlong, false);)
>>>>>>>>>>>>
>>>>>>>>>>>> I think this will get you the GC roots as well.
>>>>>>>>>>>
>>>>>>>>>>> Thanks! It works fine.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> We need to think about if / how this should be integrated.
>>>>>>>>>>>> If so, it might be that it needs to be guarded behind some
>>>>>>>>>>>> flag to not always issue a full safepoint, root scanning and
>>>>>>>>>>>> edge traversals on every OOME.
>>>>>>>>>>>
>>>>>>>>>>> Do you mean VM operation should be added for
>>>>>>>>>>> Jfr::emit_leak_profiler_events()?
>>>>>>>>>>> I think it can do so because HeapDumper is called from this
>>>>>>>>>>> function.
>>>>>>>>>> When a recording ends, old object samples are written. They
>>>>>>>>>> have the most up to date information about what is leaking. I
>>>>>>>>>> don?t think we should emit old object events before that
>>>>>>>>>> happens. It will make recordings harder to analyze and
>>>>>>>>>> introduce an intrusive safepoint.
>>>>>>>>>> If a user on the other hand has specified a flag that the JVM
>>>>>>>>>> should exit on OOME, then it makes sense to dump a recording
>>>>>>>>>> with the old object events and shortest path-to-gc-root.
>>>>>>>>>> That part seems to be missing.
>>>>>>>>>> Erik
>>>>>>>>>>>
>>>>>>>>>>> Anyway, I want you to merge this change to JFR. :-)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Markus
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>> Sent: den 30 oktober 2018 14:54
>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks Markus!
>>>>>>>>>>>> It works fine on my environment.
>>>>>>>>>>>>
>>>>>>>>>>>> Could you apply this change to JFR?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> P.S.
>>>>>>>>>>>> I got flight record with path-to-gc-roots=true, however
>>>>>>>>>>>> "GC Root" in JMC
>>>>>>>>>>>> is null. Is it correct?
>>>>>>>>>>>> (Application is OOME.java which I shared before)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2018/10/30 21:01, Markus Gronlund wrote:
>>>>>>>>>>>>> Hi again,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe you can try something like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # HG changeset patch
>>>>>>>>>>>>> # User mgronlun
>>>>>>>>>>>>> # Date 1540900357 -3600
>>>>>>>>>>>>> # Tue Oct 30 12:52:37 2018 +0100
>>>>>>>>>>>>> # Node ID 32a48c323970c5fc4d0d1ffff5860a3c55c4a4dc
>>>>>>>>>>>>> # Parent 80d104390dd2821fd95d56981bf9d37f1cc2e363
>>>>>>>>>>>>> [mq]: yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>> b/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.cpp
>>>>>>>>>>>>> @@ -91,6 +91,10 @@
>>>>>>>>>>>>> return
>>>>>>>>>>>>> JfrOptionSet::parse_start_flight_recording_option(option,
>>>>>>>>>>>>> delimiter);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> +void Jfr::emit_leak_profiler_events(jlong cutoff_ticks, bool
>>>>>>>>>>>>> +emit_all) {
>>>>>>>>>>>>> + LeakProfiler::emit_events(cutoff_ticks, emit_all); }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> Thread* Jfr::sampler_thread() {
>>>>>>>>>>>>> return JfrThreadSampling::sampler_thread();
>>>>>>>>>>>>> }
>>>>>>>>>>>>> diff --git a/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>> b/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>> --- a/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>> +++ b/src/hotspot/share/jfr/jfr.hpp
>>>>>>>>>>>>> @@ -52,6 +52,7 @@
>>>>>>>>>>>>> static bool on_flight_recorder_option(const
>>>>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>>>>> static bool on_start_flight_recording_option(const
>>>>>>>>>>>>> JavaVMOption** option, char* delimiter);
>>>>>>>>>>>>> static void weak_oops_do(BoolObjectClosure* is_alive,
>>>>>>>>>>>>> OopClosure*
>>>>>>>>>>>>> f);
>>>>>>>>>>>>> + static void emit_leak_profiler_events(jlong
>>>>>>>>>>>>> cutoff_ticks, bool
>>>>>>>>>>>>> + emit_all);
>>>>>>>>>>>>> static Thread* sampler_thread();
>>>>>>>>>>>>> };
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>> b/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>>>>>>>> #include "utilities/macros.hpp"
>>>>>>>>>>>>> #include "utilities/vmError.hpp"
>>>>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>
>>>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -306,6 +309,8 @@
>>>>>>>>>>>>> // commands multiple times we just do it once when the
>>>>>>>>>>>>> first threads reports
>>>>>>>>>>>>> // the error.
>>>>>>>>>>>>> if (Atomic::cmpxchg(1, &out_of_memory_reported, 0) == 0) {
>>>>>>>>>>>>> + JFR_ONLY(Jfr::emit_leak_profiler_events(0, true);)
>>>>>>>>>>>>> +
>>>>>>>>>>>>> // create heap dump before OnOutOfMemoryError
>>>>>>>>>>>>> commands are executed
>>>>>>>>>>>>> if (HeapDumpOnOutOfMemoryError) {
>>>>>>>>>>>>> tty->print_cr("java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This should write the contents of the leak profiler at the
>>>>>>>>>>>>> first reported OOME; it will go through the regular chunk
>>>>>>>>>>>>> writing mechanism in order that it do not destroy existing
>>>>>>>>>>>>> dump logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me know if it works for you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>> Sent: den 30 oktober 2018 02:35
>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I confirmed with GDB that Leak Profiler is called by
>>>>>>>>>>>>> shutdown hook.
>>>>>>>>>>>>> I think it is very useful to obtain information when OOME
>>>>>>>>>>>>> occurs because the user might not get heap dump.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I want JFR to call Leak Profiler when OOME occurs before
>>>>>>>>>>>>> being destroyed problematic thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018?10?29?(?) 23:41 Markus Gronlund
>>>>>>>>>>>>> <markus.gronlund at oracle.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think it is called.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Remember that main thread has already exited when the
>>>>>>>>>>>>>> shutdown logic is called, the allocations in your test can
>>>>>>>>>>>>>> already have been removed by the GC at this point (marked
>>>>>>>>>>>>>> as dead in the Leak Profiler).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In general, small tests like these are not representative
>>>>>>>>>>>>>> for the Leak Profiler, because it works by acquiring
>>>>>>>>>>>>>> samples over longer periods of time, and then there is the
>>>>>>>>>>>>>> problem of the main thread could already have exited at
>>>>>>>>>>>>>> the point of dump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please take a look at the tests located in
>>>>>>>>>>>>>> test/jdk/jdk/jfr/event/oldobject for some reference on how
>>>>>>>>>>>>>> you can take more control to increase the chances of
>>>>>>>>>>>>>> getting samples.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>> Sent: den 29 oktober 2018 15:13
>>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I tried to get flight record of OOME sample application as
>>>>>>>>>>>>>> below:
>>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>> import java.util.*;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> public class OOME{
>>>>>>>>>>>>>> public static void main(String[] args){
>>>>>>>>>>>>>> var list = new ArrayList<byte[]>();
>>>>>>>>>>>>>> while(true){
>>>>>>>>>>>>>> list.add(new byte[1024]);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Command:
>>>>>>>>>>>>>> $ /usr/local/jdk-11.0.1/bin/java
>>>>>>>>>>>>>> -XX:StartFlightRecording=filename=oome.jfr,settings=profile -Xmx256m
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> OOME
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I could get flight record into oome.jfr, but JMC did not
>>>>>>>>>>>>>> show any objects on "Live Objects" window.
>>>>>>>>>>>>>> OOME.java will finish immidiatery. It will not invoke any
>>>>>>>>>>>>>> periodic tasks.
>>>>>>>>>>>>>> So I guess LeakProfiler::emit_events() will not be called.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2018/10/29 22:47, Markus Gronlund wrote:
>>>>>>>>>>>>>>> Rotate() and / or stop() is always called as part of
>>>>>>>>>>>>>>> shutdown, indirectly by the shutdown thread asking the
>>>>>>>>>>>>>>> recorder thread in the VM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LeakProfiler::emit_events() will be called on shutdown if
>>>>>>>>>>>>>>> the jdk.OldObjectSample event is enabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Absence of EventDumpReason implies a normal shutdown.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>>> Sent: den 29 oktober 2018 11:58
>>>>>>>>>>>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>> Subject: Re: Emergency JFR dump at OOME
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This would screw up the logic for the registered
>>>>>>>>>>>>>>>> Shutdown hook that will run if the VM is also shutting
>>>>>>>>>>>>>>>> down (which is not implied).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does it mean JfrRecorderService::rotate() will be called
>>>>>>>>>>>>>>> at JfrEmergencyDump::on_vm_shutdown() ?
>>>>>>>>>>>>>>> I think EventDumpReason::commit() and
>>>>>>>>>>>>>>> LeakProfiler::emit_events() should be called when OOME
>>>>>>>>>>>>>>> occurs even if it would not be treated as emergency dump.
>>>>>>>>>>>>>>> So we should add them to
>>>>>>>>>>>>>>> Universe::gen_out_of_memory_error().
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2018/10/29 17:28, Markus Gronlund wrote:
>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't think so.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Handling OOMEs is very intricate, OOMEs are thread local
>>>>>>>>>>>>>>>> and it is difficult to get it right.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The suggested patch would do an emergency dump
>>>>>>>>>>>>>>>> unconditionally on the first reported OOME. This would
>>>>>>>>>>>>>>>> screw up the logic for the registered Shutdown hook that
>>>>>>>>>>>>>>>> will run if the VM is also shutting down (which is not
>>>>>>>>>>>>>>>> implied).
>>>>>>>>>>>>>>>> But, it is only the first invocation of
>>>>>>>>>>>>>>>> report_java_out_of_memory() that is happening here and
>>>>>>>>>>>>>>>> user code can catch OOME's. Other threads might run fine
>>>>>>>>>>>>>>>> for quite some time and might not even run into OOME's.
>>>>>>>>>>>>>>>> The shutdown hook registered for dumping JFR recordings
>>>>>>>>>>>>>>>> on VM Exit is set up to attempt to handle graceful
>>>>>>>>>>>>>>>> shutdown if possible (no OOME), but if it gets an OOME,
>>>>>>>>>>>>>>>> it will trigger the OOME emergency dump logic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Remember that you will need to state you would like a
>>>>>>>>>>>>>>>> recording dumped out to disk on VM exit for the shutdown
>>>>>>>>>>>>>>>> hook logic to complete.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This you can do by:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -XX:StartFlightRecording:dumponexit=true
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Or by:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -XX:StartFlightRecording:filename=myrec.jfr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If the Shutdown hook gets an OOME during the exit logic,
>>>>>>>>>>>>>>>> it will take the emergency path to create a file called
>>>>>>>>>>>>>>>> hs_oom_<pid>.jfr.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is a current known issue with the fact that OOME's
>>>>>>>>>>>>>>>> are pre-allocated so they don't turn up in the
>>>>>>>>>>>>>>>> recordings as Errors (because they are pre-allocated
>>>>>>>>>>>>>>>> before JFR starts). We might want to add something to
>>>>>>>>>>>>>>>> Universe::gen_out_of_memory_error() to report this in
>>>>>>>>>>>>>>>> some way.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>>>>>>>>>>>> Sent: den 25 oktober 2018 12:58
>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>> Subject: Emergency JFR dump at OOME
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> According to [1], I guess JFR dumps flight record to file.
>>>>>>>>>>>>>>>> But current JFR don't do so.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Should we fix it as below?
>>>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>>> diff -r 003c062e16ea src/hotspot/share/utilities/debug.cpp
>>>>>>>>>>>>>>>> --- a/src/hotspot/share/utilities/debug.cpp Wed Oct 24
>>>>>>>>>>>>>>>> 21:17:30 2018 -0700
>>>>>>>>>>>>>>>> +++ b/src/hotspot/share/utilities/debug.cpp Thu Oct 25
>>>>>>>>>>>>>>>> 19:56:54 2018 +0900
>>>>>>>>>>>>>>>> @@ -58,6 +58,9 @@
>>>>>>>>>>>>>>>> #include "utilities/globalDefinitions.hpp"
>>>>>>>>>>>>>>>> #include "utilities/macros.hpp"
>>>>>>>>>>>>>>>> #include "utilities/vmError.hpp"
>>>>>>>>>>>>>>>> +#if INCLUDE_JFR
>>>>>>>>>>>>>>>> +#include "jfr/jfr.hpp"
>>>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> @@ -321,6 +324,8 @@
>>>>>>>>>>>>>>>> fatal("OutOfMemory encountered: %s", message);
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> + JFR_ONLY(Jfr::on_vm_shutdown(false);)
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> if (ExitOnOutOfMemoryError) {
>>>>>>>>>>>>>>>> tty->print_cr("Terminating due to
>>>>>>>>>>>>>>>> java.lang.OutOfMemoryError: %s", message);
>>>>>>>>>>>>>>>> os::exit(3);
>>>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I will file it to JBS and will send review request if it
>>>>>>>>>>>>>>>> is verified.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> https://hg.openjdk.java.net/jdk/jdk/file/003c062e16ea/src/hotspot/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> hare/jfr/recorder/repository/jfrEmergencyDump.cpp#l159
>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>
More information about the hotspot-jfr-dev
mailing list