[11u] RFR: 8258414: OldObjectSample events too expensive

Doerr, Martin martin.doerr at sap.com
Thu Apr 22 09:19:13 UTC 2021


Hi,

it did not only break the debug build.

We have seen a crash on Windows:
#  Internal Error (./src/hotspot/share/runtime/safepoint.cpp:894), pid=11412, tid=12932
#  fatal error: Deadlock in safepoint code.  Should have called back to the VM before blocking.

V  [jvm.dll+0x247c64]  report_fatal+0x64  (debug.cpp:268)
V  [jvm.dll+0x65e30b]  SafepointSynchronize::block+0x14b  (safepoint.cpp:893)
V  [jvm.dll+0x101d44]  SafepointMechanism::block_if_requested+0x34  (safepointmechanism.inline.hpp:73)
V  [jvm.dll+0x6f0d9a]  JavaThread::check_safepoint_and_suspend_for_native_trans+0xca  (thread.cpp:2508)
V  [jvm.dll+0x5e3175]  ObjectSampleCheckpoint::on_rotation+0xa5  (objectsamplecheckpoint.cpp:301)
V  [jvm.dll+0x397694]  JfrRecorderService::pre_safepoint_write+0x64  (jfrrecorderservice.cpp:428)
V  [jvm.dll+0x3978a4]  JfrRecorderService::rotate+0x194  (jfrrecorderservice.cpp:325)
V  [jvm.dll+0x398324]  recorderthread_entry+0xd4  (jfrrecorderthreadloop.cpp:76)
V  [jvm.dll+0x6f7649]  JavaThread::run+0x139  (thread.cpp:1840)
V  [jvm.dll+0x6f06b4]  Thread::call_run+0x84  (thread.cpp:391)
V  [jvm.dll+0x5f8e36]  thread_native_entry+0xd6  (os_windows.cpp:460)

Test:
jdk/jfr/startupargs/TestOldObjectQueueSize.java

Best regards,
Martin


> -----Original Message-----
> From: jdk-updates-dev <jdk-updates-dev-retn at openjdk.java.net> On
> Behalf Of Hohensee, Paul
> Sent: Mittwoch, 21. April 2021 19:42
> To: Jaroslav Bachorík <jaroslav.bachorik at datadoghq.com>; Florian David
> <florian.david at datadoghq.com>
> Cc: jdk-updates-dev <jdk-updates-dev at openjdk.java.net>; Marcus Hirt
> <marcus.hirt at datadoghq.com>
> Subject: RE: [11u] RFR: 8258414: OldObjectSample events too expensive
> 
> Hi, Jaroslav.
> 
> This push (https://hg.openjdk.java.net/jdk-updates/jdk11u-
> dev/rev/b103107d7ceb) broke the debug builds, vis
> 
> /home/hohensee/workspaces/jdk11u-
> dev/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleCheckpoint
> .cpp:298:14: error: ‘JfrJavaSupport’ has not been declared
>    DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_native(thread);)
> 
> The fix is
> 
> diff -r b103107d7ceb
> src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleCheckpoint.cpp
> ---
> a/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleCheckpoint.cp
> p  Wed Apr 21 12:24:28 2021 +0200
> +++
> b/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleCheckpoint.c
> pp  Wed Apr 21 17:40:23 2021 +0000
> @@ -24,6 +24,7 @@
> 
>  #include "precompiled.hpp"
>  #include "jfr/jfrEvents.hpp"
> +#include "jfr/jni/jfrJavaSupport.hpp"
>  #include "jfr/leakprofiler/chains/edgeStore.hpp"
>  #include "jfr/leakprofiler/chains/objectSampleMarker.hpp"
>  #include "jfr/leakprofiler/checkpoint/objectSampleCheckpoint.hpp"
> 
> Thanks,
> Paul
> 
> -----Original Message-----
> From: jdk-updates-dev <jdk-updates-dev-retn at openjdk.java.net> on
> behalf of Jaroslav Bachorík <jaroslav.bachorik at datadoghq.com>
> Date: Monday, April 12, 2021 at 1:40 AM
> To: Florian David <florian.david at datadoghq.com>
> Cc: jdk-updates-dev <jdk-updates-dev at openjdk.java.net>, Marcus Hirt
> <marcus.hirt at datadoghq.com>
> Subject: RE: [11u] RFR: 8258414: OldObjectSample events too expensive
> 
> Great, thanks!
> 
> Reviewed!
> 
> -JB-
> 
> On Fri, Apr 9, 2021 at 11:49 PM Florian David
> <florian.david at datadoghq.com> wrote:
> >
> > After receiving feedback from Markus stating that:
> > > There is no need to attempt a lock replacement, because in 11, there is
> no concurrent class unloading. There, unloading only happens at a safepoint.
> > > With concurrent class unloading, there is a need to protect this list, but in
> 11 it is mutually exclusive and cannot be modified concurrently with the JFR
> Recorder thread calling "install_stack_traces(sampler").
> >
> > I removed the module lock and added an is_at_safepoint() assert.
> >
> > Update webrev link:
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8258414
> > webrev: http://cr.openjdk.java.net/~fdavid/8258414/webrev.02/
> > Original PR: https://github.com/openjdk/jdk/pull/2780
> >
> > Florian
> >
> > On Sun, Apr 4, 2021 at 8:33 PM Florian David
> <florian.david at datadoghq.com> wrote:
> >>
> >> Hi Jaroslav,
> >>
> >> Thanks for the review.
> >>
> >>>
> >>> -
> src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleCheckpoint.cpp
> >>>    L287 - IMO, the TODO is not necessary
> >>
> >> It's removed.
> >>
> >>>
> >>>    L293 - I think the comment `// caller needs ResourceMark` should
> >>> not be removed
> >>
> >> I added it back.
> >>
> >>>    L303 - Not sure if using Module_lock instead of
> >>> ClassLoaderDataGraph_lock is correct. Also, this change seems to be
> >>> bringing in changes unrelated to the patch.
> >>>               From what is happening in other places it would seem
> >>> that a safepoint should be asserted instead (or nothing should be
> >>> done).
> >>>               I will let Markus weigh in on this.
> >>
> >> No change for the moment.
> >>
> >>>
> >>> - src/hotspot/share/jfr/support/jfrAllocationTracer.cpp
> >>>    L38 - can this be completely removed?
> >>
> >> It's removed.
> >>
> >>>
> >>> - src/hotspot/share/jfr/support/jfrAllocationTracer.hpp
> >>>   L30 - I think `class JfrThreadLocal;` can also be removed
> >>>
> >> It's removed.
> >>
> >> Update webrev link:
> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8258414
> >> webrev: http://cr.openjdk.java.net/~fdavid/8258414/webrev.01/
> >> Original PR: https://github.com/openjdk/jdk/pull/2780
> >>
> >> Florian
> >>
> >> On Mon, Mar 29, 2021 at 12:22 PM Jaroslav Bachorík
> <jaroslav.bachorik at datadoghq.com> wrote:
> >>>
> >>> Hi Florian,
> >>>
> >>> a few nits:
> >>> -
> src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleCheckpoint.cpp
> >>>    L287 - IMO, the TODO is not necessary
> >>>    L293 - I think the comment `// caller needs ResourceMark` should
> >>> not be removed
> >>>    L303 - Not sure if using Module_lock instead of
> >>> ClassLoaderDataGraph_lock is correct. Also, this change seems to be
> >>> bringing in changes unrelated to the patch.
> >>>               From what is happening in other places it would seem
> >>> that a safepoint should be asserted instead (or nothing should be
> >>> done).
> >>>               I will let Markus weigh in on this.
> >>> - src/hotspot/share/jfr/support/jfrAllocationTracer.cpp
> >>>    L38 - can this be completely removed?
> >>> - src/hotspot/share/jfr/support/jfrAllocationTracer.hpp
> >>>   L30 - I think `class JfrThreadLocal;` can also be removed
> >>>
> >>> Cheers,
> >>>
> >>> -JB-
> >>>
> >>>
> >>> On Wed, Mar 24, 2021 at 8:42 PM Florian David
> >>> <florian.david at datadoghq.com> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Please review this 11u backport. It's very similar to the original patch,
> >>> > except for the class loader data graph lock and
> JfrKlassUnloading::sort()
> >>> > method which are not available in 11u.
> >>> >
> >>> > The missing lock has been replaced by locking the module_lock
> instead,
> >>> > as it is done in jfrcheckpointManager.cpp:363.
> >>> > JfrKlassUnloading class does not exist and thus sort() method is not
> replaced,
> >>> > I added a TODO instead.
> >>> >
> >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8258414
> >>> > webrev: http://cr.openjdk.java.net/~fdavid/8258414/webrev.00/
> >>> > Original PR: https://github.com/openjdk/jdk/pull/2780
> >>> >
> >>> > Testing:
> >>> > - Linux x86_64 tier1 tests
> >>> > - SPECvm 2008 on a 96 cores/192 Gib server. JFR recording size is 75.12
> MB before the patch and 3.08 MB after the patch.
> >>> >
> >>> > Let me know what you think.
> >>> >
> >>> > Thanks,
> >>> > Florian



More information about the jdk-updates-dev mailing list