RFR: 8366122: Shenandoah: Implement efficient support for object count after gc events
William Kemper
wkemper at openjdk.org
Wed Sep 3 15:54:23 UTC 2025
On Thu, 28 Aug 2025 01:30:39 GMT, pf0n <duke at openjdk.org> wrote:
> ### Summary
>
> The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of the existing marking phases and records strongly marked objects in a histogram. If the event is disabled, the original marking closures are used. When enabled new mark-and-count closures are used by the worker threads. Each worker thread updates its local histogram as it marks an object. These local histograms are merged at the conclusion of the marking phase under a mutex. The event is emitted outside a safepoint. Because (most) Shenandoah's marking is done concurrently, so is the object counting work.
>
> ### Performance
> The performance test were ran using the Extremem benchmark on a default and stress workload. (will edit this section to include data after average time and test for GenShen)
>
> #### Default workload:
> ObjectCountAfterGC disabled (master branch):
> `[807.216s][info][gc,stats ] Pause Init Mark (G) = 0.003 s (a = 264 us)`
> `[807.216s][info][gc,stats ] Pause Init Mark (N) = 0.001 s (a = 91 us)`
> `[807.216s][info][gc,stats ] Concurrent Mark Roots = 0.041 s (a = 4099 us)`
> `[807.216s][info][gc,stats ] Concurrent Marking = 1.660 s (a = 166035 us)`
> `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s (a = 446 us) `
> `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s (a = 446 us) `
> `[807.216s][info][gc,stats ] Pause Final Mark (N) = 0.004 s (a = 357 us)`
>
> ObjectCountAfterGC disabled (feature branch):
> `[807.104s][info][gc,stats ] Pause Init Mark (G) = 0.003 s (a = 302 us)`
> `[807.104s][info][gc,stats ] Pause Init Mark (N) = 0.001 s (a = 92 us) `
> `[807.104s][info][gc,stats ] Concurrent Mark Roots = 0.048 s (a = 4827 us)`
> `[807.104s][info][gc,stats ] Concurrent Marking = 1.666 s (a = 166638 us) `
> `[807.104s][info][gc,stats ] Pause Final Mark (G) = 0.006 s (a = 603 us)`
> `[807.104s][info][gc,stats ] Pause Final Mark (N) = 0.005 s (a = 516 us)`
>
> ObjectCountAfterGC enabled (feature branch)
> `[807.299s][info][gc,stats ] Pause Init Mark (G) = 0.002 s (a = 227 us)`
> `[807.299s][info][gc,stats ] Pause Init Mark (N) = 0.001 s (a = 89 us) `
> `[807.299s][info][gc,stats ] Concurrent Mark Roots = 0.053 s (a = 5279 us)`
> `[807.299s][info][gc,st...
Left a few superficial suggestions. My big question is what work remains for this event to also work with the generational mode?
src/hotspot/share/gc/shared/gcTrace.hpp line 136:
> 134: void report_gc_reference_stats(const ReferenceProcessorStats& rp) const;
> 135: void report_object_count_after_gc(BoolObjectClosure* object_filter, WorkerThreads* workers) NOT_SERVICES_RETURN;
> 136: // Report object count by not performing a heap inspection. This method will
s/by not/without
src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 179:
> 177: // Use object counting closure if ObjectCount or ObjectCountAfterGC event is enabled.
> 178: const bool object_count_enabled = ObjectCountEventSender::should_send_event();
> 179: if (object_count_enabled && !ShenandoahHeap::heap()->mode()->is_generational()) {
Can the generational mode support this? It can also perform a _global_ collection.
src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 130:
> 128: // Create the KlassInfoTable for Shenandoah only if JFR is enabled.
> 129: #if INCLUDE_JFR
> 130: KlassInfoTable cit(false);
Might consider a `CADR` helper class here (Constructor Acquires, Destructor Releases). There are a lot of lines of code between the two `set_cit` calls (and the second one isn't guarded by `INCLUDE_JFR`).
src/hotspot/share/memory/heapInspection.hpp line 33:
> 31: #include "oops/objArrayOop.hpp"
> 32: #include "oops/oop.hpp"
> 33: #include "runtime/mutex.hpp"
Does this need to be in the header?
src/hotspot/share/runtime/mutexLocker.hpp line 90:
> 88: extern Monitor* Notify_lock; // a lock used to synchronize the start-up of the vm
> 89: extern Mutex* ExceptionCache_lock; // a lock used to synchronize exception cache updates
> 90: extern Mutex* TableMerge_lock; // a lock used to synchronize merging a thread-local table into a global table
Might call this `ObjectCountMerge_lock` and describe its usage as `merging a thread local object count`. There are many `Tables` in the JVM (SymbolTable, InternedStrings, etc.).
test/jdk/jdk/jfr/event/gc/objectcount/TestObjectCountAfterGCEventWithShenandoah.java line 11:
> 9: * & vm.opt.ExplicitGCInvokesConcurrent != true
> 10: * @library /test/lib /test/jdk
> 11: * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:-UseFastUnorderedTimeStamps -XX:+UseShenandoahGC -XX:MarkSweepDeadRatio=0 -XX:-UseCompressedOops -XX:-UseCompressedClassPointers -XX:+IgnoreUnrecognizedVMOptions jdk.jfr.event.gc.objectcount.TestObjectCountAfterGCEventWithShenandoah
Are all of these flags necessary to run this test? Can we pare any unnecessary options?
-------------
Changes requested by wkemper (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/26977#pullrequestreview-3178372422
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317339492
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317344826
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317348710
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317355707
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317358657
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317360739
More information about the hotspot-dev
mailing list