RFR: 8366122: Shenandoah: Implement efficient support for object count after gc events

Y. Srinivas Ramakrishna ysr at openjdk.org
Wed Sep 3 15:54:24 UTC 2025


On Thu, 28 Aug 2025 01:30:39 GMT, pf0n <duke at openjdk.org> wrote:

> ### Summary
> 
> The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of the existing marking phases and records strongly marked objects in a histogram. If the event is disabled, the original marking closures are used. When enabled new mark-and-count closures are used by the worker threads. Each worker thread updates its local histogram as it marks an object. These local histograms are merged at the conclusion of the marking phase under a mutex. The event is emitted outside a safepoint. Because (most) Shenandoah's marking is done concurrently, so is the object counting work.
> 
> ### Performance
> The performance test were ran using the Extremem benchmark on a default and stress workload. (will edit this section to include data after average time and test for GenShen)
> 
> #### Default workload:
> ObjectCountAfterGC disabled (master branch):
> `[807.216s][info][gc,stats    ] Pause Init Mark (G)            =    0.003 s (a =      264 us)`
> `[807.216s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s (a =       91 us)`
> `[807.216s][info][gc,stats    ] Concurrent Mark Roots          =    0.041 s (a =     4099 us)`
> `[807.216s][info][gc,stats    ] Concurrent Marking             =    1.660 s (a =   166035 us)`
> `[807.216s][info][gc,stats    ] Pause Final Mark (G)           =    0.004 s (a =      446 us) `
> `[807.216s][info][gc,stats    ] Pause Final Mark (G)           =    0.004 s (a =      446 us) `
> `[807.216s][info][gc,stats    ] Pause Final Mark (N)           =    0.004 s (a =      357 us)`
> 
> ObjectCountAfterGC disabled (feature branch):
> `[807.104s][info][gc,stats    ] Pause Init Mark (G)            =    0.003 s (a =      302 us)`
> `[807.104s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s (a =       92 us) `
> `[807.104s][info][gc,stats    ] Concurrent Mark Roots          =    0.048 s (a =     4827 us)`
> `[807.104s][info][gc,stats    ] Concurrent Marking             =    1.666 s (a =   166638 us) `
> `[807.104s][info][gc,stats    ] Pause Final Mark (G)           =    0.006 s (a =      603 us)`
> `[807.104s][info][gc,stats    ] Pause Final Mark (N)           =    0.005 s (a =      516 us)`
> 
> ObjectCountAfterGC enabled (feature branch)
> `[807.299s][info][gc,stats    ] Pause Init Mark (G)            =    0.002 s (a =      227 us)`
> `[807.299s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s (a =       89 us) `
> `[807.299s][info][gc,stats    ] Concurrent Mark Roots          =    0.053 s (a =     5279 us)`
> `[807.299s][info][gc,st...

@pf0n : 

1. For the performance numbers, use the final summaries (in the gc log) of quartile and range of time distributions of the pauses and the phases to better reflect the impact on collections throughout the entire run, rather than for specific spot collections.
2. For completeness in the PR description and to help reviewers, please comment upon whether, when the event is enabled, the counting closures are used also for collections other than concurrent ones and where the events are emitted in those cases.

Thanks!

src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 129:

> 127:     if (gc_requested) {
> 128:       // Create the KlassInfoTable for Shenandoah only if JFR is enabled.
> 129:       #if INCLUDE_JFR

Here and elsewhere:
1. for single line pre-proc directives, consider the use of `JFR_ONLY()`.
2. For multi-line pre-proc directives, unindent all the way to the zeroth column, in keeping with hotspot style precedent, even though newer pre-processors accept indented directives.

src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 232:

> 230:   volatile size_t _committed;
> 231:   shenandoah_padding(1);
> 232:   KlassInfoTable* _cit;

Move the comment from line 257 below to before this place.

src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 256:

> 254:   void set_soft_max_capacity(size_t v);
> 255: 
> 256:   // Create Shenandoah's KlassInfoTable.

Not create, but set?

src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 260:

> 258:   inline void set_cit(KlassInfoTable* cit);
> 259: 
> 260:   // Return Shenandoah's KlassInfoTable.

Put these together and use one comment like:


// Setter & accessor for class histogram

src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 653:

> 651: inline void ShenandoahHeap::set_cit(KlassInfoTable* cit) {
> 652:   assert((_cit == nullptr && cit != nullptr) ||
> 653:     (_cit != nullptr && cit == nullptr), "Initialize once & clear once");

The assert message isn't accurate for your current usage. Suggest splitting into two asserts with a suitable message:

assert(_cit == nullptr || cit == nullptr, "Overwriting an existing histogram");
assert(_cit != nullptr || cit != nullptr, "Already cleared");

src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 658:

> 656: 
> 657: inline KlassInfoTable* ShenandoahHeap::get_cit() {
> 658:   assert(_cit != nullptr, "KlassInfoTable for Shenandoah should be initialized");

Change the message to: "KlassInfoTable is null"
(suggestion is to avoid "initialized" here, since the value is repeatedly toggled from a pointer to stack allocated klass info table to null and back).

Essentially the heap is used as a post-office-box to pass a handle to a stack allocated object to worker threads using the object.

src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 72:

> 70:     mark_loop_work<Closure, GENERATION, CANCELLABLE, STRING_DEDUP>(&cl, ld, w, t, req);
> 71:   } else {
> 72:     #if INCLUDE_JFR

unindent (left-justify to 0th column)

src/hotspot/share/gc/shenandoah/shenandoahMark.hpp line 60:

> 58: public:
> 59:   template<class T, ShenandoahGenerationType GENERATION>
> 60:   static inline bool mark_through_ref(T* p, ShenandoahObjToScanQueue* q, ShenandoahObjToScanQueue* old_q, ShenandoahMarkingContext* const mark_context, bool weak);

document the semantics of the method's return value here and everywhere where a bool return replaces a previously void return.

src/hotspot/share/runtime/mutexLocker.hpp line 90:

> 88: extern Monitor* Notify_lock;                     // a lock used to synchronize the start-up of the vm
> 89: extern Mutex*   ExceptionCache_lock;             // a lock used to synchronize exception cache updates
> 90: extern Mutex*   TableMerge_lock;                 // a lock used to synchronize merging of thread-local KlassInfoTables

nit: `... merging a thread-local into a global table`

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26977#issuecomment-3235400211
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308906242
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308907269
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308907857
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308910727
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308919955
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308923148
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308926296
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308929362
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308937729


More information about the hotspot-dev mailing list