RFR: 8366122: Shenandoah: Implement efficient support for object count after gc events

Wed Sep 3 15:54:22 UTC 2025

On Thu, 28 Aug 2025 01:30:39 GMT, pf0n <duke at openjdk.org> wrote:

> ### Summary
> 
> The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of the existing marking phases and records strongly marked objects in a histogram. If the event is disabled, the original marking closures are used. When enabled new mark-and-count closures are used by the worker threads. Each worker thread updates its local histogram as it marks an object. These local histograms are merged at the conclusion of the marking phase under a mutex. The event is emitted outside a safepoint. Because (most) Shenandoah's marking is done concurrently, so is the object counting work.
> 
> ### Performance
> The performance test were ran using the Extremem benchmark on a default and stress workload. (will edit this section to include data after average time and test for GenShen)
> 
> #### Default workload:
> ObjectCountAfterGC disabled (master branch):
> `[807.216s][info][gc,stats    ] Pause Init Mark (G)            =    0.003 s (a =      264 us)`
> `[807.216s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s (a =       91 us)`
> `[807.216s][info][gc,stats    ] Concurrent Mark Roots          =    0.041 s (a =     4099 us)`
> `[807.216s][info][gc,stats    ] Concurrent Marking             =    1.660 s (a =   166035 us)`
> `[807.216s][info][gc,stats    ] Pause Final Mark (G)           =    0.004 s (a =      446 us) `
> `[807.216s][info][gc,stats    ] Pause Final Mark (G)           =    0.004 s (a =      446 us) `
> `[807.216s][info][gc,stats    ] Pause Final Mark (N)           =    0.004 s (a =      357 us)`
> 
> ObjectCountAfterGC disabled (feature branch):
> `[807.104s][info][gc,stats    ] Pause Init Mark (G)            =    0.003 s (a =      302 us)`
> `[807.104s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s (a =       92 us) `
> `[807.104s][info][gc,stats    ] Concurrent Mark Roots          =    0.048 s (a =     4827 us)`
> `[807.104s][info][gc,stats    ] Concurrent Marking             =    1.666 s (a =   166638 us) `
> `[807.104s][info][gc,stats    ] Pause Final Mark (G)           =    0.006 s (a =      603 us)`
> `[807.104s][info][gc,stats    ] Pause Final Mark (N)           =    0.005 s (a =      516 us)`
> 
> ObjectCountAfterGC enabled (feature branch)
> `[807.299s][info][gc,stats    ] Pause Init Mark (G)            =    0.002 s (a =      227 us)`
> `[807.299s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s (a =       89 us) `
> `[807.299s][info][gc,stats    ] Concurrent Mark Roots          =    0.053 s (a =     5279 us)`
> `[807.299s][info][gc,st...

I concur with comments by Ramki.

Also, I wonder if you can add to the summary overview a description of how much additional memory is required to enable concurrent object counting.  I believe there is a new thread-local table (how large is this?) for each GC worker thread.  I think service threads do not need this table.  Can you clarify?

The performance numbers quoted in the performance summary above are for Shennadoah satb mode, or for generational mode?  Maybe both should be reported.

src/hotspot/share/gc/shared/gcTrace.inline.hpp line 12:

> 10: 
> 11: // The ObjectCountEventSenderClosure will determine if only the ObjectCount
> 12: // event will be emitted instead of ObjectCountAfterGC. If false, then both

If "what" is false?  This comment is not clear.  Are you speaking of the SeparateEventEmission template parameter?

I think the use of future-tense "will" also makes this comment confusing.  Can you write this in present tense?

-------------

PR Review: https://git.openjdk.org/jdk/pull/26977#pullrequestreview-3166929412
PR Comment: https://git.openjdk.org/jdk/pull/26977#issuecomment-3235270167
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2308805732