RFR: 8368015: Shenandoah: fix error in computation of average allocation rate [v5]

Wed Sep 24 23:03:23 UTC 2025

On Tue, 23 Sep 2025 22:56:01 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> We use bytes_allocated_since_gc_start() to compute allocation rates.  This leaves a blind spot, as our current implementation ignores allocations and the time period between the moment we begin GC and the first time we update the allocation rate following the start of GC.  When this happens, we typically find that the sampled number of allocations is smaller than the allocations that had accumulated by the time we triggered the start of the current GC cycle.
>> 
>> This PR adds tracking for that accounting gap.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More override declarations to satisfy certain compilers

The changes look mostly correct (there's the small issue of ascribing time and allocation precisely to the right interval which might be unavoidable), but I feel this code can be refactored to make it slightly less clunky.

However, probably best to leave that for a follow up.

I left a few suggestions for documentation.

src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.hpp line 40:

> 38:   void allocation_counter_reset();
> 39: 
> 40:   double force_sample(size_t allocated, size_t &unaccounted_bytes_allocated);

Document these two APIs: force_sample() and sample().

A small block documentation comment on ShenandoahAllocationRate() is also a good idea.

src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 244:

> 242:   double elapsed_cycle_time() const;
> 243: 
> 244:   virtual size_t force_alloc_rate_sample(size_t bytes_allocated) {

I realize none of these public API's have any documentation of intent, but might be a good idea to start doing that with this new API you are adding here.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2329:

> 2327:   // unaccounted_bytes is the bytes not accounted for by our forced sample.  If the sample interval is too short,
> 2328:   // the "forced sample" will not happen, and any recently allocated bytes are "unaccounted for".  We pretend these
> 2329:   // bytes are allocated after the start of subsequent gc.

I think these comments should be made slightly more abstract and moved to the spec of `reset` and of the `force` methods. These almost feel like a bit of the specifics of the sampling and the "residue" computation leaking through into objects that it shouldn't leak into. So a bit of thought on how to organize this best by pushing everything to one place would work better from the standpoint of code maintainability and robustness.

As I understand the issue is that we use two separate APIs with no locking to read the allocated bytes and to reset these bytes, all while these are being concurrently incremented. So you want to have the reset method detect the residue allocation between the two points, and ascribe it to the rate computation to be used in the next sample. I feel there may be a simpler way to do this by pushing all of these mechanics & details down into the sampling method which has direct access to the counter that it can manipulate for computing rates and upon request for resetting.

-------------

Marked as reviewed by ysr (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27398#pullrequestreview-3264863343
PR Review Comment: https://git.openjdk.org/jdk/pull/27398#discussion_r2377176198
PR Review Comment: https://git.openjdk.org/jdk/pull/27398#discussion_r2377172069
PR Review Comment: https://git.openjdk.org/jdk/pull/27398#discussion_r2377256204