RFR: 8359348: G1: Improve cpu usage measurements for heap sizing [v3]

Thomas Schatzl tschatzl at openjdk.org
Thu Jul 31 08:53:56 UTC 2025


On Wed, 30 Jul 2025 20:48:25 GMT, Man Cao <manc at openjdk.org> wrote:

>>> As mentioned in the other comment, concurrent refinement threads typically consume very little CPU compared to concurrent mark workers, and most refinement threads are inactive most of the time. I think bloating up the divisor up to ConcRefinementThreads will undercount the wall-clock time "lost" by mutator threads due to concurrent GC.
>> 
>> What if there is significant concurrent refinement activity? That would unnecessarily expand the heap.
>> 
>> Running one of my go-to benchmarks (some "object storage"; you would not want to run it at that gc cpu usage level, i.e. 55% ;) ) gives the following thread times:
>> 
>> sun.threads.cpu_time.gc_conc_mark=       187537964718
>> sun.threads.cpu_time.gc_conc_refine=    1306781191282
>> sun.threads.cpu_time.gc_parallel_workers=561790592662
>> sun.threads.cpu_time.gc_service=            125919008
>> 
>> Not a very realistic scenario, still I would prefer to not discount refinement costs...
>> 
>> Maybe somehow separate marking and refinment cpu activity and merge again?
>> 
>>> For the other proposed approach using (concurrent-cpu-usage + gc-pause-cpu-usage) as dividend, it is much bigger change of behavior from the pause-time-based approach. I suspect it will suffer from the problem of fluctuating mutator CPU usage (https://bugs.openjdk.org/browse/JDK-8359348?focusedId=14799278), due to using total-cpu-usage-during-mutator in divisor. Perhaps it is better to experiment separately from this PR?
>> 
>> It is a combination of both: for pauses use the existing approach, for mutator time it uses cpu usage.
>> 
>> However I agree that probably it suffers from the same issue, weighing cpu usage too much if the mutator is mostly idle, causing overexpansion.
>
>> Not a very realistic scenario, still I would prefer to not discount refinement costs...
>> Maybe somehow separate marking and refinment cpu activity and merge again?
> 
> Fair enough. Perhaps we need to implement a metric that estimates how many refinement threads are active on average, between two pauses? So we don't overcount or undercount refinement cost.

Another idea is, instead of using application cpu usage in the divisor (i.e. from the OS), calculate some `mutator-cpu-usage` similar to `gc-pause-cpu-usage` by multiplying time spent in mutator with the active processors.

I.e. the resulting formula being:

  gc-time-ratio_new = (concurrent-cpu-usage + gc-pause-cpu-usage) / (mutator-cpu-usage + gc-pause-cpu-usage)
``` 
where

  mutator-cpu-usage = #active processors * mutator-duration

effectively making it:

  gc-time-ratio = (concurrent-cpu-usage + gc-pause-cpu-usage) / (#active processors * time-since-last-pause)

(`gc-pause-cpu-usage` calculated as before).

Which is very similar to the current approach to divide `concurrent-cpu-usage` by some arbitrary threading factor, without needing to know that factor, and incurring inaccuracies because of that.

Compare this formula to the current formula for determing `GCTimeRatio`:

  gc-time-ratio_old = gc-pause-time / time-since-last-pause
                    // multiply with #active-processors on both divisor and dividend, i.e. multiplying by 1
                    = (#active processors * gc-pause-time) / (#active-processors * time-since-last-pause) 


So just adding the known concurrent cpu usage to the dividend seems... straightforward, and no more susceptible to issues than before with idle mutators.

There is dependency on the number of active processors being "constant", but even the old formula uses it implicitly (i.e. the duration of the mutator and the pause is somewhat dependent on the number of active processors anyway due to allocation rate depending on it).

What do you all think? This approximation seems to be no worse than the current and the formula suggested in this change to me. Somebody modifying the number of active processors at runtime for the VM, or something like burstable VMs are already very problematic.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26351#discussion_r2244763473


More information about the hotspot-gc-dev mailing list