RFR: 8359348: G1: Improve cpu usage measurements for heap sizing [v3]

Wed Jul 23 07:18:57 UTC 2025

On Thu, 17 Jul 2025 13:16:29 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

>> Hi,
>> 
>> Please review this patch which takes into account the impact of concurrent GC activity on mutator threads when computing the time spent on GC activity in a time interval. Previously, only the GC pause times were considered, and the overhead imposed by concurrent GC worker threads was not included.
>> 
>> With this change, we now estimate the impact of concurrent GC by dividing `elapsed_gc_cpu_time` by the number of CPUs. This provides an approximation of the additional time attributable to GC activity, assuming a fair CPU resource sharing.  Although this method does not account for contention on other shared resources (such as memory bandwidth or caches), it offers a reasonable estimate for most scenarios.
>> 
>> Testing: Tier 1
>
> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits:
> 
>  - cleanup after merge
>  - Merge remote-tracking branch 'upstream/master' into ConcCPUImpact
>  - Merge branch 'NormalizeTiming' into ConcCPUImpact
>  - Thomas suggestion
>  - Thomas Review
>  - reorder
>  - concurrent gc impact
>  - clean out
>  - init

src/hotspot/share/gc/g1/g1Analytics.cpp line 173:

> 171:   // activity. We do not account for contention on other shared resources such as memory bandwidth and
> 172:   // caches, therefore underestimate the impact of the concurrent GC activity on mutator threads.
> 173:   uint num_cpus = (uint)os::active_processor_count();

It is a good idea to convert CPU time to wall-clock time "lost" by mutator threads. However, it could be challenging in some environments. E.g. a background controller process that periodically runs `taskset` to change CPU affinity mask for the Java process, leading to `os::active_processor_count()` returning variable values.
Another scenario is that the Java process runs in a CPU-constraint container, but for some reasons has to run with `-XX:-UseContainerSupport`. Then all of `os::active_processor_count()`, `ConcGCThreads`, `ParallelGCThreads` could be much higher than the max # of available CPU cores.

Our internal infra has both issues above, and I don't have a good idea to mitigate them in G1. I think the current approach is acceptable: due to the latter `-XX:-UseContainerSupport` issue, it will mostly lead to under-counting concurrent GC CPU time, which is not a show stopper.

src/hotspot/share/gc/g1/g1Analytics.cpp line 174:

> 172:   // caches, therefore underestimate the impact of the concurrent GC activity on mutator threads.
> 173:   uint num_cpus = (uint)os::active_processor_count();
> 174:   num_cpus = MIN2(num_cpus, MAX2(ConcGCThreads, ParallelGCThreads));

A minor issue is that `UseDynamicNumberOfGCThreads` may not activate all `ConcGCThreads`. I think it is acceptable the wall-clock time "lost" by mutator threads is a best-effort approximation anyway.

src/hotspot/share/gc/g1/g1Policy.cpp line 678:

> 676:   double cur_gc_cpu_time_ms = _g1h->elapsed_gc_cpu_time() * MILLIUNITS;
> 677: 
> 678:   double concurrent_gc_cpu_time_ms = cur_gc_cpu_time_ms - prev_gc_cpu_pause_end_ms;

`CollectedHeap::elapsed_gc_cpu_time()` includes  `_vmthread_cpu_time` and `string_dedup_cpu_time`. Should we exclude VM threads and StringDedup threads' CPU time from `concurrent_gc_cpu_time_ms`? Otherwise non-GC activities could affect G1's heap sizing decisions.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26351#discussion_r2224597028
PR Review Comment: https://git.openjdk.org/jdk/pull/26351#discussion_r2224629012
PR Review Comment: https://git.openjdk.org/jdk/pull/26351#discussion_r2224541165