RFR: 8359348: G1: Improve cpu usage measurements for heap sizing [v3]
Man Cao
manc at openjdk.org
Thu Jul 24 21:00:53 UTC 2025
On Thu, 24 Jul 2025 10:42:40 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
>> src/hotspot/share/gc/g1/g1Analytics.cpp line 173:
>>
>>> 171: // activity. We do not account for contention on other shared resources such as memory bandwidth and
>>> 172: // caches, therefore underestimate the impact of the concurrent GC activity on mutator threads.
>>> 173: uint num_cpus = (uint)os::active_processor_count();
>>
>> It is a good idea to convert CPU time to wall-clock time "lost" by mutator threads. However, it could be challenging in some environments. E.g. a background controller process that periodically runs `taskset` to change CPU affinity mask for the Java process, leading to `os::active_processor_count()` returning variable values.
>> Another scenario is that the Java process runs in a CPU-constraint container, but for some reasons has to run with `-XX:-UseContainerSupport`. Then all of `os::active_processor_count()`, `ConcGCThreads`, `ParallelGCThreads` could be much higher than the max # of available CPU cores.
>>
>> Our internal infra has both issues above, and I don't have a good idea to mitigate them in G1. I think the current approach is acceptable: due to the latter `-XX:-UseContainerSupport` issue, it will mostly lead to under-counting concurrent GC CPU time, which is not a show stopper.
>
>> However, it could be challenging in some environments. E.g. a background controller process that periodically runs taskset to change CPU affinity mask for the Java process, leading to os::active_processor_count() returning variable values.
>
> As bad as it sounds, this is already basically an unsupported use case. G1 heuristic will be screwed already (actually lots of ergonomics/predictions do not even very well support wildly changing values returned from ergonomic thread sizing, i.e. self-inflicted thread number changes).
>
> Use the `ActiveProcessorCount` option with an average or so.
>
> I do not think it is the purpose of this change to try to go to great lengths to start supporting this use case. We can consider it though, but it should not prevent progress imo.
>
>> Another scenario is that the Java process runs in a CPU-constraint container, but for some reasons has to run with -XX:-UseContainerSupport. Then all of os::active_processor_count(), ConcGCThreads, ParallelGCThreads could be much higher than the max # of available CPU cores.
>
> We can't do much about this (bad configuration), and is already problematic wrt even to initial thread sizing.
Ack on both points. There is no need to address these issues for this PR. Our infra is basically "too dynamic" that could mess up many heuristics in the JVM. Also agree that `ActiveProcessorCount` is a reasonable workaround in such dynamic infra.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26351#discussion_r2229561943
More information about the hotspot-gc-dev
mailing list