RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32]
Man Cao
manc at openjdk.org
Thu Oct 26 20:36:40 UTC 2023
On Thu, 26 Oct 2023 14:31:23 GMT, Volker Simonis <simonis at openjdk.org> wrote:
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 138:
>>
>>> 136: _vtime_accum = (os::elapsedVTime() - _vtime_start);
>>> 137:
>>> 138: cm()->update_concurrent_mark_threads_cpu_time();
>>
>> Is there some overlapping btw this and the existing `_vtime_accum`. If so, can they be consolidated somehow?
>>
>> I believe the purpose of calling `update_concurrent_mark_threads_cpu_time` in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value.
>>
>> Finally, is CSR required for this feature?
>
> @albertnetymk, the hsperf counters are a non-public API and the new counters have been added to the non-standard `sun.threads.cpu_time` name space which is "[unstable and unsupported](https://github.com/openjdk/jdk/blob/9864951dceb0ddc4479ced04b6d5a2363f1e307d/src/hotspot/share/runtime/perfData.cpp#L56)" so I don't think a CSR is required.
> Is there some overlapping btw this and the existing _vtime_accum. If so, can they be consolidated somehow?
We don't really like `_vtime_accum` for monitoring:
- `os::elapsedVTime()` could silently fall back from CPU time to wall time, according to os_linux.cpp. We'd rather to have true CPU time, or nothing. Mixing up CPU time and wall time is confusing to users.
- `_vtime_accum` only tracks the time consumed by the concurrent marking main thread, but not the concurrent worker threads.
There's a `G1ConcurrentMarkThread::vtime_accum()` that seems to account for concurrent worker threads. It looks only used for logging. It might be possible to replace the existing logging code with the value of the `sun.threads.cpu_time.gc_conc_mark` hsperf counter. However, it is better to do that separately, and I'll create an RFE.
> I believe the purpose of calling update_concurrent_mark_threads_cpu_time in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value.
Yes. The reason is that a concurrent mark cycle could take several minutes for a large heap. For tools like [AHS](https://mail.openjdk.org/pipermail/hotspot-dev/2022-September/064190.html) that reads these CPU hsperf counters, they could read these CPU data every 1 to 5 seconds.
The current mechanism still needs to wait for the whole `G1ConcurrentMark::mark_from_roots()` to complete, which could still take minutes. I can create a separate RFE to make it update more frequently inside `G1CMConcurrentMarkingTask::work()`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1373784494
More information about the hotspot-gc-dev
mailing list