RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48]
Man Cao
manc at openjdk.org
Fri Dec 1 21:04:51 UTC 2023
On Fri, 1 Dec 2023 10:21:31 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:
>> Right, see @simonis 's comments at https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912.
>>
>> I initially had similar thought that `gc_total` isn't necessary and provides redundant data. Now I agree with @simonis that the `gc_total` is valuable to users. It saves users from extra work of aggregating different sets of counters for different garbage collectors, and potential mistakes of missing some counters. It is also future-proof that if GC implementation changes that add additional threads, users wouldn't need to change their code to add the counter for additional threads.
>>
>> I think the maintenance overhead is quite small for `gc_total` since it is mostly in this class. The benefit to users is worth it.
>
> I agree that the counter is valuable if always up-to-date, but if it is out of sync compared to the "concurrent counters" I think it will confuse some users. So if we want to keep it I think we should try to keep it in sync.
>
> I suppose adding a lock for updating `gc_total` should be ok. In the safepoint case we should never contend on that lock and for the concurrent updates it should not be a big deal. Basically what I think would be to change `update_counter(...)` to do something like:
>
> if (CPUTimeGroups::is_gc_counter(name)) {
> <lock gc_total_lock>
> instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(net_cpu_time);
> }
>
>
> This way we would also be able to remove the publish part above, right? Any other problems with this approach?
I think the ideal approach to simplify this is to support Atomic operation on a `PerfCounter`.
We could either introduce a `PerfAtomicCounter`/`PerfAtomicLongCounter` class, or perform `Atomic::add()` on the `PerfData::_valuep` pointer. There's already `PerfData::get_address()`, so we might be able to do:
Atomic::add((volatile jlong *)(instance->get_counter(CPUTimeGroups::CPUTimeType::gc_total)->get_address()), net_cpu_time);
However, a new class `PerfAtomicCounter` is likely cleaner. E.g., we may also want to make `PerfAtomicCounter::sample()` use a CAS. It is probably better to introduce `PerfAtomicCounter` in a separate RFE later.
Would the `Atomic::add()` with `PerfData::get_address()` approach be OK for now, or would we rather introduce a lock, or leave the `gc_total` mechanism as-is and address the out-of-sync-ness in a follow-up RFE?
IMO the out-of-sync-ness problem is minor, because users are likely to either look at a single `gc_total` counter, or look at each individual GC CPU counter and disregard `gc_total`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1412569208
More information about the hotspot-gc-dev
mailing list