RFR: 8265836: OperatingSystemImpl.getCpuLoad() returns incorrect CPU load
Severin Gehwolf
sgehwolf at openjdk.java.net
Thu May 6 13:54:54 UTC 2021
On Thu, 6 May 2021 12:36:09 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
>>> Thanks for linking that. It sounds reasonable to me to prefer `quota` in that case.
>>
>> Yes, flag `PreferContainerQuotaForCPUCount` is [true by default](https://github.com/openjdk/jdk/blob/739769c8/src/hotspot/os/linux/globals_linux.hpp#L62). Therefore, [my current implementation](https://github.com/openjdk/jdk/pull/3656/commits/b052b624c84) might be a minimal implementation.
>>
>> We can also cover the case where `PreferContainerQuotaForCPUCount` is false. There are two different ways.
>> 1. To access the value of `PreferContainerQuotaForCPUCount`, so that we can decide if we should use `quota` or (`quota` & `share`);
>> 2. Reuse `CgroupSubsystem::active_processor_count`. However, the function returns an integer. It is more reasonable to use a floating number.
>>
>> Looking forward to your suggestion.
>
>> We happened to hit an exactly similar problem when running on a container with openjdk15.
>>
>> Given we effectively agree that the problem is `elapsedNanos` doesn't accurately reflect the cpu time allocated across all shares vs a single share, my proposal was to use `getCpuShares` as a multiplier for `periodLength` above.
>> Is there a good reason `getCpuQuota` is a better alternative?
>
> @argha-c The proposed fix is within the `quota > 0` condition. I.e. this is code only run when CPU quotas, *not* shares are in effect. In docker/podman speach these are `--cpu-quota=...` and `--cpu-period=....` switches. So no, in this case it wouldn't make sense to use cpu shares info in a branch which looks at cpu quotas ;-)
> Hi Argha, thanks a lot for your suggestion. I think both "quota" and "share" are worth considering.
@tanghaoth90 My local testing suggests that your fix addresses the issue of CPU quotas set via `--cpu-quota/--cpu-period`. When using `--cpu-shares` the CPU load calculation is wrong since it will (wrongly) report host values. Lets look at them individually, fix the quota and shares case individually (i.e. when not both are set). Once that's done, quota will be preferred in the OperatingSystemMXBean impl, which is reasonable. I don't think we need to account for the shares-preferred-over-quota at this point since that changed in HotSpot in JDK 11 time-frame (JDK-8197589) and OperatingSystemMXBean has only been made container aware in JDK 14 (yes, it got backported, but still).
-------------
PR: https://git.openjdk.java.net/jdk/pull/3656
More information about the serviceability-dev
mailing list