RFR: 8265836: OperatingSystemImpl.getCpuLoad() returns incorrect CPU load inside a container [v6]

Hao Tang github.com+7947546+tanghaoth90 at openjdk.java.net
Wed May 26 06:24:15 UTC 2021


On Fri, 21 May 2021 16:11:49 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>>> @tanghaoth90 My local testing suggests that your fix addresses the issue of CPU quotas set via `--cpu-quota/--cpu-period`. When using `--cpu-shares` the CPU load calculation is wrong since it will (wrongly) report host values. Lets look at them individually, fix the quota and shares case individually (i.e. when not both are set). Once that's done, quota will be preferred in the OperatingSystemMXBean impl, which is reasonable. I don't think we need to account for the shares-preferred-over-quota at this point since that changed in HotSpot in JDK 11 time-frame (JDK-8197589) and OperatingSystemMXBean has only been made container aware in JDK 14 (yes, it got backported, but still).
>> 
>> @jerboaa I have updated my fix by inserting a new branch for `--cpu-shares`. By setting up containers with different restrictions based on your suggestion, I noticed two problems for the CPU load calculation. Take [TestLoad.java](https://bugs.openjdk.java.net/secure/attachment/94530/TestLoad.java) as the example.
>> 
>> 1. `--cpu-quota=700000 --cpu-period=100000` as the restriction: the result is getting close to `6/7` slowly as time goes by, which indicates that the result of 
>> 
>> long usageNanos = containerMetrics.getCpuUsage();
>> 
>> is an accumulated CPU usage rather than a real-time CPU usage.
>> According to the javadoc, `getCpuLoad()` _returns the "recent cpu usage"_. The current fix obviously does not address this issue.
>> 
>> 2. `--cpu-shares=2048` as the restriction: the "cpu-share" branch only returns `-1` since `containerMetrics.getCpuNumPeriods()` returns `0` (`nr_periods` of `cpu.stat` stays `0` when only `--cpu-shares` is set). By now, I have no idea to compute the elapsed time or the total available CPU time with the help of the methods of `jdk.internal.platform.CgroupMetrics`.
>> 
>> Any suggestion is appreciated. @jerboaa @argha-c
>
> @tanghaoth90
> 
> As to your observation of 1). Yes, that's true and I'm not sure it's worth changing. We could change the javadoc.
> 
> As for 2) I've created a PR for your branch with a suggested implementation for cpu shares based accounting. It uses a similar heuristics than the getCpuLoad0() uses here:
> https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L278
> 
> That's the best I could think of. It's certainly more accurate than the current implementation.

@jerboaa Just read your comment about my observation (due to the weird format of github pull request). Sorry for making further [change](https://github.com/openjdk/jdk/pull/3656/commits/eba3bc10b4ebd) for the `quota` branch without asking for your suggestion. For now, I still hold a different opinion about the `quota` branch. My local test shows that

podman run --cpu-shares=4096 <image>
podman run --cpu-quota=400000 --cpu-period=100000 <image>

yields the same result after the change, otherwise the second one still prints the average cpu load.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3656


More information about the serviceability-dev mailing list