RFR: 8265836: OperatingSystemImpl.getCpuLoad() returns incorrect CPU load inside a container [v6]

Yasumasa Suenaga ysuenaga at openjdk.java.net
Thu May 27 00:50:15 UTC 2021


On Tue, 25 May 2021 21:46:27 GMT, Hao Tang <github.com+7947546+tanghaoth90 at openjdk.org> wrote:

>> OperatingSystemImpl.getCpuLoad() may return 1.0 in a container, even though the CPU load is obviously below 100%.
>> 
>> We created a 5-core container and run 4 "while (true)" loops in the container. OperatingSystemImpl.getCpuLoad() returned 1.0, which is incorrect (0.8 is correct).
>> "systemLoad" in getCpuLoad() is exactly 4.0 before "systemLoad = Math.min(1.0, systemLoad);". The problem is caused by using the elapsed time (specified by "cpu.cfs_period_us") instead of the total CPU time (specified by "cpu.cfs_quota_us"). Therefore, it is more reasonable to divide cpu usage time by "quotaNanos" instead of "elapsedNanos".
>
> Hao Tang has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Use historical-value-based formula for both cpu-quota-based and cpu-shares-based calculation
>  - rename usageTicks and totalTicks

I haven't followed yet all of discussions in this review, but I concern this PR changes the meaning of `getCpuLoad()`.

`getCpuLoad()` has been based on total time since the start of the container, but after this PR, it is based on the ticks in earlier call. Is it ok?  
IMHO it can be accepted because it is the same with load average on Linux, but I concern we may need CSR because this PR changes behavior.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3656


More information about the serviceability-dev mailing list