RFR: 8265836: OperatingSystemImpl.getCpuLoad() returns incorrect CPU load inside a container [v6]

Yasumasa Suenaga ysuenaga at openjdk.java.net
Thu May 27 09:22:09 UTC 2021


On Tue, 25 May 2021 21:46:27 GMT, Hao Tang <github.com+7947546+tanghaoth90 at openjdk.org> wrote:

>> OperatingSystemImpl.getCpuLoad() may return 1.0 in a container, even though the CPU load is obviously below 100%.
>> 
>> We created a 5-core container and run 4 "while (true)" loops in the container. OperatingSystemImpl.getCpuLoad() returned 1.0, which is incorrect (0.8 is correct).
>> "systemLoad" in getCpuLoad() is exactly 4.0 before "systemLoad = Math.min(1.0, systemLoad);". The problem is caused by using the elapsed time (specified by "cpu.cfs_period_us") instead of the total CPU time (specified by "cpu.cfs_quota_us"). Therefore, it is more reasonable to divide cpu usage time by "quotaNanos" instead of "elapsedNanos".
>
> Hao Tang has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Use historical-value-based formula for both cpu-quota-based and cpu-shares-based calculation
>  - rename usageTicks and totalTicks

@tanghaoth90 @jerboaa Thank you for explanation!

> The result might be too coarse/inaccurate, if the time between two calls is too long/short. Any comments for that?

I concerned about that too. Javadoc says "recent cpu usage" about this - it is ambiguous. In other words, our concerns are tolerated IMHO.
Of course it is better if we can get sampling value, but it is difficult now. Maybe we can implement sampling thread, but we need to think some things (e.g. sampling frequency). I'm not sure it is worth to work for it.

I think it is important that `getCpuLoad()` behaves similar in all cases, and also I can't see any problems in your change. So I give +1 to your change ��

-------------

Marked as reviewed by ysuenaga (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3656


More information about the serviceability-dev mailing list