RFR: 8268098: jdk.CPULoad event reports incorrect CPU usage inside a container

Severin Gehwolf sgehwolf at openjdk.java.net
Tue Jun 8 08:06:19 UTC 2021


On Tue, 8 Jun 2021 04:41:03 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:

>> src/hotspot/os/linux/os_perf_linux.cpp line 404:
>> 
>>> 402:     }
>>> 403:     *pkernelLoad *= scale;
>>> 404:     user_load *= scale;
>> 
>> If I read this correctly the cpu load would only be scaled to the number of configured container limits with no bearing of the actual cpu usage of processes within the container. For example: Let process A be running in container A' with X/2 cpus and process B running in container B' with X/2 cpus. X are the total number of cpus available on a host. Then A is using 100% cpu (i.e. `1.0`) and B is being idle (i.e. close to `0.0`). With this patch I'd get a CPU load report of about `1.0` for each container, A' and B'. Am I missing something?
>> 
>> On the JDK side we have `Metrics.getCpuUsage()` API which isn't present in HotSpot code. I believe we'd have to have a similar API in OSContainer in order to implement this correctly. Thoughts?
>
> Further investigation, I think need to read cpuacct.usage in cgroups v1, or cpu.stat in cgroups v2 when `get_cpu_load()` is called with `CPU_LOAD_GLOBAL` - @jerboaa did you point it out?
> 
> Of course I can add it to this PR, but if we work for it, I think we should consolidate implementation - but it may not happen until RDP 1 (and it might be big change).
> 
> Hence I will withdraw this PR and will close JBS, is it ok?

@YaSuenag Yes, that's what `Metrics.getCpuUsage()` does under the hood. I'd suggest to first add the `get_cpu_usage()` API to `OSContainer` and then once it's there use it here. Or maybe consolidate things and you'll get away with using the Java API. Either way I'm fine with waiting/withdrawing.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4299


More information about the hotspot-runtime-dev mailing list