RFR: 8268098: jdk.CPULoad event reports incorrect CPU usage inside a container

Yasumasa Suenaga ysuenaga at openjdk.java.net
Mon Jun 7 15:40:17 UTC 2021


On Mon, 7 Jun 2021 13:44:29 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> jdk.CPULoad event reports CPU usage.
>> If JVM runs in a container which is limited CPU resource (quota, shares, cpu), it is incorrect.
>> 
>> When I ran following program inside a container with `--cpuset-cpu=0,1`, I expected jdk.CPULoad event reports 50%, however it reported 25% because container host has 4 CPUs.
>> 
>> 
>> public class InfiniteLoop{
>>   public static void main(String[] args){
>>     while(true){
>>     }
>>   }
>> }
>> 
>> 
>> jdk.CPULoad event uses the result from `get_cpu_load()` in os_perf_linux.cpp, but it does not consider cgroups.
>
> src/hotspot/os/linux/os_perf_linux.cpp line 404:
> 
>> 402:     }
>> 403:     *pkernelLoad *= scale;
>> 404:     user_load *= scale;
> 
> If I read this correctly the cpu load would only be scaled to the number of configured container limits with no bearing of the actual cpu usage of processes within the container. For example: Let process A be running in container A' with X/2 cpus and process B running in container B' with X/2 cpus. X are the total number of cpus available on a host. Then A is using 100% cpu (i.e. `1.0`) and B is being idle (i.e. close to `0.0`). With this patch I'd get a CPU load report of about `1.0` for each container, A' and B'. Am I missing something?
> 
> On the JDK side we have `Metrics.getCpuUsage()` API which isn't present in HotSpot code. I believe we'd have to have a similar API in OSContainer in order to implement this correctly. Thoughts?

My proposal shows correctly both `jvmUser` and `jvmSystem` in your case, however `machineTotal` is incorrect (100%). I need to tweak it.

> On the JDK side we have `Metrics.getCpuUsage()` API which isn't present in HotSpot code. I believe we'd have to have a similar API in OSContainer in order to implement this correctly. Thoughts?

It is better to do so, but I thought it is much simpler if we can handle CPU quotas correctly.
I understand it is better if it is same logic between JDK and HotSpot, but I thought we can work for it in future RFE. Isn't it?

-------------

PR: https://git.openjdk.java.net/jdk/pull/4299


More information about the hotspot-runtime-dev mailing list