RFR: 8268098: jdk.CPULoad event reports incorrect CPU usage inside a container
Severin Gehwolf
sgehwolf at openjdk.java.net
Mon Jun 7 13:58:28 UTC 2021
On Wed, 2 Jun 2021 06:59:53 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
> jdk.CPULoad event reports CPU usage.
> If JVM runs in a container which is limited CPU resource (quota, shares, cpu), it is incorrect.
>
> When I ran following program inside a container with `--cpuset-cpu=0,1`, I expected jdk.CPULoad event reports 50%, however it reported 25% because container host has 4 CPUs.
>
>
> public class InfiniteLoop{
> public static void main(String[] args){
> while(true){
> }
> }
> }
>
>
> jdk.CPULoad event uses the result from `get_cpu_load()` in os_perf_linux.cpp, but it does not consider cgroups.
I'm concerned this implementation isn't correct.
> CPULoad in JMX returns container CPU usage if JVM runs on the container.
> If JFR distinguishes ContainerCPUUsage and CPULoad, it is odd a bit. It seems to be inconsistent.
My understanding was that for JFR it's interesting to "see" both kinds of events. Host CPU load and what's reported for the container. Then do some analysis across them.
> And also this problem affects CPU time in thread dump as I said before. It is not JFR event.
OK, good point.
src/hotspot/os/linux/os_perf_linux.cpp line 404:
> 402: }
> 403: *pkernelLoad *= scale;
> 404: user_load *= scale;
If I read this correctly the cpu load would only be scaled to the number of configured container limits with no bearing of the actual cpu usage of processes within the container. For example: Let process A be running in container A' with X/2 cpus and process B running in container B' with X/2 cpus. X are the total number of cpus available on a host. Then A is using 100% cpu (i.e. `1.0`) and B is being idle (i.e. close to `0.0`). With this patch I'd get a CPU load report of about `1.0` for each container, A' and B'. Am I missing something?
On the JDK side we have `Metrics.getCpuUsage()` API which isn't present in HotSpot code. I believe we'd have to have a similar API in OSContainer in order to implement this correctly. Thoughts?
-------------
Changes requested by sgehwolf (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/4299
More information about the hotspot-runtime-dev
mailing list