[RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization
Ioi Lam
ioi.lam at oracle.com
Thu Feb 3 07:30:46 UTC 2022
Please see the bug report [1] for detailed description and test cases.
I'd like to have some discussion before we can decide what to do.
I discovered this issue when analyzing JDK-8279484 [2]. Under Kubernetes
(minikube), Runtime.availableProcessors() returns 1, despite that the
fact the machine has 32 CPUs, the Kubernetes node has a single
deployment, and no CPU limits were set.
Specifically, I want to understand why the JDK is using
CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the
Java process.
In cgroup, there are other ways that are designed specifically for
limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why is
using cpu_quota() alone not enough? Why did we choose the current
approach of considering both cpu_quota() and cpu_shares()?
My guess is that sometimes people don't limit the actual number of CPUs
per container, but instead use CPU Shares to set the relative scheduling
priority between containers.
I.e., they run "docker run --cpu-shares=1234" without using the "--cpus"
flag.
If this is indeed the reason, I can understand the (good) intention, but
the solution seems awfully insufficient.
CPU Shares is a *relative* number. How much CPU is allocated to you
depends on
- how many other processes are actively running
- what their CPU Shares are
The above information can change dynamically, as other processes may be
added or removed, and they can change between active and idle states.
However, the JVM treats CPU Shares as an *absolute/static* number, and
sets the CPU quota of the current process using this very simplistic
formula.
Value of /sys/fs/cgroup/cpu.shares -> cpu quota:
1023 -> 1 CPU
1024 -> no limit (huh??)
2048 -> 2 CPUs
4096 -> 4 CPUs
This seems just wrong to me. There's no way you can get a "correct"
result without knowing anything about other processes that are running
at the same time.
The net effect is when Java is running under a container, more likely
that not, the JVM will limit itself to a single CPU. This seems really
inefficient to me.
What should we do?
Thanks
- Ioi
[1] https://bugs.openjdk.java.net/browse/JDK-8281181
[2] https://bugs.openjdk.java.net/browse/JDK-8279484
More information about the hotspot-dev
mailing list