[RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization
Harold Seigel
harold.seigel at oracle.com
Fri Feb 4 20:24:42 UTC 2022
Information on how cpu's are calculated can be found here:
https://bugs.openjdk.java.net/browse/JDK-8197867
Harold
On 2/3/2022 2:30 AM, Ioi Lam wrote:
> Please see the bug report [1] for detailed description and test cases.
>
> I'd like to have some discussion before we can decide what to do.
>
> I discovered this issue when analyzing JDK-8279484 [2]. Under
> Kubernetes (minikube), Runtime.availableProcessors() returns 1,
> despite that the fact the machine has 32 CPUs, the Kubernetes node has
> a single deployment, and no CPU limits were set.
>
> Specifically, I want to understand why the JDK is using
> CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the
> Java process.
>
> In cgroup, there are other ways that are designed specifically for
> limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why
> is using cpu_quota() alone not enough? Why did we choose the current
> approach of considering both cpu_quota() and cpu_shares()?
>
> My guess is that sometimes people don't limit the actual number of
> CPUs per container, but instead use CPU Shares to set the relative
> scheduling priority between containers.
>
> I.e., they run "docker run --cpu-shares=1234" without using the
> "--cpus" flag.
>
> If this is indeed the reason, I can understand the (good) intention,
> but the solution seems awfully insufficient.
>
> CPU Shares is a *relative* number. How much CPU is allocated to you
> depends on
>
> - how many other processes are actively running
> - what their CPU Shares are
>
> The above information can change dynamically, as other processes may
> be added or removed, and they can change between active and idle states.
>
> However, the JVM treats CPU Shares as an *absolute/static* number, and
> sets the CPU quota of the current process using this very simplistic
> formula.
>
> Value of /sys/fs/cgroup/cpu.shares -> cpu quota:
>
> 1023 -> 1 CPU
> 1024 -> no limit (huh??)
> 2048 -> 2 CPUs
> 4096 -> 4 CPUs
>
> This seems just wrong to me. There's no way you can get a "correct"
> result without knowing anything about other processes that are running
> at the same time.
>
> The net effect is when Java is running under a container, more likely
> that not, the JVM will limit itself to a single CPU. This seems really
> inefficient to me.
>
> What should we do?
>
> Thanks
> - Ioi
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8281181
> [2] https://bugs.openjdk.java.net/browse/JDK-8279484
More information about the hotspot-dev
mailing list