[RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization

Severin Gehwolf sgehwolf at redhat.com
Thu Feb 3 11:29:46 UTC 2022


Hi Ioi,

On Wed, 2022-02-02 at 23:30 -0800, Ioi Lam wrote:
> Please see the bug report [1] for detailed description and test cases.
> 
> I'd like to have some discussion before we can decide what to do.
> 
> I discovered this issue when analyzing JDK-8279484 [2]. Under Kubernetes 
> (minikube), Runtime.availableProcessors() returns 1, despite that the
> fact the machine has 32 CPUs, the Kubernetes node has a single 
> deployment, and no CPU limits were set.

>From looking at the bug it would be good to know why a cpu.weight value
of 1 is being obverved. The default is 100. I.e. if it is really unset:

$ sudo docker run --rm -v $(pwd)/jdk17:/opt/jdk:z fedora:35 /opt/jdk/bin/java -Xlog:os+container=trace --version
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max
[0.001s][trace][os,container] Raw value for memory limit is: max
[0.001s][trace][os,container] Memory Limit is: Unlimited
[0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max
[0.001s][trace][os,container] Raw value for CPU quota is: max
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight
[0.001s][trace][os,container] Raw value for CPU shares is: 100
[0.001s][debug][os,container] CPU Shares is: -1
[0.001s][trace][os,container] OSContainer::active_processor_count: 4
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4
[0.001s][debug][os,container] container memory limit unlimited: -1, using host value
[0.001s][debug][os,container] container memory limit unlimited: -1, using host value
[0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4
[0.007s][debug][os,container] container memory limit unlimited: -1, using host value
[0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4
[0.022s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max
[0.022s][trace][os,container] Raw value for memory limit is: max
[0.022s][trace][os,container] Memory Limit is: Unlimited
[0.022s][debug][os,container] container memory limit unlimited: -1, using host value
openjdk 17.0.2-internal 2022-01-18
OpenJDK Runtime Environment (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u)
OpenJDK 64-Bit Server VM (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u, mixed mode, sharing)

> Specifically, I want to understand why the JDK is using 
> CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the
> Java process.

TLDR: Kubernetes and/or other container orchestration frameworks? That
was back in the day of cgroups v1, though.

> In cgroup, there are other ways that are designed specifically for 
> limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why is 
> using cpu_quota() alone not enough? Why did we choose the current 
> approach of considering both cpu_quota() and cpu_shares()?

Kubernetes has a concept of "cpu requests" and "cpu limit". It maps (or
mapped?) those values to cpu shares and cpu quota in cgroups.

> My guess is that sometimes people don't limit the actual number of CPUs 
> per container, but instead use CPU Shares to set the relative scheduling 
> priority between containers.
> 
> I.e., they run "docker run --cpu-shares=1234" without using the "--cpus" 
> flag.
> 
> If this is indeed the reason, I can understand the (good) intention, but 
> the solution seems awfully insufficient.
> 
> CPU Shares is a *relative* number. How much CPU is allocated to you 
> depends on
> 
> - how many other processes are actively running
> - what their CPU Shares are
> 
> The above information can change dynamically, as other processes may be 
> added or removed, and they can change between active and idle states.
> 
> However, the JVM treats CPU Shares as an *absolute/static* number, and 
> sets the CPU quota of the current process using this very simplistic 
> formula.
> 
> Value of /sys/fs/cgroup/cpu.shares -> cpu quota:
> 
>      1023 -> 1 CPU
>      1024 -> no limit (huh??)
>      2048 -> 2 CPUs
>      4096 -> 4 CPUs
> 
> This seems just wrong to me. There's no way you can get a "correct" 
> result without knowing anything about other processes that are running 
> at the same time.
> 
> The net effect is when Java is running under a container, more likely
> that not, the JVM will limit itself to a single CPU. This seems really 
> inefficient to me.

I believe the point is that popular container orchestration frameworks
use the cpu requests feature to map to cpu.shares. A similar question
regarding this was asked by myself a while ago. See JDK-8216366.

Here is what Bob Vandette had to say at the time:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036093.html

Thanks,
Severin

> 
> What should we do?
> 
> Thanks
> - Ioi
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8281181
> [2] https://bugs.openjdk.java.net/browse/JDK-8279484
> 



More information about the hotspot-dev mailing list