[RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization
Ioi Lam
ioi.lam at oracle.com
Tue Feb 8 06:29:52 UTC 2022
On 2022/02/07 10:36, Severin Gehwolf wrote:
> On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote:
>> Case (4) is the cause for the bug in JDK-8279484
>>
>> Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2.
>> This means:
>>
>> - This container is guaranteed a minimum amount of CPU resources
>> - If no other containers are executing, this container can use as
>> much CPU as available on the host
>> - If other containers are executing, the amount of CPU available
>> to this container is (2 / (sum of cpu.shares of all active
>> containers))
>>
>>
>> The fundamental problem with the current JVM implementation is that it
>> treats "CPU request" as a maximum value, the opposite of what Kubernetes
>> does. Because of this, in case (4), the JVM artificially limits itself
>> to a single CPU. This leads to CPU underutilization.
> I agree with your analysis. Key point is that in such a setup
> Kubernetes sets CPU shares value to 2. Though, it's a very specific
> case.
>
> In contrast to Kubernetes the JVM doesn't have insight into what other
> containers are doing (or how they are configured). It would, perhaps,
> be good to know what Kubernetes does for containers when the
> environment (i.e. other containers) changes. Do they get restarted?
> Restarted with different values for cpu shares?
My understanding is that Kubernetes will try to do load balancing and
may migrate the containers. According to this:
https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit
If you change the CPU limits, a currently running container will be shut
down and restarted (using the new limit), and may be relocated to a
different host if necessary.
I think this means that a JVM process doesn't need to worry about the
CPU limit changing during its lifetime :-)
> Either way, what are our options to fix this? Does it need fixing?
>
> * Should we no longer take cpu shares as a means to limit CPU into
> account? It would be a significant change to how previous JDKs
> worked. Maybe that wouldn't be such a bad idea :)
I think we should get rid of it. This feature was designed to work with
Kubernetes, but has no effect in most cases. The only time it takes
effect (when no resource limits are set) it does the opposite of what
the user expects.
Also, the current implementation is really tied to specific behaviors of
Kubernetes + docker (the 1024 and 100 constants). This will cause
problems with other container/orchestration software that use different
algorithms and constants.
> * How likely is CPU underutilization to happen in practise?
> Considering the container is not the only container on the node,
> then according to your formula, it'll get one CPU or less anyway.
> Underutilization would, thus, only happen when it's an idle node
> with no other containers running. That would suggest to do nothing
> and let the user override it as they see fit.
I think under utilization happens when the containers have a bursty
usage pattern. If other containers do not fully utilize their CPU
quotas, we should distribute the unused CPUs to the busy containers.
Thanks
- Ioi
> * Something else I'm missing?
>
> Thanks,
> Severin
>
More information about the hotspot-dev
mailing list