[RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization
Severin Gehwolf
sgehwolf at redhat.com
Tue Feb 8 11:32:07 UTC 2022
On Mon, 2022-02-07 at 22:29 -0800, Ioi Lam wrote:
> On 2022/02/07 10:36, Severin Gehwolf wrote:
> > On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote:
> > > Case (4) is the cause for the bug in JDK-8279484
> > >
> > > Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2.
> > > This means:
> > >
> > > - This container is guaranteed a minimum amount of CPU resources
> > > - If no other containers are executing, this container can use as
> > > much CPU as available on the host
> > > - If other containers are executing, the amount of CPU available
> > > to this container is (2 / (sum of cpu.shares of all active
> > > containers))
> > >
> > >
> > > The fundamental problem with the current JVM implementation is that it
> > > treats "CPU request" as a maximum value, the opposite of what Kubernetes
> > > does. Because of this, in case (4), the JVM artificially limits itself
> > > to a single CPU. This leads to CPU underutilization.
> > I agree with your analysis. Key point is that in such a setup
> > Kubernetes sets CPU shares value to 2. Though, it's a very specific
> > case.
> >
> > In contrast to Kubernetes the JVM doesn't have insight into what other
> > containers are doing (or how they are configured). It would, perhaps,
> > be good to know what Kubernetes does for containers when the
> > environment (i.e. other containers) changes. Do they get restarted?
> > Restarted with different values for cpu shares?
>
> My understanding is that Kubernetes will try to do load balancing and
> may migrate the containers. According to this:
>
> https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit
>
> If you change the CPU limits, a currently running container will be shut
> down and restarted (using the new limit), and may be relocated to a
> different host if necessary.
>
> I think this means that a JVM process doesn't need to worry about the
> CPU limit changing during its lifetime :-)
> > Either way, what are our options to fix this? Does it need fixing?
> >
> > * Should we no longer take cpu shares as a means to limit CPU into
> > account? It would be a significant change to how previous JDKs
> > worked. Maybe that wouldn't be such a bad idea :)
>
> I think we should get rid of it. This feature was designed to work with
> Kubernetes, but has no effect in most cases. The only time it takes
> effect (when no resource limits are set) it does the opposite of what
> the user expects.
I tend to agree. We should start with a CSR review of this, though, as
it would be a behavioural change as compared to previous versions of
the JDK.
> Also, the current implementation is really tied to specific behaviors of
> Kubernetes + docker (the 1024 and 100 constants). This will cause
> problems with other container/orchestration software that use different
> algorithms and constants.
There are other container orchestration frameworks, like Mesos, which
behave in a similar way (1024 constant is being used). The good news is
that mesos seems to have moved to a hard-limit default. See:
https://mesosphere.github.io/field-notes/faqs/utilization.html
https://mesos.apache.org/documentation/latest/quota/#deprecated-quota-guarantees
>
> > * How likely is CPU underutilization to happen in practise?
> > Considering the container is not the only container on the node,
> > then according to your formula, it'll get one CPU or less anyway.
> > Underutilization would, thus, only happen when it's an idle node
> > with no other containers running. That would suggest to do nothing
> > and let the user override it as they see fit.
>
> I think under utilization happens when the containers have a bursty
> usage pattern. If other containers do not fully utilize their CPU
> quotas, we should distribute the unused CPUs to the busy containers.
Right, but this isn't really something the JVM process should care
about. It's really a core feature of the orchestration framework to do
that. All we could do is to not limit CPU for those cases. On the other
hand there is the risk of resource starvation too. Consider a node with
many cores, 50 say, and a very small cpu share setting via container
limits. The experience running a JVM application in such a set up would
be very mediocre as the JVM thinks it can use 50 cores (100% of the
time), yet it would only get this when the rest of the
containers/universe is idle.
Thanks,
Severin
More information about the hotspot-dev
mailing list