Container-aware heap sizing for OpenJDK

Fri Sep 16 12:32:14 UTC 2022

ZGC already has a manageable flag to support the proposed "current 
target heap size", called SoftMaxHeapSize. It would probably be good to 
use the same name for the other GCs.

https://malloc.se/blog/zgc-softmaxheapsize

StefanK

On 2022-09-13 21:16, Jonathan Joo wrote:
>
> Hello hotspot-dev and hotspot-gc-dev,
>
>
> My name is Jonathan, and I'm working on the Java Platform Team at 
> Google. Here, we are working on a project to address Java container 
> memory issues, as we noticed that a significant number of Java servers 
> hit container OOM issues due to people incorrectly tuning their heap 
> size with respect to the container size. Because our containers have 
> other RAM consumers which fluctuate over time, it is often difficult 
> to determine a priori what is an appropriate Xmx to set for a 
> particular server.
>
>
> We set about trying to solve this by dynamically adjusting the Java 
> heap/gc behavior based on the container usage information that we pass 
> into the JVM. We have seen promising results so far, reducing 
> container OOMs by a significant amount, and oftentimes also reducing 
> average heap usage (with the tradeoff of more CPU time spent doing GC).
>
>
> Below (under the dotted line) is a more detailed explanation of our 
> initial approach. Does this sound like something that may be useful 
> for the general OpenJDK community? If so, would some of you be open to 
> further discussion? I would also like to better understand what 
> container environments look like outside of Google, to see how we 
> could modify our approach for the more general case.
>
>
> Thank you!
>
> Jonathan
>
>
>       ------------------------------------------------------------------------
>
>
>       Introduction:
>
> Adaptable Heap Sizing (AHS) is a project internal to Google that is 
> meant to simplify configuration and improve the stability of 
> applications in container environments. The key is that in a 
> containerized environment, we have access to container usage and limit 
> information. This can be used as a signal to modify Java heap 
> behavior, helping prevent container OOMs.
>
>
>       Problem:
>
>  *
>
>     Containers at Google must be properly sized to not only the JVM
>     heap, but other memory consumers as well. These consumers include
>     non-heap Java (e.g. native code allocations), and simultaneously
>     running non-Java processes.
>
>  *
>
>     Common antipattern we see here at Google:
>
>      o
>
>         We have an application running into container OOMs.
>
>      o
>
>         An engineer raises both container memory limit and Xmx by the
>         same amount, since there appears to be insufficient memory.
>
>      o
>
>         The application has reduced container OOMs, but is still prone
>         to them, since G1 continues to use most of Xmx.
>
>  *
>
>     This results in many jobs being configured with much more RAM than
>     they need, but still running into container OOM issues.
>
>
>       Hypothesis:
>
>  *
>
>     For preventing container OOM: Why can't heap expansions be bounded
>     by the remaining free space in the container?
>
>  *
>
>     For preventing the `unnecessarily high Xmx` antipattern: Why can't
>     target heap size be set based on GC CPU overhead?
>
>  *
>
>     From our work on Adaptable Heap Sizing, it appears they can!
>
>
>       Design:
>
>  *
>
>     We add two manageable flags in the JVM
>
>      o
>
>         Current maximum heap expansion size
>
>      o
>
>         Current target heap size
>
>  *
>
>     A separate thread runs alongside the JVM, querying:
>
>      o
>
>         Container memory usage/limits
>
>      o
>
>         GC CPU overhead metrics from the JVM.
>
>  *
>
>     This thread then uses this information to calculate new values for
>     the two new JVM flags, and continually updates them at runtime.
>
>  *
>
>     The `Current maximum heap expansion size` informs the JVM what is
>     the maximum amount we can expand the heap by, while staying within
>     container limits. This is a hard limit, and trying to expand more
>     than this amount results in behavior equivalent to hitting the Xmx
>     limit.
>
>  *
>
>     The `Current target heap size` is a soft target value, which is
>     used to resize the heap (when possible) so as to bring GC CPU
>     overhead toward its target value.
>
>
>       Results:
>
>  *
>
>     At Google, we have found that this design works incredibly well in
>     our initial rollout, even for large and complex workloads.
>
>  *
>
>     After deploying this to dozens of applications:
>
>      o
>
>         Significant memory savings for previously misconfigured jobs
>         (many of which reduced their heap usage by 50% or more)
>
>      o
>
>         Significantly reduced occurrences of container OOM (100%
>         reduction in vast majority of cases)
>
>      o
>
>         No correctness issues
>
>      o
>
>         No latency regressions*
>
>      o
>
>         We plan to deploy AHS across a much wider subset of
>         applications by EOY '22.
>
>
>       *Caveats:
>
>  *
>
>
>           Enabling this feature might require tuning of the newly
>           introduced default GC CPU overhead target to avoid regressions.
>
>  *
>
>     Time spent doing GC for an application may increase significantly
>     (though generally we've seen in practice that even if this is the
>     case, end-to-end latency does not increase a noticeable amount)
>
>  *
>
>     Enabling AHS results in frequent heap resizings, but we have not
>     seen evidence of any negative effects as a result of these more
>     frequent heap resizings.
>
>  *
>
>     AHS is not necessarily a replacement for proper JVM tuning, but
>     should generally work better than an untuned or improperly tuned
>     configuration.
>
>  *
>
>     AHS is not intended for every possible workload, and there could
>     be pathological cases where AHS results in worse behavior.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20220916/6d2a2ed2/attachment.htm>