Configurable G1 heap expansion aggressiveness

Jaroslaw Odzga jarek.odzga at gmail.com
Thu Feb 13 13:24:17 UTC 2025


Thank you Kirk and Thomas for your answers!

What Kirk describes sounds great, is the right long term approach and
I can't wait for it to be shipped. It also sounds like a feature we
might need to wait for a while (please correct me if I am wrong).

My proposal is just a tiny stopgap that might help alleviate some of
the problems but does not attempt to be a holistic solution and, as
you pointed out, has downsides.
I totally agree with your assessment: it is just exposing internal
constants but the fact that these are constants is part of the problem
because they bake in an eager heap expansion behavior which is not
necessarily desired.
I share your reluctance to adding more obscure tuning flags: it has
maintenance cost and a risk of misuse. I would not recommend anyone
tuning these flags without reading the source code and understanding
the tradeoffs.
These are not silver bullets and, as you pointed out, probably would
have to be used together with other tuning parameters to achieve
reasonable results.
To clarify, the way we plan to use these flags is to establish a
constant set of tuning parameters that achieve a good tradeoff between
latency, throughput and footprint and apply it to a large number of
services.
We want to avoid tuning each service individually because it is hard
to scale. Example configuration (used with jdk17):
        -XX:+UnlockExperimentalVMOptions -XX:+G1PeriodicGCInvokesConcurrent
        -XX:G1PeriodicGCInterval=60000 -XX:G1PeriodicGCSystemLoadThreshold=0
        -XX:GCTimeRatio=9 -XX:G1MixedGCLiveThresholdPercent=85
        -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
-XX:MaxGCPauseMillis=200 -XX:GCPauseIntervalMillis=1000
        -XX:-G1UsePreventiveGC -XX:-G1ScaleWithHeapPauseTimeThreshold
-XX:G1MinPausesOverThresholdForGrowth=10
>From experiments so far it seems that we can leave the adaptive IHOP
on because even if it mispredicted, e.g. due to allocation spikes, the
heap is not aggressively expanded.

On the plus side, the change itself is tiny, very localized and could
be trivially backported e.g. all the way to jdk17. Most importantly,
it seems to enable significant cost savings.

At the end of the day it is a tradeoff. Would it help if I provided
examples of the impact this change had on real life applications? At
Databricks we run hundreds of JVM services and initial results are
very promising. Or should I treat this proposal as officially
rejected?

> Wouldn't the option to make G1 to keep GCTimeRatio better (e.g.
> https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable
> soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that
> the collector will keep also solve your issue while being easier to
> configure?
Thanks for sharing these. The JDK-8238687 focuses on uncommit while
the heap expansion hurts the most.
The SoftMaxHeapSize could be used as a building block towards a
solution. I think there still would have to be some controller that
adjusts the value of SoftMaxHeapSize based on GC behavior e.g.
increase it when GC pressure is too high.

Best regards,
Jaroslaw

On Thu, Feb 13, 2025 at 2:49 AM Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
>
> Hi Jaroslaw,
>
>    thank you for contributing and speaking up with an itch of yours!
>
> The motivation, and analysis are spot on: we agree that the
> aggressiveness of G1 heap expansion paired with reluctance to give back
> memory can make it hard to configure G1 as you would want in this situation.
>
> However we do not think that the proposed solution (adding even more
> customizability) is where we want to go.
>
> More background below, inline:
>
> On 09.02.25 20:54, Jaroslaw Odzga wrote:
> > Context and Motivation
> > In multi-tenant environments e.g. Kubernetes clusters in cloud
> > environments there is a strong incentive to use as little memory as
> > possible. Lower memory usage means more processes can be packed on a
> > single VM which directly translates to lower cloud cost.
> > Configuring G1 heap size in this setup is currently challenging. On
> > the one hand we would like to set the max heap size to a high value so
> > that application doesn’t fail with heap OOME when faced with
> > unexpectedly high load or organic growth. On the other hand we need to
> > set max heap size to as small a value as possible because G1 is very
> > eager to expand heap even when tuned to collect garbage aggressively.
> >
> > Ideally, we would like to:
> > - Set the initial heap size to a small value.
> > - Set the max heap size to a value larger than expected usage so that
> > application can handle unexpected load and organic growth.
> > - Configure G1 GC to not expand heap aggressively. This is currently
> > not possible.
> >
> > We propose two new JVM G1 flags that would give us more control over
> > G1 heap expansion aggressiveness and realize significant cost savings
> > in multi-tenant environments.
>
> Understood.
>
> We are generally very reluctant in exposing more flags in basically any
> collector due to maintenance overhead. We understand that these are
> experimental flags that can be removed at a whim, but still doing that
> if/when they are in use is awkward.
>
>
> > At the same time we don’t want to change existing G1 behavior - with
> > default values of the new flags current G1 behavior would be
> > maintained.
> >
> > Analysis
> > Currently even with very aggressive G1 configuration such as:
> > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20
> > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
> > the heap is fairly eagerly expanded.
> >
> > We found two culprits responsible for this in
> > G1HeapSizingPolicy::young_collection_expansion_amount() function.
> > First, the scale_with_heap() function makes pause_time_threshold small
> > in cases where current heap size is smaller than 1/2 of max heap size.
> > While it is likely a desired behavior in many situations, it also
> > causes memory usage spikes in situations where max heap size is much
> > larger than current heap size.
> > Second, the MinOverThresholdForGrowth constant equal to 4 is an
> > arbitrary value which hardcodes the heap expansion aggressiveness. We
> > observed that short_term_pause_time_ratio can exceed
> > pause_time_threshold and trigger heap expansion too eagerly in many
> > situations, especially when allocation rate is spiky.
> >
> > Proposal
> > We would like to introduce two new experimental flags:
> > - G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow
> > disabling scale_with_heap()
> > - G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a
> > configurable replacement for the MinOverThresholdForGrowth constant.
> >
> > We don’t want to change the default behavior of G1. Default values for
> > these flags (G1ScaleWithHeapPauseTimeThreshold=true,
> > G1MinPausesOverThresholdForGrowth=4) would maintain the existing
> > behavior.
> >
> > Alternatives
> > There is currently no good alternative. Potentially we could configure
> > G1 aggressively to trigger GC very frequently e.g.:
> > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20
> > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60
> > Even with this configuration we see occasional large memory spikes
> > where heap is quickly expanded. Even though the expanded heap
> > contracts eventually, this poses a significant problem because in
> > practice we don’t know if such a spike could have been avoided so it
> > is not obvious how much memory the application really needs. Of course
> > such configuration would also consume more CPU.
>
> The suggestion changes
>
> a) the aggressiveness of expansion if it has been decided that G1 should
> expand (G1ScaleWithHeapPauseTimeThreshold); looking at this particular
> piece of code, this behavior actually seems strange and unexpected. I.e.
> given that the user sets a GCTimeRatio, for some reason allow G1 to
> basically override it to a large extent.
>
> The reason is mostly historical: I collected thoughts in
> https://bugs.openjdk.org/browse/JDK-8349978.
>
> Note that just removing this behavior has quite a few unintended
> consequences as heap sizing is very much interconnected with general
> performance behavior.
>
> b) makes G1 more lazy about determining whether it needs to expand
> (G1MinPausesOverThresholdForGrowth) by increasing the number of
> consecutive GCs that GCTimeRatio needs to be over the threshold to cause
> expansion.
> (That's just exposing an internal constant :))
>
>
> These changes cover expansion behavior, but not shrinking again. I
> believe that still the other slew of options mentioned above
>
> (-XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20
> -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60)
>
> is needed to keep the heap stable and shrinking again over time (it may
> work with just changing GCTimeRatio in your particular case).
>
> That seems awfully complicated for an end user, and indicative of
> papering over the problem. We would like to avoid this.
>
>
> As Kirk in his other email in the thread indicates, there is work
> underway to make the VM (and G1) aware of other memory consumers in the
> VM. Not sure if that would also fix your problem in a more user friendly
> (and hopefully generic) way.
>
>
>
> Wouldn't the option to make G1 to keep GCTimeRatio better (e.g.
> https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable
> soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that
> the collector will keep also solve your issue while being easier to
> configure?
>
> (There're a lot of connected problems in the bug tracker, so make sure
> to follow related issues).
>
> Maybe you are interested and can find something to work on in that area;
> there has actually already been a lot of investigation (and some
> resulting, unfinished patches) in that area, so feel free to ask.
>
> Thanks,
>    Thomas
>
> Fwiw, we tried to label issues related to this area, see
> https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing .


More information about the hotspot-gc-dev mailing list