RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Jun 10 09:31:00 UTC 2020
Hi all, Liang,
after a few months of busy working in the area of G1 heap resizing
and ultimately SoftMaxHeapSize support, I am fairly okay with a first
preview of these associated changes. So I would like to ask for feedback
on the current changes for what I intend to complete in the (early)
jdk16 timeframe.
This is not a request for review of the changes for pushing, although
feedback on the code is also appreciated.
From my point of view only tuning a few heuristics and code polishing
is left to do as the change seems to do what it is intended to do.
In particular it would be nice if Liang Mao, the original requestor of
all this functionality, could help with feedback on his loads. :)
Just to recap: Sometime around end of last year, Liang posted review(s)
with functionality to:
- concurrent uncommit of memory
- implement SoftMaxHeapSize by uncommitting free memory
That did not work well in some cases, so we agreed on us at Oracle
taking over. Today I would like to talk about the progress on the second
part :)
The original proposal did not work well because it did not really change
how G1 resized the heap - i.e. SoftMaxHeapSize related changes to the
heap were quickly undone by regular heap expansion because it was too
aggressive for several reasons (e.g. bugs like JDK-8243672,
JDK-8244603), uncooperative (JDK-8238686) and never actually helped
shrinking or keeping a particular heap size.
This resulted in lots of unnecessary heap changes even on known constant
load.
After some analysis after fixing these issues (at least internally ;)) I
thought that for G1 to keep a particular heap size G1 needs to have an
element in its heap sizing control loop that pushes back on (excessive)
heap expansion.
The best approach I thought of has been to introduce a *lower*
GCTimeRatio that G1 tries to stay *above* by resizing the heap.
Effectively, G1 then tries to stay within ["LowerGCTimeRatio",
GCTimeRatio] for its actual gc time ratio.
That works out fairly well actually, and today I thought that the code
is in a state, while still heavy in development (it does look like that
:) still), could be provided for gathering feedback on more loads from you.
First, how to try and use before going into the details and questions I
have:
This is a series of patches, which I put up on cr.openjdk.net that need
to be applied on recent trunk:
These are the ones already out for review:
1) JDK-8243672: http://cr.openjdk.java.net/~tschatzl/8243672/webrev.1/
2) JDK-8244603: http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
These are in the pipeline and not "fully complete" yet:
3) JDK-8238163: http://cr.openjdk.java.net/~tschatzl/8238163/webrev/
(optional)
4) JDK-8238686: http://cr.openjdk.java.net/~tschatzl/8238686/webrev/
5) JDK-8238687: http://cr.openjdk.java.net/~tschatzl/8238687/webrev/
6) JDK-8236073: http://cr.openjdk.java.net/~tschatzl/8236073/webrev/
All of the above:
http://cr.openjdk.java.net/~tschatzl/8236073/webrev.preview/
What these do:
(1) and (2) make the input variables to the control loop more
consistent. Since they are out for review, I would defer to the review
threads for them.
(3) stabilizes IHOP calculation a bit, trying to improve uncommon
situations. This change is optional.
(4) fixes the issue with resizing at Remark being totally disconnected
with actual load, causing some erratic expansions and shrinks.
After some time tinkering with that I decided to remove resizing at
Remark - since we check heap size at every gc anyway, this is not
required any more (but also delaying uncommit to the next gc).
(5) is the main change that implements a what has been mentioned above:
G1 tries to keep actual GC time ratio within the range of
LowerGCTimeRatio and GCTimeRatio. As long as actual GC time ratio is
within this range, no action occurs. As soon as it finds that there is a
trend of being outside, it tries to correct for that, internally trying
to reach an actual gc time ratio in the middle of that range.
(6) implements SoftMaxHeapSize on top of that, trying to steer IHOP so
that G1 does not use more than that. (I.e. a complete mess of
potentially conflicting goals ;)
What I would like to ask you is try out these changes on your load(s),
and potentially report back with at least
gc*,gc=debug,gc+ergo+heap=trace
logging.
Of course more feedback about how it works for you is even better, and
if you are adventurous, maybe try tuning (internal) knobs a bit, which
I'll describe in a minute :)
As mentioned, the changes are not complete, here's what I think should
still be tuned a bit, and what I expect helps. The interesting method is
G1HeapSizingPolicy::resize_amount_after_young_gc().
- determining the desired gc time ratio range: there is a new (likely
temporary) option G1MinimumPercentOfGCTimeRatio that determines the
lower gc time ratio described above as percentage of the original
GCTimeRatio. Currently set at 50%, which seems a good value as a too
tight range will cause lots of resizing (which might be good), and a too
large range will effectively disable shrinking (which also might be
desired).
Either way, this value works fairly well so far in my tests. Suggestions
very appreciated.
- detection of being outside of the expected gc time ratio range: this
somewhat works as before, separating short term and long term behavior.
Long term behavior: every X gcs without a heap resize g1 looks if long
term gc ratio is outside of the bounds, if so, react. I think this is
fairly straightforward.
Short term behavior: tracks the amount of times short term gc time ratio
exceeds the bounds in a single variable, incrementing or decrementing it
depending on whether current gc time ratio is above or below the gc time
ratio bounds. If that value exceeds certain thresholds, do something.
There is a new bias towards expansion at startup to make g1 react faster
at that time, and some decay towards "no action to be taken" if for a
"long" time nothing happens.
I reused the same values for "short" time (+/-4) and "long" (10) as
before, they seem to be okay.
- actual resizing: expansion is supposed to be the same as before,
relatively aggressive, which I intend to keep.
Shrinking is based on the number of free regions at the moment. This is
not optimal because e.g. you do not want to shrink below what is needed
for current eden (and the survivors of the next gc).
Other than that it is bounded by a percentage of the number of free
regions (G1ShrinkByPercentOfAvailable). That results some heap size
undershoot in some cases (i.e. temporarily uncommitting a bit to much),
but in my tests it hasn't been too bad.
Still rather (too) simple, expect some tunings and changes particularly
here, deviating a bit more from the expansion code.
Comments and ideas in this area, particularly ones applied to your
workloads, particularly appreciated.
Another big area not yet really tested is interaction with JEP 346:
Promptly Return Unused Committed Memory from G1, but I am certain that
with it you can reduce heap usage a lot (too much?).
My performance (throughput) tests so far look almost always encouraging:
20-30% less heap size with statistically insignificant throughput
changes. There are some exceptions, in these cases you loose 10% of
throughput for like 90% of less heap usage.
The only really bad results come from tests that try to find the maximum
throughput of g1 by incrementally increasing the load finding out that
it does not work, slightly back off with the load and then increase the
load again to find an "equilibrium". From what I can tell it looks like
the heap sizing follows the application (i.e. what it's supposed to do),
making the application think it's already done while there is still more
heap available to potentially increase performance (looking at you
specjbb2015 out-of-box performance!).
Not yet sure how to counter that, but some decrease in default
GCTimeRatio to decrease the shrinking aggressiveness (and keeping more
heap longer) might fix this.
Of course, if you disable this adaptive heap sizing by fixing the heap
min/max in your benchmarks, there are no differences to before.
One interesting upcoming change is to make MinHeapSize manageable
(JDK-8224879) to help the algorithm a bit.
As closing words, given that the email is quite long already, thanks for
your attention and looking forward to feedback :)
If you have questions, please chime in too, I am happy to answer them.
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list