RFR (L): 8244603 and 8238858: Improve young gen sizing

Tue May 19 13:37:56 UTC 2020

Hi all,

   can I have reviews for this change that improves young gen sizing a 
lot to prepare for heap shrinking during young gc (JDK-8238687) ;)

In particular, with this change the following issues two related issues 
are fixed:

* 8238858: G1 Mixed gc young gen sizing might cause the first mixed gc 
to immediately follow the prepare mixed gc
* 8244603: G1 incorrectly limiting young gen size when using the reserve 
can result in repeated full gcs

These have been grouped together because it is too hard to separate them 
out as the bugs required significant rewrite in the young gen sizing.

This results in G1 following GCTimeRatio much better than before, 
leading to less erratic heap expansion. That is, constant loads do not 
result in that much overcommit any more.

Some background:

At end of gc and when the remembered set sampling thread (re-)evaluates 
the young gen G1 calculates two values:

- the *desired* (unconstrained) size of the young gen; 
desired/unconstrained meaning that these values are not limited by 
actually existing free regions. This value is interesting for adaptive 
IHOP so that it (better) converges to a fixed value. (There is some 
bugfix, JDK-8238163, coming for this to fix a few issues with that)

- the actual *target* size for the young gen, i.e. after taking 
constraints in available free regions into account, and whether we are 
currently already needing to use parts of the reserve or not.

Some problems that were fixed with the current code:

- during calculation of the *desired* young gen length G1 basically 
sizes the young gen during mixed gc to the minimum allowed size always. 
This causes unnecessary spikes in short/long term ratios, causing lots 
of heap increases even with a constant load.
Since we do not shrink the heap yet during regular gcs, this typically 
ended up in fully expanding the heap (unless the reclamations during 
Remark actually reclaimed something, but the equilibrium of committed 
heap between these two mechanisms is much higher).

E.g. on specjbb2015 fixed IR (constant load) with "small" and "medium" 
load G1 will use half the heap now while staying < GCTimeRatio.

- at end of gc g1 calculates the young gen for the *next* gc, i.e. 
during the prepare mixed gc g1 should already use the "reduced" amount 
of regions (that's JDK-8238858); similarly the *last* mixed gc in the 
mixed gc phase should already use the calculation for the young phase. 
The current code does not. This partially fixes some "at end of mixed gc 
it takes a while for g1 to achieve previous young gen size again" issues.
(There is a CR for that, but as mentioned, this change does not 
completely solve it).

- there were some calculations to ensure that "at least one region will 
be allocated" every time g1 recalculates young gen but that really 
resulted in g1 increasing the young gen by at least one. You do not 
always want that, particularly since we regularly revise the young gen. 
What you want is a minimum desired *eden* size.

- the desired and target young gen size calculation was done in a single 
huge method. This change splits up the code a bit.

- the code that calculated the actual *target* young length has been 
very inconsistent in handling the reserve. I.e. the limit to the 
actually available free regions only applied in some cases, and notably 
not if you were already using the reserve, causing strange behavior 
where at least the calculated young gen target length was higher than 
available free regions. This could cause full gcs.

- I added some detailed trace-level logging for the ergonomic decisions 
which really helps when debugging issues, but might be too 
large/intrusive for the code - I am not sure whether to leave it in.

Reviewing: I think the best entry point for this change is 
G1Policy::update_young_list_target_length(size_t) where the new code 
first calculates a desired young gen length and then target young length.

Some additional notes:
- eden before a mixed gc is calculated by predicting the time for the 
minimum amount of old gen regions we will definitely take 
(min_old_cset_length) and then letting eden fill up the remainder.

Often this results in significantly larger young gens than before this 
change, at worst young gen will be limited to minimum young gen size (as 
before). Overall it works fairly well, i.e. gives much smoother cpu 
usage. There is a caveat to that in that it depends on accuracy of 
predictions. Since G1 predictions are often too high, we might want to 
take more a lot more optional regions in the future to not be required 
to early terminate the mixed gc
I.e. I have often seen that we do not use up the 200ms pause time goal.

- currently, and like before, MMU desired length overrides the pause 
time desired length. I.e. *if* a GCPauseTimeIntervalMillis is set, the 
spacing between gcs is more important than actual pause times. The 
difference is that now that there is an explicit comment about this 
behavior there :)

- when eating into the reserve (last 10% of heap), we at most allow use 
of the sizer's min young length or half of the reserve regions (rounded 
up!), whichever is higher. This is an arbitrary decision, but since 
supposedly at that time we are already close to the next mixed gc due to 
adaptive ihop, so we can take more.

- note that all this is about calculating a *target* length, not the 
actual length. Actual length may be higher e.g. due to gclocker.

- there may be some out-of-box performance regressions since G1 does not 
expand the heap that much more. Performance can be restored by either 
decreasing GCTimeRatio, or better setting minimum heap sizes.

Actually, in the future, when shrinking is implemented (JDK-8238687), 
these may be more severe (in some benchmarks, actual gc usage is still 
<2%). I will likely try to balance that with decreasing default 
GCTimeRatio value in the future.

CR:
https://bugs.openjdk.java.net/browse/JDK-8244603
https://bugs.openjdk.java.net/browse/JDK-8238858
Webrev:
http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
Testing:
mach5 tier1-5, perf testing

Thanks,
   Thomas