RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
Thomas Schatzl
thomas.schatzl at oracle.com
Fri Feb 7 11:09:20 UTC 2020
Hi,
On 06.02.20 13:27, Liang Mao wrote:
> Hi Thomas,
>
> Thanks for the testing and evaluating!
>
> I tried your test with specjbb2015 and had some little different
> result maybe because of machine capability. The config I used is as below:
> -Xmx8g -Xms2g -Xlog:gc* -XX:GCTimeRatio=4
> -XX:+UseStringDeduplication
> -Dspecjbb.comm.connect.type=HTTP_Jetty
> -Dspecjbb.controller.type=PRESET
> -Dspecjbb.controller.preset.ir=5000
> -Dspecjbb.controller.preset.duration=10800000
>
> The heap was around 6GB after running for a while (300s). And
> I was able to use SoftMaxHeapSize to let it shrink to 5GB. It
> should be like your scenario to shrink the heap to 3GB
>
> The behavior is as I expected. But I thought you might expect
> more aggressive result. In my mind, for a constant load,
> the jvm might not need to shrink the heap that JVM supposes to expand
> the heap to the right capacity.
Did you change Min/MaxHeapFreeRatio for your test? It does not look like
that, as I get roughly the same results if I don't. Given that we agree
that it is wrong to use Min/MaxHeapFreeRatio during Remark, the
observation is interesting, but does not seem to help here except
reinforcing that Min/MaxHeapFreeRatio are not a good thing to use.
Also, I doubt that G1's current heap size selection is optimal. Some
reasons off my head:
- Min/MaxHeapFreeRatio has been chosen to avoid uncommit/commit
ping-pong and frequent (un-)commits (i.e. performance), not heap
compactness.
- adaptive IHOP (or at least the knowledge about expected amount of
memory used during gc operation) has not been available, hence the very
conservative values.
- the values have been chosen long before the uncommit at remark [2] has
been implemented. As author of that change I can authoratively say that
fixing the policy had been out of scope for that change ;) however it
had been needed for JEP 346 Promptly Uncommit unused memory [1] to do
*something* without disrupting existing behavior too much to avoid
lengthy re-evaluation of sizing policies.
The logic went something like: what concurrent mark does roughly equals
full gc, so do the same sizing as during full gc. End.
- there is (rough) consensus that Min/MaxHeapFreeRatio is/has been a bad
idea, starting from the naming. ZGC and Shenandoah do not use it afaict.
- optimal heap size depends on application phase (e.g.
startup/operation/idle). Min/MaxHeapFreeRatio default values basically
prevent shrinking in many cases. Sometimes they even expand the heap
[3]. Given the high default value of MinHeapFreeRatio, G1 will most
likely end up using too much memory.
I.e. we apply MinHeapFreeRatio at Remark, which means that the heap size
will be kept at heap size at Remark + 40%. Given that Remark is where
heap usage almost peaked anyway, you get a really large commit size.
Unnecessarily large because (beginning with modestly large heaps in few
GBs) the actual peak memory usage *at optimal operation* is what
adaptive IHOP determined. This is typically a lot less than 40% of
existing usage at Remark. So G1 keeps a lot of memory around for no
reason. This can be particularly significant in large heaps (say, double
digit GB) where those 40% can be a lot in absolute terms while G1 only
ever uses single digit additional GB during the cycle.
In my tests, e.g. the suggested 10% seem sufficient for that particular
case.
We also agree that uncommit at end of mixed gc is probably better, but
again, how much do you uncommit? To keep as much as you expect to not
use would be a good start, maybe a bit more. Not less, because then you
are going to do an unnecessary commit during that cycle for sure.
Currently the best idea about what we are going to need in the next time
is given by the IHOP goal value imho.
So overall, please do not read too much into existing heap sizing policy :)
> The soft limit I imagine is
> to bring the heap size down after a load pike. In Alibaba's
> workload, the heap shrink is controlled by cluster's unified
> control center which has the predicition data and the soft limit
> works more like a *hard* limit in our 8u implementation. >
> So I think it is acceptable that heap size failed shrinked
> to 2GB in your test case. You can see that
> G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative
> and we may be able to make it more aggressive.
>
>
> For almost idle application which doesn't have a GC for a
> rather long time, the shrink cannot happen. In our previous 8u
> patch, we have a timer to trigger GC and the softmx is changed by
> a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option
> in 8u yet). Shall we introduce a timer GC as well?
>
Please give the functionality JEP 346 added a try if you haven't. It
should achieve what you suggest except that Min/MaxHeapFreeRatio may
prevent G1 to achive the compact heap you expect (again).
Min/MaxHeapFreeRatio were changed to be manageable exactly for this
reason, i.e. if you are idle, and your control center knows that the
machine is going to be idle, instead of adjusting (in this case)
SoftMaxHeapSize it may as well set Min/MaxHeapFreeRatio to low values
and JEP 346 would do the rest. Before JEP 346 you needed to send a
manual system.gc in addition.
So a simpler solution than the one suggested by you would be to just
drop usage of Min/MaxHeapFreeRatio and/or incorporate SoftMaxHeapSize in
the uncommit at remark in your case and let JEP 346 functionality its job.
If JEP 346 does not work for your use case, we are eager to hear back
from you about your experience. We do know that it may be a little bit
too much focused on what "idle" is, but that can be tweaked.
The reason I am suggesting to try JEP 346 is that from my understanding
the suggested implementation seems to cover only exactly the same case
as JEP 346, but only with side effects e.g.
- causing commit/uncommit ping-pong if the application is slightly
active at worst, and no effect at best. While concurrent uncommit tries
to mitigate this (and it is still very interesting to do), doing less
commit/uncommit in the first place seems better.
- not covering e.g. the case where an existing Remark finishes after the
last GC that decreased the heap to SoftMaxHeapSize even in the idle case
(could be fixed as you mentioned above with a timer, but JEP 346 covers
this already)
- only limited to reducing heap to SoftMaxHeapSize (why? Fixed as you
said you were thinking about a more aggressive policy)
In a SoftMaxHeapSize solution in the JVM that I envision, the change
should cover a wide(r) range of usage scenarios. We need to look a bit
further than this single use case (which afaict G1 should already handle).
In the case you need a real hard limit I recommend looking at
implementing that. There has been a proposal to do so some time ago, but
is inactive at this time [0].
>
> Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine
> the heap expand/shrink in G1 and in our 8u practical experience we never
> have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce
> your test, the only exception is the heap will expand to 6GB after
> shrinking to SoftMaxHeapSize=5g is because in remark we will resize the
> heap.
> BTW, I don't think remark is a good point to resize heap since in remark
> phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't
> need to resize in remark but just resize after mixed GC according to
> GCTimeRatio.
>
> Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling
> seems a similar approach as ZGC. ZGC is a single generation GC whose
> scenario
> is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision
> in G1. Since we already have policy to determine the shrink of the heap
> by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according
> to SoftMaxHeapSize... We may encounter the situation that we cannot
> shrink the
> heap size to SoftMaxHeapSize but concurrent mark become frequent after
> affecting
> the IHOP policy.
ZGC will be generational at some point. This has been on its roadmap
since the beginning. Also, there is not much difference as you can see
from the patch. The difference is currently 1 LOC to set young gen sizes
in addition to the heap goal.
I also thought about the last point, i.e. when the user sets
SoftMaxHeapSize too low, then you get continuous marking cycles. My
answer to the user would be that, well, feel free to shoot yourselves
into the foot, but compared to an OOME with a hard limit, this behavior
seems much better (but there are certainly situations where a hard limit
is better for someone so both seem useful).
Ultimately the only thing I can say that there is no free lunch in the
throughput/latency/memory triangle, but there may be situations where
memory is more important than performance too (widening the appeal of
SoftMaxHeapSize).
In the test I gave, the 2g goal is maybe too low for this case, but the
3g (instead of 3.8g) looks really attractive (and G1 seems to find an
"optimal" size of 2.2-2.8g at that point; I think I found the reason for
the spikes above 3g and looking into testing a fix).
The implementation suggested by me does not affect the idle case at all;
JEP 346 functionality will clean up and compact the heap nicely (you
would still need to fix the shrinking amount in the sizing policy, but
we already agreed on that it is not good, and that doing the evaluation
at remark isn't the best idea either - but both are separate issues).
>
>> In the log I have, the problem seems to be that we are re-setting the
>> softmaxheapsize within the space reclamation phase (i.e. mixed gc) and
>> G1 sizing policies got confused, i.e. it partially keeps on using the 2g
>> goal for young gen sizing until the *2 problem expands it. That's a bug
>> and needs to be fixed.
>
> I don't think it's a problem that after mixed GC
> resize_heap_after_young_collection
> will evaluate if the heap can be shrinked to the new value of
> SoftMaxHeapSize.
Resizing (to SoftMaxHeapSize) after every gc will shrink and expand all
the time unnecessarily. I.e. you expand one GC, the next gc it may
happen that G1 can shrink to SoftMaxHeapSize again (e.g. because eager
reclaim freed a lot), next gc G1 commits again because of failed pause
time goal (or just commit during humongous allocation which can be
immediately reversed because of eager reclaim).
Even with concurrent uncommit, such behavior seems a waste of time. Imho
with concurrent (un-)commit unnecessary resizing should be avoided if
possible.
One option is to base that decision on the value that adaptive IHOP
gives you. It seems a very good start but there may be better
approaches. Fixed percentages like Min/MaxFreeRatio are too simple as it
seems :)
Thanks,
Thomas
[0] https://bugs.openjdk.java.net/browse/JDK-8204088
[1] https://bugs.openjdk.java.net/browse/JDK-8204089
[2] https://bugs.openjdk.java.net/browse/JDK-6490394
[3]
https://bugs.openjdk.java.net/browse/JDK-6490394?focusedCommentId=14283475&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14283475
(only just noticed)
More information about the hotspot-gc-dev
mailing list