Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Wed Jan 15 10:58:40 UTC 2020

Hi Thomas,

>> 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms
>> The constraint function didn't make sure the SoftMaxHeapSize should
>> less than Xms. Do we need to add the checking? It will not only
>> affect G1...

> I will check again later, but from what I remember from yesterday it
> does check it at VM start (-Xms sets both minimum and initial heap
> size). The constraint func does not check when the user changes the
> value during runtime. So code using it must still maintain this
> invariant in behavior.

The default constraint function will be both checked in VM startup
and during runtime via jinfo. By looking into the code, ZGC seems
to allow SoftMaxHeapSize less than Xms. So do we need to create 
another mail thread to discuss it?

>> 4. commit/uncommit parallelism
>> The concurrent uncommit will work with VMThread doing GC and GC may
>> request to expand heap if not enough empty regions. So the
>> parallelism is possible and immediate uncommit is a solution.

> There may be others, but it actually seems easiest as blocking such a
> request seems actually harder to implement, at least it's less
> localized in the code. Completely *dropping* the request seems against
> the rule that "SoftMaxHeapSize is a hint" guideline and may have other
> unforeseen consequences too. Like I said, since G1 does not expand
> then, there will be more GCs with the small heap, increasing the
> current GCTimeRatio more than it should. Which means when ultimately
> the request comes through as G1 will certainly try again, the increase
> may be huge. (The increase is proportional to the difference in actual
> and requested GCTimeRatio iirc).

> Again, if there are good reasons to do otherwise I am open to
> discussion, but it would be nice to have numbers to base decisions on.

I'm not on the side of blocking the expand request:)
G1RegionsLargerThanCommitSizeMapper can do uncommit/commit 
parallelly and G1RegionsSmallerThanCommitSizeMapper
can do uncommit/commit immediately. So I think we don't have issues
so far? 

>> So it is kind of *hard* limit and we need to expand immediately if
>> GCTimeRatio drops below 12. The difference in our workloads is that
>> we will keep a GCTimeRatio nearly the original value 99 to make GC in

>I.e. you set it to 99 at startup?

In fact we are not controlling GCTimeRatio. In a lot of applications
 running in exclusive containers we set Xms same to Xmx to avoid 
any heap expansion during runtime which might cause allocation
 stalls and timeout. 

>> I propose if we can still use the original option
>> "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified
>> number. The default flow will make sure the GCTimeRatio is above the
>> threshold 12 and concurrent commit/uncommit will adjust the heap to
>> keep GCTimeRatio in a proper number that the adjustment is not
>> urgent.

> I am not completely sure what you want to achieve here or what the
> problem is. I probably need to understand more about the problem and
> potentially other solutions can be found.

> As for a new -XX:+G1ElasticHeap option, it does not seem to make a
> difference to set this or -XX:GCTimeRatio in this case (both are single
> options). But I do not completely know the details here.

Theoretically Java heap will not return memory in default and
ZGC/Shenandoah have options to control by "ZUncommit" and "ShenandoahUncommit"
to info user that memory can be uncommit... So I think G1 needs
the same thing as well. In my opinion, here are 2 espects. The 
default value of GCTimeRatio is the basic line so we might
 need to expand immediately to avoid frequent GCs if using 
concurrent flow. But the G1ElasticHeap is an optimization
to keep the balance of GC health and memory utility so the 
policy should be more conservative and we also need to do it
concurrently by not bringing any obvious pause overhead.

Thanks,
Liang

------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Jan. 15 (Wed.) 16:37
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On Wed, 2020-01-15 at 11:52 +0800, Liang Mao wrote:
> Hi Thomas,
> 
> I summarize the issues in as following:
> 
> 1. Criterion of SoftMaxHeapSize
> I agree to keep the policy of SoftMaxHeapSize similar with ZGC to
> make it unified. So "expand_heap_after_young_collection" is used for
> meeting the basic GCTimeRatio and expand heap immediately which
> cannot be blocked by any
> reasons. "adjust_heap_after_young_collection" cannot change the
> logic
> and I will take both expansion and shrink into consideration. Is my 
> understanding correct here?

Yes, ideally we would be close to ZGC in behavior with SoftMaxHeapSize.
If for some reason this does not work we may need to reconsider - but
we need a reason if possible backed by numbers/graphs of actual
behavior.

> 
> 2. Full GC with SoftMaxHeapSize
> In my thought non-explicit Full GC probably means the insufficiency 
> of heap capacity and we may not keep shrinking within SoftMaxHeapSize
> but explicit FGC don't have that issue. That's the only reason why I 

People run explicit FGC for many reasons, and the one you describe is
just one of them.

E.g. explicit FGC can be converted to a concurrent cycle or disabled
for other reasons, so having special behavior for this particular case
may just not work as intended in many cases. Users may then need to
decide then whether they want this behavor, or the system.gc-starts-
concurrent-cycle one they might also rely on.

The lone "System.gc()" call is insufficient to transport the actual
intent of the user - but that is a different issue.

> checked if it is explicit. But we will have the same determine logic
> to check if the heap can be shrinked so "explicit" check could be
> meaningless and I will remove that.

Exactly. 

> 
> 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms
> The constraint function didn't make sure the SoftMaxHeapSize should
> less than Xms. Do we need to add the checking? It will not only
> affect G1...

I will check again later, but from what I remember from yesterday it
does check it at VM start (-Xms sets both minimum and initial heap
size). The constraint func does not check when the user changes the
value during runtime. So code using it must still maintain this
invariant in behavior.

> 4. commit/uncommit parallelism
> The concurrent uncommit will work with VMThread doing GC and GC may
> request to expand heap if not enough empty regions. So the
> parallelism is possible and immediate uncommit is a solution.

There may be others, but it actually seems easiest as blocking such a
request seems actually harder to implement, at least it's less
localized in the code. Completely *dropping* the request seems against
the rule that "SoftMaxHeapSize is a hint" guideline and may have other
unforeseen consequences too. Like I said, since G1 does not expand
then, there will be more GCs with the small heap, increasing the
current GCTimeRatio more than it should. Which means when ultimately
the request comes through as G1 will certainly try again, the increase
may be huge. (The increase is proportional to the difference in actual
and requested GCTimeRatio iirc).

Again, if there are good reasons to do otherwise I am open to
discussion, but it would be nice to have numbers to base decisions on.

> 4. More heap expansion/shrink heuristics further
> We have some data and experience in dynamimc heap adjustment in our
> workloads.
> The default GCTimeRatio 12 is really well tuned number that we found
> applications will have obvious timeout erros if it is less than ~12. 

It is actually *very* interesting to hear that the default G1
GCTimeRatio fits you well. Given over-time improvements in G1 gc
performance, I was already privately asking myselves whether to
decrease the default percentage, increasing this value (I hope I got
the directions right ;)) and similarly adjust the default
MaxGCPauseMillis down to reflect that from time to time.

> So it is kind of *hard* limit and we need to expand immediately if
> GCTimeRatio drops below 12. The difference in our workloads is that
> we will keep a GCTimeRatio nearly the original value 99 to make GC in

I.e. you set it to 99 at startup?

> a heathy state because allocation rate and outside input can vary
> violently that we don't want frequent adjustment. You know that in
> our 8u  implementation we just keep a conservative GC interval to
> achieve that. Comparing to the current code in JDK15, keeping
> GCTimeRatio as 99 is a different behavior which might have more
> memory footprint. 

As mentioned above, I think given that we both very thinking about
this, we might actually evaluate changing the defaults.

> I propose if we can still use the original option
> "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified
> number. The default flow will make sure the GCTimeRatio is above the
> threshold 12 and concurrent commit/uncommit will adjust the heap to
> keep GCTimeRatio in a proper number that the adjustment is not
> urgent.

I am not completely sure what you want to achieve here or what the
problem is. I probably need to understand more about the problem and
potentially other solutions can be found.

As for a new -XX:+G1ElasticHeap option, it does not seem to make a
difference to set this or -XX:GCTimeRatio in this case (both are single
options). But I do not completely know the details here.

Thanks,
  Thomas