RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8]

Fri Apr 4 07:26:54 UTC 2025

On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi all,
>> 
>> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as:
>> 
>> - does not respect `MinHeapSize`;
>> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`;
>> - does not affect heuristcs to trigger a concurrent cycle;
>> 
>> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context.
>
> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use Atomic::load for flag

Thank you both for the quick and detailed responses!

> * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system).
> * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage.
> 
> With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence.
> 
> As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree.
> 
> Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants).

I was unaware that G1 plans to stop using `Min/MaxHeapFreeRatio` until now. Looks like [JDK-8238686](https://bugs.openjdk.org/browse/JDK-8238686) has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as `GCTimeRatio`, and ensure both incremental and full GCs respect this flag. (We should also fix [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) for converging on `GCTimeRatio`. ) It would be nicer if we have a doc or a master bug that describes the overall plan.

In comparison, this PR's approach for a high-precedence, "harder" `SoftMaxHeapSize` is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust `SoftMaxHeapSize` to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence `SoftMaxHeapSize` (or `ProposedHeapSize`) could probably migrate to the `GCTimeRatio` approach, and stop using `SoftMaxHeapSize`.

I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how `SoftMaxHeapSize` should work?

> 
> Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected.

Somewhat related to above, our experience with our internal algorithm that adjusts `SoftMaxHeapSize` based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating `SoftMaxHeapSize`, which is similar to how `Min/MaxHeapFreeRatio` works.

I'm not sure if `GCTimeRatio` using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2777769994