RFR: 8339611: GenShen: Simplify ShenandoahOldHeuristics::trigger_collection_if_fragmented [v2]

Mon Sep 9 20:11:25 UTC 2024

On Fri, 6 Sep 2024 21:29:11 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> But if we are at 110% old_span_percent, we have violated our intended Humongous Reserves, so I'm thinking we should try to squeeze that last 10% out of old so we don't have to STW on the next humongous allocation request...
>
> I have a few broader questions:
> * why is `allowed_old_gen_span` defined in terms of humongous reserve?
> * why do we want to square `old_span_percent` and divide it into `old_density`? The comments suggest we ought to be comparing `old_density` directly?

The idea is that services that anticipate a need for "lots of humongous" allocation can specify a non-zero value for ShenandoahGenerationalHumongousReserve.  If this value is non-zero, GenShen endeavors to avoid placing old-gen regions into the first N% of the heap.  The low end of the heap is reserved for humongous regions, leaving the rest of the heap eligible for Old regions.

Each time we rebuild the freeset, the top-most available regions are reserved for old collector. Next to these, we reserve for the collector (young "survivor regions").  The remaining heap is available for mutator allocations, with humongous allocations prioritized to bottom.

There's an "implicit assumption" in this code that the top-most old-gen region is aligned at top of the heap.  I'll change this by defining old_region_span as (num_regions() - first_old_region).  This makes the correlation more clear.  (I'll also add some comments).

The reason for squaring old_span_percent is because we want non-linear triggering of defragmenting old.  We don't really care about old fragmentation if old's span is small.  Doing too much defragmentation of OLD is shown to degrade performance when the defragmentation was really not "necessary".  With linear (not-squared) old_span_percent, we would be triggering almost twice as frequently at lower span values and less aggressively when old_span_percent exceeds its allowed span:
  // Trigger if old_span_percent is: 110% and density is below 82.5%
  //                                                      100%                                    75% (same at this trigger)
  //                                                       90%                                     67.5%
  //                                                       80%                                     60%
  //                                                       70%                                     56.25%
  //                                                       60%                                     45%
  //                                                       50%                                     37.5%                                      

As old span approaches the "target maximum span" it becomes increasingly important to defragment old.  On the other hand, if we totally ignore fragmentation until we have exceeded the limit, we have high probability of experiencing a STW Full GC due to humongous allocation failure.  This is because:

1. Old defragmentation generally requires many GC cycles, first to do OLD marking, and then to do the mixed evacuations.
2. Defragmenting mixed evacuations typically run slower than "traditional" mixed evacuations.  This is because a typical mixed evacuation only evacuates regions that have "abundant garbage".  But a defragmenting mixed evacuation may have to evacuate regions that are highly utilized.

One other reason to not totally ignore defragmentation until it has become "urgent" is because early investment in defragmenting can prevent the more severe forms of fragmentation from manifesting. When large numbers of regions with low utilization have been promoted in place (which s is one of the key causes of old fragmentation), it is easier to consolidate the live memory from multiple sparsely populated old-gen regions while they are still sparsely populated.  Otherwise, if we delay the defragmentation effort, it is likely that many of the sparsely populated regions become high utilized before we defragment, and then we'll have to copy a lot more data around in order to move regions out of the humongous region zone.

There is room for more extensive performance measurements and improved heuristics, I'm sure.

-------------

PR Review Comment: https://git.openjdk.org/shenandoah/pull/492#discussion_r1750868532