RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3]

Thu Feb 8 19:53:04 UTC 2024

On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix typo in comment

Some key observations, which are highlighted in the attached excel spreadsheet:

1. Concurrent GCs take roughly twice as long as degenerated GCs, and use half as many worker threads. (out-of-cycle degen: 3.105s, fully conc gcs: 6.44s and 6.341s)
2. The sum of degen time plus 1/2 conc gc time that precedes degen is nearly constant: 3.368s, with std dev: 0.291
3. Note that a longer concurrent GC effort is accompanied by a shorter degen effort because degen GC leverages progress already made by concurrent GC.
4. Full GC requires 50% more time than a degen GC (5.026s avg vs 3.368s avg).  This is presumably because full GC compacts everything, whereas degen GC uses garbage-first heuristics to only evacuate regions that are "convenient".
5. Full GC typically reclaims 88% more garbage than conc gc and degen.  This is presumably because full GC reclaims all the floating garbage for allocations that occur following the start of a concurrent GC effort.
6. When Full GC is preceded by a failed concurrent effort and then a failed degen effort, the total duration of the GC cycle is much longer than a typical degen cycle.  The time for a conc/degen cycle ranges from 3.105s (for out-of-cycle degen with no concurrent phase) to 6.44s for concurrent phase without any degen.  In contrast, duration of a full GC cycle ranges from 7.431 w (when 2.393s of concurrent gc effort transitions directly to Full) to 10.329s (when 2.483s of conc gc is followed by 2.636s of degen gc, which experiences bad progress and upgrades to full gc).
7. Under heavy allocation load, full GC tends to self-perpetuate.  This is caused by multiple factors:
    1. The very long GC duration means we're reclaiming memory less efficiently, even though each full GC yields greater available memory
    2. Full GCs, on average, are yielding 1,043 MB/s of wall-clock time spent in GC compared to 931 MB/s for concurrent and degen GC.  However, degen offers a higher peak yield of 1,770 MB/s compared to the peak yield of 1,181 MB/s for full GC.
    3. In terms of CPU time dedicated to GC, Full GC averages 1,239 MB/s vs. 1,529 MB/s for degen.  Peak performance also favors degen, with max of 2,029 MB/s vs. full GC max of 1,460 MB/s.
    4. A final strike against Full GC is that it leaves the heap in a state that is very difficult to recover from.  Specifically, following Full GC, there is no garbage in the heap.  If we immediately trigger concurrent GC, it will be unproductive because there is no garbage to be found in the heap, and any floating garbage created following the start of concurrent GC will not be found until the next concurrent GC cycle.  This is observed in the log, with GC(85) through GC(91) each upgrading to Ful
[full-vs-degen.xlsx](https://github.com/openjdk/jdk/files/14213994/full-vs-degen.xlsx)
l GC.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1934831931