RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3]

Tue Feb 6 01:55:56 UTC 2024

On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix typo in comment

Analysis of the spreadsheet results:

1. I repeated the experiment 5 times because the results were so "noisy"
2. Turns out, the results have almost nothing to do with how many degens we do before upgrading to full following a humongous allocation failure.  The reaons is because humongous allocation failures are very rare (on these workloads).
3. The biggest problem is "upgrading degen to full gc".  Whenever we see more than a few upgrades to full gc, the P50 latency explodes on us.
   1. Often, we see the same exat workload configuration run without any upgrades to full gc and perform much better.  Consider, for example 25g heap, Mainline.  Three runs ran without upgrades to full gc.  P50 latency was 1_147, 1_145, and 1_144 microseconds respectively.  Two runs experienced upgrades to full gc.  Those had p50 latencies of 223_236_560 and 86_091_599 microseconds respectively.
   2. Note that there's not much middle ground here.  If the workload avoids upgrading to full, the results are good.  If it experiences a single upgrade to full, it is likely to spiral into a lot more upgrades to full.
   3. My assessment: upgrade to full is generally counterproductive.  We take a "long pause" to do degenerated work, and then we throw that work away and start all over with another even longer pause to do full gc.   Meanwhile, client requests continue to accumulate.

Based on these measurements, I'm inclined to recommend the following:

1. If a humongous allocation request fails, we should ask ourselves "How much humongous memory was available at the most recent freeset rebuild?"  If that amount of memory is greater or equal to the size of the requested humongous allocation, we should degenerate (always).  If it is smaller, we should do Full GC.

2. We should never upgrade to full from degen.  The heuristic about "non-productive" degen is confusing us.  What typically happens is we get into an unproductive cycle consisting of the following:

   1. Concurrent GC fails to allocate
   2. Degenerated GC takes multiple seconds to complete
   3. We resume execution with multiple seconds of pentup demand for allocation
   4. Concurrent GC triggers immediately, but may degenerate again if we are "near the edge of the cliff" due to the burst of "catch-up" work that we are trying to perform
   5. All the memory just allocated during concurrent GC is "floating garbage".  None of it can be reclaimed by degenerated GC.
   6. Degen is unproductive.
   7. We upgrade to Full GC.
   8. Full GC reclaims all the floating garbage.
   9. When we resume mutator work, the pentup demand is even larger than the previous scenario because we now have pent up demand from both STW degen phase and STW full GC phase.
   10. So we repeat this cycle over and over and over again.

 I'm going to try an experiment where the only time we do full gc is if there is humongous allocation failure and humongous memory from most recently built free set is too small to satisfy the alloc request.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1928639977