RFR: 8357471: GenShen: Share collector reserves between young and old

Wed May 21 16:08:05 UTC 2025

Genshen independently reserves memory to hold evacuations into young and old generations.  We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young).

This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle.  In this case, we can share the young collector reserves with the old generation.  This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity.

The following spreadsheet snapshots highlight the benefits of this change.  In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time).  This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young.  This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC.  In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive.  The total number of GC cycles decreases significantly.

![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a)

With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles.  By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles.  This reduces CPU load.  The impact on response times is not as significant as with the 6G heap size.  We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100.

![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f)

At 8G heap size, the GC is not at all stressed.  We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100.

![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c)

The command line for these comparisons follows:

            ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \
                -XX:+UnlockExperimentalVMOptions \
                -XX:-ShenandoahPacing \
                -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$s -Xmx$s \
                -XX:ShenandoahMinimumOldTimeMs=25 \
                -XX:ShenandoahFullGCThreshold=1024 \
                -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
                -Xlog:"gc*=info,ergo" \
                -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \
                -XX:+UnlockDiagnosticVMOptions \
                -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \
                -dInitializationDelay=45s \
                -dDictionarySize=3000000 \
                -dNumCustomers=300000 \
                -dNumProducts=60000 \
                -dCustomerThreads=500 \
                -dCustomerPeriod=7s \
                -dCustomerThinkTime=1s \
                -dKeywordSearchCount=4 \
                -dServerThreads=5 \
                -dServerPeriod=5s \
                -dProductNameLength=10 \
                -dBrowsingHistoryQueueCount=5 \
                -dSalesTransactionQueueCount=5 \
                -dProductDescriptionLength=32 \
                -dProductReplacementPeriod=25s \
                -dProductReplacementCount=10 \
                -dCustomerReplacementPeriod=30s \
                -dCustomerReplacementCount=1000 \
                -dBrowsingExpiration=1m \
                -dPhasedUpdates=true \
                -dPhasedUpdateInterval=90s \
                -dSimulationDuration=25m \
                -dResponseTimeMeasurements=100000 \
                >$t.genshen.share-reserves.$r-evac-ratio.$s.out 2>$t.genshen.share-reserves.$r-evac-ratio.$s.err
            gzip $t.genshen.share-reserves.$r-evac-ratio.$s.out $t.genshen.share-reserves.$r-evac-ratio.$s.err

We have tested this patch through our performance pipeline.  Both aarch64 and x86 show similar results, a small increase in concurrent evacuation on the graphchi benchmark, with slight improvements of other metrics on a number of other test workloads:

Genshen aarch64
-------------------------------------------------------------------------------------------------------
+16.35% graphchi/concurrent_evacuation p=0.00000
  Control:      1.895ms (+/-392.33us)        306
  Test:         2.205ms (+/-401.72us)        124

-33.43% specjbb2015/concurrent_marking_old p=0.00213
  Control:    513.923ms (+/-225.22ms)        338
  Test:       385.169ms (+/-231.25ms)         38

-28.58% specjbb2015/cm_parallel_mark_old p=0.00833
  Control:      1.022s  (+/-446.83ms)        333
  Test:       794.476ms (+/-440.83ms)         35

-25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.113ms (+/-  0.01ms)        285
  Test:         0.090ms (+/-  0.02ms)        158

-18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.106ms (+/-  0.01ms)        474
  Test:         0.090ms (+/-  0.01ms)        180

-15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103
  Control:    384.599ms (+/- 76.47ms)        277
  Test:       333.581ms (+/- 89.51ms)         55

-15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55

-15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55

Shenandoah aarch64
-------------------------------------------------------------------------------------------------------
-12.07% extremem-phased/update_references p=0.00050
  Control:    479.826ms (+/- 52.78ms)         23
  Test:       428.148ms (+/-  2.26ms)          3

Genshen x86
-------------------------------------------------------------------------------------------------------
+16.35% graphchi/concurrent_evacuation p=0.00000
  Control:      1.895ms (+/-392.33us)        306
  Test:         2.205ms (+/-401.72us)        124

-33.43% specjbb2015/concurrent_marking_old p=0.00213
  Control:    513.923ms (+/-225.22ms)        338
  Test:       385.169ms (+/-231.25ms)         38

-28.58% specjbb2015/cm_parallel_mark_old p=0.00833
  Control:      1.022s  (+/-446.83ms)        333
  Test:       794.476ms (+/-440.83ms)         35

-25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.113ms (+/-  0.01ms)        285
  Test:         0.090ms (+/-  0.02ms)        158

-18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.106ms (+/-  0.01ms)        474
  Test:         0.090ms (+/-  0.01ms)        180

-15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103
  Control:    384.599ms (+/- 76.47ms)        277
  Test:       333.581ms (+/- 89.51ms)         55

-15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55

-15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55

Shenandoah x86
-------------------------------------------------------------------------------------------------------
-12.07% extremem-phased/update_references p=0.00050
  Control:    479.826ms (+/- 52.78ms)         23
  Test:       428.148ms (+/-  2.26ms)          3

-------------

Commit messages:
 - Fix whitespace
 - Merge remote-tracking branch 'jdk/master' into share-collector-reserves
 - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves
 - make old gc more aggresive
 - Change fullgc phase5 return type
 - compute_old_generation_balance needs available computations under lock
 - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves
 - Fixup bugs introduced by most recent commit
 - Improve empty region accounting in FreeSet
 - Revert "Acquire heaplock before adjusting interval for old"
 - ... and 24 more: https://git.openjdk.org/jdk/compare/6162e2c5...3d55a646

Changes: https://git.openjdk.org/jdk/pull/25357/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8357471
  Stats: 1354 lines in 28 files changed: 781 ins; 333 del; 240 mod
  Patch: https://git.openjdk.org/jdk/pull/25357.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357

PR: https://git.openjdk.org/jdk/pull/25357