RFR: 8357471: GenShen: Share collector reserves between young and old
Kelvin Nilsen
kdnilsen at openjdk.org
Wed May 21 16:08:05 UTC 2025
Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young).
This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity.
The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly.

With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100.

At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100.

The command line for these comparisons follows:
~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \
-XX:+UnlockExperimentalVMOptions \
-XX:-ShenandoahPacing \
-XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$s -Xmx$s \
-XX:ShenandoahMinimumOldTimeMs=25 \
-XX:ShenandoahFullGCThreshold=1024 \
-XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
-Xlog:"gc*=info,ergo" \
-Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \
-XX:+UnlockDiagnosticVMOptions \
-jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \
-dInitializationDelay=45s \
-dDictionarySize=3000000 \
-dNumCustomers=300000 \
-dNumProducts=60000 \
-dCustomerThreads=500 \
-dCustomerPeriod=7s \
-dCustomerThinkTime=1s \
-dKeywordSearchCount=4 \
-dServerThreads=5 \
-dServerPeriod=5s \
-dProductNameLength=10 \
-dBrowsingHistoryQueueCount=5 \
-dSalesTransactionQueueCount=5 \
-dProductDescriptionLength=32 \
-dProductReplacementPeriod=25s \
-dProductReplacementCount=10 \
-dCustomerReplacementPeriod=30s \
-dCustomerReplacementCount=1000 \
-dBrowsingExpiration=1m \
-dPhasedUpdates=true \
-dPhasedUpdateInterval=90s \
-dSimulationDuration=25m \
-dResponseTimeMeasurements=100000 \
>$t.genshen.share-reserves.$r-evac-ratio.$s.out 2>$t.genshen.share-reserves.$r-evac-ratio.$s.err
gzip $t.genshen.share-reserves.$r-evac-ratio.$s.out $t.genshen.share-reserves.$r-evac-ratio.$s.err
We have tested this patch through our performance pipeline. Both aarch64 and x86 show similar results, a small increase in concurrent evacuation on the graphchi benchmark, with slight improvements of other metrics on a number of other test workloads:
Genshen aarch64
-------------------------------------------------------------------------------------------------------
+16.35% graphchi/concurrent_evacuation p=0.00000
Control: 1.895ms (+/-392.33us) 306
Test: 2.205ms (+/-401.72us) 124
-33.43% specjbb2015/concurrent_marking_old p=0.00213
Control: 513.923ms (+/-225.22ms) 338
Test: 385.169ms (+/-231.25ms) 38
-28.58% specjbb2015/cm_parallel_mark_old p=0.00833
Control: 1.022s (+/-446.83ms) 333
Test: 794.476ms (+/-440.83ms) 35
-25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000
Control: 0.113ms (+/- 0.01ms) 285
Test: 0.090ms (+/- 0.02ms) 158
-18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000
Control: 0.106ms (+/- 0.01ms) 474
Test: 0.090ms (+/- 0.01ms) 180
-15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103
Control: 384.599ms (+/- 76.47ms) 277
Test: 333.581ms (+/- 89.51ms) 55
-15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105
Control: 768.676ms (+/-152.94ms) 277
Test: 666.786ms (+/-178.97ms) 55
-15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105
Control: 768.676ms (+/-152.94ms) 277
Test: 666.786ms (+/-178.97ms) 55
Shenandoah aarch64
-------------------------------------------------------------------------------------------------------
-12.07% extremem-phased/update_references p=0.00050
Control: 479.826ms (+/- 52.78ms) 23
Test: 428.148ms (+/- 2.26ms) 3
Genshen x86
-------------------------------------------------------------------------------------------------------
+16.35% graphchi/concurrent_evacuation p=0.00000
Control: 1.895ms (+/-392.33us) 306
Test: 2.205ms (+/-401.72us) 124
-33.43% specjbb2015/concurrent_marking_old p=0.00213
Control: 513.923ms (+/-225.22ms) 338
Test: 385.169ms (+/-231.25ms) 38
-28.58% specjbb2015/cm_parallel_mark_old p=0.00833
Control: 1.022s (+/-446.83ms) 333
Test: 794.476ms (+/-440.83ms) 35
-25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000
Control: 0.113ms (+/- 0.01ms) 285
Test: 0.090ms (+/- 0.02ms) 158
-18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000
Control: 0.106ms (+/- 0.01ms) 474
Test: 0.090ms (+/- 0.01ms) 180
-15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103
Control: 384.599ms (+/- 76.47ms) 277
Test: 333.581ms (+/- 89.51ms) 55
-15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105
Control: 768.676ms (+/-152.94ms) 277
Test: 666.786ms (+/-178.97ms) 55
-15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105
Control: 768.676ms (+/-152.94ms) 277
Test: 666.786ms (+/-178.97ms) 55
Shenandoah x86
-------------------------------------------------------------------------------------------------------
-12.07% extremem-phased/update_references p=0.00050
Control: 479.826ms (+/- 52.78ms) 23
Test: 428.148ms (+/- 2.26ms) 3
-------------
Commit messages:
- Fix whitespace
- Merge remote-tracking branch 'jdk/master' into share-collector-reserves
- Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves
- make old gc more aggresive
- Change fullgc phase5 return type
- compute_old_generation_balance needs available computations under lock
- Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves
- Fixup bugs introduced by most recent commit
- Improve empty region accounting in FreeSet
- Revert "Acquire heaplock before adjusting interval for old"
- ... and 24 more: https://git.openjdk.org/jdk/compare/6162e2c5...3d55a646
Changes: https://git.openjdk.org/jdk/pull/25357/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8357471
Stats: 1354 lines in 28 files changed: 781 ins; 333 del; 240 mod
Patch: https://git.openjdk.org/jdk/pull/25357.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357
PR: https://git.openjdk.org/jdk/pull/25357
More information about the hotspot-gc-dev
mailing list