RFR: 8373225: GenShen: More adaptive old-generation growth heuristics
Kelvin Nilsen
kdnilsen at openjdk.org
Sun Dec 7 17:54:24 UTC 2025
On Sat, 29 Nov 2025 01:10:02 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:
> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark.
>
> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old.
The benefits of this PR are demonstrated on an Extremem workload. Comparisons with master are highighted in this spreadsheet:
<img width="2187" height="392" alt="image" src="https://github.com/user-attachments/assets/49935994-7a94-4ace-bc29-7a9e25b32299" />
Highlights:
1. Far fewer old GCs, with slight increase in young GCs (74.45% improvement)
2. Since old GCs are much more costly than young GCs, 4.5% improvement in CPU utilization.
3. Latencies improved across all percentiles (from small increase of 0.3% at p50 to significant increase of 51.2% at p99.999)
The workload is configured as follows:
~/github/jdk.11-17-2025/build/linux-x86_64-server-release/images/jdk/bin/java \
-XX:+UnlockExperimentalVMOptions \
-XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms8g -Xmx8g \
-XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
-XX:ShenandoahMinFreeThreshold=5 \
-XX:ShenandoahFullGCThreshold=1024 \
-Xlog:"gc*=info,ergo" \
-Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \
-XX:+UnlockDiagnosticVMOptions \
-jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \
-dInitializationDelay=45s \
-dDictionarySize=3000000 \
-dNumCustomers=300000 \
-dNumProducts=60000 \
-dCustomerThreads=750 \
-dCustomerPeriod=1600ms \
-dCustomerThinkTime=300ms \
-dKeywordSearchCount=4 \
-dServerThreads=5 \
-dServerPeriod=1s \
-dProductNameLength=10 \
-dBrowsingHistoryQueueCount=5 \
-dSalesTransactionQueueCount=5 \
-dProductDescriptionLength=32 \
-dProductReplacementPeriod=10s \
-dProductReplacementCount=10000 \
-dCustomerReplacementPeriod=5s \
-dCustomerReplacementCount=1000 \
-dBrowsingExpiration=1m \
-dPhasedUpdates=true \
-dPhasedUpdateInterval=30s \
-dSimulationDuration=25m \
-dResponseTimeMeasurements=100000 \
>$t.genshen.reproducer.baseline-8g.out 2>$t.genshen.reproducer.baseline-8g.err &
job_pid=$!
max_rss_kb=0
for s in {1..99}
do
sleep 15
rss_kb=$(ps -o rss= -p $job_pid)
if (( $rss_kb > $max_rss_kb ))
then
max_rss_kb=$rss_kb
fi
done
rss_mb=$((max_rss_kb / 1024))
cpu_percent=$(ps -o cputime -o etime -p $job_pid)
wait $job_pid
echo "RSS: $rss_mb MB" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err
echo "$cpu_percent" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err
gzip $t.genshen.reproducer.baseline-8g.out $t.genshen.reproducer.baseline-8g.err
Note that this PR causes us to operate closer to the edge of the operating envelope. In more aggressively provisioned configurations (same workload in smaller heap, for example), we see some regression in latencies compared to tip. This results because of increased numbers of degenerated GCs which result from starvation of mixed evacuations. This PR causes us to do fewer old GCs, but each old GC is expected to work more efficiently. We expect these regressions to be mitigated by other PRs that are currently under development and review, including:
1. Sharing of collector reserves between young and old
2. Accelerated triggers
3. Surging of GC workers
4. Adaptive old-evac ratio
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622610260
PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622625901
More information about the shenandoah-dev
mailing list