RFR: 8373225: GenShen: More adaptive old-generation growth heuristics

Sun Dec 7 17:54:24 UTC 2025

On Sat, 29 Nov 2025 01:10:02 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark.
> 
> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than  ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old.

The benefits of this PR are demonstrated on an Extremem workload.  Comparisons with master are highighted in this spreadsheet:

<img width="2187" height="392" alt="image" src="https://github.com/user-attachments/assets/49935994-7a94-4ace-bc29-7a9e25b32299" />

Highlights:
1. Far fewer old GCs, with slight increase in young GCs (74.45% improvement)
2. Since old GCs are much more costly than young GCs, 4.5% improvement in CPU utilization.
3. Latencies improved across all percentiles (from small increase of 0.3% at p50 to significant increase of 51.2% at p99.999)

The workload is configured as follows:

               ~/github/jdk.11-17-2025/build/linux-x86_64-server-release/images/jdk/bin/java \
                -XX:+UnlockExperimentalVMOptions \
                -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms8g -Xmx8g \
                -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
                -XX:ShenandoahMinFreeThreshold=5 \
                -XX:ShenandoahFullGCThreshold=1024 \
                -Xlog:"gc*=info,ergo" \
                -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \
                -XX:+UnlockDiagnosticVMOptions \
                -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \
                -dInitializationDelay=45s \
                -dDictionarySize=3000000 \
                -dNumCustomers=300000 \
                -dNumProducts=60000 \
                -dCustomerThreads=750 \
                -dCustomerPeriod=1600ms \
                -dCustomerThinkTime=300ms \
                -dKeywordSearchCount=4 \
                -dServerThreads=5 \
                -dServerPeriod=1s \
                -dProductNameLength=10 \
                -dBrowsingHistoryQueueCount=5 \
                -dSalesTransactionQueueCount=5 \
                -dProductDescriptionLength=32 \
                -dProductReplacementPeriod=10s \
                -dProductReplacementCount=10000 \
                -dCustomerReplacementPeriod=5s \
                -dCustomerReplacementCount=1000 \
                -dBrowsingExpiration=1m \
                -dPhasedUpdates=true \
                -dPhasedUpdateInterval=30s \
                -dSimulationDuration=25m \
                -dResponseTimeMeasurements=100000 \
                >$t.genshen.reproducer.baseline-8g.out 2>$t.genshen.reproducer.baseline-8g.err &
            job_pid=$!
            max_rss_kb=0
            for s in {1..99}
            do
                sleep 15
                rss_kb=$(ps -o rss= -p $job_pid)
                if (( $rss_kb > $max_rss_kb ))
                then
                    max_rss_kb=$rss_kb
                fi
            done
            rss_mb=$((max_rss_kb / 1024))
            cpu_percent=$(ps -o cputime -o etime -p $job_pid)
            wait $job_pid
            echo "RSS: $rss_mb MB" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err
            echo "$cpu_percent" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err
            gzip $t.genshen.reproducer.baseline-8g.out $t.genshen.reproducer.baseline-8g.err

Note that this PR causes us to operate closer to the edge of the operating envelope.  In more aggressively provisioned configurations (same workload in smaller heap, for example), we see some regression in latencies compared to tip.  This results because of increased numbers of degenerated GCs which result from starvation of mixed evacuations.  This PR causes us to do fewer old GCs, but each old GC is expected to work more efficiently.  We expect these regressions to be mitigated by other PRs that are currently under development and review, including:
1. Sharing of collector reserves between young and old
2. Accelerated triggers
3. Surging of GC workers
4. Adaptive old-evac ratio

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622610260
PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622625901