RFR: 8312116: JDK GenShen: make instantaneous allocation rate triggers more timely

Mon Jan 5 19:31:14 UTC 2026

On Mon, 5 Jan 2026 15:10:52 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements:
> 
> 1. Track trends in GC times rather than always using the average GC time plus standard deviation.  In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory.
> 2. Sample allocation rates more frequently than once every 100 ms.
> 3. Track trends in allocation rates.  In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload.
> 4. When we detect acceleration of workload, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate.

This PR shows very slight improvements on specjbb tests:

~/github/jdk.accelerated-triggers/build/linux-x86_64-server-release/jdk/bin/java \
  -XX:+UnlockExperimentalVMOptions \
  -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms10g -Xmx10g -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
  -javaagent:/home/kdnilsen/lib/jHiccup-2.0.10/jHiccup.jar=-l,results/specjbb2015-master-jhiccup.log,-i,1000,-a \
  -Xlog:async \
  -Xlog:gc*=info \
  -Xlog:safepoint*=info \
  -Xlog:handshake*=info \
  -jar /home/kdnilsen/lib/specjbb2015/specjbb2015.jar \
  -m composite -ikv \
  -p /home/kdnilsen/lib/specjbb2015/config/specjbb2015.props \
  -raw /home/kdnilsen/lib/specjbb2015/config/template-C.raw >$t.accelerated-trigger.specjbb2015.out 2>$t.accelerated-triggers.specjbb2015.err

<img width="395" height="174" alt="image" src="https://github.com/user-attachments/assets/5111e4d4-ae0a-4f04-b993-f3e6716dbf20" />

We have tested this new PR out with several different heap sizes on a particular Extremem workload and provide the results here.

With 16GB heap size, both master and accelerated-triggers perform poorly.  We consider the JVM to be under provisioned for this workload, and the behavior of accelerated-triggers is considered acceptable compared to master in this configuration.  Accelerated-triggers has 0.24% to 30.5% worse latency across reported response-time percentiles.  On average, it performs 57% more GC cycles, resulting in 50% fewer degenerated cycles (due to earlier triggers).  CPU utilization is 0.60% higher.

<img width="2428" height="421" alt="image" src="https://github.com/user-attachments/assets/c0299e0a-ddd3-409c-a7aa-10502275c86a" />

With 20GB heap size, the benefits of accelerated-triggers are demonstrated in improved p50, p95, and p99 latencies.  Note that accelerated-triggers is able to complete an average of 120% more old GCs than master.  In this configuration, master is more vulnerable to starvation of old generation processing.  Accelerated-triggers performed 30% fewer degenerated cycles and 30% fewer full GC cycles than master.

<img width="2426" height="420" alt="image" src="https://github.com/user-attachments/assets/14d34bf6-6335-417b-bc54-7ac249091ae5" />

With 24GB heap size, both master and accelerated-triggers experienced degraded performance on one of five trials.  This appears to have resulted from starvation of old-gen processing in both cases.  Even so, the accelerated-triggers run was able to complete 5 old collections vs. only 4 completed old collections with master.   For this configuration, we report both average results and trimmed average results.  Average results favor accelerated-triggers at most percentiles.  Trimmed average results favor master at most percentiles.

<img width="2430" height="526" alt="image" src="https://github.com/user-attachments/assets/92a8c69b-cd92-4950-8b87-e70c51b7d718" />

At 28GB heap size, accelerated-triggers shows signifcant strength compared to master.  Three of five trials with master experienced degenerated cycles, and two of five trials with master experienced full GC.  None of the five trials with accelerated-triggers experienced degenerated or full GC cycles.  This manifests in generally better latency across all percentiles.

<img width="2427" height="527" alt="image" src="https://github.com/user-attachments/assets/c7ede7a7-fea9-4533-84d7-61adfd0882c2" />

With the 31GB heap size, latencies are very similar between master and accelerated-triggers.  Accelerated-triggers consumes 15% more CPU as it is performing 103% more GCs.  Note that accelerated-triggers completes one more old GC than master, demonstrating that it is less vulnerable than master to starvation of old-gen processing.

<img width="2427" height="418" alt="image" src="https://github.com/user-attachments/assets/09622d03-352a-4031-8e59-44f373dd280e" />

Note that typical service deployments tend to be provisioned with excess resources.  This allows the services to operate more reliably under transient spikes in client workload, and avoids "rare" triggering missteps that cause unwanted degenerated and full GC cycles.  This particular workload would most typically be deployed today with a 31G heap if it were a production service.  A goal of the GenShen engineering team is to enable more frugal use of CPU and memory resources.  In the longer term, we would hope to enable reliable production deployment of this workload in 28GB or 24GB of memory.

We have observed for some workload that accelerated-triggers increases contention between young-generation and old-generation GC activities, because it often forces more frequent young-generation activities.  In practice, this is often balanced by more timely collection of young, which reduces "urgent" young collection efforts that occur when the JVM is under duress.  Other development efforts are under way to allow more graceful cooperation between young-generation and old-generation concurrent activities when both feel the need to contend for CPU time.

The workload used in the above tests is represented by this script:

            ~/github/jdk.accelerated-triggers/build/linux-x86_64-server-release/images/jdk/bin/java \
                -XX:ActiveProcessorCount=16 \
                -XX:+UnlockExperimentalVMOptions \
                -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$m -Xmx$m \
                -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
                -XX:ShenandoahFullGCThreshold=1024 \
                -XX:ShenandoahGuaranteedOldGCInterval=0 \
                -XX:ShenandoahGuaranteedYoungGCInterval=0 \
                -Xlog:"gc*=info,ergo" \
                -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \
                -XX:+UnlockDiagnosticVMOptions \
                -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \
                -dDictionarySize=3000000 \
                -dNumCustomers=9000000 \
                -dNumProducts=240000 \
                -dCustomerThreads=800 \
                -dAllowAnyMatch=false \
                -dCustomerPeriod=2s \
                -dCustomerThinkTime=300ms \
                -dKeywordSearchCount=4 \
                -dSelectionCriteriaCount=2 \
                -dProductReviewLength=12 \
                -dServerThreads=5 \
                -dServerPeriod=10s \
                -dProductNameLength=10 \
                -dBrowsingHistoryQueueCount=5 \
                -dSalesTransactionQueueCount=5 \
                -dProductDescriptionLength=320 \
                -dProductReplacementPeriod=60s \
                -dProductReplacementCount=25 \
                -dCustomerReplacementPeriod=60s \
                -dCustomerReplacementCount=1500 \
                -dBrowsingExpiration=1m \
                -dPhasedUpdates=true \
                -dPhasedUpdateInterval=60s \
                -dSimulationDuration=25m \
                -dResponseTimeMeasurements=100000 \
                >$t.$m.genshen.medium.accelerated.out \
                2>$t.$m.genshen.medium.accelerated.err &
            job_pid=$!
            sleep 1500
            cpu_percent=$(ps -o cputime -o etime -p $job_pid)
            rss_kb=$(ps -o rss= -p $job_pid)
            rss_mb=$((rss_kb / 1024))
            wait $job_pid
            echo "RSS: $rss_mb MB" >>$t.$m.genshen.medium.accelerated.out
            echo "$cpu_percent" >>$t.$m.genshen.medium.accelerated.out
            gzip $t.$m.genshen.medium.accelerated.out $t.$m.genshen.medium.accelerated.err

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3710878539
PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3711727740