RFR: 8361520: Stabilize SystemGC benchmarks

Tue Jul 8 09:37:50 UTC 2025

On Tue, 8 Jul 2025 09:02:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> Noticed this while working on a related bug ([JDK-8359960](https://bugs.openjdk.org/browse/JDK-8359960)):
> 
> First, I see the benchmark executes a single shot per fork. As such, I believe the benchmark really tests the cost of initial GC, that probably drags a lot of (potentially non-benchmark-related) objects through new (possibly awkwardly wired, despite +AlwaysPreTouch) memory. The first iteration is 80 ms/op for me here, and the second one is -- whoosh -- only 3 ms/op! Second, the benchmark is really, really noisy. Part of it is due to first iteration being noisy, but also we want more samples to shrink the estimated errors.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `gc.systemgc` benchmark runs

Both benchmark scores and the errors improve. So we made the test more accurate and more precise at the same time.

Benchmark                       Mode  Cnt    Score   Error  Units

# Mainline
AllDead.gc                        ss   25    9.439 ± 0.117  ms/op
AllLive.gc                        ss   25   17.979 ± 0.126  ms/op
DifferentObjectSizesArray.gc      ss   25   72.549 ± 0.363  ms/op
DifferentObjectSizesHashMap.gc    ss   25   75.390 ± 0.336  ms/op
DifferentObjectSizesTreeMap.gc    ss   25   86.146 ± 1.813  ms/op
HalfDeadFirstPart.gc              ss   25   13.558 ± 0.127  ms/op
HalfDeadInterleaved.gc            ss   25   64.643 ± 0.347  ms/op
HalfDeadInterleavedChunks.gc      ss   25   59.416 ± 0.218  ms/op
HalfDeadSecondPart.gc             ss   25   13.405 ± 0.150  ms/op
HalfHashedHalfDead.gc             ss   25   68.121 ± 0.328  ms/op
NoObjects.gc                      ss   25    7.879 ± 0.099  ms/op
OneBigObject.gc                   ss   25  118.887 ± 0.697  ms/op

# This PR
AllDead.gc                        ss  125    7.352 ± 0.060  ms/op
AllLive.gc                        ss  125   14.964 ± 0.069  ms/op
DifferentObjectSizesArray.gc      ss  125   70.373 ± 0.125  ms/op
DifferentObjectSizesHashMap.gc    ss  125   73.404 ± 0.137  ms/op
DifferentObjectSizesTreeMap.gc    ss  125   81.512 ± 1.044  ms/op
HalfDeadFirstPart.gc              ss  125   10.806 ± 0.047  ms/op
HalfDeadInterleaved.gc            ss  125   61.393 ± 0.097  ms/op
HalfDeadInterleavedChunks.gc      ss  125   56.400 ± 0.094  ms/op
HalfDeadSecondPart.gc             ss  125   10.748 ± 0.056  ms/op
HalfHashedHalfDead.gc             ss  125   64.909 ± 0.091  ms/op
NoObjects.gc                      ss  125    5.616 ± 0.063  ms/op
OneBigObject.gc                   ss  125  116.459 ± 0.324  ms/op

The drawback is that we do more work, so run times and cpu consumption is about 2..3x worse. That's the price we pay for more precision/accuracy.

# Mainline
real	5m46.048s
user	9m42.887s
sys	37m20.507s

# This PR
real	12m3.271s
user	67m55.798s
sys	34m46.397s

In the [JDK-8359960](https://bugs.openjdk.org/browse/JDK-8359960), where I was chasing a regression, we now in much better place, focusing on more stable GC cost.

# java -jar benchmarks.jar AllDead.gc --jvmArgsAppend "-XX:+UseZGC"
Mainline: 80,596 ± 3,801  ms/op
This PR:   6,057 ± 0,097  ms/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26182#issuecomment-3048014191
PR Comment: https://git.openjdk.org/jdk/pull/26182#issuecomment-3048112617