RFR: 8361520: Stabilize SystemGC benchmarks
Aleksey Shipilev
shade at openjdk.org
Tue Jul 8 09:37:50 UTC 2025
On Tue, 8 Jul 2025 09:02:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Noticed this while working on a related bug ([JDK-8359960](https://bugs.openjdk.org/browse/JDK-8359960)):
>
> First, I see the benchmark executes a single shot per fork. As such, I believe the benchmark really tests the cost of initial GC, that probably drags a lot of (potentially non-benchmark-related) objects through new (possibly awkwardly wired, despite +AlwaysPreTouch) memory. The first iteration is 80 ms/op for me here, and the second one is -- whoosh -- only 3 ms/op! Second, the benchmark is really, really noisy. Part of it is due to first iteration being noisy, but also we want more samples to shrink the estimated errors.
>
> Additional testing:
> - [x] Linux x86_64 server fastdebug, `gc.systemgc` benchmark runs
Both benchmark scores and the errors improve. So we made the test more accurate and more precise at the same time.
Benchmark Mode Cnt Score Error Units
# Mainline
AllDead.gc ss 25 9.439 ± 0.117 ms/op
AllLive.gc ss 25 17.979 ± 0.126 ms/op
DifferentObjectSizesArray.gc ss 25 72.549 ± 0.363 ms/op
DifferentObjectSizesHashMap.gc ss 25 75.390 ± 0.336 ms/op
DifferentObjectSizesTreeMap.gc ss 25 86.146 ± 1.813 ms/op
HalfDeadFirstPart.gc ss 25 13.558 ± 0.127 ms/op
HalfDeadInterleaved.gc ss 25 64.643 ± 0.347 ms/op
HalfDeadInterleavedChunks.gc ss 25 59.416 ± 0.218 ms/op
HalfDeadSecondPart.gc ss 25 13.405 ± 0.150 ms/op
HalfHashedHalfDead.gc ss 25 68.121 ± 0.328 ms/op
NoObjects.gc ss 25 7.879 ± 0.099 ms/op
OneBigObject.gc ss 25 118.887 ± 0.697 ms/op
# This PR
AllDead.gc ss 125 7.352 ± 0.060 ms/op
AllLive.gc ss 125 14.964 ± 0.069 ms/op
DifferentObjectSizesArray.gc ss 125 70.373 ± 0.125 ms/op
DifferentObjectSizesHashMap.gc ss 125 73.404 ± 0.137 ms/op
DifferentObjectSizesTreeMap.gc ss 125 81.512 ± 1.044 ms/op
HalfDeadFirstPart.gc ss 125 10.806 ± 0.047 ms/op
HalfDeadInterleaved.gc ss 125 61.393 ± 0.097 ms/op
HalfDeadInterleavedChunks.gc ss 125 56.400 ± 0.094 ms/op
HalfDeadSecondPart.gc ss 125 10.748 ± 0.056 ms/op
HalfHashedHalfDead.gc ss 125 64.909 ± 0.091 ms/op
NoObjects.gc ss 125 5.616 ± 0.063 ms/op
OneBigObject.gc ss 125 116.459 ± 0.324 ms/op
The drawback is that we do more work, so run times and cpu consumption is about 2..3x worse. That's the price we pay for more precision/accuracy.
# Mainline
real 5m46.048s
user 9m42.887s
sys 37m20.507s
# This PR
real 12m3.271s
user 67m55.798s
sys 34m46.397s
In the [JDK-8359960](https://bugs.openjdk.org/browse/JDK-8359960), where I was chasing a regression, we now in much better place, focusing on more stable GC cost.
# java -jar benchmarks.jar AllDead.gc --jvmArgsAppend "-XX:+UseZGC"
Mainline: 80,596 ± 3,801 ms/op
This PR: 6,057 ± 0,097 ms/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26182#issuecomment-3048014191
PR Comment: https://git.openjdk.org/jdk/pull/26182#issuecomment-3048112617
More information about the hotspot-gc-dev
mailing list