Perf: SATB and WB coalescing
Aleksey Shipilev
shade at redhat.com
Wed Jan 10 09:45:26 UTC 2018
If you do a few back-to-back reference stores, like this:
http://icedtea.classpath.org/hg/gc-bench/file/6ec38e1bea7a/src/main/java/org/openjdk/gcbench/wip/BarriersMultiple.java
Then you shall find what WB coalescing breaks because of the SATB barriers in-between. See:
*) No WB, no SATB -> back-to-back stores:
http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/noWB-noSATB.perfasm
*) WB, but no SATB -> initial evac-in-progress check, then back-to-back stores with RBs:
http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-noSATB.perfasm
*) WB with SATB -> interleaved evac-in-progress and conc-mark-in-progress checks:
http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB.perfasm
It seems the impact of the non-coalesced SATB barriers alone is the culprit, and WB coalescing is
the second-order effect:
Benchmark Mode Cnt Score Error Units
# Base
BarriersMultiple.test avgt 15 2.739 ± 0.003 ns/op
BarriersMultiple.test:L1-dcache-loads avgt 3 13.128 ± 0.475 #/op
BarriersMultiple.test:L1-dcache-stores avgt 3 8.103 ± 0.133 #/op
BarriersMultiple.test:branches avgt 3 4.039 ± 0.213 #/op
BarriersMultiple.test:cycles avgt 3 10.344 ± 0.413 #/op
BarriersMultiple.test:instructions avgt 3 30.273 ± 1.280 #/op
# +WB
BarriersMultiple.test avgt 15 3.459 ± 0.011 ns/op
BarriersMultiple.test:L1-dcache-loads avgt 3 19.195 ± 0.638 #/op // +6
BarriersMultiple.test:L1-dcache-stores avgt 3 8.080 ± 0.539 #/op
BarriersMultiple.test:branches avgt 3 4.045 ± 0.118 #/op
BarriersMultiple.test:cycles avgt 3 13.031 ± 0.324 #/op // +3
BarriersMultiple.test:instructions avgt 3 40.426 ± 1.133 #/op
# +SATB
BarriersMultiple.test avgt 15 3.620 ± 0.005 ns/op
BarriersMultiple.test:L1-dcache-loads avgt 3 18.148 ± 0.519 #/op // +5
BarriersMultiple.test:L1-dcache-stores avgt 3 8.065 ± 0.409 #/op
BarriersMultiple.test:branches avgt 3 13.115 ± 0.423 #/op
BarriersMultiple.test:cycles avgt 3 13.628 ± 0.471 #/op // +3.5
BarriersMultiple.test:instructions avgt 3 49.421 ± 1.880 #/op
# +SATB +WB
BarriersMultiple.test avgt 15 4.923 ± 0.040 ns/op
BarriersMultiple.test:L1-dcache-loads avgt 3 28.269 ± 1.519 #/op // +15 (should be +11)
BarriersMultiple.test:L1-dcache-stores avgt 3 8.112 ± 1.161 #/op
BarriersMultiple.test:branches avgt 3 13.134 ± 1.134 #/op
BarriersMultiple.test:cycles avgt 3 18.561 ± 1.198 #/op // +8 (should be +6.5)
BarriersMultiple.test:instructions avgt 3 56.577 ± 4.024 #/op
I wonder if that means we need to go forward with tracking the GC state in one single flag, and
polling it with different masks, then coalescing the paths when masks are similar?
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list