RFR: Common TLS access to GC state, where possible
Roman Kennke
rkennke at redhat.com
Mon Jan 15 13:25:07 UTC 2018
Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/
> (The initial version of this patch was drafted by Roland)
>
> This patch bases on single GC state flag patch. This enables us to match that load at once, and
> common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses.
> This covers for the cases where we cannot move the barriers themselves, and thus improves the
> worst-case scenario.
>
> It sure helps targeted back-to-back store benchmarks:
>
> Benchmark Mode Cnt Score Error Units
>
> # default
> BarriersMultiple.test avgt 15 5.935 ± 0.003 ns/op
> BarriersMultiple.test:L1-dcache-loads avgt 3 35.420 ± 2.116 #/op
> BarriersMultiple.test:L1-dcache-stores avgt 3 9.082 ± 0.603 #/op
> BarriersMultiple.test:branches avgt 3 18.187 ± 1.005 #/op
> BarriersMultiple.test:cycles avgt 3 22.401 ± 1.249 #/op
> BarriersMultiple.test:instructions avgt 3 83.810 ± 4.297 #/op
>
> # -XX:+ShenandoahCommonGCStateLoads
> BarriersMultiple.test avgt 15 5.392 ± 0.116 ns/op
> BarriersMultiple.test:L1-dcache-loads avgt 3 26.302 ± 0.456 #/op // -9!
> BarriersMultiple.test:L1-dcache-stores avgt 3 9.078 ± 1.174 #/op
> BarriersMultiple.test:branches avgt 3 18.218 ± 0.092 #/op
> BarriersMultiple.test:cycles avgt 3 20.368 ± 3.023 #/op // -2
> BarriersMultiple.test:instructions avgt 3 86.984 ± 1.127 #/op
>
> ...but comes with the caveat: the increased register pressure (?) seems to penalize some of the
> bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I
> propose pushing this under disabled experimental flag. New test validates the feature is not
> completely broken.
>
> Testing: hotspot_gc_shenandoah
>
> Thanks,
> -Aleksey
>
Also, I am not sure if the patch already does it: what about also moving
up the actual tests? And thus creating longer paths with/without
barriers? I suspect it would be slightly trickier now because of the
different masks that it needs to check? It might not be very useful with
default heuristics because we tend to interleave different barriers
(SATB vs. evac), but may be tremendously useful for traversal GC, where
we only have one phase and can thus group all the barriers into one path
(enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain
barrier-free in another?
Roman
More information about the shenandoah-dev
mailing list