Perf: SATB and WB coalescing

Aleksey Shipilev shade at redhat.com
Thu Jan 11 10:51:24 UTC 2018


On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
> Okay, so the dirty patch for the idea:
>   http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
> 
> perfasm for the offending test:
>   http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
> 
>  *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
> -- which might be the lesser evil;

Hey, this one works with the dirty hack like this:
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch

It now drags commons GC state loads (and puts in the register):
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm

...and this eliminates around 8 L1 reads, that recovers 50% of the overhead:

Benchmark                               Mode  Cnt   Score    Error  Units

# -WB -SATB
BarriersMultiple.test                   avgt   15   2.760 ±  0.081  ns/op
BarriersMultiple.test:L1-dcache-loads   avgt    3  13.121 ±  0.444   #/op
BarriersMultiple.test:L1-dcache-stores  avgt    3   8.089 ±  0.141   #/op
BarriersMultiple.test:branches          avgt    3   4.039 ±  0.220   #/op
BarriersMultiple.test:cycles            avgt    3  10.429 ±  2.041   #/op
BarriersMultiple.test:instructions      avgt    3  30.306 ±  2.414   #/op

# +WB +SATB
BarriersMultiple.test                   avgt   15   4.897 ±  0.003  ns/op
BarriersMultiple.test:L1-dcache-loads   avgt    3  28.195 ±  0.838   #/op
BarriersMultiple.test:L1-dcache-stores  avgt    3   8.102 ±  0.274   #/op
BarriersMultiple.test:branches          avgt    3  13.074 ±  0.344   #/op
BarriersMultiple.test:cycles            avgt    3  18.492 ±  2.365   #/op
BarriersMultiple.test:instructions      avgt    3  56.423 ±  1.681   #/op

# +WB +SATB +TLS commoning
BarriersMultiple.test                   avgt   15   3.884 ±  0.003  ns/op
BarriersMultiple.test:L1-dcache-loads   avgt    3  20.221 ±  0.602   #/op  // -8!
BarriersMultiple.test:L1-dcache-stores  avgt    3   8.093 ±  0.264   #/op
BarriersMultiple.test:branches          avgt    3  13.133 ±  0.395   #/op
BarriersMultiple.test:cycles            avgt    3  14.668 ±  0.771   #/op  // -4!
BarriersMultiple.test:instructions      avgt    3  58.636 ±  2.368   #/op


Thanks,
-Aleksey



More information about the shenandoah-dev mailing list