Perf: SATB and WB coalescing

Roman Kennke rkennke at redhat.com
Thu Jan 11 11:19:50 UTC 2018


Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev:
> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
>> Okay, so the dirty patch for the idea:
>>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
>>

>> perfasm for the offending test:
>>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>>
>>   *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
>> -- which might be the lesser evil;
> 
> Hey, this one works with the dirty hack like this:
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch
> 
> It now drags commons GC state loads (and puts in the register):
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm
> 
> ...and this eliminates around 8 L1 reads, that recovers 50% of the overhead:
> 
> Benchmark                               Mode  Cnt   Score    Error  Units
> 
> # -WB -SATB
> BarriersMultiple.test                   avgt   15   2.760 ±  0.081  ns/op
> BarriersMultiple.test:L1-dcache-loads   avgt    3  13.121 ±  0.444   #/op
> BarriersMultiple.test:L1-dcache-stores  avgt    3   8.089 ±  0.141   #/op
> BarriersMultiple.test:branches          avgt    3   4.039 ±  0.220   #/op
> BarriersMultiple.test:cycles            avgt    3  10.429 ±  2.041   #/op
> BarriersMultiple.test:instructions      avgt    3  30.306 ±  2.414   #/op
> 
> # +WB +SATB
> BarriersMultiple.test                   avgt   15   4.897 ±  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads   avgt    3  28.195 ±  0.838   #/op
> BarriersMultiple.test:L1-dcache-stores  avgt    3   8.102 ±  0.274   #/op
> BarriersMultiple.test:branches          avgt    3  13.074 ±  0.344   #/op
> BarriersMultiple.test:cycles            avgt    3  18.492 ±  2.365   #/op
> BarriersMultiple.test:instructions      avgt    3  56.423 ±  1.681   #/op
> 
> # +WB +SATB +TLS commoning
> BarriersMultiple.test                   avgt   15   3.884 ±  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads   avgt    3  20.221 ±  0.602   #/op  // -8!
> BarriersMultiple.test:L1-dcache-stores  avgt    3   8.093 ±  0.264   #/op
> BarriersMultiple.test:branches          avgt    3  13.133 ±  0.395   #/op
> BarriersMultiple.test:cycles            avgt    3  14.668 ±  0.771   #/op  // -4!
> BarriersMultiple.test:instructions      avgt    3  58.636 ±  2.368   #/op
> 
> 
> Thanks,
> -Aleksey
> 

Ok, this basically makes the load of the flag appear to access immutable 
memory. It can now basically freely float above or below safepoints. We 
need to ensure that this cannot happen, otherwise we'll see the wrong 
flag state. But it seems to be step #1. Maybe restore the control into 
the LoadUBNode is enough to keep it at the right side of safepoints?

Roman



More information about the shenandoah-dev mailing list