Perf: SATB and WB coalescing
Roman Kennke
rkennke at redhat.com
Thu Jan 11 11:19:50 UTC 2018
Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev:
> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
>> Okay, so the dirty patch for the idea:
>> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
>>
>> perfasm for the offending test:
>> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>>
>> *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
>> -- which might be the lesser evil;
>
> Hey, this one works with the dirty hack like this:
> http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch
>
> It now drags commons GC state loads (and puts in the register):
> http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm
>
> ...and this eliminates around 8 L1 reads, that recovers 50% of the overhead:
>
> Benchmark Mode Cnt Score Error Units
>
> # -WB -SATB
> BarriersMultiple.test avgt 15 2.760 ± 0.081 ns/op
> BarriersMultiple.test:L1-dcache-loads avgt 3 13.121 ± 0.444 #/op
> BarriersMultiple.test:L1-dcache-stores avgt 3 8.089 ± 0.141 #/op
> BarriersMultiple.test:branches avgt 3 4.039 ± 0.220 #/op
> BarriersMultiple.test:cycles avgt 3 10.429 ± 2.041 #/op
> BarriersMultiple.test:instructions avgt 3 30.306 ± 2.414 #/op
>
> # +WB +SATB
> BarriersMultiple.test avgt 15 4.897 ± 0.003 ns/op
> BarriersMultiple.test:L1-dcache-loads avgt 3 28.195 ± 0.838 #/op
> BarriersMultiple.test:L1-dcache-stores avgt 3 8.102 ± 0.274 #/op
> BarriersMultiple.test:branches avgt 3 13.074 ± 0.344 #/op
> BarriersMultiple.test:cycles avgt 3 18.492 ± 2.365 #/op
> BarriersMultiple.test:instructions avgt 3 56.423 ± 1.681 #/op
>
> # +WB +SATB +TLS commoning
> BarriersMultiple.test avgt 15 3.884 ± 0.003 ns/op
> BarriersMultiple.test:L1-dcache-loads avgt 3 20.221 ± 0.602 #/op // -8!
> BarriersMultiple.test:L1-dcache-stores avgt 3 8.093 ± 0.264 #/op
> BarriersMultiple.test:branches avgt 3 13.133 ± 0.395 #/op
> BarriersMultiple.test:cycles avgt 3 14.668 ± 0.771 #/op // -4!
> BarriersMultiple.test:instructions avgt 3 58.636 ± 2.368 #/op
>
>
> Thanks,
> -Aleksey
>
Ok, this basically makes the load of the flag appear to access immutable
memory. It can now basically freely float above or below safepoints. We
need to ensure that this cannot happen, otherwise we'll see the wrong
flag state. But it seems to be step #1. Maybe restore the control into
the LoadUBNode is enough to keep it at the right side of safepoints?
Roman
More information about the shenandoah-dev
mailing list