[aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check
Boris Ulasevich
boris.ulasevich at bell-sw.com
Fri Jun 19 16:49:43 UTC 2020
Hi Andrew,
I added the expression canonicalization in the BoolNode::Ideal method:
http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b
The change reduces a number of generated machine instructions on all
ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 and
ARM32 with the given change.
On x86 benchmark performance improves from +1% to +13% depending on the
CPU generation, except of machines affected by Intel Erratum (JDK-8234160)
issue. Maximum decrease observed is -%11. It does not look like a problem
with the proposed benchmark though, but rather like an issue with
Erratum mitigation.
On PowerPC result of the micro-benchmark is also positive. I changed the
micro-benchmark to make it a little bulkier so that we don't hit the
limitations of architectures with a less elaborate branch prediction
mechanism. The original application performance does not change on PowerPC.
thanks,
Boris
Cascade Lake 817.639 ± 15.806 -> 810.058 ± 3.128 ns/op
Whiskey Lake 751.560 ± 29.690 -> 751.390 ± 24.406 ns/op
Whiskey Lake* 803.742 ± 14.280 -> 746.670 ± 5.626 ns/op
Ivy Bridge 1021.523 ± 166.719 -> 903.092 ± 81.799 ns/op
Skylake 690.554 ± 4.839 -> 769.115 ± 18.775 ns/op --- this is
the only case where we see a regression
Skylake* 734.354 ± 8.136 -> 712.512 ± 10.301 ns/op
ARM32 11760.804 ± 335.050 -> 7133.137 ± 17.058 ns/op
ARM64 896.789 ± 3.524 -> 758.096 ± 3.367 ns/op
PowerPC8 5313.218 ± 248.753 -> 1919.234 ± 605.326 ns/op
PowerPC9 6174.107 ± 26.885 -> 1435.108 ± 48.447 ns/op
* = -XX:-IntelJccErratumMitigation
On 15.06.2020 12:28, Andrew Haley wrote:
> On 12/06/2020 19:10, Boris Ulasevich wrote:
>> Please review the new AARCH64 instruction selection rules.
>> The change applies TBZ instruction for bit checks: "if ((var&16) == 16)".
>> This makes 17% performance improvement on the benchmark and 5% on a real
>> application.
> Please forgive me if I am misunderstanding, but...
>
> This is strange Java for anyone to write. The expression "((var&16) == 16)"
> is, I think, equivalent to "((var&16) != 0)". Do you believe that it
> is wise to add new patterns to do this to (potentially) every HotSpot
> back end rather than canonicalize the expression during the
> machine-independent part of C2? This would have the same improvement
> on all targets.
>
More information about the aarch64-port-dev
mailing list