[aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check

Boris Ulasevich boris.ulasevich at bell-sw.com
Fri Jun 19 16:49:43 UTC 2020


Hi Andrew,

I added the expression canonicalization in the BoolNode::Ideal method:
http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b

The change reduces a number of generated machine instructions on all
ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 and
ARM32 with the given change.

On x86 benchmark performance improves from +1% to +13% depending on the
CPU generation, except of machines affected by Intel Erratum (JDK-8234160)
issue. Maximum decrease observed is -%11. It does not look like a problem
with the proposed benchmark though, but rather like an issue with 
Erratum mitigation.

On PowerPC result of the micro-benchmark is also positive. I changed the
micro-benchmark to make it a little bulkier so that we don't hit the
limitations of architectures with a less elaborate branch prediction
mechanism. The original application performance does not change on PowerPC.

thanks,
Boris

Cascade Lake  817.639 ± 15.806  -> 810.058 ± 3.128   ns/op
Whiskey Lake  751.560 ± 29.690  -> 751.390 ± 24.406  ns/op
Whiskey Lake* 803.742 ± 14.280  -> 746.670 ±  5.626  ns/op
Ivy Bridge   1021.523 ± 166.719 -> 903.092 ± 81.799  ns/op
Skylake       690.554 ± 4.839   -> 769.115 ± 18.775  ns/op --- this is 
the only case where we see a regression
Skylake*      734.354 ± 8.136   -> 712.512 ± 10.301  ns/op
ARM32       11760.804 ± 335.050 -> 7133.137 ± 17.058 ns/op
ARM64         896.789 ± 3.524   -> 758.096 ± 3.367   ns/op
PowerPC8     5313.218 ± 248.753 -> 1919.234 ± 605.326 ns/op
PowerPC9     6174.107 ± 26.885  -> 1435.108 ± 48.447 ns/op

* = -XX:-IntelJccErratumMitigation

On 15.06.2020 12:28, Andrew Haley wrote:
> On 12/06/2020 19:10, Boris Ulasevich wrote:
>> Please review the new AARCH64 instruction selection rules.
>> The change applies TBZ instruction for bit checks: "if ((var&16) == 16)".
>> This makes 17% performance improvement on the benchmark and 5% on a real
>> application.
> Please forgive me if I am misunderstanding, but...
>
> This is strange Java for anyone to write. The expression "((var&16) == 16)"
> is, I think, equivalent to "((var&16) != 0)". Do you believe that it
> is wise to add new patterns to do this to (potentially) every HotSpot
> back end rather than canonicalize the expression during the
> machine-independent part of C2? This would have the same improvement
> on all targets.
>



More information about the aarch64-port-dev mailing list