[aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs
Andrew Haley
aph at redhat.com
Fri Jun 5 15:01:48 UTC 2020
On 05/06/2020 11:35, Yang Zhang wrote:
> Hi Andrew
>
> Please check this java program.
> http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java
> absvs is used to generate AbsVS node.
> Abss is used to generate AbsI node.
>
> I update the jmh benchmarks to make them aligned with absvs and abss above. The new results are as follows:
> New vector jmh:
> http://cr.openjdk.java.net/~yzhang/8243597/TestVectNew.java
> New scalar jmh:
> http://cr.openjdk.java.net/~yzhang/8243597/TestScalarNew.java
>
> Before:
> Benchmark (size) Mode Cnt Score Error Units
> TestVectNew.testVectAbsVB 1024 avgt 5 1221.852 ± 3.336 us/op
> TestVectNew.testVectAbsVI 1024 avgt 5 1450.422 ± 6.344 us/op
> TestVectNew.testVectAbsVL 1024 avgt 5 1429.934 ± 4.901 us/op
> TestVectNew.testVectAbsVS 1024 avgt 5 1227.134 ± 2.901 us/op
> TestScalarNew.testAbsI 1024 avgt 5 3777.007 ± 10.067 us/op
> TestScalarNew.testAbsL 1024 avgt 5 3776.717 ± 13.776 us/op
> TestScalarNew.testAbsS 1024 avgt 5 3153.195 ± 10.175 us/op
>
> After
> Benchmark (size) Mode Cnt Score Error Units
> TestVectNew.testVectAbsVB 1024 avgt 5 147.389 ± 0.921 us/op
> TestVectNew.testVectAbsVI 1024 avgt 5 444.318 ± 14.107 us/op
> TestVectNew.testVectAbsVL 1024 avgt 5 874.074 ± 2.224 us/op
> TestVectNew.testVectAbsVS 1024 avgt 5 224.559 ± 0.902 us/op
> TestScalarNew.testAbsI 1024 avgt 5 3087.172 ± 62.372 us/op
> TestScalarNew.testAbsL 1024 avgt 5 3113.322 ± 10.237 us/op
> TestScalarNew.testAbsS 1024 avgt 5 2723.048 ± 8.338 us/op
I tried TestAbs with a ThunderX2, and it certainly looks nice: great
improvement across the board.
Benchmark Mode Cnt Score Error Units
TestAbs.absvb avgt 8 971.100 ± 1.544 ns/op
TestAbs.absvs avgt 8 983.061 ± 1.626 ns/op
TestAbs.absvi avgt 8 1170.826 ± 11.055 ns/op
TestAbs.absvl avgt 8 1159.936 ± 3.747 ns/op
Benchmark Mode Cnt Score Error Units
TestAbs.absvb avgt 8 117.981 ± 1.048 ns/op
TestAbs.absvs avgt 8 174.949 ± 4.158 ns/op
TestAbs.absvi avgt 8 352.012 ± 0.884 ns/op
TestAbs.absvl avgt 8 702.076 ± 0.116 ns/op
OK, we're good to go. Thanks, approved.
> Why the improvement of scalar abs is not as obvious as vector abs is because only one instruction is reduced than before.
> Before:
> 0x0000ffff80b763d8: cmp w12, #0x0
> 0x0000ffff80b763dc: neg w11, w12
> 0x0000ffff80b763e0: csel w11, w11, w12, lt // lt = tstop
>
> After:
> 0x0000ffffa0bd7a38: cmp w12, wzr
> 0x0000ffffa0bd7a3c: cneg w13, w12, lt // lt = tstop
That's interesting, too: we don't have a cneg pattern, which is I guess
an omission.
> Ps. The generated assembly files are also attached.
> Before this patch
> http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java.aarch64.ori.asm
> After this patch:
> http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java.aarch64.asm
Great. Again, sorry for the slow response.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the aarch64-port-dev
mailing list