Vector API: blend() performance on AArch64
Ningsheng Jian
ningsheng.jian at arm.com
Tue Mar 9 05:59:14 UTC 2021
Hi Gunnar,
Thanks for trying Vector API on AArch64. I see you were using
IntVector.SPECIES_256 species in your benchmarks. However, on AArch64
NEON, the max hardware vector size is 128 bits. So for 256-bits, we are
not able to intrinsify to use SIMD directly, which will fall back to
Java implementation of those APIs, blend() for example. You can use
-XX:+PrintIntrinsics option to see some details.
For the benchmarks, I would suggest to write in a more (performance)
portable way, e.g. use IntVector.SPECIES_PREFERRED and do not assume the
actual vector length in code logic.
Thanks,
Ningsheng
On 3/9/21 4:25 AM, Gunnar Morling wrote:
> Hi,
>
> I was exploring the Vector API a bit [1] and noticed that the performance
> of my vectorized FizzBuzz information is pretty poor on AArch64. I first
> thought this may be specific to the Apple M1 chip on which I was running
> this; but numbers don't look better with Linux (AWS Graviton2, see the repo
> [2] for all numbers) either. My implementation is using the blend() API
> method, is this not (yet) supported on AArch64 perhaps?
>
> Thanks for any hints,
>
> --Gunnar
>
> [1] https://www.morling.dev/blog/fizzbuzz-simd-style/
> [2] https://github.com/gunnarmorling/simd-fizzbuzz
>
More information about the panama-dev
mailing list