Vector API: blend() performance on AArch64

Tue Mar 9 05:59:14 UTC 2021

Hi Gunnar,

Thanks for trying Vector API on AArch64. I see you were using 
IntVector.SPECIES_256 species in your benchmarks. However, on AArch64 
NEON, the max hardware vector size is 128 bits. So for 256-bits, we are 
not able to intrinsify to use SIMD directly, which will fall back to 
Java implementation of those APIs, blend() for example. You can use 
-XX:+PrintIntrinsics option to see some details.

For the benchmarks, I would suggest to write in a more (performance) 
portable way, e.g. use IntVector.SPECIES_PREFERRED and do not assume the 
actual vector length in code logic.

Thanks,
Ningsheng

On 3/9/21 4:25 AM, Gunnar Morling wrote:
> Hi,
> 
> I was exploring the Vector API a bit [1] and noticed that the performance
> of my vectorized FizzBuzz information is pretty poor on AArch64. I first
> thought this may be specific to the Apple M1 chip on which I was running
> this; but numbers don't look better with Linux (AWS Graviton2, see the repo
> [2] for all numbers) either. My implementation is using the blend() API
> method, is this not (yet) supported on AArch64 perhaps?
> 
> Thanks for any hints,
> 
> --Gunnar
> 
> [1] https://www.morling.dev/blog/fizzbuzz-simd-style/
> [2] https://github.com/gunnarmorling/simd-fizzbuzz
>