Vector API: blend() performance on AArch64

Tue Mar 9 08:08:06 UTC 2021

Paul, Ningsheng,

> you can use JMH with profiling, on the Mac using the command line "-prof
dtraceasm”

Ah, yes, good point. I had shied away from this so far, as it requires
system integrity protection to be disabled, but I'll look into it.

> However, on AArch64 NEON, the max hardware vector size is 128 bits

I see, that makes sense.

Thanks a lot for taking your time to look into this and your replies! I've
updated this part of the post as per that info.

--Gunnar

Am Di., 9. März 2021 um 06:59 Uhr schrieb Ningsheng Jian <
ningsheng.jian at arm.com>:

> Hi Gunnar,
>
> Thanks for trying Vector API on AArch64. I see you were using
> IntVector.SPECIES_256 species in your benchmarks. However, on AArch64
> NEON, the max hardware vector size is 128 bits. So for 256-bits, we are
> not able to intrinsify to use SIMD directly, which will fall back to
> Java implementation of those APIs, blend() for example. You can use
> -XX:+PrintIntrinsics option to see some details.
>
> For the benchmarks, I would suggest to write in a more (performance)
> portable way, e.g. use IntVector.SPECIES_PREFERRED and do not assume the
> actual vector length in code logic.
>
> Thanks,
> Ningsheng
>
> On 3/9/21 4:25 AM, Gunnar Morling wrote:
> > Hi,
> >
> > I was exploring the Vector API a bit [1] and noticed that the performance
> > of my vectorized FizzBuzz information is pretty poor on AArch64. I first
> > thought this may be specific to the Apple M1 chip on which I was running
> > this; but numbers don't look better with Linux (AWS Graviton2, see the
> repo
> > [2] for all numbers) either. My implementation is using the blend() API
> > method, is this not (yet) supported on AArch64 perhaps?
> >
> > Thanks for any hints,
> >
> > --Gunnar
> >
> > [1] https://www.morling.dev/blog/fizzbuzz-simd-style/
> > [2] https://github.com/gunnarmorling/simd-fizzbuzz
> >
>
>