Vector API: blend() performance on AArch64

Mon Mar 8 22:18:03 UTC 2021

Hi Gunnar,

I enjoyed reading your article, nicely written. If I may make a one suggestion: you can use JMH with profiling, on the Mac using the command line "-prof dtraceasm”, to present the hotspots of generated assembler code, rather than having to trawl through all the generated code (including that which is discarded after recompilation). Prior to JMH version 1.28 dtraceasm was unreliable, but it has now been fixed.

At the moment I have found using JMH with profiling is the best way to determine what is being vectorized. We don’t have a mechanism to query what vector operations compile to vector hardware instructions, or if a method with Vector API operations compiles without boxing the vectors (namely being represented as arrays at runtime rather than vector registers). Ideally you should not have to know since the API should leverage the hardware support available, but this is still work in progress.

I shall let the ARM folks answer more specifically.

Paul.

> On Mar 8, 2021, at 12:25 PM, Gunnar Morling <gunnar at hibernate.org> wrote:
> 
> Hi,
> 
> I was exploring the Vector API a bit [1] and noticed that the performance
> of my vectorized FizzBuzz information is pretty poor on AArch64. I first
> thought this may be specific to the Apple M1 chip on which I was running
> this; but numbers don't look better with Linux (AWS Graviton2, see the repo
> [2] for all numbers) either. My implementation is using the blend() API
> method, is this not (yet) supported on AArch64 perhaps?
> 
> Thanks for any hints,
> 
> --Gunnar
> 
> [1] https://www.morling.dev/blog/fizzbuzz-simd-style/
> [2] https://github.com/gunnarmorling/simd-fizzbuzz