Vector API benefits for small matrices/vectors?

Paul Sandoz paul.sandoz at oracle.com
Fri Dec 17 18:42:46 UTC 2021


You could improve the scalar implementation using Math.fma.

Hopefully, with a clever arrangement you should be able to use vectorized fmas, similar to a kennel in a linear algebra library e.g. accumulate into 4 vectors representing the resulting rows or columns.

Paul.

> On Dec 17, 2021, at 9:30 AM, Mark Raynsford <org.openjdk at io7m.com> wrote:
> 
> Hello!
> 
> On 2021-12-17T17:08:07 +0000
> Paul Sandoz <paul.sandoz at oracle.com> wrote:
> 
>> Hi Mark,
>> 
>> I think your use-case is very relevant to see if the Vector API can be used effectively.
>> 
>> In the Matrix4x4D example you mention I think a different layout or access will be required, namely double[] or ByteBuffer holding the 16 elements rather than 16 independent fields, so the values can be loaded into vectors. 
> 
> Yeah, I thought that might be the case. I've avoided a backing array up
> until now precisely to avoid that extra pointer. In terms of API,
> there's also the issue that I'd need to expose the backing array in
> order to operate on it using the vector API, but arrays are obviously
> mutable and so users shouldn't have access to it. I think the first
> step will be to try writing some versions of all of these functions
> that just mindlessly move the 16 fields to/from a local array in each
> static method. This will at least tell me if the API *could* be faster
> even in the presence of all of that extra moving around of data.
> 
>> You might wanna start assuming say AVX2 and a vector size of 256-bits, then after see if it can be generalized. Writing shape (as in vector bit-size) independent code can be tricker, so that experience would be very useful as a further experiment.
>> 
>> For, say Matrix4x4D multiplication, it will of require that C2 inline the multiply. We don’t yet have vector calling convention support.
> 
> For reference, here's what the heavily warmed up C2 code looks like
> right now, without any use of the vector API:
> 
>  https://gist.github.com/io7m/27de1e5068f953cba228ac3a4323d7ea
> 
> There appear to be 16 scalar multiplications in there, which is at
> least better than the 64 multiplications that appear in the original
> program text. :)
> 
>> Using 17 and the second incubator is a good place to start. It will allow you to slot in an 18-ea or a Panama build if necessary (the API has not changed), since we are making continual improvements to C2. Use-cases are really helpful in that regard.
> 
> Right!
> 
> -- 
> Mark Raynsford | https://www.io7m.com
> 



More information about the panama-dev mailing list