Using the Vector API to access SIMD instructions

Fri Oct 8 15:16:41 UTC 2021

Hello,

I'm implementing two decimal floating-point formats, as defined by the 
IEEE 754 spec. I'm anticipating that they will become primitive classes 
(JEP 401) if I can make them perform reasonably well.

In Decimal128 I find myself coding something like

     int c0, c1, c2, c3;
     long m;
     int f, ph;

     ...

     int d0 = (int) ((m * c0) >>> f);
     int d1 = (int) ((m * c1) >>> f);
     int d2 = (int) ((m * c2) >>> f);
     int d3 = (int) ((m * c3) >>> f);

     int e0 = c0 - ph * d0;
     int e1 = c1 - ph * d1;
     int e2 = c2 - ph * d2;
     int e3 = c3 - ph * d3;

I would like to code these repetitive 4 + 4 lines as SIMD operations 
using the Vector API.

However, it seems to me that I would have to re-code them 3 times, 
depending on the preferred size of the underlying SIMD registers and 
hoping that the preferred size can be constant folded and dead code be 
eliminated by C2.
Also, I would need to first fill a long[4] array with the scalar ci 
values, hoping that the array allocation on the heap is elided by C2.

Is my understanding correct?
Does it make sense to use the Vector API in the first place for such 
small "fixed size" usages?

Thanks for any suggestion

Greetings
Raffaello