[vectorIntrinsics] Processing interleaved data formats
Paul Sandoz
paul.sandoz at oracle.com
Thu Apr 15 22:19:28 UTC 2021
Hi Peter,
If you sent you code as an attachment, then it got removed by the email server. Share as gist?
A shuffle can be used to rearrange elements in a vector. A shuffle redirects lane elements.
In the case of a matrix of complex elements it should be possible to use this method:
Vector<E> rearrange(VectorShuffle<E> s, Vector<E> v);
e.g. two vectors loaded from the matrix buffer, rearrange to produce one vector containing the real elements. However, there is some cost to that.
There are also load/store operations that accept an indexMap e.g.:
IntVector fromArray(VectorSpecies<Integer> species,
int[] a, int offset,
int[] indexMap, int mapOffset) {
These may be more appropriate for your needs. Again there is a cost to those. We don’t have load stores for an iota-like pattern, that may be more optimal for linear structures.
It would be interesting to look at some C/asm vector code for inspiration. There is probably some clever way with a shuffle rotate blend combination (with some redundant calculations) [*].
Paul.
[*]
Off the top of my head:
R1 I1 R2 I2 R3 I3 R4 I4
*
r1 i1 r2 i2 r3 i3 r4 i4
=
R1*r1 I1*i1 R2*r2 I2*i2 R3*r3 I3*i3 R4*r4 I4*i4
-
I1*i1 R2*r2 I2*i2 R3*r3 I3*i3 R4*r4 I4*i4 R1*r1 // rotate by -1
=
v1
R1 I1 R2 I2 R3 I3 R4 I4
*
i1 r1 i2 r2 i3 r3 i4 r4 // swap r and I using a shuffle
=
R1*i1 I1*r1 R2*i2 I2*r2 R3*i3 I3*r3 R4*i4 I4*r4
+
I4*r4 R1*i1 I1*r1 R2*i2 I2*r2 R3*i3 I3*r3 R4*i4 // rotate by 1
=
v2
v = v1.blend(v2, mask(01010101))
> On Apr 14, 2021, at 6:40 PM, Peter A <peter.abeles at gmail.com> wrote:
>
> I'm attempting to vectorize interleaved data formats. For example, complex
> matrices are often stored in an array where elements alternate between real
> and imaginary values. This also comes up in low level image processing,
> e.g. YUV to RGB and debayer.
>
> Here's what the actual math looks like for complex multiplication of a
> scalar value against a vector:
>
> double realA = ...
> double imagA = ...
>
> while (indexB < end) {
> double realB = B.data[indexB++];
> double imagB = B.data[indexB++];
>
> C.data[indexC++] = realA * realB - imagA * imagB;
> C.data[indexC++] = realA * imagB + imagA * realB;
> }
>
> My attempts so far have failed since at some point I need to address the
> memory not being "continuous" and I resort to non vectorized code. I did a
> quick search and it seems like I need to use a "shuffle" command. I did
> find a shuffle in the Vector API but it wasn't obvious how to apply it
> here, to me at least.
>
> I suspect that some others here know exactly how to approach this problem.
>
> Thanks,
> - Peter
>
> P.S. I'm sharing this code so others can learn from it too.
> --
> "Now, now my good man, this is no time for making enemies." — Voltaire
> (1694-1778), on his deathbed in response to a priest asking that he
> renounce Satan.
More information about the panama-dev
mailing list