[vector] Sparse load (simple gather) of vector from array?
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Wed Mar 20 23:07:59 UTC 2019
Hi Lev,
> I have (long) array of floats: `float[] data`. I have Spicies of
> FloatArray, `S`. I need to create `FloatVector` with even elements (0,
> 2, 4, ...) set to elements from `data` array and odd elements set to
> zero (`0.0f`).
>
> So, if we use 256 bit vectors shape, it should be like this:
>
> {data[offest+0], 0.0f, data[offest+1], 0.0f, data[offest+2], 0.0f,
> data[offest+3], 0.0f}
>
> I could load vector from array and blend it with zero vector, but it
> will have wrong float data from array (like `offset+0`, `offset+2`, etc).
>
> What is simplest way to perform such "gather" load?
Strictly speaking, it's not a "gather" operation - the input data being
accessed is still laid out contiguously in memory.
You can achieve your goal by adapting vector contents after it is loaded
from memory. On API level you can achieve that by reshape + rearrange +
blend and I see it as the idiomatic way to implement such transformation:
(d_i = data[offest+i])
{d_0, d_1, d_2, d_3} // Float128Vector
=reshape=>
{d_0, d_1, d_2, d_3, 0, 0, 0, 0} // Float256Vector
{ 0, 0, 1, 1, 2, 2, 3, 3} // Float256Shuffle
=rearrange=>
{d_0, d_0, d_1, d_1, d_2, d_2, d_3, d_3} // Float256Vector
{ 0, 0, 0, 0, 0, 0, 0, 0} // Float256Vector
{ T, F, T, F, T, F, T, F} // Float256Mask
=blend=>
{d_0, 0, d_1, 0, d_2, 0, d_3, 0} // Float256Vector
Depending on the platform, there may be a more efficient way available
to implement it. For example, on x86 it's possible to implement a
similar transformation using a single instruction:
* VPSHUFB ymm1, ymm2, ymm3/m256
* VPSHUFD ymm1, ymm2/m256, imm8
Current vision is that it should be back-end responsibility to choose
most optimal implementation, but x86 one isn't powerful enough yet to
perform such optimizations.
For example, rearrange + blend can be fused into a VPSHUFB by setting
MSB in shuffle control mask for elements being zeroed and reshape
becomes no-op when upper part of the source register is known to be
zeroed (e.g., EVEX/VEX-encoded vector `mov`s zero remaining part of the
vector register).
(There were some explorations in simplifying complex vector
transformations [1] [2], but there hasn't been too much progress on that
front yet.)
Best regards,
Vladimir Ivanov
[1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002374.html
[2]
http://mail.openjdk.java.net/pipermail/panama-dev/2018-August/002440.html
More information about the panama-dev
mailing list