Replicating __mm256_shuffle_epi8 Intrinsic

Radosław Smogura mail at smogura.eu
Tue Jul 6 15:23:15 UTC 2021


Hi Michael,

Shuffling can be problematic sometimes.

I wonder if you tried something like this


byteSwap = VectorShuffle.fromArray(BYTE_VECTOR_SPECIES, shuffleArr, 0);

final var byteSwapVector = byteSwap.toVector();

final var srcVector =  ByteVector.fromArray(BYTE_VECTOR_SPECIES, src, i);
final var dstVector = byteSwapVector.selectFrom(srcVector);

dstVector.intoArray(dst, i);

Kind regards,
Rado

________________________________
From: panama-dev <panama-dev-retn at openjdk.java.net> on behalf of Michael Ennen <mike.ennen at gmail.com>
Sent: Tuesday, July 6, 2021 06:54
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: Replicating __mm256_shuffle_epi8 Intrinsic

I understand that representing the input and output as 32-bit integers is
kind of confusing, but the point is, is that the shuffle as written isn't
doing anything. I have also tried:

            var shuffle = VectorShuffle.fromArray(ByteVector.SPECIES_256,
new int[]{
                    12,13,14,15,   8, 9,10,11,
                    4, 5, 6, 7,    0, 1, 2, 3,
                    12,13,14,15,   8, 9,10,11,
                    4, 5, 6, 7,    0, 1, 2, 3 }, 0)

But still the returned vector is the same.

On Sun, Jul 4, 2021 at 10:14 PM Michael Ennen <mike.ennen at gmail.com> wrote:

> Thanks for your assistance. I am having trouble replicating the same
> results from the first shuffle done in Bitcoin's SHA-AVX2:
>
> The following 8 integers are read in:
>
> Read8: 1684234849, 1886350957, 1684234849, 1886350957, 1684234849,
> 1886350957, 1684234849, 1886350957
>
> These 8 integers are shuffled with _mm256_shuffle_epi8 and the result is:
>
> 1835954032, 1633837924, 1835954032, 1633837924, 1835954032, 1633837924,
> 1835954032, 1633837924
>
> But using your suggested code:
>
> var shuffle = VectorShuffle.fromOp(ByteVector.SPECIES_256, (i ->
> ((8+i)%16)));
> ByteVector shuffled = ret.reinterpretAsBytes().rearrange(shuffle,
> shuffle.laneIsValid());
> return IntVector.fromByteArray(SPECIES_256, shuffled.toArray(), 0,
> ByteOrder.LITTLE_ENDIAN);
>
> I get:
>
> 1684234849, 1886350957, 1684234849, 1886350957, 1684234849, 1886350957,
> 1684234849, 1886350957
>
> That is, the numbers don't seem to be changed.
>
> Thanks for your help.
>
> On Thu, Jul 1, 2021 at 11:18 AM Viswanathan, Sandhya <
> sandhya.viswanathan at intel.com> wrote:
>
>> Hi Michael,
>>
>> The rearrange() api should generate pshufb.
>>
>> e.g. for the following Java code:
>>
>>    static final int SIZE = 1024;
>>    static byte[] a = new byte[SIZE];
>>    static byte[] r = new byte[SIZE];
>>
>>    static final VectorSpecies<Byte> SPECIES = ByteVector.SPECIES_128;
>>    static final VectorShuffle<Byte> HIGHTOLOW =
>> VectorShuffle.fromOp(SPECIES, (i -> ((8+i)%16)));
>>
>>    static void workload() {
>>        VectorShuffle<Byte> vshuf = HIGHTOLOW;
>>
>>        for (int i = 0; i <= a.length - SPECIES.length(); i +=
>> SPECIES.length()) {
>>            var av = ByteVector.fromArray(SPECIES, a, i);
>>            var bv = av.rearrange(vshuf);
>>            bv.intoArray(r, i);
>>        }
>>    }
>>
>> We generate the following code for the loop:
>> 0x00007fc388fa3180:   vmovdqu 0x10(%rsi),%xmm1
>> 0x00007fc388fa3185:   vmovdqu 0x10(%r14),%xmm2
>> 0x00007fc388fa318b:   movslq %eax,%r10
>> 0x00007fc388fa318e:   vmovdqu 0x10(%rbp,%r10,1),%xmm3
>> 0x00007fc388fa3195:   vpcmpgtb %xmm2,%xmm1,%xmm1
>> 0x00007fc388fa3199:   vptest %xmm0,%xmm1
>> 0x00007fc388fa319e:   setne  %r13b
>> 0x00007fc388fa31a2:   movzbl %r13b,%r13d
>> 0x00007fc388fa31a6:   test   %r13d,%r13d
>> 0x00007fc388fa31a9:   jne    0x00007fc388fa31e2
>> 0x00007fc388fa31ab:   vpshufb %xmm2,%xmm3,%xmm3
>> 0x00007fc388fa31b0:   vmovdqu %xmm3,0x10(%r8,%r10,1)
>> 0x00007fc388fa31b7:   add    $0x10,%eax
>> 0x00007fc388fa31ba:   cmp    %ebx,%eax
>> 0x00007fc388fa31bc:   jl     0x00007fc388fa3180
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: panama-dev <panama-dev-retn at openjdk.java.net> On Behalf Of Michael
>> Ennen
>> Sent: Tuesday, June 29, 2021 11:20 PM
>> To: panama-dev at openjdk.java.net
>> Subject: Replicating __mm256_shuffle_epi8 Intrinsic
>>
>>  I am trying to implement SHA-256 using the new Java Vector API.
>>
>> I have read the API docs but crossing the large mental gap of SIMD
>> instructions to the API for someone who knows very little SIMD has been
>> insurmountable for me.
>>
>> My question has been asked on Stack Overflow:
>>
>>
>> https://stackoverflow.com/questions/68135596/replicating-mm256-shuffle-epi8-intrinsic-with-java-vector-api-shuffle
>>
>> It is quite a simple (to ask anyway) question, which is, how to replicate
>> the _mm256_shuffle_epi8 intrinsic with the Java Vector API?
>>
>> Thanks very much.
>>
>> --
>> Michael Ennen
>>
>
>
> --
> Michael Ennen
>


--
Michael Ennen


More information about the panama-dev mailing list