Replicating __mm256_shuffle_epi8 Intrinsic

Michael Ennen mike.ennen at gmail.com
Mon Jul 5 05:14:58 UTC 2021


Thanks for your assistance. I am having trouble replicating the same
results from the first shuffle done in Bitcoin's SHA-AVX2:

The following 8 integers are read in:

Read8: 1684234849, 1886350957, 1684234849, 1886350957, 1684234849,
1886350957, 1684234849, 1886350957

These 8 integers are shuffled with _mm256_shuffle_epi8 and the result is:

1835954032, 1633837924, 1835954032, 1633837924, 1835954032, 1633837924,
1835954032, 1633837924

But using your suggested code:

var shuffle = VectorShuffle.fromOp(ByteVector.SPECIES_256, (i ->
((8+i)%16)));
ByteVector shuffled = ret.reinterpretAsBytes().rearrange(shuffle,
shuffle.laneIsValid());
return IntVector.fromByteArray(SPECIES_256, shuffled.toArray(), 0,
ByteOrder.LITTLE_ENDIAN);

I get:

1684234849, 1886350957, 1684234849, 1886350957, 1684234849, 1886350957,
1684234849, 1886350957

That is, the numbers don't seem to be changed.

Thanks for your help.

On Thu, Jul 1, 2021 at 11:18 AM Viswanathan, Sandhya <
sandhya.viswanathan at intel.com> wrote:

> Hi Michael,
>
> The rearrange() api should generate pshufb.
>
> e.g. for the following Java code:
>
>    static final int SIZE = 1024;
>    static byte[] a = new byte[SIZE];
>    static byte[] r = new byte[SIZE];
>
>    static final VectorSpecies<Byte> SPECIES = ByteVector.SPECIES_128;
>    static final VectorShuffle<Byte> HIGHTOLOW =
> VectorShuffle.fromOp(SPECIES, (i -> ((8+i)%16)));
>
>    static void workload() {
>        VectorShuffle<Byte> vshuf = HIGHTOLOW;
>
>        for (int i = 0; i <= a.length - SPECIES.length(); i +=
> SPECIES.length()) {
>            var av = ByteVector.fromArray(SPECIES, a, i);
>            var bv = av.rearrange(vshuf);
>            bv.intoArray(r, i);
>        }
>    }
>
> We generate the following code for the loop:
> 0x00007fc388fa3180:   vmovdqu 0x10(%rsi),%xmm1
> 0x00007fc388fa3185:   vmovdqu 0x10(%r14),%xmm2
> 0x00007fc388fa318b:   movslq %eax,%r10
> 0x00007fc388fa318e:   vmovdqu 0x10(%rbp,%r10,1),%xmm3
> 0x00007fc388fa3195:   vpcmpgtb %xmm2,%xmm1,%xmm1
> 0x00007fc388fa3199:   vptest %xmm0,%xmm1
> 0x00007fc388fa319e:   setne  %r13b
> 0x00007fc388fa31a2:   movzbl %r13b,%r13d
> 0x00007fc388fa31a6:   test   %r13d,%r13d
> 0x00007fc388fa31a9:   jne    0x00007fc388fa31e2
> 0x00007fc388fa31ab:   vpshufb %xmm2,%xmm3,%xmm3
> 0x00007fc388fa31b0:   vmovdqu %xmm3,0x10(%r8,%r10,1)
> 0x00007fc388fa31b7:   add    $0x10,%eax
> 0x00007fc388fa31ba:   cmp    %ebx,%eax
> 0x00007fc388fa31bc:   jl     0x00007fc388fa3180
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: panama-dev <panama-dev-retn at openjdk.java.net> On Behalf Of Michael
> Ennen
> Sent: Tuesday, June 29, 2021 11:20 PM
> To: panama-dev at openjdk.java.net
> Subject: Replicating __mm256_shuffle_epi8 Intrinsic
>
>  I am trying to implement SHA-256 using the new Java Vector API.
>
> I have read the API docs but crossing the large mental gap of SIMD
> instructions to the API for someone who knows very little SIMD has been
> insurmountable for me.
>
> My question has been asked on Stack Overflow:
>
>
> https://stackoverflow.com/questions/68135596/replicating-mm256-shuffle-epi8-intrinsic-with-java-vector-api-shuffle
>
> It is quite a simple (to ask anyway) question, which is, how to replicate
> the _mm256_shuffle_epi8 intrinsic with the Java Vector API?
>
> Thanks very much.
>
> --
> Michael Ennen
>


-- 
Michael Ennen


More information about the panama-dev mailing list