Replicating __mm256_shuffle_epi8 Intrinsic

Michael Ennen mike.ennen at gmail.com
Wed Jul 7 21:47:45 UTC 2021


So far I have tried to copy the upstream code verbatim until I get it to
match the results - however I am interested in what you're suggesting. How
would that be done?

On Wed, Jul 7, 2021 at 3:03 AM Radosław Smogura <mail at smogura.eu> wrote:

> Michel,
>
> I wonder as well if you did consider using shuffles and vector operations
> to load int vector, instead of using bytesToIntLE. I wonder if loading to
> two vectors initially and permitting with shuffle would be better.
>
> Kind regards,
> Rado
> ------------------------------
> *From:* Michael Ennen <mike.ennen at gmail.com>
> *Sent:* Wednesday, July 7, 2021 08:17
> *To:* Radosław Smogura <mail at smogura.eu>
> *Cc:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> *Subject:* Re: Replicating __mm256_shuffle_epi8 Intrinsic
>
> I figured it out. It was a mismatch of little and big endian. I can
> reproduce the same shuffle result with:
>
> IntVector read8(byte[] chunk, int offset) {
>     System.out.println("read8, offset = " + offset);
>     IntVector ret = IntVector.fromArray(SPECIES_256, new int[] {
>             bytesToIntLE(chunk, 0 + offset),
>             bytesToIntLE(chunk, 64 + offset),
>             bytesToIntLE(chunk, 128 + offset),
>             bytesToIntLE(chunk, 192 + offset),
>             bytesToIntLE(chunk, 256 + offset),
>             bytesToIntLE(chunk, 320 + offset),
>             bytesToIntLE(chunk, 384 + offset),
>             bytesToIntLE(chunk, 448 + offset)}, 0);
>     System.out.println("read8 in: " + bytesToIntLE(chunk, 0 + offset) + ",
> " + bytesToIntLE(chunk, 64 + offset) +
>             ", " + bytesToIntLE(chunk, 128 + offset) + ", " +
> bytesToIntLE(chunk, 192 + offset) + ", " +
>             bytesToIntLE(chunk, 256 + offset) + ", " + bytesToIntLE(chunk,
> 320 + offset) + ", " +
>             bytesToIntLE(chunk, 384 + offset) + ", " + bytesToIntLE(chunk,
> 448 + offset));
>
>     var shuffle = VectorShuffle.fromArray(ByteVector.SPECIES_256, new
> int[]{
>             12,13,14,15,   8, 9,10,11,
>             4, 5, 6, 7,    0, 1, 2, 3,
>             12,13,14,15,   8, 9,10,11,
>             4, 5, 6, 7,    0, 1, 2, 3 }, 0);
>
>     ByteVector shuffled = ret.reinterpretAsBytes().rearrange(shuffle,
> shuffle.laneIsValid());
>
>     System.out.println("read8 after shuffle: " +
> IntVector.fromByteArray(SPECIES_256, shuffled.toArray(), 0,
> ByteOrder.BIG_ENDIAN));
>     return IntVector.fromByteArray(SPECIES_256, shuffled.toArray(), 0,
> ByteOrder.BIG_ENDIAN );
> }
>
> Thanks for all your help.
>
> On Tue, Jul 6, 2021 at 11:10 PM Michael Ennen <mike.ennen at gmail.com>
> wrote:
>
> Oh my gosh how embarrassing! I have been tweaking things so much in this
> code I really needed to step back and take a closer look.
>
> I still don't get the right result (matching this:
> https://github.com/brcolow/bitcoin-sha256/blob/master/src/sha256_avx2.cpp#L70
> ).
>
> I will keep trying, though.
>
> On Tue, Jul 6, 2021 at 2:48 PM Radosław Smogura <mail at smogura.eu> wrote:
>
> Hi Michael,
>
> intoArray is not only for vector shuffle, and I think it's preffered way
> to load and store data (as Sandhya used).
>
> Maybe this sound too simply, but I wonder if you are absolutely sure that
> this line should look like this, and it should not print shuffled vector?
> System.out.println("read8 returns: " + ret); :)
>
> Kind regards,
> Rado
> ------------------------------
> *From:* Michael Ennen <mike.ennen at gmail.com>
> *Sent:* Tuesday, July 6, 2021 21:29
> *To:* Radosław Smogura <mail at smogura.eu>
> *Cc:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> *Subject:* Re: Replicating __mm256_shuffle_epi8 Intrinsic
>
> I am a bit confused by your example code. selectFrom returns a `Vector`,
> but then you are calling `intoArray` which is a method only for
> `VectorShuffle`.
>
> In addition to that - do you think you could use variable names from the
> example for clarity:
>
>
> https://github.com/brcolow/vector-sha256/blob/master/src/main/java/com/brcolow/vectorsha256/VectorSHA256.java#L450
>
> Thank you very much.
>
> On Tue, Jul 6, 2021 at 8:23 AM Radosław Smogura <mail at smogura.eu> wrote:
>
> Hi Michael,
>
> Shuffling can be problematic sometimes.
>
> I wonder if you tried something like this
>
> byteSwap = VectorShuffle.fromArray(BYTE_VECTOR_SPECIES, shuffleArr, 0);
>
> final var byteSwapVector = byteSwap.toVector();
>
> final var srcVector =  ByteVector.fromArray(BYTE_VECTOR_SPECIES, src, i);
> final var dstVector = byteSwapVector.selectFrom(srcVector);
>
> dstVector.intoArray(dst, i);
>
> Kind regards,
> Rado
>
> ------------------------------
> *From:* panama-dev <panama-dev-retn at openjdk.java.net> on behalf of
> Michael Ennen <mike.ennen at gmail.com>
> *Sent:* Tuesday, July 6, 2021 06:54
> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> *Subject:* Re: Replicating __mm256_shuffle_epi8 Intrinsic
>
> I understand that representing the input and output as 32-bit integers is
> kind of confusing, but the point is, is that the shuffle as written isn't
> doing anything. I have also tried:
>
>             var shuffle = VectorShuffle.fromArray(ByteVector.SPECIES_256,
> new int[]{
>                     12,13,14,15,   8, 9,10,11,
>                     4, 5, 6, 7,    0, 1, 2, 3,
>                     12,13,14,15,   8, 9,10,11,
>                     4, 5, 6, 7,    0, 1, 2, 3 }, 0)
>
> But still the returned vector is the same.
>
> On Sun, Jul 4, 2021 at 10:14 PM Michael Ennen <mike.ennen at gmail.com>
> wrote:
>
> > Thanks for your assistance. I am having trouble replicating the same
> > results from the first shuffle done in Bitcoin's SHA-AVX2:
> >
> > The following 8 integers are read in:
> >
> > Read8: 1684234849, 1886350957, 1684234849, 1886350957, 1684234849,
> > 1886350957, 1684234849, 1886350957
> >
> > These 8 integers are shuffled with _mm256_shuffle_epi8 and the result is:
> >
> > 1835954032, 1633837924, 1835954032, 1633837924, 1835954032, 1633837924,
> > 1835954032, 1633837924
> >
> > But using your suggested code:
> >
> > var shuffle = VectorShuffle.fromOp(ByteVector.SPECIES_256, (i ->
> > ((8+i)%16)));
> > ByteVector shuffled = ret.reinterpretAsBytes().rearrange(shuffle,
> > shuffle.laneIsValid());
> > return IntVector.fromByteArray(SPECIES_256, shuffled.toArray(), 0,
> > ByteOrder.LITTLE_ENDIAN);
> >
> > I get:
> >
> > 1684234849, 1886350957, 1684234849, 1886350957, 1684234849, 1886350957,
> > 1684234849, 1886350957
> >
> > That is, the numbers don't seem to be changed.
> >
> > Thanks for your help.
> >
> > On Thu, Jul 1, 2021 at 11:18 AM Viswanathan, Sandhya <
> > sandhya.viswanathan at intel.com> wrote:
> >
> >> Hi Michael,
> >>
> >> The rearrange() api should generate pshufb.
> >>
> >> e.g. for the following Java code:
> >>
> >>    static final int SIZE = 1024;
> >>    static byte[] a = new byte[SIZE];
> >>    static byte[] r = new byte[SIZE];
> >>
> >>    static final VectorSpecies<Byte> SPECIES = ByteVector.SPECIES_128;
> >>    static final VectorShuffle<Byte> HIGHTOLOW =
> >> VectorShuffle.fromOp(SPECIES, (i -> ((8+i)%16)));
> >>
> >>    static void workload() {
> >>        VectorShuffle<Byte> vshuf = HIGHTOLOW;
> >>
> >>        for (int i = 0; i <= a.length - SPECIES.length(); i +=
> >> SPECIES.length()) {
> >>            var av = ByteVector.fromArray(SPECIES, a, i);
> >>            var bv = av.rearrange(vshuf);
> >>            bv.intoArray(r, i);
> >>        }
> >>    }
> >>
> >> We generate the following code for the loop:
> >> 0x00007fc388fa3180:   vmovdqu 0x10(%rsi),%xmm1
> >> 0x00007fc388fa3185:   vmovdqu 0x10(%r14),%xmm2
> >> 0x00007fc388fa318b:   movslq %eax,%r10
> >> 0x00007fc388fa318e:   vmovdqu 0x10(%rbp,%r10,1),%xmm3
> >> 0x00007fc388fa3195:   vpcmpgtb %xmm2,%xmm1,%xmm1
> >> 0x00007fc388fa3199:   vptest %xmm0,%xmm1
> >> 0x00007fc388fa319e:   setne  %r13b
> >> 0x00007fc388fa31a2:   movzbl %r13b,%r13d
> >> 0x00007fc388fa31a6:   test   %r13d,%r13d
> >> 0x00007fc388fa31a9:   jne    0x00007fc388fa31e2
> >> 0x00007fc388fa31ab:   vpshufb %xmm2,%xmm3,%xmm3
> >> 0x00007fc388fa31b0:   vmovdqu %xmm3,0x10(%r8,%r10,1)
> >> 0x00007fc388fa31b7:   add    $0x10,%eax
> >> 0x00007fc388fa31ba:   cmp    %ebx,%eax
> >> 0x00007fc388fa31bc:   jl     0x00007fc388fa3180
> >>
> >> Best Regards,
> >> Sandhya
> >>
> >>
> >> -----Original Message-----
> >> From: panama-dev <panama-dev-retn at openjdk.java.net> On Behalf Of
> Michael
> >> Ennen
> >> Sent: Tuesday, June 29, 2021 11:20 PM
> >> To: panama-dev at openjdk.java.net
> >> Subject: Replicating __mm256_shuffle_epi8 Intrinsic
> >>
> >>  I am trying to implement SHA-256 using the new Java Vector API.
> >>
> >> I have read the API docs but crossing the large mental gap of SIMD
> >> instructions to the API for someone who knows very little SIMD has been
> >> insurmountable for me.
> >>
> >> My question has been asked on Stack Overflow:
> >>
> >>
> >>
> https://stackoverflow.com/questions/68135596/replicating-mm256-shuffle-epi8-intrinsic-with-java-vector-api-shuffle
> >>
> >> It is quite a simple (to ask anyway) question, which is, how to
> replicate
> >> the _mm256_shuffle_epi8 intrinsic with the Java Vector API?
> >>
> >> Thanks very much.
> >>
> >> --
> >> Michael Ennen
> >>
> >
> >
> > --
> > Michael Ennen
> >
>
>
> --
> Michael Ennen
>
>
>
> --
> Michael Ennen
>
>
>
> --
> Michael Ennen
>
>
>
> --
> Michael Ennen
>


-- 
Michael Ennen


More information about the panama-dev mailing list