Replicating __mm256_shuffle_epi8 Intrinsic
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Thu Jul 1 18:18:05 UTC 2021
Hi Michael,
The rearrange() api should generate pshufb.
e.g. for the following Java code:
static final int SIZE = 1024;
static byte[] a = new byte[SIZE];
static byte[] r = new byte[SIZE];
static final VectorSpecies<Byte> SPECIES = ByteVector.SPECIES_128;
static final VectorShuffle<Byte> HIGHTOLOW = VectorShuffle.fromOp(SPECIES, (i -> ((8+i)%16)));
static void workload() {
VectorShuffle<Byte> vshuf = HIGHTOLOW;
for (int i = 0; i <= a.length - SPECIES.length(); i += SPECIES.length()) {
var av = ByteVector.fromArray(SPECIES, a, i);
var bv = av.rearrange(vshuf);
bv.intoArray(r, i);
}
}
We generate the following code for the loop:
0x00007fc388fa3180: vmovdqu 0x10(%rsi),%xmm1
0x00007fc388fa3185: vmovdqu 0x10(%r14),%xmm2
0x00007fc388fa318b: movslq %eax,%r10
0x00007fc388fa318e: vmovdqu 0x10(%rbp,%r10,1),%xmm3
0x00007fc388fa3195: vpcmpgtb %xmm2,%xmm1,%xmm1
0x00007fc388fa3199: vptest %xmm0,%xmm1
0x00007fc388fa319e: setne %r13b
0x00007fc388fa31a2: movzbl %r13b,%r13d
0x00007fc388fa31a6: test %r13d,%r13d
0x00007fc388fa31a9: jne 0x00007fc388fa31e2
0x00007fc388fa31ab: vpshufb %xmm2,%xmm3,%xmm3
0x00007fc388fa31b0: vmovdqu %xmm3,0x10(%r8,%r10,1)
0x00007fc388fa31b7: add $0x10,%eax
0x00007fc388fa31ba: cmp %ebx,%eax
0x00007fc388fa31bc: jl 0x00007fc388fa3180
Best Regards,
Sandhya
-----Original Message-----
From: panama-dev <panama-dev-retn at openjdk.java.net> On Behalf Of Michael Ennen
Sent: Tuesday, June 29, 2021 11:20 PM
To: panama-dev at openjdk.java.net
Subject: Replicating __mm256_shuffle_epi8 Intrinsic
I am trying to implement SHA-256 using the new Java Vector API.
I have read the API docs but crossing the large mental gap of SIMD instructions to the API for someone who knows very little SIMD has been insurmountable for me.
My question has been asked on Stack Overflow:
https://stackoverflow.com/questions/68135596/replicating-mm256-shuffle-epi8-intrinsic-with-java-vector-api-shuffle
It is quite a simple (to ask anyway) question, which is, how to replicate the _mm256_shuffle_epi8 intrinsic with the Java Vector API?
Thanks very much.
--
Michael Ennen
More information about the panama-dev
mailing list