[aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic
Yangfei (Felix)
felix.yang at huawei.com
Mon Aug 31 09:46:58 UTC 2020
> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Monday, August 31, 2020 4:41 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>; hotspot-compiler-
> dev at openjdk.java.net; core-libs-dev at openjdk.java.net
> Cc: aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3
> accelerator/intrinsic
>
> On 31/08/2020 07:50, Yangfei (Felix) wrote:
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8252204
> > Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/
> >
> > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto
> Extensions.
> > Reference implementation for core SHA-3 transform using ARMv8.2
> Crypto Extensions:
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/ar
> m64/crypto/sha3-cecore.S?h=v5.4.52
> > Trivial adaptation in SHA3. implCompress is needed for the purpose
> > of adding the intrinsic. For SHA3, we need to pass one extra
> > parameter "digestLength" to the stub for the calculation of block
> > size. "digestLength" is also used in for the EOR loop before
> > keccak to differentiate different SHA3 variants.
> >
> > We added jtreg tests for SHA3 and used QEMU system emulator
> > which supports SHA3 instructions to test the functionality.
> > Patch passed jtreg tier1-3 tests with QEMU system emulator.
> > Also verified with jtreg tier1-3 tests without SHA3 instructions
> > on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that
> > there's no regression.
> >
> > We used one existing JMH test for performance test:
> > test/micro/org/openjdk/bench/java/security/MessageDigests.java
> > We measured the performance benefit with an aarch64
> > cycle-accurate simulator.
> > Patch delivers 20% - 40% performance improvement depending on
> > specific SHA3 digest length and size of the message.
> > For now, this feature will not be enabled automatically for
> > aarch64. We can auto-enable this when it is fully tested on
> > real hardware.
> > But for the above testing purposes, this is auto-enabled when
> > the corresponding hardware feature is detected.
> >
> > Comments?
>
> This looks like a direct copy of the sha3-cecore.S file.You'll need Linaro to
> contribute it. I don't imagine they'll have any problem with that: they are
> OCA signatories
Since the code in sha3-cecore.S works in kernel space, we need several modifications here to makes it work in hotspot.
First, we need to add callee-save & restore for d8 - d15 according to the aarch64 abi.
Also, the following code snippet is not needed for user-space:
if_will_cond_yield_neon
add x8, x19, #32
st1 { v0.1d- v3.1d}, [x19]
st1 { v4.1d- v7.1d}, [x8], #32
st1 { v8.1d-v11.1d}, [x8], #32
st1 {v12.1d-v15.1d}, [x8], #32
st1 {v16.1d-v19.1d}, [x8], #32
st1 {v20.1d-v23.1d}, [x8], #32
st1 {v24.1d}, [x8]
do_cond_yield_neon
b 0b
endif_yield_neon
And we need to handle the multi-block case differently for StubRoutines::sha3_implCompressMB:
3485 if (multi_block) {
3486 // block_size = 200 - 2 * digest_length, ofs += block_size
3487 __ add(ofs, ofs, 200);
3488 __ sub(ofs, ofs, digest_length, Assembler::LSL, 1);
3489
3490 __ cmp(ofs, limit);
3491 __ br(Assembler::LE, sha3_loop);
3492 __ mov(c_rarg0, ofs); // return ofs
3493 }
And StubRoutines::sha3_implCompress does not even need this multi-block check logic.
> Also, given that we've got the assembly source file, why not just copy that
> into OpenJDK? I can't see the point rewriting it into the HotSpot assembler.
Actually, we referenced the existing intrinsics implementation and took a similar way. It looks strange to have one intrinsic that goes differently.
And we won't be able to emit this code on demand if we go that different way. Some cpu does not support these special sha3 instructions and thus does need this code at all.
I think that's one advantage of using a stub.
Thanks,
Felix
More information about the aarch64-port-dev
mailing list