RFR: 8295698: AArch64: test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics

Dong Bo dongbo at openjdk.org
Mon Jul 24 08:53:08 UTC 2023


On Thu, 20 Jul 2023 07:04:10 GMT, Yudi Zheng <yzheng at openjdk.org> wrote:

>> In JDK-8252204, when implemented SHA3 intrinsics, we use `digest_length` to differentiate SHA3-224, SHA3-256, SHA3-384, SHA3-512 and calculate `block_size` with `block_size = 200 - 2 * digest_length`.
>> However, there are two extra SHA3 instances, SHAKE256 and SHAKE128, allowing an arbitrary `digest_length`:
>> 
>> 	digest_length	block_size
>> SHA3-224	28	144
>> SHA3-256	32	136
>> SHA3-384	48	104
>> SHA3-512	64	72
>> SHAKE128	variable	168
>> SHAKE256	variable	136
>> 
>> 
>> This causes SIGSEGV crash or hash code mismatch with `test/jdk/sun/security/ec/ed/EdDSATest.java`. The test calls `SHAKE256` in `Ed448`.
>> 
>> The main idea of the patch is to pass the `block_size` to differentiate SHA3 instances.
>> Tests `test/jdk/sun/security/ec/ed/EdDSATest.java` and `./test/jdk/sun/security/provider/MessageDigest/SHA3.java` both passed.
>> And tier1~3 passed on SHA3 supported hardware.
>> 
>> The SHA3 intrinsics still deliver 20%~40% performance improvement on our pre-silicon simulated platform.
>> The latency and throughput of crypto SHA3 ops are designed to be 1 cpu cycle and 2 execution pipes respectively.
>> 
>> Compared with the main stream code, the performance change with this patch are negligible on real hardware and simulation platform.
>> Based on the JMH results of SHA3 intirinsics, performance can be improved by ~50% on some hardware, while some hardware have ~30% regression.
>> These performance details are available in the comments of the issue page.
>> I guess the performance benefit of SHA3 intrinsics is dependent on the micro architecture, it should be switched on/off based on the running platform.
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3910:
> 
>> 3908: 
>> 3909:     // block_size == 136, bit4 == 0 and bit5 == 0, SHA3-256 or SHAKE256
>> 3910:     __ andw(c_rarg5, block_size, 48);
> 
> does `c_rarg5` serve as a temporary register here?

Yes, it is a save-on-call register. If it was active, it should been saved.
Similar usages can be found at other intrinsics, such as https://github.com/openjdk/jdk/blob/ab821aa24f248e042d367ccd908fc1f68ebe8333/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3468.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/10939#discussion_r1271944466


More information about the hotspot-compiler-dev mailing list