RFR: 8295698: AArch64: test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics
Dong Bo
dongbo at openjdk.org
Mon Jul 24 08:53:08 UTC 2023
On Thu, 20 Jul 2023 07:04:10 GMT, Yudi Zheng <yzheng at openjdk.org> wrote:
>> In JDK-8252204, when implemented SHA3 intrinsics, we use `digest_length` to differentiate SHA3-224, SHA3-256, SHA3-384, SHA3-512 and calculate `block_size` with `block_size = 200 - 2 * digest_length`.
>> However, there are two extra SHA3 instances, SHAKE256 and SHAKE128, allowing an arbitrary `digest_length`:
>>
>> digest_length block_size
>> SHA3-224 28 144
>> SHA3-256 32 136
>> SHA3-384 48 104
>> SHA3-512 64 72
>> SHAKE128 variable 168
>> SHAKE256 variable 136
>>
>>
>> This causes SIGSEGV crash or hash code mismatch with `test/jdk/sun/security/ec/ed/EdDSATest.java`. The test calls `SHAKE256` in `Ed448`.
>>
>> The main idea of the patch is to pass the `block_size` to differentiate SHA3 instances.
>> Tests `test/jdk/sun/security/ec/ed/EdDSATest.java` and `./test/jdk/sun/security/provider/MessageDigest/SHA3.java` both passed.
>> And tier1~3 passed on SHA3 supported hardware.
>>
>> The SHA3 intrinsics still deliver 20%~40% performance improvement on our pre-silicon simulated platform.
>> The latency and throughput of crypto SHA3 ops are designed to be 1 cpu cycle and 2 execution pipes respectively.
>>
>> Compared with the main stream code, the performance change with this patch are negligible on real hardware and simulation platform.
>> Based on the JMH results of SHA3 intirinsics, performance can be improved by ~50% on some hardware, while some hardware have ~30% regression.
>> These performance details are available in the comments of the issue page.
>> I guess the performance benefit of SHA3 intrinsics is dependent on the micro architecture, it should be switched on/off based on the running platform.
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3910:
>
>> 3908:
>> 3909: // block_size == 136, bit4 == 0 and bit5 == 0, SHA3-256 or SHAKE256
>> 3910: __ andw(c_rarg5, block_size, 48);
>
> does `c_rarg5` serve as a temporary register here?
Yes, it is a save-on-call register. If it was active, it should been saved.
Similar usages can be found at other intrinsics, such as https://github.com/openjdk/jdk/blob/ab821aa24f248e042d367ccd908fc1f68ebe8333/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3468.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/10939#discussion_r1271944466
More information about the hotspot-compiler-dev
mailing list