RFR: 8322179: RISC-V: Implement SHA-1 intrinsic [v10]
Fei Yang
fyang at openjdk.org
Mon Feb 19 03:37:57 UTC 2024
On Tue, 6 Feb 2024 15:36:12 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>> Can you review this patch to implement SHA-1 intrinsic for riscv?
>> Thanks!
>>
>>
>> ## Test
>>
>> ### Functionality
>>
>> tests under `test/hotspot/jtreg/compiler/intrinsics/sha`
>> tests found via `find test/jdk -iname "*SHA1*.java"`
>>
>> ### Performance
>>
>> tested on `T-HEAD Light Lichee Pi 4A`
>>
>> JMH_PARAMS="-f 1 -wi 10 -i 20" // for every loop of jmh test
>>
>> benchmark tests `MessageDigests.java GetMessageDigest.java MessageDigestBench.java MacBench.java` which are under `test/micro/org/openjdk/bench/`, more spcifically `TESTS="MessageDigests.digest MessageDigests.getAndDigest MessageDigestBench.digest"`
>>
>>
>> // After
>> o.o.b.java.security.MessageDigests.digest N/A N/A SHA-1 64 DEFAULT avgt 20 1845.446 ? 27.052 ns/op
>> o.o.b.java.security.MessageDigests.digest N/A N/A SHA-1 16384 DEFAULT avgt 20 181455.350 ? 532.258 ns/op
>> o.o.b.java.security.MessageDigests.getAndDigest N/A N/A SHA-1 64 DEFAULT avgt 20 2447.674 ? 10.239 ns/op
>> o.o.b.java.security.MessageDigests.getAndDigest N/A N/A SHA-1 16384 DEFAULT avgt 20 182896.083 ? 1242.774 ns/op
>> o.o.b.javax.crypto.small.MessageDigestBench.digest SHA1 1048576 N/A N/A avgt 20 11599227.792 ? 121442.390 ns/op
>> // Before
>> o.o.b.java.security.MessageDigests.digest N/A N/A SHA-1 64 DEFAULT avgt 20 2352.475 ? 11.198 ns/op
>> o.o.b.java.security.MessageDigests.digest N/A N/A SHA-1 16384 DEFAULT avgt 20 188495.684 ? 1467.942 ns/op
>> o.o.b.java.security.MessageDigests.getAndDigest N/A N/A SHA-1 64 DEFAULT avgt 20 2437.347 ? 6.398 ns/op
>> o.o.b.java.security.MessageDigests.getAndDigest N/A N/A SHA-1 16384 DEFAULT avgt 20 196086.570 ? 1140.998 ns/op
>> o.o.b.javax.crypto.small.MessageDigestBench.digest SHA1 1048576 N/A N/A avgt 20 12362160.119 ? 38788.109 ns/op
>>
>>
>> **getAndDigest when size == 64**
>> The data is not stable for test getAndDigest when size == 64, which I think is introduced by j.s.MessageDigest.getInstance itself, which we don't touch in this patch.
>> Check more details at [1](ht...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> refine register usage and comment
Hi, Thanks for the quick update. A couple of minor comments remain after another look.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4820:
> 4818: // 8f1bbcdc, 40 <= t <= 59
> 4819: // ca62c1d6, 60 <= t <= 79
> 4820: void sha1_prepare_k(int round, Register cur_k) {
Personally, I prefer to move `int round` as the last formal param. Similar for the other several assember funtions which also have this same param.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4835:
> 4833: assert(round >= 0 && round < 80, "must be");
> 4834:
> 4835: if (round < 16) {
Could you please also add a code comment about how the valid w't values are laid out in the w't registers for the first 16 rounds?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4946:
> 4944: {
> 4945: // reuse e as a temporary register, as we will mv new value into it later
> 4946: Register t = e;
For consistency, I would suggest register alias names like `tmp1`, `tmp2`, `tmp3`, instead of `t`. Could you please also create an alias for `cur_w`? I see it is used as a temp register here too.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4953:
> 4951: __ rolw_imm(cur_w, a, 5, t0);
> 4952: // as pointed above, we can use cur_w as temporary register here.
> 4953: sha1_f(round, tmp, b, c, d);
Could we also reuse `cur_k` in this function where input param `tmp` is used? I see `cur_k` will also be recalculated at the beginning of each round like `cur_w`. Hope this could help eliminate `tmp` param and finally free `t2`.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5060:
> 5058: // put two adjecent w's in one register:
> 5059: // one at high word part, another at low word part
> 5060: // at different round (even or odd), w't value resdie in different items in ws[].
Typo: s/resdie/reside/
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5123:
> 5121:
> 5122: if (multi_block) {
> 5123: __ bge(limit, buf, L_sha1_loop, true);
A small question: Is it OK to continue the loop when `limit` equals `buf`?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5129:
> 5127: __ mv(mask32, 0xffffffff);
> 5128: // store back the state.
> 5129: __ andr(a, a, mask32);
Why not emit one `__ zero_extend(a, a, 32)` here instead like you do in `sha1_preserve_prev_abcde`? This will save us one instruction (the preceding mov to mask32 instruction) when the `zext_w` from UseZba is available and will also help free `t2`.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5133:
> 5131: __ orr(a, a, b);
> 5132: __ sd(a, Address(state, 0));
> 5133: __ andr(c, c, mask32);
Similar here.
-------------
PR Review: https://git.openjdk.org/jdk/pull/17130#pullrequestreview-1873563008
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1485015419
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1493967123
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1485013608
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1493337316
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1493341768
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1493159778
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1491927454
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1491927796
More information about the hotspot-dev
mailing list