RFR: 8322179: RISC-V: Implement SHA-1 intrinsic [v2]
Antonios Printezis
tonyp at openjdk.org
Mon Dec 18 16:43:46 UTC 2023
On Mon, 18 Dec 2023 14:04:52 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>> Can you review this patch to implement SHA-1 intrinsic for riscv?
>> Thanks!
>>
>>
>> ## Test
>>
>> ### Functionality
>>
>> tests under `test/hotspot/jtreg/compiler/intrinsics/sha`
>> tests found via `find test/jdk -iname "*SHA1*.java"`
>>
>> ### Performance
>>
>> tested on `T-HEAD Light Lichee Pi 4A`
>>
>> benchmark tests `MessageDigests.java GetMessageDigest.java MessageDigestBench.java MacBench.java` which are under `test/micro/org/openjdk/bench/`.
>>
>> **when intrinsic is enabled**
>>
>> o.o.b.java.security.GetMessageDigest.cloneInstance N/A N/A SHA-1 N/A N/A avgt 10 489.860 ? 6.277 ns/op
>> o.o.b.java.security.GetMessageDigest.getInstance N/A N/A SHA-1 N/A N/A avgt 10 3477.197 ? 204.203 ns/op
>> o.o.b.java.security.GetMessageDigest.getInstanceWithProvider N/A N/A SHA-1 N/A N/A avgt 10 4111.164 ? 108.861 ns/op
>> o.o.b.java.security.MessageDigests.digest N/A N/A SHA-1 64 DEFAULT avgt 10 3454.207 ? 53.924 ns/op
>> o.o.b.java.security.MessageDigests.digest N/A N/A SHA-1 16384 DEFAULT avgt 10 184063.834 ? 677.635 ns/op
>> o.o.b.java.security.MessageDigests.getAndDigest N/A N/A SHA-1 64 DEFAULT avgt 10 8260.011 ? 150.045 ns/op
>> o.o.b.java.security.MessageDigests.getAndDigest N/A N/A SHA-1 16384 DEFAULT avgt 10 191325.246 ? 3298.882 ns/op
>> o.o.b.javax.crypto.full.MacBench.mac HmacSHA1 128 N/A N/A avgt 10 8220.886 ? 53.684 ns/op
>> o.o.b.javax.crypto.full.MacBench.mac HmacSHA1 1024 N/A N/A avgt 10 18006.955 ? 92.432 ns/op
>> o.o.b.javax.crypto.small.MessageDigestBench.digest SHA1 1048576 N/A N/A avgt 10 11688843.558 ? 34924.678 ns/op
>>
>>
>> **when intrinsic is disabled**
>>
>> o.o.b.java.security.GetMessageDigest.cloneInstance N/A N/A SHA-1 N/A N/A avgt 10 496.890 ? 6.695 ns/op
>> o.o.b.java.security.GetMessageDigest.getInstance ...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> Add some comments
I'll finish the review tomorrow. But posting the comments I have so far.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4442:
> 4440: // M't, 0 <= t <= 15
> 4441: // ROTL'1(W't-3 ^ W't-8 ^ W't-14 ^ W't-16), 16 <= t <= 79
> 4442: void sha1_prepare_w(int round, Register cur_w, Register ws[], Register buf, Register tmp) {
I'd just use `t0` directly instead of passing the temp register as an arg.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4448:
> 4446:
> 4447: if (round%2 == 0) {
> 4448: __ ld(ws[round/2], Address(buf, 0));
Instead of incrementing `buf` 8 times, could you just increment the offset (0, 8, 16, etc.) and only increment `buf` once per loop iteration?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4585:
> 4583: }
> 4584:
> 4585: void sha1_reserve_prev_abcde(Register a, Register b, Register c, Register d, Register e,
I think it's safe to just use t0 and t1 for intermediate results without passing them as args. I used to pass them as args too, but I changed that for md5.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4592:
> 4590:
> 4591: __ slli(tmp1, b, 32);
> 4592: __ andi(prev_ab, a, mask32, tmp2);
I think this will materialize `mask32` in `tmp2` twice, once per `andi`, given that the value won't work as an intermediate. I'd do `__ mv(tmp2, mask32)` and use `__ andr(prev_ab, a, tmp2)` and `__ andr(prev_cd, c, tmp2)`. I think it will save 2-3 instructions here. No idea how performance-critical this section is, though! I assume not much?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4627:
> 4625:
> 4626: // c_rarg0 - c_rarg3: x10 - x13
> 4627: Register buf = c_rarg0;
You could copy the four arguments to a different set of registers and use a0 -> a3 for some of the other values to see if you can increase the number of compressed instructions that can be used. Unclear whether it's worth it or not.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4664:
> 4662:
> 4663: RegSet saved_regs = RegSet::range(x18, x27);
> 4664: saved_regs += RegSet::of(t2);
Do you need to save t2?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4673:
> 4671: __ srli(d, c, 32);
> 4672: __ lw(e, Address(state, 16));
> 4673:
(nit) extra whitespace
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4713:
> 4711: __ sd(c, Address(state, 8));
> 4712: __ sw(e, Address(state, 16));
> 4713:
(nit) extra whitespace
-------------
PR Review: https://git.openjdk.org/jdk/pull/17130#pullrequestreview-1786898719
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430181504
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430361118
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430162672
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430175000
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430156869
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430146051
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430147125
PR Review Comment: https://git.openjdk.org/jdk/pull/17130#discussion_r1430148290
More information about the hotspot-dev
mailing list