RFR: 8319716: RISC-V: Add SHA-2
Fei Yang
fyang at openjdk.org
Wed Nov 15 07:45:37 UTC 2023
On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
> Hi, please consider.
>
> Main author is @luhenry, I only fixed some minor things and tested it.
>
> Such as:
> test/hotspot/jtreg/compiler/intrinsics/sha/
> test/jdk/java/security/MessageDigest/
> test/jdk/jdk/security/
> tier1
>
> And still running some test.
Changes requested by fyang (Reviewer).
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3700:
> 3698: Register ofs = c_rarg2;
> 3699: Register limit = c_rarg3;
> 3700: Register consts = t0;
I would suggest choose a different temporary register for `consts`, maybe `t2`. Using x5 (t0) / x6 (t1) to keep some long-lived values like `consts` can be error prone. Those two are reserved scratch registers which could be explictly / implicitly clobberred by various assembler functions.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3769:
> 3767: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f}
> 3768: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16
> 3769: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f}
I see the openssl version makes use of index-load to get {f,e,b,a},{h,g,d,c} pre-loop and index-store to put {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h} post-loop, which is much simpler than this code. Please consider.
[1] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl#L124-L142
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3926:
> 3924: //--------------------------------------------------------------------------------
> 3925: // Quad-round 1 (+1, v11->v12->v13->v10)
> 3926: __ vl1re32_v(v15, consts);
I am still worried about the load latency if we do one `vl1re3_v` to get the consts for each round even for single pass.
Preloading the constants into vectors is less likely to have this issue, right? We should have enough vector registers for that purpose.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4146:
> 4144: Register ofs = c_rarg2;
> 4145: Register limit = c_rarg3;
> 4146: Register consts = t0;
Similar here. Please consider using `t1` instead for `consts`.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4215:
> 4213: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f}
> 4214: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16
> 4215: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f}
Simlar here. Can we make use of index-load and index-store to simplify the code for the 512 case too?
-------------
PR Review: https://git.openjdk.org/jdk/pull/16562#pullrequestreview-1731361891
PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393737654
PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393767555
PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393774633
PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393769966
PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393769327
More information about the hotspot-dev
mailing list