RFR: 8313779: RISC-V: use andn / orn in the MD5 instrinsic

Ludovic Henry luhenry at openjdk.org
Fri Aug 4 14:35:37 UTC 2023


On Fri, 4 Aug 2023 13:10:06 GMT, Antonios Printezis <tonyp at openjdk.org> wrote:

> Small improvement of the MD5 intrinsic when the Zbb extension is available. Performance comparison (with thanks again to @robehn for this!):
> 
> before:
> 
> 
> MessageDigests.digest                   md5        64     DEFAULT  avgt    6    1835.246 ±  252.071  ns/op
> MessageDigests.digest                   md5     16384     DEFAULT  avgt    6  145386.522 ±  444.446  ns/op
> MessageDigests.getAndDigest             md5        64     DEFAULT  avgt    6    2555.515 ±  639.491  ns/op
> MessageDigests.getAndDigest             md5     16384     DEFAULT  avgt    6  149045.631 ± 6658.545  ns/op
> 
> 
> after:
> 
> 
> MessageDigests.digest                   md5        64     DEFAULT  avgt    6    1779.637 ±  207.869  ns/op
> MessageDigests.digest                   md5     16384     DEFAULT  avgt    6  137147.179 ±  706.396  ns/op
> MessageDigests.getAndDigest             md5        64     DEFAULT  avgt    6    2645.354 ± 1245.318  ns/op
> MessageDigests.getAndDigest             md5     16384     DEFAULT  avgt    6  141306.966 ± 7000.576  ns/op
> 
> 
> (only line 3 is not an improvement, but it has higher variation)
> 
> It seems to save around 5% of executed instructions.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1658:

> 1656: 
> 1657: // Rd = Rs1 & (~Rd2)
> 1658: void MacroAssembler::andnr(Register Rd, Register Rs1, Register Rs2) {

Keep the name `andn` here, and at https://github.com/openjdk/jdk/pull/15156/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR1660 use `Assembler::andn`.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1671:

> 1669: void MacroAssembler::ornr(Register Rd, Register Rs1, Register Rs2) {
> 1670:   if (UseZbb) {
> 1671:     orn(Rd, Rs1, Rs2);

Same as above.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4033:

> 4031:               Register rtmp1, Register rtmp2, Register rmask32) {
> 4032:     // rtmp1 = c ^ (b | (~d))
> 4033:     __ ornr(rtmp2, b, d);

Nit: You could use `rtmp1` here, the dependency for the next instruction will block the CPU from executing OOO anyway.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15156#discussion_r1284499778
PR Review Comment: https://git.openjdk.org/jdk/pull/15156#discussion_r1284502509
PR Review Comment: https://git.openjdk.org/jdk/pull/15156#discussion_r1284504036


More information about the hotspot-dev mailing list