RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2]
Hamlin Li
mli at openjdk.org
Tue Nov 14 15:57:41 UTC 2023
On Mon, 13 Nov 2023 17:34:10 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Hello All,
>>
>> Please review these changes to support _vectorizedHashCode intrinsic on
>> RISC-V platform. The patch adds the "scalar" code for the intrinsic without
>> usage of any RVV instruction but provides manual unrolling of the appropriate
>> loop. The code with usage of RVV instruction could be added as follow-up of
>> the patch or independently.
>>
>> Thanks,
>> -Yuri Gaevsky
>>
>> P.S. My OCA has been accepted recently (ygaevsky).
>>
>> ### Correctness checks
>>
>> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux.
>>
>> ### Performance results (the numbers for non-ints are similar)
>>
>> #### StarFive JH7110 board:
>>
>>
>> ArraysHashCode: without intrinsic with intrinsic
>> -------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt Score Error Score Error Units
>> -------------------------------------------------------------------------------
>> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op
>> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op
>> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op
>> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op
>> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op
>> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op
>> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op
>> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op
>> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op
>> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op
>> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op
>> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op
>> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op
>> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op
>> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op
>> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op
>> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op
>> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op
>> ---------------------------------------...
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>
> Minor cosmetic fixes.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1477:
> 1475: case T_SHORT: BLOCK_COMMENT("arrays_hashcode(short) {"); break;
> 1476: case T_INT: BLOCK_COMMENT("arrays_hashcode(int) {"); break;
> 1477: default: BLOCK_COMMENT("arrays_hashcode {"); break;
In `C2_MacroAssembler::arrays_hashcode_elsize`, default action is `ShouldNotReachHere();`, should it be consistent here?
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1491:
> 1489: beqz(cnt, DONE);
> 1490:
> 1491: ld(pow31_1_2, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31()
Does `mv` of the power values to pow31_1_2 do the same effect as the `ld` here? If it does, mv might be better than ld.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1508:
> 1506: } \
> 1507:
> 1508: ld(pow31_3_4, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31()
same comment as the above `ld` of pow31_1_2.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1512:
> 1510:
> 1511: bind(WIDE_LOOP);
> 1512: DO_ELEMENT_LOAD(tmp1, 0)
Can you add `;` at the end of the statement? similar comments for other DO_ELEMENT_LOAD's
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1513:
> 1511: bind(WIDE_LOOP);
> 1512: DO_ELEMENT_LOAD(tmp1, 0)
> 1513: DO_ELEMENT_LOAD(tmp3, 1)
Would it help to optimize the perf by moving `DO_ELEMENT_LOAD(tmp3, 1)` after `srli(tmp2, pow31_3_4, 32);`?
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1523:
> 1521: DO_ELEMENT_LOAD(tmp3, 3)
> 1522: srli(tmp2, pow31_1_2, 32);
> 1523: mulw(tmp1, tmp1, tmp2); // 31^^1 * ary[i+2]
Could this line be optimized as `x<<5-x`? Just as TAIL_LOOP below.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1527:
> 1525: addw(result, result, tmp3); // 31^^4 * h + 31^^3 * ary[i+0] + 31^^2 * ary[i+1]
> 1526: // + 31^^1 * ary[i+2] + 31^^0 * ary[i+3]
> 1527: subw(chunk, chunk, stride);
Could chunk and ary be merged into one variable? so we don't need one sub and one add, but only one add here.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1540:
> 1538: addw(result, result, tmp1); // result = result + ary[i]
> 1539: subw(cnt, cnt, 1);
> 1540: add(ary, ary, elsize);
Similar comment for cnt and ary as chunk and ary above.
src/hotspot/cpu/riscv/riscv.ad line 10306:
> 10304:
> 10305:
> 10306: instruct arrays_hashcode(iRegP_R11 ary, iRegI_R12 cnt, iRegI_R10 result, immI basic_type,
Is it necessary to specify the regs(r11/12/10) here?
src/hotspot/cpu/riscv/riscv.ad line 10312:
> 10310: match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type)));
> 10311: effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, TEMP tmp5, TEMP tmp6,
> 10312: USE_KILL ary, USE_KILL cnt, USE basic_type, KILL cr);
should `TEMP_DEF result` be added here?
src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 58:
> 56: address StubRoutines::riscv::_method_entry_barrier = nullptr;
> 57:
> 58: ATTRIBUTE_ALIGNED(64) const jint StubRoutines::riscv::_arrays_hashcode_powers_of_31[] =
If we use `mv` to replace the `ld` of power values above, these related code could be removed here.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392818497
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805158
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805457
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805546
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805782
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805847
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805967
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392806033
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392806124
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392809666
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392820954
More information about the hotspot-dev
mailing list