RFR: 8318217: RISC-V: C2 VectorizedHashCode [v10]
Fei Yang
fyang at openjdk.org
Fri Dec 8 04:00:22 UTC 2023
On Wed, 6 Dec 2023 21:58:55 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
>> Hello All,
>>
>> Please review these changes to support _vectorizedHashCode intrinsic on
>> RISC-V platform. The patch adds the "scalar" code for the intrinsic without
>> usage of any RVV instruction but provides manual unrolling of the appropriate
>> loop. The code with usage of RVV instruction could be added as follow-up of
>> the patch or independently.
>>
>> Thanks,
>> -Yuri Gaevsky
>>
>> P.S. My OCA has been accepted recently (ygaevsky).
>>
>> ### Correctness checks
>>
>> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux.
>>
>> ### Performance results (the numbers for non-ints are similar)
>>
>> #### StarFive JH7110 board:
>>
>>
>> ArraysHashCode: without intrinsic with intrinsic
>> -------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt Score Error Score Error Units
>> -------------------------------------------------------------------------------
>> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op
>> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op
>> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op
>> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op
>> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op
>> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op
>> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op
>> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op
>> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op
>> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op
>> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op
>> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op
>> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op
>> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op
>> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op
>> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op
>> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op
>> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op
>> ---------------------------------------...
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
>
> Added two temp registers for loads; all loads in wide loop has been moved to the start of the loop.
Hi, glad to see the performance numbers are back to normal. Would you mind two more tweaks? Thanks.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1504:
> 1502: andi(cnt, cnt, stride-1); // don't forget about tail!
> 1503:
> 1504: #define DO_ELEMENT_LOAD(reg, idx) \
Why not turn `DO_ELEMENT_LOAD` macro into a small function? Say `C2_MacroAssembler::arrays_hashcode_elload`. We can put it after `C2_MacroAssembler::arrays_hashcode_elsize`.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1541:
> 1539:
> 1540: bind(TAIL);
> 1541: beqz(cnt, DONE);
`cnt` is non-zero we reach here from L1498, so this `beqz` check seems redundant in that case. Maybe move this `beqz` check immediate after L1538?
-------------
PR Review: https://git.openjdk.org/jdk/pull/16629#pullrequestreview-1771480770
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1419899410
PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1419891737
More information about the hotspot-dev
mailing list