RFR: 8306966: RISC-V: Support vector cast node for Vector API [v8]

Fei Yang fyang at openjdk.org
Sat May 6 03:08:18 UTC 2023


On Fri, 5 May 2023 06:18:23 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, 
>> 
>> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot.
>> 
>> We can use the VectorReshapeTests.java[2]  to print the compilation log, verify and observe the generation of nodes.
>> 
>> For example, we can use the following command to print the compilation log of a jtreg test case:
>> 
>> 
>> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \
>> -v:default \
>> -concurrency:16 -timeout:50 \
>> -javaoption:-XX:+UnlockExperimentalVMOptions \
>> -javaoption:-XX:+UseRVV \
>> -javaoption:-XX:+PrintOptoAssembly \
>> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \
>> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \
>> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \
>> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>> 
>> 
>> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X
>> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes:
>> 
>>    ```
>>    1ba0    ld  R28, [R23, #280]	# ptr, #@loadP
>>    1ba4    addi  R29, R7, #32	# ptr, #@addP_reg_imm
>>    1ba8    reinterpretResize V1, V5
>>    1bb0    vcvtBtoX V4, V1
>>    1bb8    far_bgeu  R29, R28, B465	#@far_cmpP_branch  P=0.000100 C=-1.000000
>>    ```
>> 
>> #### VectorRearrange
>> 
>> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node.
>> 
>> The compilation log for the `VectorRearrange` node:
>> 
>>    ```
>> 1f6     spill R7 -> [sp, #320]	# spill size = 64
>> 1f8     spill [sp, #128] -> V1	# vector spill size = 256
>> 200     spill [sp, #160] -> V2	# vector spill size = 256
>> 208     rearrange V3, V1, V2
>> 210     spill V3 -> [sp, #96]	# vector spill size = 256
>> 218     li R11, #4	# int, #@loadConI
>>    ```
>> 
>> #### VectorReinterpret
>> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number.
>> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376
>> The compilation log for the `VectorReinterpret` node:
>> 
>> 
>> 1218    spill [sp, #32] -> V4	# vector spill size = 256
>> 1220    spill [sp, #176] -> V3	# vector spill size = 256
>> 1228    rearrange V2, V4, V3
>> 1230    spill [sp, #72] -> V0	# vmask spill size = 32
>> 123c    vmerge_vvm V1, V1, V2, v0	#@vector blend
>> 1244    reinterpretResize V2, V1
>> 124c    vcvtStoX_extend V5, V2
>> 1254    bgeu  R28, R7, B169	#@cmpP_branch  P=0.000100 C=-1.000000
>> 
>> 
>> ####  LShiftCntV/RShiftCntV
>> 
>> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types
>> 
>> The compilation log for the LShiftCntV/RShiftCntV node:
>> 
>> 
>> 24c     vasrB V3, V1, V2
>> 260     storeV [R19], V3	# vector (rvv)
>> 268     lbu  R19, [R29, #48]	# byte, #@loadUB
>> 26c     andi  R19, R19, #7	#@andI_reg_imm
>> 270     loadV V1, [R25]	# vector (rvv)
>> 278     vshiftcnt V2, R19
>> 280     vasrB V3, V1, V2
>> 294     storeV [R26], V3	# vector (rvv)
>> 29c     lbu  R19, [R29, #80]	# byte, #@loadUB
>> 2a0     andi  R19, R19, #7	#@andI_reg_imm
>> 2a4     loadV V1, [R22]	# vector (rvv)
>> 2ac     vshiftcnt V2, R19
>> 
>> 
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>> Testing:
>> qemu with UseRVV:
>> 
>> - [x] Tier1 tests (release)
>> - [x] Tier2 tests (release)
>> - [x] Tier3 tests (release)
>> - [x] test/jdk/jdk/incubator/vector (fastdebug)
>
> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'upstream/master' into JDK-8306966
>  - rename rvv_vsetvli to vsetvli_helper
>  - Fix round mode and optimize widen/narrow vcast
>  - Small refactoring of rvv_vsetvli
>  - Fix VectorCastF2X
>  - During the conversion, specify the number of vectors
>  - Use zr register instead of x0
>  - 8306966: RISC-V: Support vector cast node for Vector API

Thanks for the update. Would you mind a few more tweaks?

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1823:

> 1821: // High part of dst vector will be filled with zero.
> 1822: void C2_MacroAssembler::integer_narrow_v(VectorRegister dst, BasicType dst_bt, int vector_length,
> 1823:                                          VectorRegister src, BasicType src_bt, VectorRegister tmp) {

If you allocate different vector registers for 'dst' and 'src' on the callsite, then we should be able to eliminate the 'tmp' register parameter for this function. That is saving the intermediate result in 'dst' instead.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp line 239:

> 237:                         VectorRegister src, BasicType src_bt, VectorRegister tmp);
> 238: 
> 239:   void vfcvt_rtz_xu_f_v_safe(VectorRegister dst, VectorRegister src);

I don't think we need the unsigned version. Could you please remove them?

-------------

Changes requested by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1415641580
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186604752
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186605732


More information about the hotspot-compiler-dev mailing list