RFR: 8306966: RISC-V: Support vector cast node for Vector API [v5]

Mon May 1 03:41:24 UTC 2023

On Sun, 30 Apr 2023 16:13:53 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, 
>> 
>> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot.
>> 
>> We can use the VectorReshapeTests.java[2]  to print the compilation log, verify and observe the generation of nodes.
>> 
>> For example, we can use the following command to print the compilation log of a jtreg test case:
>> 
>> 
>> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \
>> -v:default \
>> -concurrency:16 -timeout:50 \
>> -javaoption:-XX:+UnlockExperimentalVMOptions \
>> -javaoption:-XX:+UseRVV \
>> -javaoption:-XX:+PrintOptoAssembly \
>> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \
>> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \
>> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \
>> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>> 
>> 
>> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X
>> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes:
>> 
>>    ```
>>    1ba0    ld  R28, [R23, #280]	# ptr, #@loadP
>>    1ba4    addi  R29, R7, #32	# ptr, #@addP_reg_imm
>>    1ba8    reinterpretResize V1, V5
>>    1bb0    vcvtBtoX V4, V1
>>    1bb8    far_bgeu  R29, R28, B465	#@far_cmpP_branch  P=0.000100 C=-1.000000
>>    ```
>> 
>> #### VectorRearrange
>> 
>> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node.
>> 
>> The compilation log for the `VectorRearrange` node:
>> 
>>    ```
>> 1f6     spill R7 -> [sp, #320]	# spill size = 64
>> 1f8     spill [sp, #128] -> V1	# vector spill size = 256
>> 200     spill [sp, #160] -> V2	# vector spill size = 256
>> 208     rearrange V3, V1, V2
>> 210     spill V3 -> [sp, #96]	# vector spill size = 256
>> 218     li R11, #4	# int, #@loadConI
>>    ```
>> 
>> #### VectorReinterpret
>> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number.
>> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376
>> The compilation log for the `VectorReinterpret` node:
>> 
>> 
>> 1218    spill [sp, #32] -> V4	# vector spill size = 256
>> 1220    spill [sp, #176] -> V3	# vector spill size = 256
>> 1228    rearrange V2, V4, V3
>> 1230    spill [sp, #72] -> V0	# vmask spill size = 32
>> 123c    vmerge_vvm V1, V1, V2, v0	#@vector blend
>> 1244    reinterpretResize V2, V1
>> 124c    vcvtStoX_extend V5, V2
>> 1254    bgeu  R28, R7, B169	#@cmpP_branch  P=0.000100 C=-1.000000
>> 
>> 
>> ####  LShiftCntV/RShiftCntV
>> 
>> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types
>> 
>> The compilation log for the LShiftCntV/RShiftCntV node:
>> 
>> 
>> 24c     vasrB V3, V1, V2
>> 260     storeV [R19], V3	# vector (rvv)
>> 268     lbu  R19, [R29, #48]	# byte, #@loadUB
>> 26c     andi  R19, R19, #7	#@andI_reg_imm
>> 270     loadV V1, [R25]	# vector (rvv)
>> 278     vshiftcnt V2, R19
>> 280     vasrB V3, V1, V2
>> 294     storeV [R26], V3	# vector (rvv)
>> 29c     lbu  R19, [R29, #80]	# byte, #@loadUB
>> 2a0     andi  R19, R19, #7	#@andI_reg_imm
>> 2a4     loadV V1, [R22]	# vector (rvv)
>> 2ac     vshiftcnt V2, R19
>> 
>> 
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>> Testing:
>> qemu with UseRVV:
>> 
>> - [ ] Tier1 tests (release)
>> - [ ] Tier2 tests (release)
>> - [ ] Tier3 tests (release)
>> - [x] test/jdk/jdk/incubator/vector (fastdebug)
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Small refactoring of rvv_vsetvli

Thanks for the update. A few more questions follow.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1787:

> 1785:                                               VectorRegister src, BasicType src_bt) {
> 1786:   assert(type2aelembytes(dst_bt) > type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 8 && type2aelembytes(src_bt) <= 4, "invalid element size");
> 1787:   assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element");

Suggestion: s/"should be integer element"/"unsupported element type"/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1825:

> 1823:                                               VectorRegister src, BasicType src_bt, VectorRegister tmp) {
> 1824:   assert(type2aelembytes(dst_bt) < type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 4 && type2aelembytes(src_bt) <= 8, "invalid element size");
> 1825:   assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element");

Suggestion: s/"should be integer element"/"unsupported element type"/

src/hotspot/cpu/riscv/riscv_v.ad line 2660:

> 2658:       __ vector_integer_extend(as_VectorRegister($dst$$reg), bt == T_FLOAT ? T_INT : T_LONG,
> 2659:                                Matcher::vector_length(this), as_VectorRegister($src$$reg), T_BYTE);
> 2660:       __ vfcvt_f_x_v(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg));

The single-witdh vector conversion instructions when converting to floating-point values use the dynamic rounding mode in 'frm', so I think you should set 'frm' to the correct rounding mode first. You might also want to check the exceptional conditions possibly set by those instrucions.

src/hotspot/cpu/riscv/riscv_v.ad line 2718:

> 2716: %}
> 2717: 
> 2718: instruct vcvtItoX(vReg dst, vReg src) %{

I think the 'TEMP_DEF dst' effect is only needed for the T_DOUBLE case. It still works if 'dst' and 'src' are allocated the same vector register for the T_FLOAT case. So you might want to further break down this into two separate ones, say 'vcvtItoF' and 'vcvtItoD'.

src/hotspot/cpu/riscv/riscv_v.ad line 2804:

> 2802:   ins_encode %{
> 2803:     __ rvv_vsetvli(T_FLOAT, Matcher::vector_length(this, $src));
> 2804:     __ vfcvt_x_f_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg));

The language spec [1] specfies: "The round toward zero rounding policy applies to (i) conversion of a floating-point value to an integer value ([§5.1.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3)), and (ii) floating-point remainder ([§15.17.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.17.3))."

So it looks to me that we should use the 'rtz' variant (vfcvt.rtz.x.f.v) here to do the conversion to integer instead here. Please also check other places where we do conversion from float-point value to integer value.

[1] https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.4

-------------

Changes requested by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1407243306
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181365812
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181365844
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181364523
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181331244
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181332597