RFR: 8306966: RISC-V: Support vector cast node for Vector API [v5]
Fei Yang
fyang at openjdk.org
Mon May 1 03:41:24 UTC 2023
On Sun, 30 Apr 2023 16:13:53 GMT, Gui Cao <gcao at openjdk.org> wrote:
>> Hi,
>>
>> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot.
>>
>> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes.
>>
>> For example, we can use the following command to print the compilation log of a jtreg test case:
>>
>>
>> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \
>> -v:default \
>> -concurrency:16 -timeout:50 \
>> -javaoption:-XX:+UnlockExperimentalVMOptions \
>> -javaoption:-XX:+UseRVV \
>> -javaoption:-XX:+PrintOptoAssembly \
>> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \
>> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \
>> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \
>> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>>
>>
>> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X
>> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes:
>>
>> ```
>> 1ba0 ld R28, [R23, #280] # ptr, #@loadP
>> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm
>> 1ba8 reinterpretResize V1, V5
>> 1bb0 vcvtBtoX V4, V1
>> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000
>> ```
>>
>> #### VectorRearrange
>>
>> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node.
>>
>> The compilation log for the `VectorRearrange` node:
>>
>> ```
>> 1f6 spill R7 -> [sp, #320] # spill size = 64
>> 1f8 spill [sp, #128] -> V1 # vector spill size = 256
>> 200 spill [sp, #160] -> V2 # vector spill size = 256
>> 208 rearrange V3, V1, V2
>> 210 spill V3 -> [sp, #96] # vector spill size = 256
>> 218 li R11, #4 # int, #@loadConI
>> ```
>>
>> #### VectorReinterpret
>> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number.
>> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376
>> The compilation log for the `VectorReinterpret` node:
>>
>>
>> 1218 spill [sp, #32] -> V4 # vector spill size = 256
>> 1220 spill [sp, #176] -> V3 # vector spill size = 256
>> 1228 rearrange V2, V4, V3
>> 1230 spill [sp, #72] -> V0 # vmask spill size = 32
>> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend
>> 1244 reinterpretResize V2, V1
>> 124c vcvtStoX_extend V5, V2
>> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000
>>
>>
>> #### LShiftCntV/RShiftCntV
>>
>> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types
>>
>> The compilation log for the LShiftCntV/RShiftCntV node:
>>
>>
>> 24c vasrB V3, V1, V2
>> 260 storeV [R19], V3 # vector (rvv)
>> 268 lbu R19, [R29, #48] # byte, #@loadUB
>> 26c andi R19, R19, #7 #@andI_reg_imm
>> 270 loadV V1, [R25] # vector (rvv)
>> 278 vshiftcnt V2, R19
>> 280 vasrB V3, V1, V2
>> 294 storeV [R26], V3 # vector (rvv)
>> 29c lbu R19, [R29, #80] # byte, #@loadUB
>> 2a0 andi R19, R19, #7 #@andI_reg_imm
>> 2a4 loadV V1, [R22] # vector (rvv)
>> 2ac vshiftcnt V2, R19
>>
>>
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>> Testing:
>> qemu with UseRVV:
>>
>> - [ ] Tier1 tests (release)
>> - [ ] Tier2 tests (release)
>> - [ ] Tier3 tests (release)
>> - [x] test/jdk/jdk/incubator/vector (fastdebug)
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
>
> Small refactoring of rvv_vsetvli
Thanks for the update. A few more questions follow.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1787:
> 1785: VectorRegister src, BasicType src_bt) {
> 1786: assert(type2aelembytes(dst_bt) > type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 8 && type2aelembytes(src_bt) <= 4, "invalid element size");
> 1787: assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element");
Suggestion: s/"should be integer element"/"unsupported element type"/
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1825:
> 1823: VectorRegister src, BasicType src_bt, VectorRegister tmp) {
> 1824: assert(type2aelembytes(dst_bt) < type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 4 && type2aelembytes(src_bt) <= 8, "invalid element size");
> 1825: assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element");
Suggestion: s/"should be integer element"/"unsupported element type"/
src/hotspot/cpu/riscv/riscv_v.ad line 2660:
> 2658: __ vector_integer_extend(as_VectorRegister($dst$$reg), bt == T_FLOAT ? T_INT : T_LONG,
> 2659: Matcher::vector_length(this), as_VectorRegister($src$$reg), T_BYTE);
> 2660: __ vfcvt_f_x_v(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg));
The single-witdh vector conversion instructions when converting to floating-point values use the dynamic rounding mode in 'frm', so I think you should set 'frm' to the correct rounding mode first. You might also want to check the exceptional conditions possibly set by those instrucions.
src/hotspot/cpu/riscv/riscv_v.ad line 2718:
> 2716: %}
> 2717:
> 2718: instruct vcvtItoX(vReg dst, vReg src) %{
I think the 'TEMP_DEF dst' effect is only needed for the T_DOUBLE case. It still works if 'dst' and 'src' are allocated the same vector register for the T_FLOAT case. So you might want to further break down this into two separate ones, say 'vcvtItoF' and 'vcvtItoD'.
src/hotspot/cpu/riscv/riscv_v.ad line 2804:
> 2802: ins_encode %{
> 2803: __ rvv_vsetvli(T_FLOAT, Matcher::vector_length(this, $src));
> 2804: __ vfcvt_x_f_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg));
The language spec [1] specfies: "The round toward zero rounding policy applies to (i) conversion of a floating-point value to an integer value ([§5.1.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3)), and (ii) floating-point remainder ([§15.17.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.17.3))."
So it looks to me that we should use the 'rtz' variant (vfcvt.rtz.x.f.v) here to do the conversion to integer instead here. Please also check other places where we do conversion from float-point value to integer value.
[1] https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.4
-------------
Changes requested by fyang (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1407243306
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181365812
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181365844
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181364523
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181331244
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181332597
More information about the hotspot-compiler-dev
mailing list