RFR: 8306966: RISC-V: Support vector cast node for Vector API [v3]
Fei Yang
fyang at openjdk.org
Fri Apr 28 02:44:53 UTC 2023
On Thu, 27 Apr 2023 14:03:58 GMT, Gui Cao <gcao at openjdk.org> wrote:
>> Hi,
>>
>> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot.
>>
>> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes.
>>
>> For example, we can use the following command to print the compilation log of a jtreg test case:
>>
>>
>> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \
>> -v:default \
>> -concurrency:16 -timeout:50 \
>> -javaoption:-XX:+UnlockExperimentalVMOptions \
>> -javaoption:-XX:+UseRVV \
>> -javaoption:-XX:+PrintOptoAssembly \
>> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \
>> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \
>> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \
>> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>>
>>
>> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X
>> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes:
>>
>> ```
>> 1ba0 ld R28, [R23, #280] # ptr, #@loadP
>> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm
>> 1ba8 reinterpretResize V1, V5
>> 1bb0 vcvtBtoX V4, V1
>> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000
>> ```
>>
>> #### VectorRearrange/VectorReinterpret
>>
>> When the original vector is transformed to the target vector, if the actual number of elements of the original vector is larger than the number of elements of the target vector, a slice action is performed to provide data for the subsequent cast nodes. the slice action depends on the `VectorRearrange` and `VectorReinterpret` nodes.
>>
>> The compilation log for the `VectorRearrange` node:
>>
>> ```
>> 1f6 spill R7 -> [sp, #320] # spill size = 64
>> 1f8 spill [sp, #128] -> V1 # vector spill size = 256
>> 200 spill [sp, #160] -> V2 # vector spill size = 256
>> 208 rearrange V3, V1, V2
>> 210 spill V3 -> [sp, #96] # vector spill size = 256
>> 218 li R11, #4 # int, #@loadConI
>> ```
>>
>> The compilation log for the `VectorReinterpret` node:
>>
>>
>> 1218 spill [sp, #32] -> V4 # vector spill size = 256
>> 1220 spill [sp, #176] -> V3 # vector spill size = 256
>> 1228 rearrange V2, V4, V3
>> 1230 spill [sp, #72] -> V0 # vmask spill size = 32
>> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend
>> 1244 reinterpretResize V2, V1
>> 124c vcvtStoX_extend V5, V2
>> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000
>>
>>
>> #### LShiftCntV/RShiftCntV/MaskAll
>>
>> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types
>>
>> The compilation log for the LShiftCntV/RShiftCntV node:
>>
>>
>> 24c vasrB V3, V1, V2
>> 260 storeV [R19], V3 # vector (rvv)
>> 268 lbu R19, [R29, #48] # byte, #@loadUB
>> 26c andi R19, R19, #7 #@andI_reg_imm
>> 270 loadV V1, [R25] # vector (rvv)
>> 278 vshiftcnt V2, R19
>> 280 vasrB V3, V1, V2
>> 294 storeV [R26], V3 # vector (rvv)
>> 29c lbu R19, [R29, #80] # byte, #@loadUB
>> 2a0 andi R19, R19, #7 #@andI_reg_imm
>> 2a4 loadV V1, [R22] # vector (rvv)
>> 2ac vshiftcnt V2, R19
>>
>>
>> By the way, the mask version of MaskAll is supported.
>>
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java
>> Testing:
>> qemu with UseRVV:
>>
>> - [ ] Tier1 tests (release)
>> - [ ] Tier2 tests (release)
>> - [ ] Tier3 tests (release)
>> - [x] test/jdk/jdk/incubator/vector (fastdebug)
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
>
> During the conversion, specify the number of vectors
Changes requested by fyang (Reviewer).
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1797:
> 1795: assert_different_registers(dst, src);
> 1796:
> 1797: rvv_vsetvli(dst_bt, length_in_bytes);
I think we should use the actual AVL instread of 'length_in_bytes' for rvv_vsetvli ?
src/hotspot/cpu/riscv/riscv_v.ad line 2837:
> 2835: if (bt == T_LONG) {
> 2836: __ vector_integer_extend(as_VectorRegister($dst$$reg), T_LONG,
> 2837: Matcher::vector_length_in_bytes(this), as_VectorRegister($dst$$reg), T_INT);
Will this work? I see you are asserting that 'dst' and 'src' vector registers are different in vector_integer_extend. But the same vector register is passed for these two paramerters here.
src/hotspot/cpu/riscv/riscv_v.ad line 2885:
> 2883: %}
> 2884:
> 2885: instruct vcvtDtoF(vReg dst_src1, vReg tmp) %{
Why not break down 'dst_src1' into two seperate 'dst' and 'src' inputs like you do for 'vcvtFtoD' ?
-------------
PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1405146050
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1179873004
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1179875372
PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1179873881
More information about the hotspot-compiler-dev
mailing list