RFR: 8333006: RISC-V: C2: Support vector-scalar and vector-immediate arithmetic instructions [v2]

Wed May 29 02:27:02 UTC 2024

On Tue, 28 May 2024 10:49:26 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, We want to support vector-scalar and vector-immediate arithmetic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot.
>> We can use the Byte256VectorTests.java[2] to print the Opto JIT Code, verify and observe the generation of nodes.
>> 
>> For example, we can use the following command to print the Opto JIT Code of a jtreg test case:
>> 
>> 
>> /home/zifeihan/jtreg/bin/jtreg \
>> -v:default \
>> -concurrency:16 -timeout:50 \
>> -javaoption:-XX:+UnlockExperimentalVMOptions \
>> -javaoption:-XX:+UseRVV \
>> -javaoption:-XX:+PrintOptoAssembly \
>> -javaoption:-XX:LogFile=/home/zifeihan/jdk/Byte256VectorTests_PrintOptoAssembly.log \
>> -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \
>> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Byte256VectorTests.java
>> 
>> 
>> 
>> we can observe the specified compilation log `Byte256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate arithmetic instructions for the PR implementation.
>> 
>> vadd_immI Node
>> 
>> 16c     addw  R11, R10, zr	#@convI2L_reg_reg
>> 170     add R9, R31, R11	# ptr, #@addP_reg_reg
>> 174     addi  R9, R9, #16	# ptr, #@addP_reg_imm
>> 176     loadV V1, [R9]	# vector (rvv)
>> 17e     vadd_immI V1, V1, #7
>> 186     add R11, R15, R11	# ptr, #@addP_reg_reg
>> 188     addi  R11, R11, #16	# ptr, #@addP_reg_imm
>> 18a     storeV [R11], V1	# vector (rvv)
>> 
>> 
>> vadd_immI_masked Node
>> 
>> 1e8     B31: #	out( B37 B32 ) <- in( B30 )  Freq: 76.2281
>> 1e8     loadV V2, [R31]	# vector (rvv)
>> 1f0     vloadmask V0, V1
>> 1f8     vadd_immI_masked V2, V2, #7
>> 200     addi  R31, R10, #48	# ptr, #@addP_reg_imm
>> 204     bgeu  R30, R7, B37	#@cmpU_branch  P=0.000001 C=-1.000000
>> 
>> 
>> vadd_regI Node
>> 
>> 0c4     B4: #	out( B9 B5 ) <- in( B8 B3 )  Freq: 1
>> 0c4     vloadcon V1	# generate iota indices
>> 0cc     spill [sp, #4] -> R30	# spill size = 32
>> 0ce     vmul_regI V1, V1, R30
>> 0d6     spill [sp, #0] -> R29	# spill size = 32
>> 0d8     vadd_regI V1, V1, R29
>> 
>> 
>> vadd_regI_masked Node
>> 
>> 244     B36: #	out( B33 B37 ) <- in( B35 )  Freq: 7427.81
>> 244     # castII of R30, #@castII
>> 244     addw  R31, R30, zr	#@convI2L_reg_reg
>> 248     spill [sp, #32] -> R10	# spill size = 64
>> 24a     add R10, R10, R31	# ptr, #@addP_reg_reg
>> 24c     addi  R10, R10, #16	# ptr, #@addP_reg_imm
>> 24e     loadV V2, [R10]	# vector (rvv)
>> 256     vloadmask V0, V1
>> 25e     vadd_regI_masked V2, V2, R29
>> 
>> 
>> ...
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code Format

Updated change looks good. Thanks.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19415#pullrequestreview-2084137672