RFR: 8297549: RISC-V: Add support for Vector API vector load const operation

Tue Nov 29 03:15:37 UTC 2022

On Mon, 28 Nov 2022 01:34:15 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> The instruction which is matched `VectorLoadConst`  will create index starting from 0 and incremented by 1. In detail, the instruction populates the destination vector by setting the first element to 0 and monotonically incrementing the value by 1 for each subsequent element. 
>> 
>> We can add support of `VectorLoadConst` for RISC-V by `vid.v` . It was implemented by referring to RVV v1.0 [1].
>> 
>> We can use the JMH test from https://github.com/openjdk/jdk/pull/10332. Tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. By adding the `-XX:+PrintAssembly`, the compilation log of `floatIndexVector` is as follows:
>> 
>> 
>> 120     vloadcon V2	# generate iota indices
>> 12c     vfmul.vv V1, V2, V1	#@vmulF
>> 134     vfmv.v.f  V2, F8	#@replicateF
>> 13c     vfadd.vv V1, V2, V1	#@vaddF
>> 
>> The above nodes match the logic of `Compute indexes with "vec + iota * scale"` in https://github.com/openjdk/jdk/pull/10332, which is the operation corresponding to `addIndex` in benchmark:
>> https://github.com/openjdk/jdk/blob/d6102110e1b48c065292db83744245a33e269cc2/test/micro/org/openjdk/bench/jdk/incubator/vector/IndexVectorBenchmark.java#L92-L97
>> 
>> At the same time, the following assembly code will be generated when running the `floatIndexVector` case, there will be one more instruction than `intIndexVector`:
>> 
>>  0x000000401443cc9c:   .4byte	0x10072d7
>>  0x000000401443cca0:   .4byte	0x5208a157
>>  0x000000401443cca4:   .4byte	0x4a219157
>> 
>> `0x10072d7/0x5208a1d7` is the machine code for `vsetvli/vid.v` and `0x4a219157` is the additional machine code for `vfcvt.f.x.v`, which are the opcodes generated by `is_floating_point_type(bt)`:
>> 
>>     if (is_floating_point_type(bt)) {
>>       __ vfcvt_f_x_v(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg));
>>     }
>> 
>> 
>> After we implement these nodes, by using `-XX:+UseRVV`, the number of assembly instructions is reduced by about ~50% because of the different execution paths with the number of loops, similar to `AddTest` [3].
>> 
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/jdk/blob/857b0f9b05bc711f3282a0da85fcff131fffab91/test/micro/org/openjdk/bench/jdk/incubator/vector/IndexVectorBenchmark.java
>> [3] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md
>> 
>> Please take a look and have some reviews. Thanks a lot.
>> 
>> ## Testing:
>> 
>> - hotspot and jdk tier1 without new failures (release with UseRVV on QEMU)
>> - test/jdk/jdk/incubator/vector/* (fastdebug/release with UseRVV on QEMU)
>
> src/hotspot/cpu/riscv/riscv_v.ad line 2091:
> 
>> 2089:     __ vid_v(as_VectorRegister($dst$$reg));
>> 2090:     if (is_floating_point_type(bt)) {
>> 2091:       __ vfcvt_f_x_v(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg));
> 
> You might want to distinugish between float and double for 'bt' here. Since vfcvt.f.x.v only convert signed integer to float.

Hi @RealFYang , thanks for the review! Since `vid.v` generates a sequence of integers, it should be converted and stored in each element of the vector if `bt` is floating point type.

-------------

PR: https://git.openjdk.org/jdk/pull/11344