RFR: 8302453: RISCV: Add support for small width vector operations [v3]

Fei Yang fyang at openjdk.org
Wed Feb 15 13:14:52 UTC 2023


On Wed, 15 Feb 2023 10:22:22 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> HI,
>> 
>>    We have added support for small width vector operations, please take a look and have some reviews. Thanks a lot.
>> 
>> Add256Test case:
>> 
>> import jdk.incubator.vector.IntVector;
>> import jdk.incubator.vector.VectorSpecies;
>> 
>> public class Add256Test {
>>     static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_256;
>> 
>>     static final int SIZE = 1024;
>>     static int[] a = new int[SIZE];
>>     static int[] b = new int[SIZE];
>>     static int[] c = new int[SIZE];
>> 
>>     static {
>>         for (int i = 0; i < SIZE; i++) {
>>             a[i] = 1;
>>             b[i] = 2;
>>         }
>>     }
>> 
>>     static void workload() {
>>         for (int i = 0; i < a.length; i += SPECIES.length()) {
>>             IntVector av = IntVector.fromArray(SPECIES, a, i);
>>             IntVector bv = IntVector.fromArray(SPECIES, b, i);
>>             av.add(bv).intoArray(c,i);
>>         }
>>     }
>> 
>>     public static void main(String args[]) {
>>         for (int i = 0; i < 30_0000; i++) {
>>             workload();
>>         }
>>     }
>> }
>> 
>> 
>> 
>> Add128Test case:
>> 
>> import jdk.incubator.vector.IntVector;
>> import jdk.incubator.vector.VectorSpecies;
>> 
>> public class Add128Test {
>>     static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_128;
>> 
>>     static final int SIZE = 1024;
>>     static int[] a = new int[SIZE];
>>     static int[] b = new int[SIZE];
>>     static int[] c = new int[SIZE];
>> 
>>     static {
>>         for (int i = 0; i < SIZE; i++) {
>>             a[i] = 1;
>>             b[i] = 2;
>>         }
>>     }
>> 
>>     static void workload() {
>>         for (int i = 0; i < a.length; i += SPECIES.length()) {
>>             IntVector av = IntVector.fromArray(SPECIES, a, i);
>>             IntVector bv = IntVector.fromArray(SPECIES, b, i);
>>             av.add(bv).intoArray(c,i);
>>         }
>>     }
>> 
>>     public static void main(String args[]) {
>>         for (int i = 0; i < 30_0000; i++) {
>>             workload();
>>         }
>>     }
>> }
>> 
>> Using openjdk's own test cases, the same results can be achieved, for example Int256VectorTests.java [1], Int128VectorTests.java [2] . 
>> Note that the use of incubated Java features requires additional startup parameters to expose the module, in this case --add-modules jdk.incubator.vector, for example: `javac --add-modules jdk.incubator.vector Add128Test`
>> Before this fix, only the compilation log of Add256Test test cases used RVV-related instructions, after the fix, the compilation log of Add128Test test cases will also use RVV-related instructions.
>> The compiled log fragment of the Add128Test test case after the fix is as follows:
>> 
>> ----------------------- MetaData before Compile_id = 464 ------------------------
>> {method}
>>  - this oop:          0x0000004057701c60
>>  - method holder:     'Add128Test'
>>  - constants:         0x0000004057701848 constant pool [64] {0x0000004057701848} for 'Add128Test' cache=0x0000004057702000
>>  - access:            0xc1000008  static 
>>  - name:              'workload'
>>  - signature:         '()V'
>> ....
>> 
>> ------------------------ OptoAssembly for Compile_id = 464 -----------------------
>> #
>> #  void ( rawptr:BotPTR )
>> #
>> 000     N192: #	out( B1 ) <- BLOCK HEAD IS JUNK  Freq: 1
>> 000     BREAKPOINT
>>         nop 	# 7 bytes pad for loops and calls
>> ....
>> 
>> 0d0     B8: #	out( B9 ) <- in( B12 ) top-of-loop Freq: 253.388
>> 0d0     add R28, R12, R28	# ptr, #@addP_reg_reg
>> 0d2     addi  R28, R28, #16	# ptr, #@addP_reg_imm
>> 0d4     storeV [R28], V1	# vector (rvv)
>> 0de     ld  R28, [R23, #960]	# ptr, #@loadP
>> 0e2     addiw  R19, R19, #4	#@addI_reg_imm
>> ....
>> 100     addi  R31, R31, #16	# ptr, #@addP_reg_imm
>> 102     loadV V2, [R30]	# vector (rvv)
>> 10c     bgeu  R19, R29, B16	#@cmpU_branch  P=0.000001 C=-1.000000
>> 
>> 110     B12: #	out( B8 B13 ) <- in( B11 )  Freq: 253.389
>> 110     loadV V1, [R31]	# vector (rvv)
>> 11a     vadd.vv V1, V2, V1	#@vaddI
>> 122     bltu  R19, R8, B8	#@cmpU_branch  P=0.999999 C=-1.000000
>> 
>> 
>> #### The first part modifies Matcher::max_vector_size, Matcher::min_vector_size
>> In the process of using qemu to open RVV test vector api, currently if you set RVV vlen and java vector api's VectorSpecies are not equal, the instruction set related to RVV will not be used, and the scalar simulation implementation is used. For example, if RVV vlen = 256, the java program will execute normally when the VectorSpecies of the java vector api is SPECIES_256, as in the Add256Test case, and if you look at the java compilation log, the compilation log will also show that the current program does use the RVV-related instruction set. However, when RVV vlen = 256 and the VectorSpecies of the java vector api is SPECIES_64 or SPECIES_128, as in the Add128Test case, the java program executes normally, but no RVV-related instructions are generated in the compilation log. The reason for this is that Matcher::vector_size_supported function returns false during vectorization compilation, and thus no vectorization compilation is performed. This function is implemented as follows.
>> 
>>   static const bool vector_size_supported(const BasicType bt, int size) {
>>     return (Matcher::max_vector_size(bt) >= size &&
>>             Matcher::min_vector_size(bt) <= size);
>>   }
>> 
>> The maximum and minimum values are implemented in the individual architecture AD files, and the current RISC-V implementation is as follows.
>> 
>> // Vector width in bytes.
>> const int Matcher::vector_width_in_bytes(BasicType bt) {
>>   if (UseRVV) {
>>     // The MaxVectorSize should have been set by detecting RVV max vector register size when check UseRVV.
>>     // MaxVectorSize == VM_Version::_initial_vector_length
>>     return MaxVectorSize;
>>   }
>>   return 0;
>> }
>> // Limits on vector size (number of elements) loaded into vector.
>> const int Matcher::max_vector_size(const BasicType bt) {
>>   return vector_width_in_bytes(bt) / type2aelembytes(bt);
>> }
>> const int Matcher::min_vector_size(const BasicType bt) {
>>   return max_vector_size(bt);
>> }
>> 
>> In the above implementation, we can see that Matcher::max_vector_size, Matcher::min_vector_size are calculated to the maximum value (i.e. the maximum number of elements of that type that can be processed by the current vector register at one time). When RVV vlen and java vector api's VectorSpecies are not equal, the number of elements processed is not the maximum, so Matcher::vector_size_supported returns false during vectorization compilation, resulting in no vectorization compilation and no use of RVV instruction set optimization.
>> 
>> ##### The second part modifies LoadVector, StoreVector node implementation
>> java vector api of VectorSpecies actually indicates a vectorization operation memory size, before the operation need to load the data from memory to the register, after the operation need to store the data in the register to memory. However, the current RISC-V operation on vector register data loading and storage is based on the maximum register width, assuming RVV vlen = 256, for loading, it means that all 256 bits of data are loaded into the vector register, for storage, it means that all 256 bits of data in the register are stored in memory, if the java vector api VectorSpecies is SPECIES_128, if the actual data that needs to be stored at this time is 128 bits, then it stores 128 more bits of data, which also destroys other data.
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int128VectorTests.java
>> ### Testing:
>> Let me share some of the testing results carried out on qemu with UseRVV:
>> - [ ] Tier1-3 tests (release)
>> - [ ] Tier4 test in progress (release)
>> - [x] test/jdk/jdk/incubator/vector (fastdebug)
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add function rvv_vsetvli to simplify the code

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1709:

> 1707: }
> 1708: 
> 1709: void C2_MacroAssembler::rvv_vsetvli(BasicType bt, int length_in_bytes) {

Better to introduce another parameter to receive a temporary register and set 't0' as the default.

-------------

PR: https://git.openjdk.org/jdk/pull/12553


More information about the hotspot-compiler-dev mailing list