RFR: 8302453: RISCV: Add support for small width vector operations [v8]
Fei Yang
fyang at openjdk.org
Mon Feb 20 03:09:25 UTC 2023
On Mon, 20 Feb 2023 02:58:53 GMT, Gui Cao <gcao at openjdk.org> wrote:
>> HI,
>>
>> We have added support for small width vector operations, please take a look and have some reviews. Thanks a lot.
>>
>> Add256Test case:
>>
>> import jdk.incubator.vector.IntVector;
>> import jdk.incubator.vector.VectorSpecies;
>>
>> public class Add256Test {
>> static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_256;
>>
>> static final int SIZE = 1024;
>> static int[] a = new int[SIZE];
>> static int[] b = new int[SIZE];
>> static int[] c = new int[SIZE];
>>
>> static {
>> for (int i = 0; i < SIZE; i++) {
>> a[i] = 1;
>> b[i] = 2;
>> }
>> }
>>
>> static void workload() {
>> for (int i = 0; i < a.length; i += SPECIES.length()) {
>> IntVector av = IntVector.fromArray(SPECIES, a, i);
>> IntVector bv = IntVector.fromArray(SPECIES, b, i);
>> av.add(bv).intoArray(c,i);
>> }
>> }
>>
>> public static void main(String args[]) {
>> for (int i = 0; i < 30_0000; i++) {
>> workload();
>> }
>> }
>> }
>>
>>
>>
>> Add128Test case:
>>
>> import jdk.incubator.vector.IntVector;
>> import jdk.incubator.vector.VectorSpecies;
>>
>> public class Add128Test {
>> static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_128;
>>
>> static final int SIZE = 1024;
>> static int[] a = new int[SIZE];
>> static int[] b = new int[SIZE];
>> static int[] c = new int[SIZE];
>>
>> static {
>> for (int i = 0; i < SIZE; i++) {
>> a[i] = 1;
>> b[i] = 2;
>> }
>> }
>>
>> static void workload() {
>> for (int i = 0; i < a.length; i += SPECIES.length()) {
>> IntVector av = IntVector.fromArray(SPECIES, a, i);
>> IntVector bv = IntVector.fromArray(SPECIES, b, i);
>> av.add(bv).intoArray(c,i);
>> }
>> }
>>
>> public static void main(String args[]) {
>> for (int i = 0; i < 30_0000; i++) {
>> workload();
>> }
>> }
>> }
>>
>> These two test cases are reduced from existing jtreg vector tests Int256VectorTests.java [1] and Int128VectorTests.java [2].
>> Note that the use of incubated Java features requires additional startup parameters to expose the module, in this case --add-modules jdk.incubator.vector, for example: `javac --add-modules jdk.incubator.vector Add128Test.java`
>> Before this fix, only the compilation log of Add256Test test cases used RVV-related instructions, after the fix, the compilation log of Add128Test test cases will also use RVV-related instructions.
>> The compiled log fragment of the Add128Test test case after the fix is as follows:
>>
>> ----------------------- MetaData before Compile_id = 464 ------------------------
>> {method}
>> - this oop: 0x0000004057701c60
>> - method holder: 'Add128Test'
>> - constants: 0x0000004057701848 constant pool [64] {0x0000004057701848} for 'Add128Test' cache=0x0000004057702000
>> - access: 0xc1000008 static
>> - name: 'workload'
>> - signature: '()V'
>> ....
>>
>> ------------------------ OptoAssembly for Compile_id = 464 -----------------------
>> #
>> # void ( rawptr:BotPTR )
>> #
>> 000 N192: # out( B1 ) <- BLOCK HEAD IS JUNK Freq: 1
>> 000 BREAKPOINT
>> nop # 7 bytes pad for loops and calls
>> ....
>>
>> 0d0 B8: # out( B9 ) <- in( B12 ) top-of-loop Freq: 253.388
>> 0d0 add R28, R12, R28 # ptr, #@addP_reg_reg
>> 0d2 addi R28, R28, #16 # ptr, #@addP_reg_imm
>> 0d4 storeV [R28], V1 # vector (rvv)
>> 0de ld R28, [R23, #960] # ptr, #@loadP
>> 0e2 addiw R19, R19, #4 #@addI_reg_imm
>> ....
>> 100 addi R31, R31, #16 # ptr, #@addP_reg_imm
>> 102 loadV V2, [R30] # vector (rvv)
>> 10c bgeu R19, R29, B16 #@cmpU_branch P=0.000001 C=-1.000000
>>
>> 110 B12: # out( B8 B13 ) <- in( B11 ) Freq: 253.389
>> 110 loadV V1, [R31] # vector (rvv)
>> 11a vadd.vv V1, V2, V1 #@vaddI
>> 122 bltu R19, R8, B8 #@cmpU_branch P=0.999999 C=-1.000000
>>
>>
>> #### The first part modifies Matcher::max_vector_size, Matcher::min_vector_size
>> In the process of using qemu to open RVV test vector api, currently if you set RVV vlen and java vector api's VectorSpecies are not equal, the instruction set related to RVV will not be used, and the scalar simulation implementation is used. For example, if RVV vlen = 256, the java program will execute normally when the VectorSpecies of the java vector api is SPECIES_256, as in the Add256Test case, and if you look at the java compilation log, the compilation log will also show that the current program does use the RVV-related instruction set. However, when RVV vlen = 256 and the VectorSpecies of the java vector api is SPECIES_64 or SPECIES_128, as in the Add128Test case, the java program executes normally, but no RVV-related instructions are generated in the compilation log. The reason for this is that Matcher::vector_size_supported function returns false during vectorization compilation, and thus no vectorization compilation is performed. This function is implemented as follows.
>>
>> static const bool vector_size_supported(const BasicType bt, int size) {
>> return (Matcher::max_vector_size(bt) >= size &&
>> Matcher::min_vector_size(bt) <= size);
>> }
>>
>> The maximum and minimum values are implemented in the individual architecture AD files, and the current RISC-V implementation is as follows.
>>
>> // Vector width in bytes.
>> const int Matcher::vector_width_in_bytes(BasicType bt) {
>> if (UseRVV) {
>> // The MaxVectorSize should have been set by detecting RVV max vector register size when check UseRVV.
>> // MaxVectorSize == VM_Version::_initial_vector_length
>> return MaxVectorSize;
>> }
>> return 0;
>> }
>> // Limits on vector size (number of elements) loaded into vector.
>> const int Matcher::max_vector_size(const BasicType bt) {
>> return vector_width_in_bytes(bt) / type2aelembytes(bt);
>> }
>> const int Matcher::min_vector_size(const BasicType bt) {
>> return max_vector_size(bt);
>> }
>>
>> In the above implementation, we can see that Matcher::max_vector_size, Matcher::min_vector_size are calculated to the maximum value (i.e. the maximum number of elements of that type that can be processed by the current vector register at one time). When RVV vlen and java vector api's VectorSpecies are not equal, the number of elements processed is not the maximum, so Matcher::vector_size_supported returns false during vectorization compilation, resulting in no vectorization compilation and no use of RVV instruction set optimization.
>>
>> ##### The second part modifies LoadVector, StoreVector node implementation
>> java vector api of VectorSpecies actually indicates a vectorization operation memory size, before the operation need to load the data from memory to the register, after the operation need to store the data in the register to memory. However, the current RISC-V operation on vector register data loading and storage is based on the maximum register width, assuming RVV vlen = 256, for loading, it means that all 256 bits of data are loaded into the vector register, for storage, it means that all 256 bits of data in the register are stored in memory, if the java vector api VectorSpecies is SPECIES_128, if the actual data that needs to be stored at this time is 128 bits, then it stores 128 more bits of data, which also destroys other data.
>>
>> [1] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int128VectorTests.java
>> ### Testing:
>> Let me share some of the testing results carried out on qemu with UseRVV:
>> - [x] Tier1 tests (release)
>> - [x] Tier2 tests (release)
>> - [x] Tier3 tests (release)
>> - [x] test/jdk/jdk/incubator/vector (fastdebug)
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
>
> Add code comment
Updated change looks good. Thanks.
-------------
Marked as reviewed by fyang (Reviewer).
PR: https://git.openjdk.org/jdk/pull/12553
More information about the hotspot-compiler-dev
mailing list