RFR: 8302908: RISC-V: Support masked vector arithmetic instructions for Vector API [v3]
Dingli Zhang
dzhang at openjdk.org
Wed Mar 29 08:08:55 UTC 2023
On Wed, 22 Feb 2023 00:37:06 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:
>> HI,
>>
>> We have added support for vector add mask instructions, please take a look and have some reviews. Thanks a lot!
>> This patch will add support of vector add/sub/mul/div mask version. It was implemented by referring to RVV v1.0 [1].
>>
>> ## Load/Store/Cmp Mask
>> `VectorLoadMask, VectorMaskCmp, VectorStoreMask` will implement the mask datapath. We can see where the data is passed in the compilation log with `jdk/incubator/vector/Byte128VectorTests.java`:
>>
>> 218 loadV V1, [R7] # vector (rvv)
>> 220 vloadmask V0, V1
>> ...
>> 23c vmaskcmp_rvv_masked V0, V4, V5, V0, V1, #0
>> 24c vstoremask V1, V0
>> 258 storeV [R7], V1 # vector (rvv)
>>
>>
>> The corresponding generated jit assembly:
>>
>> # loadV
>> 0x000000400c8ef958: vsetvli t0,zero,e8,m1,tu,mu
>> 0x000000400c8ef95c: vle8.v v1,(t2)
>>
>> # vloadmask
>> 0x000000400c8ef960: vsetvli t0,zero,e8,m1,tu,
>> 0x000000400c8ef964: vmsne.vx v0,v1,zero
>>
>> # vmaskcmp_rvv_masked
>> 0x000000400c8ef97c: vsetvli t0,zero,e8,m1,tu,mu
>> 0x000000400c8ef980: vmclr.m v1
>> 0x000000400c8ef984: vmseq.vv v1,v4,v5,v0.t
>> 0x000000400c8ef988: vmv1r.v v0,v1
>>
>> # vstoremask
>> 0x000000400c8ef98c: vsetvli t0,zero,e8,m1,tu,mu
>> 0x000000400c8ef990: vmv.v.x v1,zero
>> 0x000000400c8ef994: vmerge.vim v1,v1,1,v0
>>
>>
>> ## Masked vector arithmetic instructions (e.g. vadd)
>> AddMaskTestMerge case:
>>
>> import jdk.incubator.vector.IntVector;
>> import jdk.incubator.vector.VectorMask;
>> import jdk.incubator.vector.VectorOperators;
>> import jdk.incubator.vector.VectorSpecies;
>>
>> public class AddMaskTestMerge {
>>
>> static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_128;
>> static final int SIZE = 1024;
>> static int[] a = new int[SIZE];
>> static int[] b = new int[SIZE];
>> static int[] r = new int[SIZE];
>> static boolean[] c = new boolean[]{true,false,true,false,true,false,true,false};
>> static {
>> for (int i = 0; i < SIZE; i++) {
>> a[i] = i;
>> b[i] = i;
>> }
>> }
>>
>> static void workload(int idx) {
>> VectorMask<Integer> vmask = VectorMask.fromArray(SPECIES, c, 0);
>> IntVector av = IntVector.fromArray(SPECIES, a, idx);
>> IntVector bv = IntVector.fromArray(SPECIES, b, idx);
>> av.lanewise(VectorOperators.ADD, bv, vmask).intoArray(r, idx);
>> }
>>
>> public static void main(String[] args) {
>> for (int i = 0; i < 30_0000; i++) {
>> for (int j = 0; j < SIZE; j += SPECIES.length()) {
>> workload(j);
>> }
>> }
>> }
>> }
>>
>>
>> This test case is reduced from existing jtreg vector tests Int128VectorTests.java[3]. This test case corresponds to the add instruction of the vector mask version and other instructions are similar.
>>
>> Before this patch, the compilation log will not print RVV-related instructions. Now the compilation log is as follows:
>>
>>
>> 0ae B10: # out( B25 B11 ) <- in( B9 ) Freq: 0.999991
>> 0ae loadV V1, [R31] # vector (rvv)
>> 0b6 vloadmask V0, V2
>> 0be vadd.vv V3, V1, V0 #@vaddI_masked
>> 0c6 lwu R28, [R7, #124] # loadN, compressed ptr, #@loadN ! Field: AddMaskTestMerge.r
>> 0ca decode_heap_oop R28, R28 #@decodeHeapOop
>> 0cc lwu R7, [R28, #12] # range, #@loadRange
>> 0d0 NullCheck R28
>>
>>
>> And the jit code is as follows:
>>
>>
>> 0x000000400c823cee: vsetvli t0,zero,e32,m1,tu,mu
>> 0x000000400c823cf2: vle32.v v1,(t6) ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
>> ; - jdk.incubator.vector.IntVector::intoArray at 43 (line 3228)
>> ; - AddMaskTestMerge::workload at 46 (line 25)
>> 0x000000400c823cf6: vsetvli t0,zero,e8,m1,tu,mu
>> 0x000000400c823cfa: vmsne.vx v0,v2,zero ;*invokestatic load {reexecute=0 rethrow=0 return_oop=0}
>> ; - jdk.incubator.vector.VectorMask::fromArray at 47 (line 208)
>> ; - AddMaskTestMerge::workload at 7 (line 22)
>> 0x000000400c823cfe: vsetvli t0,zero,e32,m1,tu,mu
>> 0x000000400c823d02: vadd.vv v3,v3,v1,v0.t ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
>> ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 192 (line 834)
>> ; - jdk.incubator.vector.Int128Vector::lanewise at 9 (line 291)
>> ; - jdk.incubator.vector.Int128Vector::lanewise at 4 (line 41)
>> ; - AddMaskTestMerge::workload at 39 (line 25)
>>
>>
>> ## Mask register allocation & mask bit opreation
>> Since v0 is to be used as a mask register in spec[1], sometimes we need two vmask to do the vector mask logical ops like `AndVMask, OrVMask, XorVMask`. And if only v0 and v31 mask registers are defined, the corresponding c2 nodes will not be generated correctly because of the register pressure[2], so define v30 and v31 as mask register too.
>>
>> `AndVMask` will emit the C2 JIT code like:
>>
>> vloadmask V0, V1
>> vloadmask V30, V2
>> vmask_and V0, V30, V0
>>
>> We also modified the implementation of `spill_copy_vector_stack_to_stack ` so that it no longer occupies the v0 register. In addition to that, we change some node like `vasr/vlsl/vlsr/vstring_x/varray_x/vclearArray_x`, which use v0 internally, to make C2 to sense that they used v0.
>>
>> By the way, the current implementation of `VectorMaskCast` is for the case of equal width of the parameter data, other cases depend on the subsequent cast node.
>>
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int128VectorTests.java
>> [3] https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/src/hotspot/share/opto/chaitin.cpp#L526
>>
>> ### Testing:
>>
>> qemu with UseRVV:
>> - [x] Tier1 tests (release)
>> - [x] Tier2 tests (release)
>> - [ ] Tier3 tests (release)
>> - [x] test/jdk/jdk/incubator/vector (release/fastdebug)
>
> Dingli Zhang has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available.
Because operations such as AndVMask require more than one mask register, we are discussing a more rational approach to register allocation.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/12682#issuecomment-1447518509
More information about the hotspot-compiler-dev
mailing list