RFR: 8295261: RISC-V: Support ReductionV instructions for Vector API
Gui Cao
gcao at openjdk.org
Thu Oct 27 13:08:04 UTC 2022
Currently, certain vector-specific instructions in c2 are not implemented in RISC-V. This patch will add support of `AndReductionV`, `OrReductionV`, `XorReductionV` for RISC-V. This patch was implemented by referring to the sve version of aarch64 and riscv-v-spec v1.0 [1].
For example, AndReductionV is implemented as follows:
diff --git a/src/hotspot/cpu/riscv/riscv_v.ad b/src/hotspot/cpu/riscv/riscv_v.ad
index 0ef36fdb292..c04962993c0 100644
--- a/src/hotspot/cpu/riscv/riscv_v.ad
+++ b/src/hotspot/cpu/riscv/riscv_v.ad
@@ -63,7 +63,6 @@ source %{
case Op_ExtractS:
case Op_ExtractUB:
// Vector API specific
- case Op_AndReductionV:
case Op_OrReductionV:
case Op_XorReductionV:
case Op_LoadVectorGather:
@@ -785,6 +784,120 @@ instruct vnegD(vReg dst, vReg src) %{
ins_pipe(pipe_slow);
%}
+// vector and reduction
+
+instruct reduce_andI(iRegINoSp dst, iRegIorL2I src1, vReg src2, vReg tmp) %{
+ predicate(n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_INT);
+ match(Set dst (AndReductionV src1 src2));
+ effect(TEMP tmp);
+ ins_cost(VEC_COST);
+ format %{ "vmv.s.x $tmp, $src1\t#@reduce_andI\n\t"
+ "vredand.vs $tmp, $src2, $tmp\n\t"
+ "vmv.x.s $dst, $tmp" %}
+ ins_encode %{
+ __ vsetvli(t0, x0, Assembler::e32);
+ __ vmv_s_x(as_VectorRegister($tmp$$reg), $src1$$Register);
+ __ vredand_vs(as_VectorRegister($tmp$$reg), as_VectorRegister($src2$$reg),
+ as_VectorRegister($tmp$$reg));
+ __ vmv_x_s($dst$$Register, as_VectorRegister($tmp$$reg));
+ %}
+ ins_pipe(pipe_slow);
+%}
+
+instruct reduce_andL(iRegLNoSp dst, iRegL src1, vReg src2, vReg tmp) %{
+ predicate(n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_LONG);
+ match(Set dst (AndReductionV src1 src2));
+ effect(TEMP tmp);
+ ins_cost(VEC_COST);
+ format %{ "vmv.s.x $tmp, $src1\t#@reduce_andL\n\t"
+ "vredand.vs $tmp, $src2, $tmp\n\t"
+ "vmv.x.s $dst, $tmp" %}
+ ins_encode %{
+ __ vsetvli(t0, x0, Assembler::e64);
+ __ vmv_s_x(as_VectorRegister($tmp$$reg), $src1$$Register);
+ __ vredand_vs(as_VectorRegister($tmp$$reg), as_VectorRegister($src2$$reg),
+ as_VectorRegister($tmp$$reg));
+ __ vmv_x_s($dst$$Register, as_VectorRegister($tmp$$reg));
+ %}
After this patch, Vector API can use RVV with the `-XX:+UseRVV` parameter when executing java programs on the RISC-V RVV 1.0 platform. Tests [2] and [3] can be used to test the implementation of this node and it passes the tests properly.
By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test case, hsdis is currently unable to decompile rvv's assembly instructions. The relevant OptoAssembly log output in the compilation log is as follows:
2a8 B22: # out( B14 B23 ) <- in( B21 B31 ) Freq: 32.1131
2a8 lwu R28, [R9, #8] # loadNKlass, compressed class ptr, #@loadNKlass
2ac decode_klass_not_null R14, R28 #@decodeKlass_not_null
2b8 ld R30, [R14, #40] # class, #@loadKlass
2bc li R7, #-1 # int, #@loadConI
2c0 vmv.s.x V1, R7 #@reduce_andI
vredand.vs V1, V2, V1
vmv.x.s R28, V1
2d0 mv R7, precise jdk/internal/vm/vector/VectorSupport$ReductionOperation: 0x000000408c4f6220:Constant:exact * # ptr, #@loadConP
2e8 beq R30, R7, B14 #@cmpP_branch P=0.830000 C=-1.000000
There is no hardware implementation of RISC-V RVV 1.0, so the tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. The execution of `ANDReduceInt256VectorTests` and `ANDReduceLong256VectorTests` test cases under qemu, with `-XX:+UseRVV` turned on, can reduce the execution time of this method by about 50.7% compared to the RVV version without this node implemented. After implementing this node, by comparing the influence of the number of C2 assembly instructions before and after the -XX:+UseRVV parameter is enabled, after enabling -XX:+UseRVV, the number of assembly instructions is reduced by about 50% [4]
[1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#vector-reduction-operations
[2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java#ANDReduceInt256VectorTests
[3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java#ANDReduceLong256VectorTests
[4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md
## Testing:
- hotspot and jdk tier1 on unmatched board without new failures
- test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu
- test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu
-------------
Commit messages:
- Add Reduction C2 instructions for Vector api
Changes: https://git.openjdk.org/jdk/pull/10691/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10691&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8295261
Stats: 117 lines in 1 file changed: 114 ins; 3 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/10691.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/10691/head:pull/10691
PR: https://git.openjdk.org/jdk/pull/10691
More information about the hotspot-compiler-dev
mailing list