RFR: 8295261: RISC-V: Support ReductionV instructions for Vector API
Yadong Wang
yadongwang at openjdk.org
Fri Nov 4 02:27:35 UTC 2022
On Thu, 13 Oct 2022 07:54:47 GMT, Gui Cao <gcao at openjdk.org> wrote:
> Currently, certain vector-specific instructions in c2 are not implemented in RISC-V. This patch will add support of `AndReductionV`, `OrReductionV`, `XorReductionV` for RISC-V. This patch was implemented by referring to the sve version of aarch64 and riscv-v-spec v1.0 [1].
>
> For example, AndReductionV is implemented as follows:
>
>
> diff --git a/src/hotspot/cpu/riscv/riscv_v.ad b/src/hotspot/cpu/riscv/riscv_v.ad
> index 0ef36fdb292..c04962993c0 100644
> --- a/src/hotspot/cpu/riscv/riscv_v.ad
> +++ b/src/hotspot/cpu/riscv/riscv_v.ad
> @@ -63,7 +63,6 @@ source %{
> case Op_ExtractS:
> case Op_ExtractUB:
> // Vector API specific
> - case Op_AndReductionV:
> case Op_OrReductionV:
> case Op_XorReductionV:
> case Op_LoadVectorGather:
> @@ -785,6 +784,120 @@ instruct vnegD(vReg dst, vReg src) %{
> ins_pipe(pipe_slow);
> %}
>
> +// vector and reduction
> +
> +instruct reduce_andI(iRegINoSp dst, iRegIorL2I src1, vReg src2, vReg tmp) %{
> + predicate(n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_INT);
> + match(Set dst (AndReductionV src1 src2));
> + effect(TEMP tmp);
> + ins_cost(VEC_COST);
> + format %{ "vmv.s.x $tmp, $src1\t#@reduce_andI\n\t"
> + "vredand.vs $tmp, $src2, $tmp\n\t"
> + "vmv.x.s $dst, $tmp" %}
> + ins_encode %{
> + __ vsetvli(t0, x0, Assembler::e32);
> + __ vmv_s_x(as_VectorRegister($tmp$$reg), $src1$$Register);
> + __ vredand_vs(as_VectorRegister($tmp$$reg), as_VectorRegister($src2$$reg),
> + as_VectorRegister($tmp$$reg));
> + __ vmv_x_s($dst$$Register, as_VectorRegister($tmp$$reg));
> + %}
> + ins_pipe(pipe_slow);
> +%}
> +
> +instruct reduce_andL(iRegLNoSp dst, iRegL src1, vReg src2, vReg tmp) %{
> + predicate(n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_LONG);
> + match(Set dst (AndReductionV src1 src2));
> + effect(TEMP tmp);
> + ins_cost(VEC_COST);
> + format %{ "vmv.s.x $tmp, $src1\t#@reduce_andL\n\t"
> + "vredand.vs $tmp, $src2, $tmp\n\t"
> + "vmv.x.s $dst, $tmp" %}
> + ins_encode %{
> + __ vsetvli(t0, x0, Assembler::e64);
> + __ vmv_s_x(as_VectorRegister($tmp$$reg), $src1$$Register);
> + __ vredand_vs(as_VectorRegister($tmp$$reg), as_VectorRegister($src2$$reg),
> + as_VectorRegister($tmp$$reg));
> + __ vmv_x_s($dst$$Register, as_VectorRegister($tmp$$reg));
> + %}
>
>
>
> After this patch, Vector API can use RVV with the `-XX:+UseRVV` parameter when executing java programs on the RISC-V RVV 1.0 platform. Tests [2] and [3] can be used to test the implementation of this node and it passes the tests properly.
>
> By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test case, hsdis is currently unable to decompile rvv's assembly instructions. The relevant OptoAssembly log output in the compilation log is as follows:
>
>
> 2a8 B22: # out( B14 B23 ) <- in( B21 B31 ) Freq: 32.1131
> 2a8 lwu R28, [R9, #8] # loadNKlass, compressed class ptr, #@loadNKlass
> 2ac decode_klass_not_null R14, R28 #@decodeKlass_not_null
> 2b8 ld R30, [R14, #40] # class, #@loadKlass
> 2bc li R7, #-1 # int, #@loadConI
> 2c0 vmv.s.x V1, R7 #@reduce_andI
> vredand.vs V1, V2, V1
> vmv.x.s R28, V1
> 2d0 mv R7, precise jdk/internal/vm/vector/VectorSupport$ReductionOperation: 0x000000408c4f6220:Constant:exact * # ptr, #@loadConP
> 2e8 beq R30, R7, B14 #@cmpP_branch P=0.830000 C=-1.000000
>
>
> There is no hardware implementation of RISC-V RVV 1.0, so the tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. The execution of `ANDReduceInt256VectorTests` and `ANDReduceLong256VectorTests` test cases under qemu, with `-XX:+UseRVV` turned on, can reduce the execution time of this method by about 50.7% compared to the RVV version without this node implemented. After implementing this node, by comparing the influence of the number of C2 assembly instructions before and after the -XX:+UseRVV parameter is enabled, after enabling -XX:+UseRVV, the number of assembly instructions is reduced by about 50% [4]
>
> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#vector-reduction-operations
> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java#ANDReduceInt256VectorTests
> [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java#ANDReduceLong256VectorTests
> [4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md
>
> ## Testing:
> - hotspot and jdk tier1 on unmatched board without new failures
> - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu
> - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu
src/hotspot/cpu/riscv/riscv_v.ad line 814:
> 812: "vmv.x.s $dst, $tmp" %}
> 813: ins_encode %{
> 814: __ vsetvli(t0, x0, Assembler::e64);
Only the element basic type of the two code segments is different. Could you use Matcher::vector_element_basic_type() to simplify the code?
-------------
PR: https://git.openjdk.org/jdk/pull/10691
More information about the hotspot-compiler-dev
mailing list