RFR: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes

Mon Jun 5 06:17:23 UTC 2023

Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 
[2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar-

### AddReductionVF/AddReductionVD
We can use Float256VectorTests.java Double256VectorTests.java  to
emit these nodes and the compilation log is as follows:
#### AddReductionVF
Before this patch:

0f6     B15: #	out( B61 B16 ) <- in( B14 )  Freq: 55.8033
0f6     # castII of R19, #@castII
0f6     addw  R10, R19, zr	#@convI2L_reg_reg
0fa     slli  R10, R10, (#2 & 0x3f)	#@lShiftL_reg_imm
0fc     add R11, R31, R10	# ptr, #@addP_reg_reg
100     addi  R11, R11, #16	# ptr, #@addP_reg_imm
102     loadV V1, [R11]	# vector (rvv)
10a     spill F0 -> F1	# spill size = 32
10e     reduce_addF F1, F1, V1	# KILL V2
11e     bgeu  R19, R29, B61	#@cmpU_branch  P=0.000001 C=-1.000000

After this patch(Saving a spill operation):

0f6     B15: #	out( B61 B16 ) <- in( B14 )  Freq: 55.8033
0f6     # castII of R19, #@castII
0f6     addw  R10, R19, zr	#@convI2L_reg_reg
0fa     slli  R10, R10, (#2 & 0x3f)	#@lShiftL_reg_imm
0fc     add R11, R31, R10	# ptr, #@addP_reg_reg
100     addi  R11, R11, #16	# ptr, #@addP_reg_imm
102     loadV V1, [R11]	# vector (rvv)
10a     reduce_addF F1, F0, V1	# KILL V2
11a     bgeu  R19, R29, B61	#@cmpU_branch  P=0.000001 C=-1.000000

#### AddReductionVD
Before this patch:

0f4     B15: #	out( B61 B16 ) <- in( B14 )  Freq: 55.8033
0f4     # castII of R9, #@castII
0f4     addw  R10, R9, zr	#@convI2L_reg_reg
0f8     slli  R10, R10, (#3 & 0x3f)	#@lShiftL_reg_imm
0fa     add R11, R30, R10	# ptr, #@addP_reg_reg
0fe     addi  R11, R11, #16	# ptr, #@addP_reg_imm
100     loadV V1, [R11]	# vector (rvv)
108     spill F0 -> F1	# spill size = 64
10c     reduce_addD F1, F1, V1	# KILL V2
11c     bgeu  R9, R31, B61	#@cmpU_branch  P=0.000001 C=-1.000000

After this patch(Saving a spill operation):

0f4     B15: #	out( B61 B16 ) <- in( B14 )  Freq: 55.8033
0f4     # castII of R9, #@castII
0f4     addw  R10, R9, zr	#@convI2L_reg_reg
0f8     slli  R10, R10, (#3 & 0x3f)	#@lShiftL_reg_imm
0fa     add R11, R30, R10	# ptr, #@addP_reg_reg
0fe     addi  R11, R11, #16	# ptr, #@addP_reg_imm
100     loadV V1, [R11]	# vector (rvv)
108     reduce_addD F1, F0, V1	# KILL V2
118     bgeu  R9, R31, B61	#@cmpU_branch  P=0.000001 C=-1.000000

- [x] Tier1 tests (release)
- [x] Tier2 tests (release)
- [ ] Tier3 tests (release)
- [x] test/jdk/jdk/incubator/vector (fastdebug)

-------------

Commit messages:
 - RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes

Changes: https://git.openjdk.org/jdk/pull/14308/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14308&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8309419
  Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod
  Patch: https://git.openjdk.org/jdk/pull/14308.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14308/head:pull/14308

PR: https://git.openjdk.org/jdk/pull/14308