RFR: 8333964: RISC-V: C2: Check "requires_strict_order" flag for floating-point add reduction [v2]

Wed Jun 12 14:04:15 UTC 2024

On Wed, 12 Jun 2024 11:27:42 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, We want to support non strictly-ordered floating-point add reduction, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot.
>> 
>> We can use the Float256VectorTests.java[2] to print the Opto JIT Code, verify and observe the generation of nodes.
>> 
>> For example, we can use the following command to print the Opto JIT Code of a jtreg test case:
>> 
>> /home/zifeihan/jtreg/bin/jtreg \
>> -v:default \
>> -concurrency:16 -timeout:50 \
>> -javaoption:-XX:+UnlockExperimentalVMOptions \
>> -javaoption:-XX:+UseRVV \
>> -javaoption:-XX:+PrintOptoAssembly \
>> -javaoption:-XX:LogFile=/home/zifeihan/jdk/Float256VectorTests_PrintOptoAssembly.log \
>> -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \
>> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Float256VectorTests.java
>> 
>> We can observe the specified JIT Code log Float256VectorTests_PrintOptoAssembly.log, which contains the reduce_addF_ordered instruction for the PR implementation.
>> 
>> 1e4     B28: #	out( B28 B29 ) <- in( B41 B28 ) Loop( B28-B28 inner post of N2310) Freq: 98.8164
>> 1e4     shadd  R17, R15, R10, #2	# ptr, #@shaddP_reg_reg_ext_b
>> 1e8     addi  R17, R17, #16	# ptr, #@addP_reg_imm
>> 1ea     loadV V1, [R17]	# vector (rvv)
>> 1f2     reduce_addF_unordered F2, F0, V1	# KILL V2
>> 202     fadd.s  F1, F1, F2	#@addF_reg_reg
>> 206     addiw  R15, R15, #8	#@addI_reg_imm
>> 208     blt  R15, R31, B28	#@cmpI_loop  P=0.500000 C=19400.000000
>> 
>> Similarly, for `reduce_addD_unordered` instruction, we can use the `test/jdk/jdk/incubator/vector/Double256VectorTests.java` test case.
>> 
>> ### Performance testing:
>> FloatMaxVector.ADDLanes [2] measures the performance of add reduction for floating-point type.
>> Without Patch:
>> 
>> Benchmark                (size)   Mode  Cnt    Score   Error   Units
>> FloatMaxVector.ADDLanes    1024  thrpt    5  394.558 ± 0.044  ops/ms
>> 
>> 
>> With Patch:
>> 
>> Benchmark                (size)   Mode  Cnt    Score   Error   Units
>> FloatMaxVector.ADDLanes    1024  thrpt    5  627.510 ± 1.095  ops/ms
>> 
>> 
>> ### Correctness testing:
>>  - [x] test/jdk/jdk/incubator/vector (fastdebug) qemu 8.1.50 with UseRVV
>>  - [x] Run tier1-3 tests on SOPHON SG2042 (release)
>>  - [ ]  Run tier1-3 tests (release) on qemu 8.1.50 with UseRVV
>> 
>> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
>> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Floa...
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add some code comment

Looks reasonable to me.

src/hotspot/cpu/riscv/riscv_v.ad line 2016:

> 2014: // 2. Strictly-ordered AddReductionVF/D. For example, AddReductionVF/D
> 2015: //    generated by auto-vectorization. Must do an ordered FP reduction sum
> 2016: //    (vfredosum.vs).

Nit: Can you leave an empty line after the code comment?

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19649#pullrequestreview-2113123922
PR Review Comment: https://git.openjdk.org/jdk/pull/19649#discussion_r1636522850