RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9]
Jatin Bhateja
jbhateja at openjdk.org
Wed May 17 18:56:00 UTC 2023
On Wed, 17 May 2023 13:12:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171
>>
>> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations.
>>
>> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`.
>>
>> **Performance results**
>> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`.
>>
>> I disabled `turbo-boost`.
>> Machine: `11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16`.
>> Full `avx512` support, including `avx512dq` required for `MulReductionVL`.
>>
>>
>> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note |
>> ---------------------------------------------------------------
>> int add 2063 2085 660 530 415 283 | |
>> int mul 2272 2257 1189 733 908 439 | |
>> int min 2527 2520 2516 2579 2585 2542 | 1 |
>> int max 2548 2525 2551 2516 2515 2517 | 1 |
>> int and 2410 2414 602 480 353 263 | |
>> int or 2149 2151 597 498 354 262 | |
>> int xor 2059 2062 605 476 364 263 | |
>> long add 1776 1790 2000 1000 1683 591 | 2 |
>> long mul 2135 2199 2185 2001 2176 1307 | 2 |
>> long min 1439 1424 1421 1420 1430 1427 | 3 |
>> long max 2299 2287 2303 2305 1433 1425 | 3 |
>> long and 1657 1667 2015 1003 1679 568 | 4 |
>> long or 1776 1783 2032 1009 1680 569 | 4 |
>> long xor 1834 1783 2012 1024 1679 570 | 4 |
>> float add 2779 2644 2633 2648 2632 2639 | 5 |
>> float mul 2779 2871 2810 2776 2732 2791 | 5 |
>> float min 2294 2620 1725 1286 872 672 | |
>> float max 2371 2519 1697 1265 841 468 | |
>> double add 2634 2636 2635 2650 2635 2648 | 5 |
>> double mul 3053 2955 2881 3030 2979 2927 | 5 |
>> double min 2364 2400 2439 2399 2486 2398 | 6 |
>> double max 2488 2459 2501 ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> added missing float/double cases to VectorNode::scalar_opcode
src/hotspot/share/opto/loopopts.cpp line 4192:
> 4190:
> 4191: // Convert opcode from vector-reduction -> scalar -> normal-vector-op
> 4192: const int sopc = VectorNode::scalar_opcode(last_ur->Opcode(), bt);
Other changes looks good to me, can you rename _VectorNode::scalar_opcode_ to _ReductionNode::scalar_opcode_
, also move out vector opcode cases into a separate vector-to-scalar mapping routine if needed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1196935662
More information about the hotspot-compiler-dev
mailing list