RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v7]
Jatin Bhateja
jbhateja at openjdk.org
Wed May 17 06:13:56 UTC 2023
On Mon, 15 May 2023 11:05:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171
>>
>> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations.
>>
>> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`.
>>
>> **Performance results**
>> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`.
>>
>> I disabled `turbo-boost`.
>> Machine: `11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16`.
>> Full `avx512` support, including `avx512dq` required for `MulReductionVL`.
>>
>>
>> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note |
>> ---------------------------------------------------------------
>> int add 2063 2085 660 530 415 283 | |
>> int mul 2272 2257 1189 733 908 439 | |
>> int min 2527 2520 2516 2579 2585 2542 | 1 |
>> int max 2548 2525 2551 2516 2515 2517 | 1 |
>> int and 2410 2414 602 480 353 263 | |
>> int or 2149 2151 597 498 354 262 | |
>> int xor 2059 2062 605 476 364 263 | |
>> long add 1776 1790 2000 1000 1683 591 | 2 |
>> long mul 2135 2199 2185 2001 2176 1307 | 2 |
>> long min 1439 1424 1421 1420 1430 1427 | 3 |
>> long max 2299 2287 2303 2305 1433 1425 | 3 |
>> long and 1657 1667 2015 1003 1679 568 | 4 |
>> long or 1776 1783 2032 1009 1680 569 | 4 |
>> long xor 1834 1783 2012 1024 1679 570 | 4 |
>> float add 2779 2644 2633 2648 2632 2639 | 5 |
>> float mul 2779 2871 2810 2776 2732 2791 | 5 |
>> float min 2294 2620 1725 1286 872 672 | |
>> float max 2371 2519 1697 1265 841 468 | |
>> double add 2634 2636 2635 2650 2635 2648 | 5 |
>> double mul 3053 2955 2881 3030 2979 2927 | 5 |
>> double min 2364 2400 2439 2399 2486 2398 | 6 |
>> double max 2488 2459 2501 ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> whitespace fix
src/hotspot/share/opto/vectornode.hpp line 438:
> 436: virtual bool make_normal_vector_op_implemented(const TypeVect* vt) {
> 437: return Matcher::match_rule_supported_vector(Op_MulVI, vt->length(), vt->element_basic_type());
> 438: }
I agree with Vladimir's comments, we can remove explicit calls from each reduction node class and introduce factory method _VectorNode::make_from_ropc_ in vectornode.cpp similar to _ReductionNode::make_from_vopc_ which accept reduction opcode and returns equivalent vector node.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1195969474
More information about the hotspot-compiler-dev
mailing list