RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9]
Emanuel Peter
epeter at openjdk.org
Wed May 17 13:12:34 UTC 2023
> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171
>
> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations.
>
> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`.
>
> **Performance results**
> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`.
>
> I disabled `turbo-boost`.
> Machine: `11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16`.
> Full `avx512` support, including `avx512dq` required for `MulReductionVL`.
>
>
> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note |
> ---------------------------------------------------------------
> int add 2063 2085 660 530 415 283 | |
> int mul 2272 2257 1189 733 908 439 | |
> int min 2527 2520 2516 2579 2585 2542 | 1 |
> int max 2548 2525 2551 2516 2515 2517 | 1 |
> int and 2410 2414 602 480 353 263 | |
> int or 2149 2151 597 498 354 262 | |
> int xor 2059 2062 605 476 364 263 | |
> long add 1776 1790 2000 1000 1683 591 | 2 |
> long mul 2135 2199 2185 2001 2176 1307 | 2 |
> long min 1439 1424 1421 1420 1430 1427 | 3 |
> long max 2299 2287 2303 2305 1433 1425 | 3 |
> long and 1657 1667 2015 1003 1679 568 | 4 |
> long or 1776 1783 2032 1009 1680 569 | 4 |
> long xor 1834 1783 2012 1024 1679 570 | 4 |
> float add 2779 2644 2633 2648 2632 2639 | 5 |
> float mul 2779 2871 2810 2776 2732 2791 | 5 |
> float min 2294 2620 1725 1286 872 672 | |
> float max 2371 2519 1697 1265 841 468 | |
> double add 2634 2636 2635 2650 2635 2648 | 5 |
> double mul 3053 2955 2881 3030 2979 2927 | 5 |
> double min 2364 2400 2439 2399 2486 2398 | 6 |
> double max 2488 2459 2501 2451 2493 2498 | 6 |
>
> Legen...
Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
added missing float/double cases to VectorNode::scalar_opcode
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/13056/files
- new: https://git.openjdk.org/jdk/pull/13056/files/e1af0966..e3d99c95
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=07-08
Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/13056.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056
PR: https://git.openjdk.org/jdk/pull/13056
More information about the hotspot-compiler-dev
mailing list