RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9]

Wed May 17 13:12:34 UTC 2023

> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171
> 
> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations.
> 
> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`.
> 
> **Performance results**
> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`.
> 
> I disabled `turbo-boost`.
> Machine: `11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16`.
> Full `avx512` support, including `avx512dq` required for `MulReductionVL`.
> 
> 
> operation     M-N-2  M-N-3  M-2    M-3    P-2    P-3   | note |
> ---------------------------------------------------------------
> int add       2063   2085   660    530    415    283   |      |
> int mul       2272   2257   1189   733    908    439   |      |
> int min       2527   2520   2516   2579   2585   2542  | 1    |
> int max       2548   2525   2551   2516   2515   2517  | 1    |
> int and       2410   2414   602    480    353    263   |      |
> int or        2149   2151   597    498    354    262   |      |
> int xor       2059   2062   605    476    364    263   |      |
> long add      1776   1790   2000   1000   1683   591   | 2    |
> long mul      2135   2199   2185   2001   2176   1307  | 2    |
> long min      1439   1424   1421   1420   1430   1427  | 3    |
> long max      2299   2287   2303   2305   1433   1425  | 3    |
> long and      1657   1667   2015   1003   1679   568   | 4    |
> long or       1776   1783   2032   1009   1680   569   | 4    |
> long xor      1834   1783   2012   1024   1679   570   | 4    |
> float add     2779   2644   2633   2648   2632   2639  | 5    |
> float mul     2779   2871   2810   2776   2732   2791  | 5    |
> float min     2294   2620   1725   1286   872    672   |      |
> float max     2371   2519   1697   1265   841    468   |      |
> double add    2634   2636   2635   2650   2635   2648  | 5    |
> double mul    3053   2955   2881   3030   2979   2927  | 5    |
> double min    2364   2400   2439   2399   2486   2398  | 6    |
> double max    2488   2459   2501   2451   2493   2498  | 6    |
> 
> Legen...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  added missing float/double cases to VectorNode::scalar_opcode

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/13056/files
  - new: https://git.openjdk.org/jdk/pull/13056/files/e1af0966..e3d99c95

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=07-08

  Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/13056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056

PR: https://git.openjdk.org/jdk/pull/13056