Integrated: 8302652: [SuperWord] Reduction should happen after loop, when possible

Emanuel Peter epeter at openjdk.org
Tue May 23 08:09:23 UTC 2023


On Thu, 16 Mar 2023 07:29:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171
> 
> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations.
> 
> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`.
> 
> **Performance results**
> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`.
> 
> I disabled `turbo-boost`.
> Machine: `11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16`.
> Full `avx512` support, including `avx512dq` required for `MulReductionVL`.
> 
> 
> operation     M-N-2  M-N-3  M-2    M-3    P-2    P-3   | note |
> ---------------------------------------------------------------
> int add       2063   2085   660    530    415    283   |      |
> int mul       2272   2257   1189   733    908    439   |      |
> int min       2527   2520   2516   2579   2585   2542  | 1    |
> int max       2548   2525   2551   2516   2515   2517  | 1    |
> int and       2410   2414   602    480    353    263   |      |
> int or        2149   2151   597    498    354    262   |      |
> int xor       2059   2062   605    476    364    263   |      |
> long add      1776   1790   2000   1000   1683   591   | 2    |
> long mul      2135   2199   2185   2001   2176   1307  | 2    |
> long min      1439   1424   1421   1420   1430   1427  | 3    |
> long max      2299   2287   2303   2305   1433   1425  | 3    |
> long and      1657   1667   2015   1003   1679   568   | 4    |
> long or       1776   1783   2032   1009   1680   569   | 4    |
> long xor      1834   1783   2012   1024   1679   570   | 4    |
> float add     2779   2644   2633   2648   2632   2639  | 5    |
> float mul     2779   2871   2810   2776   2732   2791  | 5    |
> float min     2294   2620   1725   1286   872    672   |      |
> float max     2371   2519   1697   1265   841    468   |      |
> double add    2634   2636   2635   2650   2635   2648  | 5    |
> double mul    3053   2955   2881   3030   2979   2927  | 5    |
> double min    2364   2400   2439   2399   2486   2398  | 6    |
> double max    2488   2459   2501   2451   2493   2498  | 6    |
> 
> Legen...

This pull request has now been integrated.

Changeset: 06b0a5e0
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/06b0a5e03852dfed9f1dee4791fc71b4e4e1eeda
Stats:     1084 lines in 16 files changed: 845 ins; 52 del; 187 mod

8302652: [SuperWord] Reduction should happen after loop, when possible

Reviewed-by: kvn, pli, jbhateja, sviswanathan

-------------

PR: https://git.openjdk.org/jdk/pull/13056


More information about the hotspot-compiler-dev mailing list