RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5]

Sandhya Viswanathan sviswanathan at openjdk.org
Fri May 12 01:13:49 UTC 2023


On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171
>> 
>> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations.
>> 
>> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`.
>> 
>> **Performance results**
>> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`.
>> 
>> I disabled `turbo-boost`.
>> Machine: `11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16`.
>> Full `avx512` support, including `avx512dq` required for `MulReductionVL`.
>> 
>> 
>> operation     M-N-2  M-N-3  M-2    M-3    P-2    P-3   | note |
>> ---------------------------------------------------------------
>> int add       2063   2085   660    530    415    283   |      |
>> int mul       2272   2257   1189   733    908    439   |      |
>> int min       2527   2520   2516   2579   2585   2542  | 1    |
>> int max       2548   2525   2551   2516   2515   2517  | 1    |
>> int and       2410   2414   602    480    353    263   |      |
>> int or        2149   2151   597    498    354    262   |      |
>> int xor       2059   2062   605    476    364    263   |      |
>> long add      1776   1790   2000   1000   1683   591   | 2    |
>> long mul      2135   2199   2185   2001   2176   1307  | 2    |
>> long min      1439   1424   1421   1420   1430   1427  | 3    |
>> long max      2299   2287   2303   2305   1433   1425  | 3    |
>> long and      1657   1667   2015   1003   1679   568   | 4    |
>> long or       1776   1783   2032   1009   1680   569   | 4    |
>> long xor      1834   1783   2012   1024   1679   570   | 4    |
>> float add     2779   2644   2633   2648   2632   2639  | 5    |
>> float mul     2779   2871   2810   2776   2732   2791  | 5    |
>> float min     2294   2620   1725   1286   872    672   |      |
>> float max     2371   2519   1697   1265   841    468   |      |
>> double add    2634   2636   2635   2650   2635   2648  | 5    |
>> double mul    3053   2955   2881   3030   2979   2927  | 5    |
>> double min    2364   2400   2439   2399   2486   2398  | 6    |
>> double max    2488   2459   2501 ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use is_counted and is_innermost

src/hotspot/share/opto/loopopts.cpp line 4210:

> 4208:           if (use != phi && ctrl_or_self(use) == cl) {
> 4209:             DEBUG_ONLY( current->dump(-1); )
> 4210:             assert(false, "reduction has use inside loop");

I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop: 
   public static int sumReductionImplement(
            int[] a,
            int[] b,
            int[] c,
            int total) {
        int sum = 0;
        for (int i = 0; i < a.length; i++) {
            total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]);
            sum = total + i;
        }
        return total + sum;
    }
Do you think this is a valid concern?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191814579


More information about the hotspot-compiler-dev mailing list