RFR: 8325495: C2: implement optimization for series of Add of unique value [v2]

Kangcheng Xu kxu at openjdk.org
Mon Sep 16 21:07:07 UTC 2024


On Mon, 16 Sep 2024 20:51:49 GMT, Kangcheng Xu <kxu at openjdk.org> wrote:

>> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`.
>> 
>> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible).  
>> 
>> Some notable examples of this transformation include:
>> - `a + a + a` => `a*3` => `(a<<1) + a`
>> - `a + a + a + a` => `a*4` => `a<<2`
>> - `a*3 + a` => `a*4` => `a<<2`
>> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2`
>> 
>> See included IR unit tests for more.
>
> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into arithmetic-canonicalization
>  - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2
>    
>    Arithmetic canonicalization v2
>  - remove dead code
>  - fix potential void type const nodes
>  - refactor and cleanup
>  - add more test cases
>  - re-implement depth limit on recursion
>  - passes TestIRLShiftIdeal_XPlusX_LShiftC
>  - passes AddI[L]NodeIdealizationTests
>  - revert depth limits
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/b7b4f7c0...c8fdb74c

> If this is your intention, then please ignore this message.

Yes, this is my intention.

--- 

My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts).

The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed.

Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master. 

Please re-review once GHA is confirmed passing. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2354031910


More information about the hotspot-compiler-dev mailing list