RFR: 8325495: C2: implement optimization for series of Add of unique value [v2]
Kangcheng Xu
kxu at openjdk.org
Mon Sep 16 21:07:07 UTC 2024
On Mon, 16 Sep 2024 20:51:49 GMT, Kangcheng Xu <kxu at openjdk.org> wrote:
>> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`.
>>
>> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible).
>>
>> Some notable examples of this transformation include:
>> - `a + a + a` => `a*3` => `(a<<1) + a`
>> - `a + a + a + a` => `a*4` => `a<<2`
>> - `a*3 + a` => `a*4` => `a<<2`
>> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2`
>>
>> See included IR unit tests for more.
>
> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
>
> - Merge branch 'openjdk:master' into arithmetic-canonicalization
> - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2
>
> Arithmetic canonicalization v2
> - remove dead code
> - fix potential void type const nodes
> - refactor and cleanup
> - add more test cases
> - re-implement depth limit on recursion
> - passes TestIRLShiftIdeal_XPlusX_LShiftC
> - passes AddI[L]NodeIdealizationTests
> - revert depth limits
> - ... and 12 more: https://git.openjdk.org/jdk/compare/b7b4f7c0...c8fdb74c
> If this is your intention, then please ignore this message.
Yes, this is my intention.
---
My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts).
The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed.
Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master.
Please re-review once GHA is confirmed passing. Thanks!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2354031910
More information about the hotspot-compiler-dev
mailing list