RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2]

Mon Apr 1 09:21:35 UTC 2024

On Wed, 27 Mar 2024 09:32:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> SUN Guoyun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>> 
>>  - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode
>>  - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode
>
> A possible counter-example:
> 
> 
> x1 = something
> y1 = someCall
> 
> for (int i = 0; i < a.length; i++) {
>   a[i] = (x + 1) + y) + ((x + 2) + y) + ((x + 2) + y) + ((x + 3) + y) + ((x + 4) + y)
> }
> 
> The call is outside the loop, so folding would not be costly at all. And I fear that the 4 terms would not common up, and so be slower after your change. And I think there are probably other examples. But I have not benchmarked anything, so I could be quite wrong.
> 
> What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch.

> Why only (x+1)+y and not also x+(y+1)? I agree with @eme64 about breaking other optimizations if we don't move constants to the right. (x+1)+(y+1) no longer becomes (x+y)+2. 

Previously, I didn't have a deep understanding of constant folding, but now it seems that this PR will cause performance regression. 

> To reduce spills, I would think we would want to move Calls to the left. (x+1)+y and x+(y+1) both become y+x+1.

This is a good idea. But my idea is, can we delete add1 after `add1->clone()` ? https://github.com/openjdk/jdk/pull/18482/files#diff-c59303cb42c3e35f20bc530628dc611003e21819e84cedb2279e69cde0345410L183

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2029463266