RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2]

Wed Mar 27 09:35:21 UTC 2024

On Wed, 27 Mar 2024 08:45:55 GMT, SUN Guoyun <duke at openjdk.org> wrote:

>> This patch prohibits the conversion from  "(x+1)+y" into "(x+y)+1" when y is a CallNode to reduce unnecessary spillcode and ADDNode.
>> 
>> Testing: tier1-3 in x86_64 and LoongArch64
>> 
>> JMH in x86_64:
>> <pre>
>> before:
>> Benchmark           Mode  Cnt      Score   Error  Units
>> CallNode.test      thrpt    2  26397.733          ops/s
>> 
>> after:
>> Benchmark           Mode  Cnt      Score   Error  Units
>> CallNode.test      thrpt    2  27839.337          ops/s
>> </pre>
>
> SUN Guoyun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode
>  - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode

A possible counter-example:

x1 = something
y1 = someCall

for (int i = 0; i < a.length; i++) {
  a[i] = (x + 1) + y) + ((x + 2) + y) + ((x + 2) + y) + ((x + 3) + y) + ((x + 4) + y)
}

The call is outside the loop, so folding would not be costly at all. And I fear that the 4 terms would not common up, and so be slower after your change. And I think there are probably other examples. But I have not benchmarked anything, so I could be quite wrong.

What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2022308731