RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2]
Emanuel Peter
epeter at openjdk.org
Wed Mar 27 09:35:21 UTC 2024
On Wed, 27 Mar 2024 08:45:55 GMT, SUN Guoyun <duke at openjdk.org> wrote:
>> This patch prohibits the conversion from "(x+1)+y" into "(x+y)+1" when y is a CallNode to reduce unnecessary spillcode and ADDNode.
>>
>> Testing: tier1-3 in x86_64 and LoongArch64
>>
>> JMH in x86_64:
>> <pre>
>> before:
>> Benchmark Mode Cnt Score Error Units
>> CallNode.test thrpt 2 26397.733 ops/s
>>
>> after:
>> Benchmark Mode Cnt Score Error Units
>> CallNode.test thrpt 2 27839.337 ops/s
>> </pre>
>
> SUN Guoyun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>
> - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode
> - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode
A possible counter-example:
x1 = something
y1 = someCall
for (int i = 0; i < a.length; i++) {
a[i] = (x + 1) + y) + ((x + 2) + y) + ((x + 2) + y) + ((x + 3) + y) + ((x + 4) + y)
}
The call is outside the loop, so folding would not be costly at all. And I fear that the 4 terms would not common up, and so be slower after your change. And I think there are probably other examples. But I have not benchmarked anything, so I could be quite wrong.
What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2022308731
More information about the hotspot-compiler-dev
mailing list