RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2]
Emanuel Peter
epeter at openjdk.org
Tue Apr 2 06:04:59 UTC 2024
On Mon, 1 Apr 2024 09:19:28 GMT, SUN Guoyun <duke at openjdk.org> wrote:
>> A possible counter-example:
>>
>>
>> x1 = something
>> y1 = someCall
>>
>> for (int i = 0; i < a.length; i++) {
>> a[i] = (x + 1) + y) + ((x + 2) + y) + ((x + 2) + y) + ((x + 3) + y) + ((x + 4) + y)
>> }
>>
>> The call is outside the loop, so folding would not be costly at all. And I fear that the 4 terms would not common up, and so be slower after your change. And I think there are probably other examples. But I have not benchmarked anything, so I could be quite wrong.
>>
>> What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch.
>
>> Why only (x+1)+y and not also x+(y+1)? I agree with @eme64 about breaking other optimizations if we don't move constants to the right. (x+1)+(y+1) no longer becomes (x+y)+2.
>
> Previously, I didn't have a deep understanding of constant folding, but now it seems that this PR will cause performance regression.
>
>> To reduce spills, I would think we would want to move Calls to the left. (x+1)+y and x+(y+1) both become y+x+1.
>
> This is a good idea. But my idea is, can we delete add1 after `add1->clone()` ? https://github.com/openjdk/jdk/pull/18482/files#diff-c59303cb42c3e35f20bc530628dc611003e21819e84cedb2279e69cde0345410L183
@sunny868 @dean-long I also like the idea of moving calls to the left. Though even that may not always be an improvement. It would be nice to have a series of benchmarks with all sorts of combinations of calls inside the loop, or outside the loop, adding together with variables inside the loop, or invariants from outside the loop. What we would like is that:
1. everything loop invariant is added up outside the loop, and only loop variant things are added inside the loop.
2. Constants should always go to the right.
3. And calls should go to the left.
These 3 ideas seem to contradict each other at times, so I wonder if there is a more general heuristic here?
Examples for contradiction (`invar`: loop invariant, `var`: loop variant, changes in each loop iteration):
1. `invar + var + con`: `(invar + con) + var` would take everything loop invariant out of the loop, but the constant would not be on the very right.
2. `invar1 + invar2 + call`: assume the call is inside the loop, but `invar1` and `invar2` are before/outside the loop. Moving the `call` to the very left `(call + invar1) + invar2` would mean we have to do more addition work inside the loop (2 additions per iteration), but `(invar1 + invar2) + call` would have the invariants added up before the loop, and only the call value added to that in every loop iteration.
I'm not sure how significant all of this is. You really would need some benchmarks to get more insight. And writing a good benchmark is not easy.
@sunny868 You will have to change the title anyway. Can you write `C2` instead of `[c2]`, please? ;)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2031139128
PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2031140331
More information about the hotspot-compiler-dev
mailing list