RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account
Emanuel Peter
epeter at openjdk.org
Thu Jun 12 15:25:29 UTC 2025
On Thu, 12 Jun 2025 15:11:05 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.
>>
>> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.
>
>> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.
>>
>> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.
>
> I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`.
>
> It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit.
@rwestrel Dropping the `Phi` means there are fewer nodes, and probably it is easier to let things float out from the inner loop. That probably works fine for data nodes, and loads that float up. But for sunk stores it doesn't work... at least not without your cleanup now.
In some way, not having the `Phi` violates the assumption of the C2 IR. Namely that if you can change the value/memory during the loop, then you need a `Phi` to model that. That is a bit scary, to handle outer strip mined loops different... sure we want the old optimizations to still work, but we have no idea what "new" things now badly optimize, like in this case here. Any new optimization also has to be aware that in the case of strip mined loops, the absence of a phi does not mean there cannot be a mutation on that data/memory. Not great :/
Adding the `Phi`s in all cases would probably break some optimizations, as you say. Maybe we would have to add dedicated `skip_strip_mining` logic all over the place. It would be difficult to know where we are missing them. That's not great either :/
It's a difficult trade-off.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2967280652
More information about the hotspot-compiler-dev
mailing list