RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account
Roland Westrelin
roland at openjdk.org
Mon Jun 16 15:26:03 UTC 2025
On Fri, 13 Jun 2025 08:01:52 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:
>>> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.
>>>
>>> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.
>>
>> I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`.
>>
>> It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit.
>
>> > Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.
>> > Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.
>>
>> I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`.
>>
>> It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit.
>
> Thanks for the background, Roland!
>
> I think it would be worth exploring this, but I agree that there is a risk of silently affecting other loop optimizations. Luckily, the IR test framework gives us now a means to improve our confidence that changes in this area do not affect expected optimizations. Unfortunately, our current IR test coverage of loop optimizations is incomplete, so a pre-condition to exploring full SSA for strip-mined loops (and something worth doing in any case IMO) would be adding more IR tests checking that at least basic optimizations like peeling, unswitching, unrolling, range check elimination, etc. happen as expected.
Thanks for the review @robcasloz
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2977070215
More information about the hotspot-compiler-dev
mailing list