RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8]
Emanuel Peter
epeter at openjdk.org
Wed Nov 20 08:08:17 UTC 2024
On Wed, 20 Nov 2024 07:39:20 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>>
>> fix benchmark
>
> I have not looked too deeply but may I ask why the result of `offset == 12` is worse than that of `offset == 4`, similarly for other multiples such as `24`, `40` and `8`? I would assume the former should not be worse than the latter since a vectorization for the latter would also work for the former.
@merykitty
> I have not looked too deeply but may I ask why the result of offset == 12 is worse than that of offset == 4
`offset=4` -> take 4-element vectors. Forwarding works perfectly.
`offset=12` -> round down to next power-of-2 -> take 8-element vectors -> forwarding failures.
I suppose there could be an alternative solution here:
If we detect a forwarding failure, then split the vectors in half, and try again. That would mean we'd try 8-element vectors for the `offset==12`, detect a failure. Then retry with 4-element vectors and detect no failures.
The issue with that solution: we would first have to schedule the VTransform graph, so that we get all the store-load dependencies for the vectors. And then we would have to go back to SuperWord, and change the packs, and build a new VTransform. Maybe there are other ways... but they all seem more complicated. Maybe in the future I will add such a "retry with shorter vectors" mechanic. But it would mean that SuperWord/VTransform may run multiple times, and that could be expensive at compile-time. We would have to do this carefully and with plenty of performance testing.
At any rate: this is a simple solution here, it works - surely not optimally - but it works. It fixes the regression I introduced earlier in JDK24 with [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). We are close to the JDK24/JDK25 fork, and it would be nice to get this JDK24 regression integrated.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2487823977
More information about the hotspot-compiler-dev
mailing list