RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2]
Christian Hagedorn
chagedorn at openjdk.org
Tue Nov 19 15:32:12 UTC 2024
On Tue, 19 Nov 2024 15:18:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> src/hotspot/share/opto/vtransform.cpp line 235:
>>
>>> 233: // faster. However, this optimization comes with some restrictions, depending on the CPU.
>>> 234: // Generally, Store-to-load forwarding works if the load and store memory regions match
>>> 235: // exactly (same start and width). Generally problematic are partial overlaps - though
>>
>> Should we also mention here that it also works when the loaded data is fully contained in the stored data. (taken from your blog post). Maybe you can also add some examples from your blog post which helped to understand this optimization better when reading the first time about it.
>
>> Should we also mention here that it also works when the loaded data is fully contained in the stored data.
>
> fully contained, as in `strict subset`? I mentioned that already... and sadly it works on some platforms, but not others... quite complex. That is why I make the "conservative assumption".
Ah, I thought that as long as the starting addresses match, then all platforms will do the optimization when we store more bytes than we load. But that's not the case then? But of course for the analysis we do in Superword, we only assume that exact matches will work.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848579487
More information about the hotspot-compiler-dev
mailing list