RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4]
Emanuel Peter
epeter at openjdk.org
Thu Jun 19 12:37:59 UTC 2025
On Thu, 19 Jun 2025 08:10:47 GMT, Kuai Wei <kwei at openjdk.org> wrote:
>> @kuaiwei I see. If there are multiple groups, then things look more difficult.
>>
>> @merykitty Once proposed the idea of not doing MergeStores / MergeLoads as IGVN optimizations, but rather to just have a separate and dedicated phase. At the time, I was against it, because I had already implemented `MergeStores` quite far. But now I'm starting to see it as a possibly better alternative.
>>
>> That would allow you to take a global view, collect all loads (and stores), put them in a big list, and then make groups that belong together. And then see which groups could be legally replaced with a single load / store. In a way, that is a global vectorizer. And we could handle other patterns than just merging loads and stores: we could also merge copy patterns, for example. That could be much more powerful than the current approach. And it would avoid the issue with having to determine if the current node in IGVN is the best "candidate", or if we should look for another node further down.
>>
>> I don't know what you think about this complete "rethink" of the approach. But I do think it would be more powerful, and also avoid having to cache results during IGVN. All the "cached" results are local to that dedicated "MergeMemopsPhase" or whatever we would call it.
>>
>> What do you think?
>
> @eme64 , It sounds a good idea. I think a benefit is we can put 'merge memory' optimization before auto vectorization, so it can expose more chance for it. I'm not clear about copy pattern you mentioned. Can you give same example as reference?
@kuaiwei I'm glad to hear you are for it too :)
> I think a benefit is we can put 'merge memory' optimization before auto vectorization, so it can expose more chance for it.
I'm not sure that is the best idea. I've tried that with MergeStores, and it had some bad effects for some "fill" loops: we would use MergeStores, and then SuperWord would work differently and fail some tests. You could investigate, but I'm not sure it is worth it. I would leave SuperWord to deal with loops, and use MergeMemory / MergeLoads / MergeStores to deal with the remaining code, be it in loops or straight line code.
> I'm not clear about copy pattern you mentioned. Can you give same example as reference?
a[0 + i] = b[0 + i]
a[1 + i] = b[1 + i]
Though that may require aliasing analysis in some cases, not sure if the complexity is worth it in general. Probably not. It is probably also not profitable to merge the copy pattern, and then to add a very expensisive aliasing check - you would lose more than you gain.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2987946996
More information about the hotspot-compiler-dev
mailing list