RFR: 8325155: C2 SuperWord: remove alignment boundaries [v6]

Thu May 30 06:33:35 UTC 2024

On Wed, 29 May 2024 07:16:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it.
>> 
>> I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code.
>> 
>> This is what I did in this PR:
>> - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove.
>> - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely:
>>   - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse.
>>   - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops.
>> - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`.
>>   - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent).
>>   - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code.
>> - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node.
>>   - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack.
>>   - This is why we used to have checks like `alignment(s1) + da...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/share/opto/superword.cpp
>   
>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

FYI: I ran performance benchmarking, and there was no significant difference.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18822#issuecomment-2138777893