RFR: 8325155: C2 SuperWord: remove alignment boundaries [v4]

Tue May 28 17:36:46 UTC 2024

> I have tried for a very long time to get rid of all the `alignment(n)` code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it.
> 
> I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code.
> 
> This is what I did in this PR:
> - Removal of `_node_info`: used to have many fields, which I refactored out to the `VLoopAnalyzer` modules. `alignment` is the last component, which I now remove.
> - Changed the implementation of `SuperWord::find_adjacent_refs`, now `SuperWord::find_adjacent_memop_pairs`, completely:
>   - It used to be an algorithm that would scan over all `memops` repeatedly, try to find some `mem_ref` and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse.
>   - I now add all `memops` into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to `O(n log n)` for sort, and a linear scan for finding adjacent memops.
> - I removed the "alignment boundaries" created in `SuperWord::memory_alignment` by `int off_rem = offset % vw;`.
>   - This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets `31, 32`, which are adjacent in theory, but if we have a `vw = 32`, then the modulo-offsets are `31, 0`, and they are not detected as adjacent).
>   - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code.
> - The `alignment` used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node.
>   - This was done by giving all nodes an `alignment`, even the non-memop nodes. This `alignment` was then scaled up and down at type casts (e.g. int `0, 4, 8, 12` -> long `0, 8, 16, 24`). If the output-size of the def-node did not match the input-size of the use-node, then the `alignment` would not match up, and we would not pack.
>   - This is why we used to have checks like `alignment(s1) + data_size(s1) == alignment(s2)` ...

Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits:

 - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries
 - merge
 - Apply suggestions from code review by Christian

   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - more updates for Christian
 - Merge branch 'master' into JDK-8325155-rm-alignment-boundaries
 - rm TODO
 - manual merge
 - revert a line, need to fix it different
 - improve comments
 - fix alignment
 - ... and 19 more: https://git.openjdk.org/jdk/compare/da6aa2a8...41fc1eb3

-------------

Changes: https://git.openjdk.org/jdk/pull/18822/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18822&range=03
  Stats: 1072 lines in 7 files changed: 608 ins; 358 del; 106 mod
  Patch: https://git.openjdk.org/jdk/pull/18822.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822

PR: https://git.openjdk.org/jdk/pull/18822