RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer
Emanuel Peter
epeter at openjdk.org
Mon Dec 2 13:42:02 UTC 2024
I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old!
**Goal**
Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`).
The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
**Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg .
**Details**
This looks like a rather big patch, so let me explain the parts.
- Refactor of `MemPointer` in `mepointer.hpp/cpp`:
- Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`.
- `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before.
- So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`.
- Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split.
- Re-write of `VPointer` based on `MemPointer`:
- Old pattern:
- `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]`
- `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the addition of `11 Param`, `1819 Phi` and `629 LoadL`.
- New pattern:
- `VPointer[size: 4, object, base(110 CastPP) + con( 16) + iv_scale( 4) * iv + invar(0)]`
- `VPointer[size: 1, native, base(629 LoadL) + con( 0) + iv_scale( 1) * iv + invar(1 * [11 Parm] + 1 * [1819 Phi])]` -> parse through the `CastX2P`, and detect the more helpful base `629 LoadL` (`MemorySegment.address()`), take `11 Param` and `1819 Phi` as invariants.
- Move `invariant` check to `VLoop::is_pre_loop_invariant`.
- Re-work the aliasing computation, and give it better names:
- Before, it was done via `VPointer::cmp`, which returned quite cryptic return codes. This allowed us to do adjacency and overlap queries. Now I have explicit queries `is_adjacent_to_and_before` and `never_overlaps_with`.
- `VPointer::make_with_size` and `VPointer::make_with_iv_offset`: one can now create a copy of a `VPointer` with a changed size (e.g. when going from smaller scalar to larger vector), or with a shifted offset (when simulating a `VPointer` in the next iteration). Now we can do aliasing computations with such modified `VPointers`, which is quite powerful and I will probably use for other things in the future.
- Refactor many uses of `VPointer`:
- since we now do not have a single `invar`, but rather a set of `invar_summands`, I had to change some code in `AlignmentSolution / AlignmentSolver`, `VMemoryRegion`, `create_adjacent_memop_pairs` and `VTransform::adjust_pre_loop_limit_to_align_main_loop_vectors`. This unfortunately makes the change quite large, but a lot of it is just renaming of variables (e.g. `scale` -> `iv_scale` to avoid confusion with scales of other summands).
- `VTransformMemVectorNode`: it now carries the `VPointer` corresponding to the vector load/store, adjusted for its size. We can now do aliasing checks with these nodes. This is how I refactored away `overlap_possible_with_any_in`, and replaced it with a single call to `never_overlaps_with`.
- Fix existing tests: more cases now vectorize:
- Non-power-of-2 stride: the issue used to be that e.g. `stride=3` would make IGVN split `3 * iv -> iv + iv << 1`. This was not properly parsed by the old `VPointer`, as it found the `iv` twice. But the `MemPointerParser` simply collects all usages and adds them up again, back to `3 * iv`.
- Some cases in `TestMemorySegment.java` now vectorize, because `VPointer` is better at detecting the invariants and detecting that they are equivalent.
- Add new tests:
- A while ago, I had worked on: [JDK-8330274](https://bugs.openjdk.org/browse/JDK-8330274) / https://github.com/openjdk/jdk/pull/18795, but then gave it up, in favour of this refactoring here. I had previously noticed, that the old VPointer does not deal very well with multiple invariants, because if the order of addition is off, then a different `invar` node was generated, and then the aliasing computation gets confused, i.e. thinks the aliasing is unknown. I added the tests from there:`TestEquivalentInvariants.java`.
**Future work:**
[JDK-8330274](https://bugs.openjdk.org/browse/JDK-8330274): investigate why some cases with multiple invariants still do not vectorize, and what can be done.
[JDK-8331576](https://bugs.openjdk.org/browse/JDK-8331576): Investigate case where native memory does not vectorize because of `CastX2P`.
[JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751): Aliasing-Analysis with speculative runtime-checks
-------------
Commit messages:
- rename
- fix up print
- add TestEquivalentInvariants.java
- improve documentation
- hide parser via delegation
- Merge branch 'master' into JDK-8343685-VPointer-MemPointer
- make sort stable
- some comment and naming improvements
- check if field not found
- update comments
- ... and 102 more: https://git.openjdk.org/jdk/compare/28ae281b...4b3c7d29
Changes: https://git.openjdk.org/jdk/pull/21926/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8343685
Stats: 4032 lines in 18 files changed: 1849 ins; 1538 del; 645 mod
Patch: https://git.openjdk.org/jdk/pull/21926.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/21926/head:pull/21926
PR: https://git.openjdk.org/jdk/pull/21926
More information about the hotspot-compiler-dev
mailing list