RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal
Emanuel Peter
epeter at openjdk.org
Fri Apr 19 10:00:27 UTC 2024
This is an enhancement for AutoVectorization.
I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order).
Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments.
**Solution Sketch: "canonicalize" the invar**
- Extract all summands of the `invar`: make a list.
- Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands.
- Bypass `CastLL` and `CastII`
- Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input.
- Sort all extracted summands by node idx.
- Add up all summands in new order.
If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same.
**Example**
invar1 = b + c + d + a
invar2 = d + b + a + c
-> equivalent but not identical nodes
Sort, and add up again:
invar1 = a + b + c + d
invar2 = a + b + c + d
-> now the nodes are identical
**Motivation: MemorySegment with invar**
One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands?
This example did not vectorize, even though it should:
https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441
Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration.
Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different.
The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization.
Why does this happen? After parsing, the graph looks like this:
![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f)
We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store.
load_adr = base + memory_segment_offset + CastLL(invar + iv)
store_adr = base + memory_segment_offset + invar + iv
And when we run SuperWord, the graph looks like this:
![image](https://github.com/openjdk/jdk/assets/32593061/c6b37919-39e9-419b-a23d-e480a39b3e51)
The invar now has 3 summands:
- `530 LoadL`: the offset that the MemorySegment has internally, to indicate its offset to the base of the beginning of the backing array.
- `11 Para`: the input `invar` parameter from our test method.
- `1460 Phi` pre-loop iv increment (value unknown at compile time)
We can see that the addition of the summands is quite different for the load and store.
An alternative solution would have been to add a corresponding `CastLL` for both the `load` and `store`, and hope that this means that the final address would look identical, though I don't know how difficult that would be.
**Future Work**
There are still a few relevant `MemorySegment` address patterns that are not properly recognized by `VPointer`.
In some cases, the `RangeCheck` is not eliminated from the loop: [JDK-8327209](https://bugs.openjdk.org/browse/JDK-8327209)
I also have seen a case where the `If` from the `VarHandleSegmentAsInts::offsetPlain` did not fold away.
![image](https://github.com/openjdk/jdk/assets/32593061/9c064631-30ce-4c5f-b938-f0b9b5afa7d4)
The check looks like `((2271 AddL) + 4) & 3 == 0`. Maybe this could be fixed with IGVN optimizations, but I'm not sure.
I am also thinking about refactoring `VPointer` completely. It has grown over time, and its pattern-matching is a bit nasty.
-------------
Commit messages:
- IR rules for test only on 64 bit
- more tests, more comments, rm trace code
- more int/long tests: where offsetPlain moves away
- add long tests
- verify cfg case
- test: handle AlignVector
- some int tests
- allow LShift for scaling
- better comments
- allow add and sub in invar
- ... and 5 more: https://git.openjdk.org/jdk/compare/6bc6392d...687611a0
Changes: https://git.openjdk.org/jdk/pull/18795/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8330274
Stats: 1062 lines in 3 files changed: 1062 ins; 0 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/18795.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/18795/head:pull/18795
PR: https://git.openjdk.org/jdk/pull/18795
More information about the hotspot-compiler-dev
mailing list