RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal
Emanuel Peter
epeter at openjdk.org
Fri Apr 19 10:00:28 UTC 2024
On Tue, 16 Apr 2024 12:00:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> This is an enhancement for AutoVectorization.
>
> I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order).
>
> Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments.
>
> **Solution Sketch: "canonicalize" the invar**
>
> - Extract all summands of the `invar`: make a list.
> - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands.
> - Bypass `CastLL` and `CastII`
> - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input.
>
> - Sort all extracted summands by node idx.
> - Add up all summands in new order.
>
> If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same.
>
> **Example**
>
>
> invar1 = b + c + d + a
> invar2 = d + b + a + c
>
> -> equivalent but not identical nodes
>
> Sort, and add up again:
>
> invar1 = a + b + c + d
> invar2 = a + b + c + d
>
> -> now the nodes are identical
>
> **Motivation: MemorySegment with invar**
>
> One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands?
>
> This example did not vectorize, even though it should:
> https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441
>
> Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration.
>
> Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different.
>
> The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization.
>
> Why does this happen? After parsing, the graph looks like this:
> ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f)
>
> We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store.
>
> load_adr = base + memory_segment_offs...
src/hotspot/share/opto/vectorization.cpp line 101:
> 99: tty->print(" lpt->_head %d", _cl->_idx); _cl->dump();
> 100: _lpt->dump_head();
> 101: _cl->dump_bfs(100, _cl_exit, "c-");
Note: Simply makes it easier to see what kind of CFG is between the loop head and end. Is it a `RangeCheck`, or some `If`?
src/hotspot/share/opto/vectorization.cpp line 503:
> 501: #ifdef ASSERT
> 502: // We are changing the invar, and the debug info may no longer be accurate.
> 503: if (new_invar != _invar) { _debug_invar = NodeSentinel; }
Note: Roland had inserted this `_debug_invar` verification code a year ago. Putting `NodeSentinel` basically just disables the verifiation, he uses that also elsewhere already.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18795#discussion_r1568819915
PR Review Comment: https://git.openjdk.org/jdk/pull/18795#discussion_r1568821470
More information about the hotspot-compiler-dev
mailing list