RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v13]
Christian Hagedorn
chagedorn at openjdk.org
Thu Jan 16 15:48:03 UTC 2025
On Thu, 16 Jan 2025 08:47:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.**
>> See: https://eme64.github.io/blog/2025/01/01/AutoVectorization-Status.html
>>
>> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old!
>>
>> **Goal**
>>
>> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`).
>>
>> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
>>
>> **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg .
>>
>> **Details**
>>
>> This looks like a rather big patch, so let me explain the parts.
>> - Refactor of `MemPointer` in `mepointer.hpp/cpp`:
>> - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`.
>> - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before.
>> - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`.
>> - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split.
>> - Re-write of `VPointer` based on `MemPointer`:
>> - Old pattern:
>> - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]`
>> - `VPointer[mem: 31...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> refactor to has_invar_summands and print invar and fix non-product guards
Thanks for addressing my comments! I did a pass on your updates only and left some more comments but mostly minor. Will continue tomorrow with the other files.
src/hotspot/share/opto/mempointer.cpp line 596:
> 594: // p2 = MemPointer[size=4, base2 + 4L * i2 + 20]
> 595: // -> Have differing summands, distance is unknown
> 596: // -> Unknown if overlap at runtime -> return false
Nice examples! That helps to get a quick grasp of these essential checks.
src/hotspot/share/opto/mempointer.hpp line 573:
> 571: // Where SUM() adds all "scale_i * variable_i" for each i together.
> 572: //
> 573: // Node: if the base is known, then it is in the 0th summand. A base can be:
Note?
Suggestion:
// Note: if the base is known, then it is in the 0th summand. A base can be:
src/hotspot/share/opto/vectorization.hpp line 701:
> 699: };
> 700:
> 701: // Reminder: MemPointer have the form:
I like this improved description! 👍
src/hotspot/share/opto/vectorization.hpp line 714:
> 712: // pointer = base + invar + iv_scale * iv + con
> 713: //
> 714: // invar = SUM(invar_summands)
Would it make sense to add `iv_summand` here as well since you mention it below? I.e. something like:
// pointer = base + invar + iv_summand + con
// with
// invar = SUM(invar_summands)
// iv_summand = iv_scale * iv
src/hotspot/share/opto/vectorization.hpp line 723:
> 721: // corresponding scale. If there is no such summand, then we know that the
> 722: // pointer does not depend on the iv, since otherwise there would have to be
> 723: // a summand where its variable is main-loop variant.
Could we find multiple summands with iv?
src/hotspot/share/opto/vectorization.hpp line 728:
> 726: // to memory align a pointer using the pre-loop limit.
> 727: //
> 728: // A VPointer can be marked "invalid", if some of these conditions are not met, or
You mention here "some of these conditions" which one could guess that you mean that it does not conform to this form. But further down you present the actual cases where a VPointer becomes invalid. Should we maybe first add these invalid cases and only then follow up with your explanation what this means for the aliasing queries? (just swap the two sections)
src/hotspot/share/opto/vectorization.hpp line 733:
> 731: // optimize in these cases. For example:
> 732: // - is_adjacent_to_and_before: returning true would allow optimizations such as
> 733: // packing into vectors. So for "invalid" VPointers
Suggestion:
// packing into vectors. So for "invalid" VPointers,
src/hotspot/share/opto/vectorization.hpp line 736:
> 734: // we always return false (i.e. unknown).
> 735: // - never_overlaps_with: returning true would allow optimizations such as
> 736: // swapping the order of memops. So for "invalid" VPointers
Suggestion:
// swapping the order of memops. So for "invalid" VPointers,
src/hotspot/share/opto/vectorization.hpp line 926:
> 924:
> 925: // VPointer needs to know if it is native (off-heap) or object (on-heap).
> 926: // We may for example have failed to fully decompose the MemPointer, possibly
Suggestion:
// We may, for example, have failed to fully decompose the MemPointer, possibly
src/hotspot/share/opto/vectorization.hpp line 939:
> 937: }
> 938:
> 939: // All summands, except the iv-summand must be pre-loop invariant. This is necessary
Suggestion:
// All summands, except the iv-summand, must be pre-loop invariant. This is necessary
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21926#issuecomment-2596068724
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918774734
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918770143
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918736648
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918717753
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918718444
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918722945
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918723702
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918723989
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918776625
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1918777506
More information about the hotspot-compiler-dev
mailing list