RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v6]
Christian Hagedorn
chagedorn at openjdk.org
Wed Jan 15 13:08:57 UTC 2025
On Wed, 15 Jan 2025 06:43:23 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.**
>> See: https://eme64.github.io/blog/2025/01/01/AutoVectorization-Status.html
>>
>> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old!
>>
>> **Goal**
>>
>> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`).
>>
>> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
>>
>> **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg .
>>
>> **Details**
>>
>> This looks like a rather big patch, so let me explain the parts.
>> - Refactor of `MemPointer` in `mepointer.hpp/cpp`:
>> - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`.
>> - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before.
>> - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`.
>> - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split.
>> - Re-write of `VPointer` based on `MemPointer`:
>> - Old pattern:
>> - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]`
>> - `VPointer[mem: 31...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> More fixes for vnkozlov
Nice clean-up! It's somewhat difficult to review, so bear with me :-)
I did a pass on `vectorization.hpp` and left some comments (and one stray one in `mempointer.hpp` but I have not looked at that file in more detail).
I will continue to review this later.
src/hotspot/share/opto/mempointer.hpp line 576:
> 574: static const int SUMMANDS_SIZE = 10;
> 575:
> 576: class Base : public StackObj {
Could add the comment from above here as well as a description for the sub class:
// A base can be:
// - Known:
// - On-heap: Object
// - Off-heap: Native
// - Unknown
src/hotspot/share/opto/vectorization.hpp line 89:
> 87:
> 88: NOT_PRODUCT(VTrace _vtrace;)
> 89: NOT_PRODUCT(TraceMemPointer _mptrace; )
Suggestion:
NOT_PRODUCT(TraceMemPointer _mptrace;)
src/hotspot/share/opto/vectorization.hpp line 190:
> 188: // Some nodes must be pre-loop invariant, so that they can be used for conditions
> 189: // before or inside the pre-loop. For example, alignment of main-loop vector
> 190: // memops must be acheived in the pre-loop, via the exit check in the pre-loop.
Suggestion:
// memops must be achieved in the pre-loop, via the exit check in the pre-loop.
src/hotspot/share/opto/vectorization.hpp line 701:
> 699: };
> 700:
> 701: // VPointer wraps the MemPointer to the use in a loop:
Suggestion:
// VPointer wraps the MemPointer for the use in a loop:
src/hotspot/share/opto/vectorization.hpp line 722:
> 720: // corresponding scale. If there is no such summand, then we know that the
> 721: // pointer does not depend on the iv, since otherwise there would have to be
> 722: // a summand where its variable it main-loop variant.
Suggestion:
// a summand where its variable is main-loop variant.
src/hotspot/share/opto/vectorization.hpp line 732:
> 730: const jint _iv_scale;
> 731:
> 732: const bool _is_valid;
Can you add a comment here what it means when a `VPointer` is invalid and how this is possible to end up with an invalid `VPointer`?
src/hotspot/share/opto/vectorization.hpp line 806:
> 804: const MemPointer& mem_pointer() const { assert(_is_valid, ""); return _mem_pointer; }
> 805: jint size() const { assert(_is_valid, ""); return mem_pointer().size(); }
> 806: jint iv_scale() const { assert(_is_valid, ""); return _iv_scale; }
For good measure, I suggest to add an assert message:
Suggestion:
const MemPointer& mem_pointer() const { assert(_is_valid, "must be valid"); return _mem_pointer; }
jint size() const { assert(_is_valid, "must be valid"); return mem_pointer().size(); }
jint iv_scale() const { assert(_is_valid, "must be valid"); return _iv_scale; }
src/hotspot/share/opto/vectorization.hpp line 810:
> 808:
> 809: template<typename Callback>
> 810: void for_each_invar_summand(Callback callback) const {
Even though you talk about `invar_summand` in the descriptive comments, I guess you could be explicit here since it's an API method:
Suggestion:
void for_each_invariant_summand(Callback callback) const {
src/hotspot/share/opto/vectorization.hpp line 822:
> 820: // Greatest common factor among the scales of the invar_summands.
> 821: // Out of simplicity, we only factor out positive powers-of-2,
> 822: // between 1 and ObjectAlignmentInBytes. If the invar is empty,
This include 1 and `ObjectAlignmentInBytes`, right? Between X and Y is often ambiguous. Maybe you want to rephrase this to: "between (inclusive) 1 and ObjectAlignmentInBytes"?
src/hotspot/share/opto/vectorization.hpp line 836:
> 834: }
> 835:
> 836: int count_invar_summands() const {
Same here, you could be explicit:
Suggestion:
int count_invariant_summands() const {
src/hotspot/share/opto/vectorization.hpp line 844:
> 842: }
> 843:
> 844: bool has_same_invar_and_iv_scale_as(const VPointer& other) const {
Okay, I see that we should probably use either "invariant" or "invar" consistently in method names. I guess it's still fine for temporary variables to use "invar". But when reading code it could be more clear when using "invariant". But I leave it up to you to decide - having a better big picture if this is improving the readability or not.
src/hotspot/share/opto/vectorization.hpp line 846:
> 844: bool has_same_invar_and_iv_scale_as(const VPointer& other) const {
> 845: // If we have the same invar_summands, and the same iv summand with the same iv_scale,
> 846: // then all summands except the base must be the same.
Comment could be moved up as a method comment.
src/hotspot/share/opto/vectorization.hpp line 850:
> 848: }
> 849:
> 850: bool is_adjacent_to_and_before(const VPointer& other) const {
Maybe you can add an example as a method comment to illustrate this property with two `VPointers` (could also be added at `MemPointer::is_adjacent_to_and_before()` or only there).
src/hotspot/share/opto/vectorization.hpp line 862:
> 860: }
> 861:
> 862: bool never_overlaps_with(const VPointer& other) const {
Same here, a simple visual example might be helpful.
src/hotspot/share/opto/vectorization.hpp line 890:
> 888:
> 889: // Check that all variables are either the iv, or else invariants.
> 890: bool init_is_valid() const {
You could extract the code to three methods to reduce the size of this method and make it easier to see what's going on:
bool init_is_valid() const {
return is_base_known() &&
are_non_iv_summands_pre_loop_invariant() &&
scale_and_stride_not_too_large()
}
src/hotspot/share/opto/vectorization.hpp line 921:
> 919: }
> 920:
> 921: // In the pointer analysis, and especially the AlignVector, analysis we assume that
Suggestion:
// In the pointer analysis, and especially the AlignVector analysis, we assume that
src/hotspot/share/opto/vectorization.hpp line 949:
> 947:
> 948: // Vector element size statistics for loop vectorization with vector masks
> 949: class VectorElementSizeStats {
Just noticed this when scrolling through the class in my IDE: All methods of this class could be made `const`. You could probably squeeze this in here as well as part of this bigger refactoring - probably not worth a separate RFE.
src/hotspot/share/opto/vectorization.hpp line 1167:
> 1165: const VPointer& p1 = s1->vpointer();
> 1166: const VPointer& p2 = s2->vpointer();
> 1167: bool both_no_invar = p1.count_invar_summands() == 0 &&
Since you are using `pointer.count_invar_summands()` multiple times in this class, you could provide a `ConstrainedAlignmentSolution::count_invar_summands()` method directly instead of first fetching the `VPointer` from the solution.
src/hotspot/share/opto/vectorization.hpp line 1212:
> 1210: tty->print("m * q(%d) + r(%d)", _q, _r);
> 1211: if (_vpointer.count_invar_summands() > 0) {
> 1212: tty->print(" - invar / (iv_scale(%d) * pre_stride)", _vpointer.iv_scale());
Would it make sense to print all invariant summands here as well?
-------------
PR Review: https://git.openjdk.org/jdk/pull/21926#pullrequestreview-2551900437
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916422040
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916157917
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916159970
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916398753
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916400672
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916445680
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916449698
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916456227
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916479629
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916482950
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916499407
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916521656
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916536664
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916540476
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916564214
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916555622
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916579807
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916610507
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916601819
More information about the hotspot-compiler-dev
mailing list