RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v6]

Christian Hagedorn chagedorn at openjdk.org
Wed Jan 15 13:08:57 UTC 2025


On Wed, 15 Jan 2025 06:43:23 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.**
>> See: https://eme64.github.io/blog/2025/01/01/AutoVectorization-Status.html
>> 
>> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old!
>> 
>> **Goal**
>> 
>> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`).
>> 
>> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
>> 
>> **Future**:  with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg .
>> 
>> **Details**
>> 
>> This looks like a rather big patch, so let me explain the parts.
>> - Refactor of `MemPointer` in `mepointer.hpp/cpp`:
>>   - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`.
>>   - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before.
>>   - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`.
>>   - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split.
>> - Re-write of `VPointer` based on `MemPointer`:
>>   - Old pattern:
>>     - `VPointer[mem:  847     StoreI, base:   37, adr:   37,  base[  37] + offset(  16) + invar(   0) + scale(   4) * iv]`
>>     - `VPointer[mem: 31...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More fixes for vnkozlov

Nice clean-up! It's somewhat difficult to review, so bear with me :-)

I did a pass on `vectorization.hpp` and left some comments (and one stray one in `mempointer.hpp` but I have not looked at that file in more detail). 

I will continue to review this later.

src/hotspot/share/opto/mempointer.hpp line 576:

> 574:   static const int SUMMANDS_SIZE = 10;
> 575: 
> 576:   class Base : public StackObj {

Could add the comment from above here as well as a description for the sub class:

// A base can be:
// - Known:
//   - On-heap: Object
//   - Off-heap: Native
// - Unknown

src/hotspot/share/opto/vectorization.hpp line 89:

> 87: 
> 88:   NOT_PRODUCT(VTrace _vtrace;)
> 89:   NOT_PRODUCT(TraceMemPointer _mptrace; )

Suggestion:

  NOT_PRODUCT(TraceMemPointer _mptrace;)

src/hotspot/share/opto/vectorization.hpp line 190:

> 188:   // Some nodes must be pre-loop invariant, so that they can be used for conditions
> 189:   // before or inside the pre-loop. For example, alignment of main-loop vector
> 190:   // memops must be acheived in the pre-loop, via the exit check in the pre-loop.

Suggestion:

  // memops must be achieved in the pre-loop, via the exit check in the pre-loop.

src/hotspot/share/opto/vectorization.hpp line 701:

> 699: };
> 700: 
> 701: // VPointer wraps the MemPointer to the use in a loop:

Suggestion:

// VPointer wraps the MemPointer for the use in a loop:

src/hotspot/share/opto/vectorization.hpp line 722:

> 720: //       corresponding scale. If there is no such summand, then we know that the
> 721: //       pointer does not depend on the iv, since otherwise there would have to be
> 722: //       a summand where its variable it main-loop variant.

Suggestion:

//       a summand where its variable is main-loop variant.

src/hotspot/share/opto/vectorization.hpp line 732:

> 730:   const jint  _iv_scale;
> 731: 
> 732:   const bool _is_valid;

Can you add a comment here what it means when a `VPointer` is invalid and how this is possible to end up with an invalid `VPointer`?

src/hotspot/share/opto/vectorization.hpp line 806:

> 804:   const MemPointer& mem_pointer() const { assert(_is_valid, ""); return _mem_pointer; }
> 805:   jint size()                     const { assert(_is_valid, ""); return mem_pointer().size(); }
> 806:   jint iv_scale()                 const { assert(_is_valid, ""); return _iv_scale; }

For good measure, I suggest to add an assert message:
Suggestion:

  const MemPointer& mem_pointer() const { assert(_is_valid, "must be valid"); return _mem_pointer; }
  jint size()                     const { assert(_is_valid, "must be valid"); return mem_pointer().size(); }
  jint iv_scale()                 const { assert(_is_valid, "must be valid"); return _iv_scale; }

src/hotspot/share/opto/vectorization.hpp line 810:

> 808: 
> 809:   template<typename Callback>
> 810:   void for_each_invar_summand(Callback callback) const {

Even though you talk about `invar_summand` in the descriptive comments, I guess you could be explicit here since it's an API method:
Suggestion:

  void for_each_invariant_summand(Callback callback) const {

src/hotspot/share/opto/vectorization.hpp line 822:

> 820:   // Greatest common factor among the scales of the invar_summands.
> 821:   // Out of simplicity, we only factor out positive powers-of-2,
> 822:   // between 1 and ObjectAlignmentInBytes. If the invar is empty,

This include 1 and `ObjectAlignmentInBytes`, right? Between X and Y is often ambiguous. Maybe you want to rephrase this to: "between (inclusive) 1 and ObjectAlignmentInBytes"?

src/hotspot/share/opto/vectorization.hpp line 836:

> 834:   }
> 835: 
> 836:   int count_invar_summands() const {

Same here, you could be explicit:
Suggestion:

  int count_invariant_summands() const {

src/hotspot/share/opto/vectorization.hpp line 844:

> 842:   }
> 843: 
> 844:   bool has_same_invar_and_iv_scale_as(const VPointer& other) const {

Okay, I see that we should probably use either "invariant" or "invar" consistently in method names. I guess it's still fine for temporary variables to use "invar". But when reading code it could be more clear when using "invariant". But I leave it up to you to decide - having a better big picture if this is improving the readability or not.

src/hotspot/share/opto/vectorization.hpp line 846:

> 844:   bool has_same_invar_and_iv_scale_as(const VPointer& other) const {
> 845:     // If we have the same invar_summands, and the same iv summand with the same iv_scale,
> 846:     // then all summands except the base must be the same.

Comment could be moved up as a method comment.

src/hotspot/share/opto/vectorization.hpp line 850:

> 848:   }
> 849: 
> 850:   bool is_adjacent_to_and_before(const VPointer& other) const {

Maybe you can add an example as a method comment to illustrate this property with two `VPointers` (could also be added at `MemPointer::is_adjacent_to_and_before()` or only there).

src/hotspot/share/opto/vectorization.hpp line 862:

> 860:   }
> 861: 
> 862:   bool never_overlaps_with(const VPointer& other) const {

Same here, a simple visual example might be helpful.

src/hotspot/share/opto/vectorization.hpp line 890:

> 888: 
> 889:   // Check that all variables are either the iv, or else invariants.
> 890:   bool init_is_valid() const {

You could extract the code to three methods to reduce the size of this method and make it easier to see what's going on:

bool init_is_valid() const {
  return is_base_known() && 
         are_non_iv_summands_pre_loop_invariant() && 
         scale_and_stride_not_too_large() 
}

src/hotspot/share/opto/vectorization.hpp line 921:

> 919:     }
> 920: 
> 921:     // In the pointer analysis, and especially the AlignVector, analysis we assume that

Suggestion:

    // In the pointer analysis, and especially the AlignVector analysis, we assume that

src/hotspot/share/opto/vectorization.hpp line 949:

> 947: 
> 948: // Vector element size statistics for loop vectorization with vector masks
> 949: class VectorElementSizeStats {

Just noticed this when scrolling through the class in my IDE: All methods of this class could be made `const`. You could probably squeeze this in here as well as part of this bigger refactoring - probably not worth a separate RFE.

src/hotspot/share/opto/vectorization.hpp line 1167:

> 1165:     const VPointer& p1 = s1->vpointer();
> 1166:     const VPointer& p2 = s2->vpointer();
> 1167:     bool both_no_invar = p1.count_invar_summands() == 0 &&

Since you are using `pointer.count_invar_summands()` multiple times in this class, you could provide a `ConstrainedAlignmentSolution::count_invar_summands()` method directly instead of first fetching the `VPointer` from the solution.

src/hotspot/share/opto/vectorization.hpp line 1212:

> 1210:     tty->print("m * q(%d) + r(%d)", _q, _r);
> 1211:     if (_vpointer.count_invar_summands() > 0) {
> 1212:       tty->print(" - invar / (iv_scale(%d) * pre_stride)", _vpointer.iv_scale());

Would it make sense to print all invariant summands here as well?

-------------

PR Review: https://git.openjdk.org/jdk/pull/21926#pullrequestreview-2551900437
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916422040
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916157917
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916159970
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916398753
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916400672
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916445680
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916449698
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916456227
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916479629
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916482950
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916499407
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916521656
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916536664
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916540476
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916564214
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916555622
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916579807
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916610507
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1916601819


More information about the hotspot-compiler-dev mailing list