RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer

Emanuel Peter epeter at openjdk.org
Mon Dec 2 13:42:06 UTC 2024


On Wed, 6 Nov 2024 13:07:13 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old!
> 
> **Goal**
> 
> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`).
> 
> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
> 
> **Future**:  with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg .
> 
> **Details**
> 
> This looks like a rather big patch, so let me explain the parts.
> - Refactor of `MemPointer` in `mepointer.hpp/cpp`:
>   - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`.
>   - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before.
>   - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`.
>   - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split.
> - Re-write of `VPointer` based on `MemPointer`:
>   - Old pattern:
>     - `VPointer[mem:  847     StoreI, base:   37, adr:   37,  base[  37] + offset(  16) + invar(   0) + scale(   4) * iv]`
>     - `VPointer[mem: 3189      LoadB, base:    1, adr: 2273,  base[   1] + offset(   0) + invar(   0) + scale(   1) * iv]` -> `adr = CastX2P`, the addition of `11 Param`, `1819 Phi` and `629 LoadL`.
>   - New pattern:
>     - `VPointer[size:  4, object, base(110 CastPP) + con( 16)...

src/hotspot/share/opto/mempointer.cpp line 188:

> 186:         }
> 187:         // Fall-through: we can find a more precise native-memory "base". We further decompose
> 188:         // the CastX2P to find this "base" and any other offsets from it.

Note: special handling for new `Base` detection, required for `VPointer`.

src/hotspot/share/opto/mempointer.cpp line 206:

> 204:           callback.callback(n);
> 205:           return;
> 206:         }

Note: indentation fix

src/hotspot/share/opto/mempointer.cpp line 458:

> 456:     // (S2) holds. If all summands are the same, also the base must be the same.
> 457:     has_same_base = true;
> 458:   } else {

Note: refactored into `has_same_summands_as` (-> `make_always_at_distance`), and added new case `has_different_object_base_but_otherwise_same_summands_as` (-> `make_not_or_at_distance`). This is necessary for the overlap query needed in `VPointer`.

src/hotspot/share/opto/mempointer.cpp line 541:

> 539: bool MemPointer::is_adjacent_to_and_before(const MemPointer& other) const {
> 540:   const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT( COMMA _trace ));
> 541:   const bool is_adjacent = aliasing.is_always_at_distance(_size);

Note: collapsed `MemPointerDecomposedForm` and `MemPointer` into one.

src/hotspot/share/opto/mempointer.cpp line 582:

> 580:   return is_never_overlap;
> 581: }
> 582: 

Note: replaces the old `VPointer::not_equal`, which was used for overlap queries.

src/hotspot/share/opto/mempointer.hpp line 174:

> 172: //   and determine if they are adjacent.
> 173: //
> 174: // MemPointerDecomposedFormParser:

Note: moved down

src/hotspot/share/opto/mempointer.hpp line 209:

> 207: //
> 208: //   then we can easily compute the distance between the pointers (distance = con2 - con1),
> 209: //   and determine if they are adjacent.

Note: moved down from higher up.

src/hotspot/share/opto/mempointer.hpp line 244:

> 242: //      (2) LoadL from field jdk.internal.foreign.NativeMemorySegmentImpl.min
> 243: //          This would be preferrable over CastX2P, because it holds the address() of a native
> 244: //          MemorySegment, i.e. we know it points to the beginning of that MemorySegment.

Note: description about new `Base` detection needed in `VPointer`.

src/hotspot/share/opto/mempointer.hpp line 430:

> 428:                       //            e.g. "array1[i] vs array2[i+4]":
> 429:                       //                 if "array1 == array2": distance = 4.
> 430:                       //                 if "array1 != array2": different memory objects.

Note: added `NotOrAtDistance`, needed for overlap query.

src/hotspot/share/opto/mempointer.hpp line 521:

> 519:     if (cmp != 0) { return cmp; }
> 520: 
> 521:     return NoOverflowInt::cmp(p1.scale(), p2.scale());

Note: `cmp` is used for sorting.

src/hotspot/share/opto/mempointer.hpp line 559:

> 557:   // Singleton for default arguments.
> 558:   static MemPointerParserCallback& empty() { return _empty; }
> 559: };

Note: used for `VPointer`. In `SuperWord::unrolling_analysis` to track all the "ignore" nodes, which are the internal nodes in the address expressions.

src/hotspot/share/opto/mempointer.hpp line 694:

> 692:       _summands[i] = old.summands_at(i);
> 693:     }
> 694:   }

Note: mutated copy constructor: used for `make_with_size` and `make_with_con`, i.e. modifying the size (scalar -> vector), and modifying the `con`, so that we can shift a `VPointer` and simulate the `VPointer` in a next iteration.

src/hotspot/share/opto/mempointer.hpp line 734:

> 732:   bool has_same_summands_as(const MemPointer& other, uint start) const;
> 733:   bool has_same_summands_as(const MemPointer& other) const { return has_same_summands_as(other, 0); }
> 734:   bool has_different_object_base_but_otherwise_same_summands_as(const MemPointer& other) const;

Note: refactored from `get_aliasing_with` and adding new case `has_different_object_base_but_otherwise_same_summands_as` for overlap query.

src/hotspot/share/opto/mempointer.hpp line 776:

> 774: 
> 775:   bool is_adjacent_to_and_before(const MemPointer& other) const;
> 776:   bool never_overlaps_with(const MemPointer& other) const;

Note: these are the two basic queries: `adjacent` and `overlap`.

src/hotspot/share/opto/noOverflowInt.hpp line 113:

> 111:     if (a.value() > b.value()) { return  1; }
> 112:     return 0;
> 113:   }

Note: needed to compare `MemPointerSummand` (when variable equal), used by `MemPointer` compare, used to sort `VPointer`s.

src/hotspot/share/opto/superword.cpp line 97:

> 95:   }
> 96: };
> 97: 

Note: refactoring: instead of passing in the old `ignored_loop_nodes` to the `VPointer` during parsing, we pass in a `callback`, and track the nodes this way.

src/hotspot/share/opto/superword.cpp line 2880:

> 2878:     // The base is only aligned with ObjectAlignmentInBytes with arrays.
> 2879:     // When the base() is top, we have no alignment guarantee at all.
> 2880:     // Hence, we must now take the base into account for the calculation.

Note: `align_to_ref_p.base()->is_top()` meant it was a native memory access, i.e. off-heap, and not related to any java object.

src/hotspot/share/opto/superword.hpp line 587:

> 585:     static int cmp_by_group(MemOp* a, MemOp* b);
> 586:     static int cmp_by_group_and_con_and_original_index(MemOp* a, MemOp* b);
> 587:   };

Note: we used to put `VPointer` directly in a `GrowableArray` to sort them. Now that the `VPointer` does not have a reference to the `MemNode*`, we have to create a "tuple" of `(VPointer, mem)`. The `original_index` is needed to keep the sort stable in `qsort`. Before this was done with the `mem->_idx`, which is a bit less well behaved.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 148:

> 146:     const VPointer& scalar_p = _vloop_analyzer.vpointers().vpointer(p0->as_Store());
> 147:     const VPointer vector_p(scalar_p.make_with_size(scalar_p.size() * pack_size));
> 148:     vtn = new (_vtransform.arena()) VTransformStoreVectorNode(_vtransform, pack_size, vector_p);

Note: we now pass in the `VPointer` of the size of the vector directly. Before, we referenced the `VPointer`s of the packed nodes. But that reliance will be in the way in the future, and it is more clean to carry the modified `VPointer` like this.

src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 35:

> 33:   flags(POINTER_ALIASING,     "Trace VPointer/MemPointer aliasing") \
> 34:   flags(POINTER_ADJACENCY,    "Trace VPointer/MemPointer adjacency") \
> 35:   flags(POINTER_OVERLAP,      "Trace VPointer/MemPointer overlap") \

Note: naming is now more consistent with `TraceMergeStore`.

src/hotspot/share/opto/traceMergeStoresTag.hpp line 36:

> 34:     flags(POINTER_PARSING,      "Trace pointer IR") \
> 35:     flags(POINTER_ALIASING,     "Trace MemPointerSimpleForm::get_aliasing_with") \
> 36:     flags(POINTER_ADJACENCY,    "Trace adjacency") \

Note: naming is now more consistent with `TraceAutoVectorization`.

src/hotspot/share/opto/vectorization.hpp line 207:

> 205:     // Is it before the pre-loop?
> 206:     return phase()->is_dominator(ctrl, pre_loop_head());
> 207:   }

Note: replaces the old `VPointer::invariant`

src/hotspot/share/opto/vectorization.hpp line 737:

> 735:   Node_Stack* _nstack;       // stack used to record a vpointer trace of variants
> 736:   bool        _analyze_only; // Used in loop unrolling only for vpointer trace
> 737:   uint        _stack_idx;    // Used in loop unrolling only for vpointer trace

Note: this is now replaced with the `MemPointerParserCallback`.

src/hotspot/share/opto/vectorization.hpp line 836:

> 834:       jlong max_diff = (jlong)1 << 31;
> 835:       if (difference >= max_diff) {
> 836:         return NotComparable;

Note: this used to be the basis of `adjacency` and `offset` queries. Now we do this in `MemPointer`, and with better naming ;)

src/hotspot/share/opto/vectorization.hpp line 854:

> 852:       // the two memory regions.
> 853:       if (!not_equal(p_mem)) {
> 854:         return true;

Note: refactored away. We used to pass in all nodes in a pack/vector, find all their `VPointer`. Now the vector has its own size-adjusted `VPointer`, and we can simply do a `never_overlaps_with` query directly on that.

src/hotspot/share/opto/vectorization.hpp line 940:

> 938:         tty->print_cr("VPointer::init_is_valid: scale or stride too large.");
> 939:       }
> 940: #endif

Note: this was already present in `VPointer::VPointer` before.

src/hotspot/share/opto/vtransform.cpp line 352:

> 350:             // care too much about those edge cases.
> 351:             memory_regions.push(new VMemoryRegion(iv_offset_p, is_load, schedule_order++));
> 352:           }

Note: instead of doing the costum logic with `iv_offset` all in `VMemoryRegion`, we now create a modified copy of the `VPointer` that simulates the `VPointer` at `iv = iv + iv_offset`.

src/hotspot/share/opto/vtransform.cpp line 579:

> 577:       mem = mem->in(MemNode::Memory);
> 578:     } else {
> 579:       break;

Note: We now have access to the size-adjusted `VPointer` for the vector.

src/hotspot/share/opto/vtransform.hpp line 488:

> 486: class VTransformMemVectorNode : public VTransformVectorNode {
> 487: private:
> 488:   const VPointer _vpointer; // with size of the vector

Note: make sure we can store the size-adjusted `VPointer` for load/store vectors.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865401201
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865400704
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865403574
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865404708
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865405191
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865405473
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865447415
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865448032
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865448653
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865450303
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865453293
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865455958
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865457779
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865458817
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865464174
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865465505
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865469704
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865472623
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865475878
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865473660
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865474120
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865478374
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865479782
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865481593
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865483659
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865485213
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865488580
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865489622
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865490285


More information about the hotspot-compiler-dev mailing list