RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer
Emanuel Peter
epeter at openjdk.org
Mon Dec 2 13:42:06 UTC 2024
On Wed, 6 Nov 2024 13:07:13 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old!
>
> **Goal**
>
> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`).
>
> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
>
> **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg .
>
> **Details**
>
> This looks like a rather big patch, so let me explain the parts.
> - Refactor of `MemPointer` in `mepointer.hpp/cpp`:
> - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`.
> - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before.
> - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`.
> - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split.
> - Re-write of `VPointer` based on `MemPointer`:
> - Old pattern:
> - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]`
> - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the addition of `11 Param`, `1819 Phi` and `629 LoadL`.
> - New pattern:
> - `VPointer[size: 4, object, base(110 CastPP) + con( 16)...
src/hotspot/share/opto/mempointer.cpp line 188:
> 186: }
> 187: // Fall-through: we can find a more precise native-memory "base". We further decompose
> 188: // the CastX2P to find this "base" and any other offsets from it.
Note: special handling for new `Base` detection, required for `VPointer`.
src/hotspot/share/opto/mempointer.cpp line 206:
> 204: callback.callback(n);
> 205: return;
> 206: }
Note: indentation fix
src/hotspot/share/opto/mempointer.cpp line 458:
> 456: // (S2) holds. If all summands are the same, also the base must be the same.
> 457: has_same_base = true;
> 458: } else {
Note: refactored into `has_same_summands_as` (-> `make_always_at_distance`), and added new case `has_different_object_base_but_otherwise_same_summands_as` (-> `make_not_or_at_distance`). This is necessary for the overlap query needed in `VPointer`.
src/hotspot/share/opto/mempointer.cpp line 541:
> 539: bool MemPointer::is_adjacent_to_and_before(const MemPointer& other) const {
> 540: const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT( COMMA _trace ));
> 541: const bool is_adjacent = aliasing.is_always_at_distance(_size);
Note: collapsed `MemPointerDecomposedForm` and `MemPointer` into one.
src/hotspot/share/opto/mempointer.cpp line 582:
> 580: return is_never_overlap;
> 581: }
> 582:
Note: replaces the old `VPointer::not_equal`, which was used for overlap queries.
src/hotspot/share/opto/mempointer.hpp line 174:
> 172: // and determine if they are adjacent.
> 173: //
> 174: // MemPointerDecomposedFormParser:
Note: moved down
src/hotspot/share/opto/mempointer.hpp line 209:
> 207: //
> 208: // then we can easily compute the distance between the pointers (distance = con2 - con1),
> 209: // and determine if they are adjacent.
Note: moved down from higher up.
src/hotspot/share/opto/mempointer.hpp line 244:
> 242: // (2) LoadL from field jdk.internal.foreign.NativeMemorySegmentImpl.min
> 243: // This would be preferrable over CastX2P, because it holds the address() of a native
> 244: // MemorySegment, i.e. we know it points to the beginning of that MemorySegment.
Note: description about new `Base` detection needed in `VPointer`.
src/hotspot/share/opto/mempointer.hpp line 430:
> 428: // e.g. "array1[i] vs array2[i+4]":
> 429: // if "array1 == array2": distance = 4.
> 430: // if "array1 != array2": different memory objects.
Note: added `NotOrAtDistance`, needed for overlap query.
src/hotspot/share/opto/mempointer.hpp line 521:
> 519: if (cmp != 0) { return cmp; }
> 520:
> 521: return NoOverflowInt::cmp(p1.scale(), p2.scale());
Note: `cmp` is used for sorting.
src/hotspot/share/opto/mempointer.hpp line 559:
> 557: // Singleton for default arguments.
> 558: static MemPointerParserCallback& empty() { return _empty; }
> 559: };
Note: used for `VPointer`. In `SuperWord::unrolling_analysis` to track all the "ignore" nodes, which are the internal nodes in the address expressions.
src/hotspot/share/opto/mempointer.hpp line 694:
> 692: _summands[i] = old.summands_at(i);
> 693: }
> 694: }
Note: mutated copy constructor: used for `make_with_size` and `make_with_con`, i.e. modifying the size (scalar -> vector), and modifying the `con`, so that we can shift a `VPointer` and simulate the `VPointer` in a next iteration.
src/hotspot/share/opto/mempointer.hpp line 734:
> 732: bool has_same_summands_as(const MemPointer& other, uint start) const;
> 733: bool has_same_summands_as(const MemPointer& other) const { return has_same_summands_as(other, 0); }
> 734: bool has_different_object_base_but_otherwise_same_summands_as(const MemPointer& other) const;
Note: refactored from `get_aliasing_with` and adding new case `has_different_object_base_but_otherwise_same_summands_as` for overlap query.
src/hotspot/share/opto/mempointer.hpp line 776:
> 774:
> 775: bool is_adjacent_to_and_before(const MemPointer& other) const;
> 776: bool never_overlaps_with(const MemPointer& other) const;
Note: these are the two basic queries: `adjacent` and `overlap`.
src/hotspot/share/opto/noOverflowInt.hpp line 113:
> 111: if (a.value() > b.value()) { return 1; }
> 112: return 0;
> 113: }
Note: needed to compare `MemPointerSummand` (when variable equal), used by `MemPointer` compare, used to sort `VPointer`s.
src/hotspot/share/opto/superword.cpp line 97:
> 95: }
> 96: };
> 97:
Note: refactoring: instead of passing in the old `ignored_loop_nodes` to the `VPointer` during parsing, we pass in a `callback`, and track the nodes this way.
src/hotspot/share/opto/superword.cpp line 2880:
> 2878: // The base is only aligned with ObjectAlignmentInBytes with arrays.
> 2879: // When the base() is top, we have no alignment guarantee at all.
> 2880: // Hence, we must now take the base into account for the calculation.
Note: `align_to_ref_p.base()->is_top()` meant it was a native memory access, i.e. off-heap, and not related to any java object.
src/hotspot/share/opto/superword.hpp line 587:
> 585: static int cmp_by_group(MemOp* a, MemOp* b);
> 586: static int cmp_by_group_and_con_and_original_index(MemOp* a, MemOp* b);
> 587: };
Note: we used to put `VPointer` directly in a `GrowableArray` to sort them. Now that the `VPointer` does not have a reference to the `MemNode*`, we have to create a "tuple" of `(VPointer, mem)`. The `original_index` is needed to keep the sort stable in `qsort`. Before this was done with the `mem->_idx`, which is a bit less well behaved.
src/hotspot/share/opto/superwordVTransformBuilder.cpp line 148:
> 146: const VPointer& scalar_p = _vloop_analyzer.vpointers().vpointer(p0->as_Store());
> 147: const VPointer vector_p(scalar_p.make_with_size(scalar_p.size() * pack_size));
> 148: vtn = new (_vtransform.arena()) VTransformStoreVectorNode(_vtransform, pack_size, vector_p);
Note: we now pass in the `VPointer` of the size of the vector directly. Before, we referenced the `VPointer`s of the packed nodes. But that reliance will be in the way in the future, and it is more clean to carry the modified `VPointer` like this.
src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 35:
> 33: flags(POINTER_ALIASING, "Trace VPointer/MemPointer aliasing") \
> 34: flags(POINTER_ADJACENCY, "Trace VPointer/MemPointer adjacency") \
> 35: flags(POINTER_OVERLAP, "Trace VPointer/MemPointer overlap") \
Note: naming is now more consistent with `TraceMergeStore`.
src/hotspot/share/opto/traceMergeStoresTag.hpp line 36:
> 34: flags(POINTER_PARSING, "Trace pointer IR") \
> 35: flags(POINTER_ALIASING, "Trace MemPointerSimpleForm::get_aliasing_with") \
> 36: flags(POINTER_ADJACENCY, "Trace adjacency") \
Note: naming is now more consistent with `TraceAutoVectorization`.
src/hotspot/share/opto/vectorization.hpp line 207:
> 205: // Is it before the pre-loop?
> 206: return phase()->is_dominator(ctrl, pre_loop_head());
> 207: }
Note: replaces the old `VPointer::invariant`
src/hotspot/share/opto/vectorization.hpp line 737:
> 735: Node_Stack* _nstack; // stack used to record a vpointer trace of variants
> 736: bool _analyze_only; // Used in loop unrolling only for vpointer trace
> 737: uint _stack_idx; // Used in loop unrolling only for vpointer trace
Note: this is now replaced with the `MemPointerParserCallback`.
src/hotspot/share/opto/vectorization.hpp line 836:
> 834: jlong max_diff = (jlong)1 << 31;
> 835: if (difference >= max_diff) {
> 836: return NotComparable;
Note: this used to be the basis of `adjacency` and `offset` queries. Now we do this in `MemPointer`, and with better naming ;)
src/hotspot/share/opto/vectorization.hpp line 854:
> 852: // the two memory regions.
> 853: if (!not_equal(p_mem)) {
> 854: return true;
Note: refactored away. We used to pass in all nodes in a pack/vector, find all their `VPointer`. Now the vector has its own size-adjusted `VPointer`, and we can simply do a `never_overlaps_with` query directly on that.
src/hotspot/share/opto/vectorization.hpp line 940:
> 938: tty->print_cr("VPointer::init_is_valid: scale or stride too large.");
> 939: }
> 940: #endif
Note: this was already present in `VPointer::VPointer` before.
src/hotspot/share/opto/vtransform.cpp line 352:
> 350: // care too much about those edge cases.
> 351: memory_regions.push(new VMemoryRegion(iv_offset_p, is_load, schedule_order++));
> 352: }
Note: instead of doing the costum logic with `iv_offset` all in `VMemoryRegion`, we now create a modified copy of the `VPointer` that simulates the `VPointer` at `iv = iv + iv_offset`.
src/hotspot/share/opto/vtransform.cpp line 579:
> 577: mem = mem->in(MemNode::Memory);
> 578: } else {
> 579: break;
Note: We now have access to the size-adjusted `VPointer` for the vector.
src/hotspot/share/opto/vtransform.hpp line 488:
> 486: class VTransformMemVectorNode : public VTransformVectorNode {
> 487: private:
> 488: const VPointer _vpointer; // with size of the vector
Note: make sure we can store the size-adjusted `VPointer` for load/store vectors.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865401201
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865400704
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865403574
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865404708
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865405191
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865405473
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865447415
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865448032
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865448653
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865450303
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865453293
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865455958
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865457779
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865458817
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865464174
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865465505
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865469704
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865472623
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865475878
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865473660
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865474120
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865478374
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865479782
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865481593
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865483659
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865485213
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865488580
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865489622
PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865490285
More information about the hotspot-compiler-dev
mailing list