RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check
Manuel Hässig
mhaessig at openjdk.org
Mon Jul 28 13:40:05 UTC 2025
On Thu, 27 Mar 2025 13:00:20 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs.
>
> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016:
> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate.
> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization.
>
> --------------------------
>
> **Where to start reviewing**
>
> - `src/hotspot/share/opto/mempointer.hpp`:
> - Read the class comment for `MemPointerRawSummand`.
> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks.
>
> - `src/hotspot/share/opto/vectorization.cpp`:
> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works.
>
> - `src/hotspot/share/opto/vtransform.hpp`:
> - Understand the difference between weak and strong edges.
>
> If you need to see some examples, then look at the tests:
> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning.
> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases.
> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases.
> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments).
> --------------------------
>
> **Details**
>
> Most fundamentally:
> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s.
> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`.
> - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)`
> - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)`
> - For aliasing analysis (adjacency and overlap), the "regu...
Thank you, @eme64, for this good work! I left some comments below.
src/hotspot/share/opto/mempointer.cpp line 732:
> 730: // -> Unknown if overlap at runtime -> return false
> 731: bool MemPointer::always_overlaps_with(const MemPointer& other) const {
> 732: const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT( COMMA _trace ));
Suggestion:
const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT(COMMA _trace));
Nit: You used this without spaces already above.
src/hotspot/share/opto/mempointer.hpp line 411:
> 409: // Both p and mp have a linear form for v in r:
> 410: // p(v) = p(lo) - lo * scale_v + iv * scale_v (Corrolary P)
> 411: // mp(v) = mp(lo) - lo * scale_v + iv * scale_v (Corrolary MP)
Where does `iv`come from? Is `v==iv`?
src/hotspot/share/opto/mempointer.hpp line 444:
> 442: // = summand_rest + scale_v * (v0 + stride_v) + con
> 443: // = summand_rest + scale_v * v0 + scale_v * stride_v * con
> 444: // = summand_rest + scale_v * v0 + scale_v * stride_v * con
Suggestion:
// = summand_rest + scale_v * v0 + scale_v * stride_v + con
// = summand_rest + scale_v * v0 + scale_v * stride_v + con
These ought to be plusses.
src/hotspot/share/opto/mempointer.hpp line 663:
> 661: };
> 662:
> 663: // The MemPointerSummand is designed to allow the simplification of
Shouldn't this be `MemPointerRawSummand`?
src/hotspot/share/opto/mempointer.hpp line 706:
> 704: // Note: we also need to track constants as separate raw summands. For
> 705: // this, we say that a raw summand tracks a constant iff _variable == null,
> 706: // and we store the constant value in _scaleI.
This contradicts the `con2` example above.
src/hotspot/share/opto/mempointer.hpp line 731:
> 729: }
> 730:
> 731: bool is_valid() const { return _int_group >= 0; }
Why is _int_group not a `uint` if it is always positive or 0?
src/hotspot/share/opto/superword.cpp line 836:
> 834:
> 835: // If we cannot speculate (aliasing analysis runtime checks), we need to respect all edges.
> 836: bool with_weak_memory_edges = !_vloop.use_speculative_aliasing_checks();
Edges that always have to be respected are strong edges. So, if we cannot speculate, we only have strong edges. With this comment and understanding, I would write the expression as
bool with_weak_memory_edges = _vloop.use_speculative_aliasing_checks();
or
bool with_strong_memory_edges = !_vloop.use_speculative_aliasing_checks();
src/hotspot/share/opto/superword.cpp line 878:
> 876:
> 877: // If we cannot speculate (aliasing analysis runtime checks), we need to respect all edges.
> 878: bool with_weak_memory_edges = !_vloop.use_speculative_aliasing_checks();
Same as above.
src/hotspot/share/opto/vectorization.hpp line 240:
> 238: }
> 239:
> 240: // But in some cases, we ctrl of n is between the pre and
Suggestion:
// But in some cases, the ctrl of n is between the pre and
Nit: spelling
src/hotspot/share/opto/vtransform.hpp line 286:
> 284: // dependency chain. Instead, we model the memory edges between all memory nodes, which
> 285: // could be quadratic in the worst case. For vectorization, we must essencially reorder the
> 286: // instructions in the graph. For this we must model all memory dependencies.
Suggestion:
// The C2 IR Node memory edges essentially define a linear order of all memory operations
// (only Loads with the same memory input can be executed in an arbitrary order). This is
// efficient, because it means every Load and Store has exactly one input memory edge,
// which keeps the memory edge count linear. This is approach is too restrictive for
// vectorization, for example, we could never vectorize stores, since they are all in a
// dependency chain. Instead, we model the memory edges between all memory nodes, which
// could be quadratic in the worst case. For vectorization, we must essentially reorder the
// instructions in the graph. For this we must model all memory dependencies.
Spelling
test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java line 176:
> 174: long t0 = System.nanoTime();
> 175: // Add a java source file.
> 176: comp.addJavaSourceCode("p.xyz.InnerTest", generate(comp));
Nit: perhaps a package related to the test might be nicer in the logs. Like `compiler.loopopts.superword.templated.AliasingFuzzer`
test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java line 270:
> 268: //
> 269: // The idea is that invarRest is always close to zero, with some small range [-err .. err].
> 270: // The invar variables for invarRest must be in the range [-1, 1, 1], so that we can
Suggestion:
// The invar variables for invarRest must be in the range [-1, 0, 1], so that we can
-------------
Changes requested by mhaessig (Committer).
PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3061496983
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2235976063
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2235538590
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2235767065
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2235785157
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2235881210
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2235887862
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2236162728
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2236175460
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2236187788
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2236082499
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2236409739
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2236425351
More information about the hotspot-compiler-dev
mailing list