RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16]
Christian Hagedorn
chagedorn at openjdk.org
Mon Nov 4 10:17:45 UTC 2024
On Fri, 1 Nov 2024 14:49:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> **Background**
>> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather.
>>
>> **Details**
>>
>> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset.
>>
>> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them.
>>
>> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing!
>>
>> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`.
>>
>> **What this change enables**
>>
>> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`).
>>
>> Now we can do:
>> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`.
>> - Merging `Unsafe` stores to native memory.
>> - Merging `MemorySegment`: with array, native, ByteBuffer backing types.
>> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better.
>>
>> **Dealing with Overflows**
>>
>> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> more review applications
Some final last mostly minor comments but otherwise, it looks good to me now! I like the summaries and how you worked out the proofs. They are now easy to understand.
src/hotspot/share/opto/memnode.cpp line 2943:
> 2941: return false;
> 2942: }
> 2943: return true;
You could directly return (I cannot create a code suggestion as it says "Applying suggestions on deleted lines is not supported"):
return pointer_def.is_adjacent_to_and_before(pointer_use);
src/hotspot/share/opto/mempointer.cpp line 45:
> 43: while (_worklist.is_nonempty()) {
> 44: // Bail out if the graph is too complex.
> 45: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); }
Might be easier to read/understand when we also have `MemPointerDecomposedForm::make_trivial(pointer)` method. What do you think? Then `MemPointerDecomposedForm(pointer)` can also be made private.
src/hotspot/share/opto/mempointer.cpp line 199:
> 197: #else
> 198:
> 199: switch(opc) {
Suggestion:
switch (opc) {
src/hotspot/share/opto/mempointer.cpp line 265:
> 263: //
> 264: // Thus, for AddI and SubI, we get:
> 265: // summand = new_summand1 + new_summand2 + scale * y * 2^32
Took me a moment to understand `new_summands`. Maybe we can give a hint like that?
Suggestion:
// scale * ConvI2L(a << con) = scale * (1 << con) * ConvI2L(a) + scale * y * 2^32
// _______________________/ _____________________________________/ ______________/
// before decomposition after decomposition ("new_summands") overflow correction
//
// Thus, for AddI and SubI, we get:
// summand = new_summand1 + new_summand2 + scale * y * 2^32
src/hotspot/share/opto/mempointer.cpp line 283:
> 281: // z * array_element_size_in_bytes = scale
> 282: //
> 283: // And hence, with "x = y * z":
Maybe add here:
Suggestion:
// And hence, with "x = y * z", the decomposition is (SAFE2) under assumed condition:
src/hotspot/share/opto/mempointer.cpp line 318:
> 316: #endif
> 317:
> 318: // "MemPointer Lemma" condition S2: check if all summands are the same:
Suggestion:
// "MemPointer Lemma" condition (S2): check if all summands are the same:
src/hotspot/share/opto/mempointer.cpp line 332:
> 330: }
> 331:
> 332: // "MemPointer Lemma" condition S3: check that the constants do not differ too much:
Suggestion:
// "MemPointer Lemma" condition (S3): check that the constants do not differ too much:
src/hotspot/share/opto/mempointer.cpp line 347:
> 345: }
> 346:
> 347: // "MemPointer Lemma" condition S1:
Suggestion:
// "MemPointer Lemma" condition (S1):
src/hotspot/share/opto/mempointer.cpp line 352:
> 350: // bounds of that same memory object.
> 351:
> 352: // Hence, all 3 conditions of the "MemoryPointer Lemma" are established, and hence
Since we also have added `(S0)` recently, we might need to add a word here about it and then update this to "all 4 conditions".
src/hotspot/share/opto/mempointer.cpp line 382:
> 380: return is_adjacent;
> 381: }
> 382:
Two new lines:
Suggestion:
src/hotspot/share/opto/mempointer.hpp line 46:
> 44: // compile-time variables (C2 nodes).
> 45: //
> 46: // For the MemPointer, we do not explicitly track base address. For Java heap pointers, the
Suggestion:
// For the MemPointer, we do not explicitly track the base address. For Java heap pointers, the
src/hotspot/share/opto/mempointer.hpp line 232:
> 230: // We decompose summand in:
> 231: // mp_i = con + summand + SUM(other_summands)
> 232: // Resulting in: +-------------------------+
Suggestion:
// resulting in: +-------------------------+
src/hotspot/share/opto/mempointer.hpp line 258:
> 256: // (S3) All summands of mp1 and mp2 are identical (i.e. only the constants are possibly different).
> 257: //
> 258: // Then the pointer difference between p1 and p2 is identical to the difference between
Suggestion:
// then the pointer difference between p1 and p2 is identical to the difference between
src/hotspot/share/opto/mempointer.hpp line 332:
> 330: // -- apply x != 0 --
> 331: // >= array_element_size_in_bytes * 2^32 - abs(mp1 - mp2)
> 332: // -- apply (S3) --
Suggestion:
// -- apply (S3) --
src/hotspot/share/opto/mempointer.hpp line 334:
> 332: // -- apply (S3) --
> 333: // = array_element_size_in_bytes * 2^32 - abs(mp1.con - mp2.con)
> 334: // -- apply (S2) --
Suggestion:
// -- apply (S2) --
test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 114:
> 112: */
> 113:
> 114: // FAILS: mixed providers currently do not merge stores. Maybe there is some inlining issue.
Is there a tracking bug/RFE to make this work?
test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 196:
> 194: Map<String, TestFunction> tests = new HashMap<>();
> 195:
> 196: // List of gold, the results from the first run before compilation
Sounds funny :-) Maybe:
Suggestion:
// List of golden values, the results from the first run before compilation
test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 375:
> 373: applyIf = {"UseUnalignedAccesses", "true"})
> 374: static Object[] test_xxx(MemorySegment a, int xI, int yI, int zI) {
> 375: // All RangeChecks remain -> RC smearing not good enough?
Is there a tracking bug to further investigate at some point?
-------------
PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2412305011
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827276306
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827447237
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827412672
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827463435
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827469254
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827482695
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827483621
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827484278
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827487031
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827488991
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827338243
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827381411
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827382642
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827390691
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827390901
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827300978
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827302506
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827310975
More information about the hotspot-compiler-dev
mailing list