RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6]

Christian Hagedorn chagedorn at openjdk.org
Tue Oct 29 13:59:31 UTC 2024


On Tue, 22 Oct 2024 07:19:54 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Background**
>> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather.
>> 
>> **Details**
>> 
>> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset.
>> 
>> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them.
>> 
>> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing!
>> 
>> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`.
>> 
>> **What this change enables**
>> 
>> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`).
>> 
>> Now we can do:
>> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`.
>> - Merging `Unsafe` stores to native memory.
>> - Merging `MemorySegment`: with array, native, ByteBuffer backing types.
>>   - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better.
>> 
>> **Dealing with Overflows**
>> 
>> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   changes to NoOverflowInt for Dean

src/hotspot/share/opto/memnode.cpp line 2781:

> 2779: private:
> 2780:   PhaseGVN* _phase;
> 2781:   StoreNode* _store;

Was like that before but maybe you can make them `const` with this change:
Suggestion:

  PhaseGVN* const _phase;
  StoreNode* const _store;

src/hotspot/share/opto/memnode.cpp line 2881:

> 2879:   }
> 2880: 
> 2881:   NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] MergePrimitiveStores::run: "); _store->dump(); })

Suggestion:

  NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] MergePrimitiveStores::run: "); _store->dump(); })

src/hotspot/share/opto/memnode.cpp line 2886:

> 2884:   // then that use or a store further down is the "last" store.
> 2885:   Status status_use = find_adjacent_use_store(_store);
> 2886:   NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] expect no use: "); status_use.print_on(tty); })

Suggestion:

  NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] expect no use: "); status_use.print_on(tty); })

src/hotspot/share/opto/memnode.cpp line 2893:

> 2891:   // Check if we can merge with at least one def, so that we have at least 2 stores to merge.
> 2892:   Status status_def = find_adjacent_def_store(_store);
> 2893:   NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] expect def: "); status_def.print_on(tty); })

Suggestion:

  NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] expect def: "); status_def.print_on(tty); })

src/hotspot/share/opto/memnode.cpp line 2907:

> 2905:   StoreNode* merged_store = make_merged_store(merge_list, merged_input_value);
> 2906: 
> 2907:   NOT_PRODUCT( if(is_trace_success()) { trace(merge_list, merged_input_value, merged_store); } )

Suggestion:

  NOT_PRODUCT( if (is_trace_success()) { trace(merge_list, merged_input_value, merged_store); } )

src/hotspot/share/opto/memnode.cpp line 3143:

> 3141:   while (current != nullptr && merge_list.size() < merge_list_max_size) {
> 3142:     Status status = find_adjacent_def_store(current);
> 3143:     NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] find def: "); status.print_on(tty); })

Suggestion:

    NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] find def: "); status.print_on(tty); })

src/hotspot/share/opto/memnode.cpp line 3151:

> 3149:       // We can have at most one RangeCheck.
> 3150:       if (status.found_range_check()) {
> 3151:         NOT_PRODUCT( if(is_trace_basic()) { tty->print_cr("[TraceMergeStores] found RangeCheck, stop traversal."); })

Suggestion:

        NOT_PRODUCT( if (is_trace_basic()) { tty->print_cr("[TraceMergeStores] found RangeCheck, stop traversal."); })

src/hotspot/share/opto/memnode.cpp line 3157:

> 3155:   }
> 3156: 
> 3157:   NOT_PRODUCT( if(is_trace_basic()) { tty->print_cr("[TraceMergeStores] found:"); merge_list.dump(); })

Suggestion:

  NOT_PRODUCT( if (is_trace_basic()) { tty->print_cr("[TraceMergeStores] found:"); merge_list.dump(); })

src/hotspot/share/opto/memnode.cpp line 3164:

> 3162:   while (merge_list.size() > pow2size) { merge_list.pop(); }
> 3163: 
> 3164:   NOT_PRODUCT( if(is_trace_basic()) { tty->print_cr("[TraceMergeStores] truncated:"); merge_list.dump(); })

Suggestion:

  NOT_PRODUCT( if (is_trace_basic()) { tty->print_cr("[TraceMergeStores] truncated:"); merge_list.dump(); })

src/hotspot/share/opto/mempointer.hpp line 49:

> 47: //
> 48: //
> 49: //   Example1: byte array access:

I suggest to add spaces for the example numbers, same below.
Suggestion:

//   Example 1: byte array access:

src/hotspot/share/opto/mempointer.hpp line 64:

> 62: //
> 63: //     pointer =           array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4       * j          + 4       * 3 * j
> 64: //             = 1       * array_base + ARRAY_INT_BASE_OFFSET + 20    + 4       * j          + 12      * j

Suggestion:

//     pointer =           array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4       * i          + 4       * 3 * j
//             = 1       * array_base + ARRAY_INT_BASE_OFFSET + 20    + 4       * i          + 12      * j

src/hotspot/share/opto/mempointer.hpp line 75:

> 73: //     pointer =           array_base + ARRAY_INT_BASE_OFFSET + 4       * i
> 74: //             = 1       * array_base + ARRAY_INT_BASE_OFFSET + 4       * i
> 75: //             = scale_0 * variable_0 + con                   + scale_1 * variable_1

Not sure if it was intentionally or just forgotten to add but I found the dashes useful. Maybe you want to add them here as well as for the other examples:
Suggestion:

//             = 1       * array_base + ARRAY_INT_BASE_OFFSET + 4       * i
//               --------------------   ---------------------   --------------------
//             = scale_0 * variable_0 + con                   + scale_1 * variable_1

src/hotspot/share/opto/mempointer.hpp line 85:

> 83: //     pointer =           address          + 4       * i
> 84: //             = 1       * address    + 0   + 4       * i
> 85: //             = scale_0 * variable_0 + con + scale_1 * variable_1

Same here:
Suggestion:

//             = 1       * address    + 0   + 4       * i
//               --------------------   ---   --------------------
//             = scale_0 * variable_0 + con + scale_1 * variable_1

src/hotspot/share/opto/mempointer.hpp line 110:

> 108: //
> 109: //     pointer = ms.heapBase() +           ms.address() +           i
> 110: //             = 0             + 1       * ms.address() + 1       * i

For variation, do you want to change this to `short` to also have an example with a `scale` other than 1 for `MemorySegment`?

src/hotspot/share/opto/mempointer.hpp line 120:

> 118: //
> 119: //     pointer =           array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4       * j          + 4       * j * k
> 120: //             = 1       * array_base + ARRAY_INT_BASE_OFFSET + 20    + 4       * j          + 4       * j * k

Suggestion:

//     pointer =           array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4       * i          + 4       * j * k
//             = 1       * array_base + ARRAY_INT_BASE_OFFSET + 20    + 4       * i          + 4       * j * k

src/hotspot/share/opto/mempointer.hpp line 143:

> 141: //
> 142: // MemPointerDecomposedForm:
> 143: //   When the pointer is parsed, it is decomposed into sum of summands plus a constant:

Suggestion:

//   When the pointer is parsed, it is decomposed into a sum of summands plus a constant:

src/hotspot/share/opto/mempointer.hpp line 153:

> 151: //   Hence, the full decomposed form is:
> 152: //
> 153: //     pointer = sum_i(scale_i * variable_i) + con

`sum_i` is a little bit confusing as it suggests to be another variable. What about the following?

Suggestion:

//   When the pointer is parsed, it is decomposed into a SUM of summands plus a constant:
//
//     pointer = SUM(summands) + con
//
//   Where each summand_i in summands has the form:
//
//     summand_i = scale_i * variable_i
//
//   Hence, the full decomposed form is:
//
//     pointer = SUM(scale_i * variable_i) + con

src/hotspot/share/opto/mempointer.hpp line 157:

> 155: //   Note: the scale_i are compile-time constants (NoOverflowInt), and the variable_i are
> 156: //         compile-time variables (C2 nodes).
> 157: //   On 64bit systems, this decomposed form is computed with long-add/mul, on 32bit systems

Suggestion:

//   On 64-bit systems, this decomposed form is computed with long-add/mul, on 32-bit systems

src/hotspot/share/opto/mempointer.hpp line 165:

> 163: //
> 164: //     pointer1 = sum(summands) + con1
> 165: //     pointer2 = sum(summands) + con2

To match my comment above:
Suggestion:

//     pointer1 = SUM(summands) + con1
//     pointer2 = SUM(summands) + con2

src/hotspot/share/opto/mempointer.hpp line 185:

> 183: //
> 184: //     At first, computing aliasing is difficult because the distance is hidden inside the
> 185: //     ConvI2L. we can convert this (with array_int_base_offset = 16) into these decomposed forms:

As discussed offline:
Suggestion:

//     At first, computing the aliasing is not immediately straight-forward in the general case because 
//     the distance is hidden inside the ConvI2L. We can convert this (with array_int_base_offset = 16) 
//     into these decomposed forms:

src/hotspot/share/opto/mempointer.hpp line 200:

> 198: // -----------------------------------------------------------------------------------------
> 199: //
> 200: //   We have to be careful on 64bit systems with ConvI2L: decomposing its input is not

Suggestion:

//   We have to be careful on 64-bit systems with ConvI2L: decomposing its input is not

src/hotspot/share/opto/mempointer.hpp line 225:

> 223: //    Resulting in:      +-------------------------+
> 224: //      mp_{i+1} = con + dec_con + sum(dec_summands) + sum(other_summands)
> 225: //               = new_con + sum(new_summands)

Suggestion:

//      mp_i     = con + summand                     + SUM(other_summands)
//    Resulting in:      +-------------------------+
//      mp_{i+1} = con + dec_con + SUM(dec_summands) + SUM(other_summands)
//               = new_con + SUM(new_summands)

src/hotspot/share/opto/mempointer.hpp line 250:

> 248: //      S3) All summands of mp1 and mp2 are identical.
> 249: //
> 250: //    Then the ponter difference between p1 and p2 is identical to the difference between

Suggestion:

//    Then the pointer difference between p1 and p2 is identical to the difference between

src/hotspot/share/opto/mempointer.hpp line 267:

> 265: //    Case 1: only decompositions of type (SAFE1) were used:
> 266: //      We make an induction proof over the decompositions from p1 to mp1, starting with
> 267: //      the trivial decompoisition:

Suggestion:

//      the trivial decompoisition:

Suggestion:

//      the trivial decomposition:

src/hotspot/share/opto/mempointer.hpp line 302:

> 300: //
> 301: //      And hence, there must be an x, such that:
> 302: //        p1 - p2 = mp1 - mp2 + x * array_element_size_in_bytes * 2^32

Maybe for completeness:
Suggestion:

//      And hence, there must be an x, such that:
//        p1 - p2 = mp1 - mp2 + x * array_element_size_in_bytes * 2^32
//      where
//        x = x1 - x2

src/hotspot/share/opto/mempointer.hpp line 314:

> 312: //                                                               -- apply S2 and S3 --
> 313: //                     >  array_element_size_in_bytes * 2^32          - 2^31
> 314: //                     >= array_element_size_in_bytes * 2^31

Should be obvious, be for completeness, maybe add:
Suggestion:

//                     >  array_element_size_in_bytes * 2^32          - 2^31
//                        -- apply array_element_size_in_bytes > 0 --
//                     >= array_element_size_in_bytes * 2^31

src/hotspot/share/opto/mempointer.hpp line 365:

> 363:   {
> 364:     const jint max_distance = 1 << 30;
> 365:     assert(_distance < max_distance && _distance > -max_distance, "safe distance");

The variable name "max_distance" suggests that the assert should use `>=` and `<=`. Would that still be correct? Maybe you should add a comment about the max distance and why it has this value.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819274089
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820457663
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820457902
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820458057
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820458329
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820459729
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820459884
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820460140
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820460278
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819047762
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819046083
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819052480
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819052958
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819061173
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819062236
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819063570
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819088073
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819089446
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819091390
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819125396
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819109444
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819131087
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819142055
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819152480
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819206340
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819218044
PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819224524


More information about the hotspot-compiler-dev mailing list