RFR: 8325252: C2 SuperWord: refactor the packset

Wed Mar 20 15:08:55 UTC 2024

On Wed, 13 Mar 2024 14:25:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm refactoring the packset, separating the details of packset-manupulation from the SuperWord algorithm.
> 
> Most importantly: I split it into two classes: `PairSet` and `PackSet`.
> `combine_pairs_to_longer_packs` converts the first into the second.
> 
> I was able to simplify the combining, and remove the pack-sorting.
> I now walk "pair-chains" directly with `PairSetIterator`. One such pair-chain is equivalent to a pack.
> 
> I moved all the `filter / split` functionality to the `PackSet`, which allows hiding a lot of packset-manipulation from the SuperWord algorithm.
> 
> I ran into some issues when I was extending the pairset in `extend_pairset_with_more_pairs_by_following_use_and_def`:
> Using the PairSetIterator changed the order of extension, and that messed with the packing heuristic, and quite a few examples did not vectorize, because we would pack up the wrong 2 nodes out of a choice of 4 (e.g. we would pack `ac bd` instead of `ab cd`). Hence, I now still have to keep the insertion order for the pairs, and this basically means we are extending with a BFS order. Maybe this issue can be removed, if I improve the packing heuristic with some look-ahead expansion approach (but that is for another day [JDK-8309908](https://bugs.openjdk.org/browse/JDK-8309908)).
> 
> But since I already spent some time on some of the packing heuristic (reordering and cost estimate), I did a light refactoring, and added extra tests for MulAddS2I.
> 
> More details are described in the annotations in the code.

src/hotspot/share/opto/superword.cpp line 49:

> 47:   _clone_map(phase()->C->clone_map()),                      // map of nodes created in cloning
> 48:   _align_to_ref(nullptr),                                   // memory reference to align vectors to
> 49:   _race_possible(false),                                    // cases where SDMU is true

Note: removed it. Will explain where it was used.

src/hotspot/share/opto/superword.cpp line 471:

> 469:   combine_pairs_to_longer_packs();
> 470: 
> 471:   construct_my_pack_map();

Note: `my_pack` is now `packset.pack`, and the map is directly created in `packset.add_pack` in `combine_pairs_to_longer_packs` above.

src/hotspot/share/opto/superword.cpp line 873:

> 871:   return false;
> 872: }
> 873: 

Note: this was rather inefficient. Now we use `packset.has_left(n)` or `packset.has_right(n)`, which are simple constant time lookups.

src/hotspot/share/opto/superword.cpp line 1056:

> 1054:   bool changed;
> 1055:   do {
> 1056:     packset_sort(_packset.length());

Note: sorting used to be necessary, because the old `combine_pairs_to_longer_packs` required the pairs to all be ordered by `alignment`, such that `(s1, s2)` always comes before `(s2, s3)`. With the new combination method, sorting is no longer necessary.

src/hotspot/share/opto/superword.cpp line 1058:

> 1056:     Node* s1 = pair.left();
> 1057:     Node* s2 = pair.right();
> 1058:     order_inputs_of_all_use_pairs_to_match_def_pair(s1, s2);

Note: `_race_possible` basically is on exactly iff there was some pair that had multiple uses, for which we needed to do `order_def_uses` (now called `order_inputs_of_all_use_pairs_to_match_def_pair`). I think that this happens rather often, and it is not worth having some global flag for this. I now just always enable it.

src/hotspot/share/opto/superword.cpp line 1208:

> 1206:       break;
> 1207:     }
> 1208:     Node* use2 = _pairset.get_right_for(use1);

Note: another case where we used to search through all packs, and now can simply do a constant-time lookup.

src/hotspot/share/opto/superword.cpp line 1208:

> 1206:     // Find pair (use1, use2)
> 1207:     Node* use2 = _pairset.get_right_or_null_for(use1);
> 1208:     if (use2 == nullptr) { break; }

Note: instead of searching the whole packset, we now have a constant-time lookup to find the pair.

src/hotspot/share/opto/superword.cpp line 1210:

> 1208:     if (use2 == nullptr) { break; }
> 1209: 
> 1210:     order_inputs_of_uses_to_match_def_pair(def1, def2, use1, use2);

Note: renamed from `opnd_positions_match`

src/hotspot/share/opto/superword.cpp line 1253:

> 1251:   // 1. Reduction
> 1252:   if (is_marked_reduction(use1) && is_marked_reduction(use2)) {
> 1253:     Node* first = use1->in(2);

Note: while this refactoring here was not strictly necessary, I still took the time to improve the comments a bit. I needed to understand this anyway for the `_race_possible` removal.

src/hotspot/share/opto/superword.cpp line 1330:

> 1328:     // Check if we have a pair (use1, use2)
> 1329:     if (!_pairset.has_left(use1)) { continue; }
> 1330:     Node* use2 = _pairset.get_right_for(use1);

Note: another constant-time lookup instead of iterating all packs/pairs.

src/hotspot/share/opto/superword.cpp line 1340:

> 1338:         ct++;
> 1339:         if (are_adjacent_refs(use1, use2)) {
> 1340:           save_use += adjacent_profit(use1, use2);

Note: we can find the pair directly, no need to search the packset.

src/hotspot/share/opto/superword.cpp line 1358:

> 1356: int SuperWord::adjacent_profit(Node* s1, Node* s2) { return 2; }
> 1357: int SuperWord::pack_cost(int ct)   { return ct; }
> 1358: int SuperWord::unpack_cost(int ct) { return ct; }

Note: I am now using lambda methods.

src/hotspot/share/opto/superword.cpp line 1369:

> 1367:         left = right;
> 1368:       }
> 1369:       _packset.add_pack(pack);

Note: I replaced a quadratic loop, which basically checked all-with-all pairs, if they can be combined.
An additional benefit: we don't need the packs sorted (this also removes the need to have all nodes annotated with alignment, but that will be more useful in a future RFE).

src/hotspot/share/opto/superword.cpp line 1393:

> 1391: 
> 1392:   // Remove all nullptr from packset
> 1393:   compress_packset();

Note: the old method left `nullptr` in the packset, the new method does not.

src/hotspot/share/opto/superword.cpp line 1587:

> 1585:     }
> 1586:   };
> 1587:   split_packs(filter_name, split_strategy);

Note: I can get rid of a lot of code by just implementing `filter` with a `split`.

src/hotspot/share/opto/superword.cpp line 1750:

> 1748:   };
> 1749:   _packset.filter_packs("Superword::filter_packs_for_profitable",
> 1750:                         "not profitable", filter);

Note: now that I use `split_packs` to implement `filter_packs`, the repetitions are already taken care of (i.e. repeating while there are changes).

src/hotspot/share/opto/superword.cpp line 1774:

> 1772:   }
> 1773:   _packset.trunc_to(j);
> 1774: }

Note: last use was removed.

src/hotspot/share/opto/superword.cpp line 1795:

> 1793:     }
> 1794:   }
> 1795: }

Note: now done for each pack directly inside `combine_pairs_to_longer_packs` with `packset.add_pack`.

src/hotspot/share/opto/superword.cpp line 1970:

> 1968:       _packset.print_pack(pack);
> 1969:       assert(false, "pack not profitable");
> 1970:     }

Just added the `implemented` and `profitable` checks for good measure.

src/hotspot/share/opto/superword.cpp line 3441:

> 3439:   }
> 3440:   return false;
> 3441: }

Note: replaced with `pairset.has_pair`. Used to be search over packset, now constant time lookup.

src/hotspot/share/opto/superword.cpp line 3452:

> 3450:   }
> 3451:   _packset.at_put(pos, nullptr);
> 3452: }

Note: last use was removed.

src/hotspot/share/opto/superword.cpp line 3471:

> 3469:     n = max_swap_index;
> 3470:   } while (n > 1);
> 3471: }

Note: sort not needed any more with new combine method, see explanation above.

src/hotspot/share/opto/superword.cpp line 3734:

> 3732:     }
> 3733:   }
> 3734: }

Note: walking the chains allows us to nicely display the pairs that belong together without need for sort.
Printing all pairs individually was extremely verbose, and it was hard to figure out what pairs were later chained.
Printing pack-chains now looks like printing packsets, and the transition is smoother that way.

src/hotspot/share/opto/superword.cpp line 3868:

> 3866: #endif
> 3867: }
> 3868: 

Note: this used to display the `alignment` next to all nodes in the packs. I plan to remove `alignment` for all nodes soon, and since I now am changing the printing code, I now print the position in the pack instead.

src/hotspot/share/opto/superword.hpp line 62:

> 60: // The PairSet is a set of pairs. These are later combined to packs,
> 61: // and stored in the PackSet.
> 62: class PairSet : public StackObj {

Note: new class.

src/hotspot/share/opto/superword.hpp line 67:

> 65:  public:
> 66:   int         _alignment; // memory alignment for a node
> 67:   Node_List*  _my_pack;   // pack containing this node

Note: is now in `PackSet`.

src/hotspot/share/opto/superword.hpp line 72:

> 70: 
> 71:   // List of all left elements bb_idx, in the order of pair addition.
> 72:   GrowableArray<int> _lefts_in_insertion_order;

Note: It turns out that the pair-extension is quite sensitive to the order in which pairs are found. Keeping the insertion order does something similar to a BFS. If this order is changed, then some of our examples don't pack properly. Well, they pack correctly on a local level, but then that prevents the proper packing further on.
Maybe I can remove the need for this order in the future, but for now I'll keep the order, the same order as the `_packset`'s pairs had.

src/hotspot/share/opto/superword.hpp line 106:

> 104: };
> 105: 
> 106: class PairSetIterator : public StackObj {

Note: new class. used to easily iterate over `PairSet`.

src/hotspot/share/opto/superword.hpp line 137:

> 135: };
> 136: 
> 137: class SplitTask {

Note: moved it out of SuperWord. No changes made.

src/hotspot/share/opto/superword.hpp line 182:

> 180: };
> 181: 
> 182: class SplitStatus {

Note: moved it out of SuperWord. No changes made.

src/hotspot/share/opto/superword.hpp line 224:

> 222:   const GrowableArray<Node_List*>& packset() const { return _packset; }
> 223:  private:
> 224:   bool           _race_possible;   // In cases where SDMU is true

Note: removed it. Explanation elsewhere.

src/hotspot/share/opto/superword.hpp line 226:

> 224: };
> 225: 
> 226: class PackSet : public StackObj {

Note: new class.

src/hotspot/share/opto/superword.hpp line 298:

> 296: 
> 297:   // Combine packs A and B with A.last == B.first into A.first..,A.last,B.second,..B.last
> 298:   void combine_pairs_to_longer_packs();

Note: moved and renamed some.

src/hotspot/share/opto/superword.hpp line 300:

> 298:   void combine_pairs_to_longer_packs();
> 299: 
> 300:   class SplitTask {

Note: class moved out of SuperWord.

src/hotspot/share/opto/superword.hpp line 345:

> 343:   };
> 344: 
> 345:   class SplitStatus {

Note: class moved out of SuperWord.

src/hotspot/share/opto/superword.hpp line 391:

> 389:   SplitStatus split_pack(const char* split_name, Node_List* pack, SplitTask task);
> 390:   template <typename SplitStrategy>
> 391:   void split_packs(const char* split_name, SplitStrategy strategy);

Note: moved to `PackSet`

src/hotspot/share/opto/superword.hpp line 401:

> 399:   void filter_packs(const char* filter_name,
> 400:                     const char* error_message,
> 401:                     FilterPredicate filter);

Note: moved to `PackSet`

src/hotspot/share/opto/superword.hpp line 411:

> 409:   void compress_packset();
> 410:   // Construct the map from nodes to packs.
> 411:   void construct_my_pack_map();

Note: removed

src/hotspot/share/opto/superword.hpp line 449:

> 447:   void initialize_node_info();
> 448:   // Compute max depth for expressions from beginning of block
> 449:   void compute_max_depth();

Note: removed.

src/hotspot/share/opto/superword.hpp line 457:

> 455:   bool in_packset(Node* s1, Node* s2);
> 456:   // Remove the pack at position pos in the packset
> 457:   void remove_pack_at(int pos);

Note: removed

src/hotspot/share/opto/superword.hpp line 464:

> 462:   void adjust_pre_loop_limit_to_align_main_loop_vectors();
> 463:   // Is the use of d1 in u1 at the same operand position as d2 in u2?
> 464:   bool opnd_positions_match(Node* d1, Node* u1, Node* d2, Node* u2);

Note: renamed, and moved up: `order_inputs_of_uses_to_match_def_pair`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1524988862
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1524990361
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1524991799
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1524995044
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1524997880
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525000012
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1526142885
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1526142923
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525002107
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525003343
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1531978254
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525003978
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525007880
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525008557
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525010591
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525015661
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525011783
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525012913
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525016402
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525018069
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525018510
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525019238
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525021056
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525023674
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525025490
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525031457
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1531982329
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525030409
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525024562
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525024785
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525032262
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525025278
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525040328
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525033432
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525033588
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525037708
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525037094
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525036685
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525036284
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525036109
PR Review Comment: https://git.openjdk.org/jdk/pull/18276#discussion_r1525035777