RFR: 8326139: C2 SuperWord: split packs (match use/def packs, implemented, mutual independence)

Emanuel Peter epeter at openjdk.org
Thu Feb 22 13:57:09 UTC 2024


On Wed, 14 Feb 2024 15:10:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> After `combine_pairs_to_longer_packs`, we sometimes create packs that are too long and cannot be vectorized.
> There are multiple reason for that:
> 
> - A pack does not "match" with a use of def pack, and we need to split it. Example: split Z:
> 
> X X X X Y Y Y Y
> Z Z Z Z Z Z Z Z
> 
> 
> - A pack is not implemented in the current size, but would be implemented for a smaller size. Some operations are not implemented at max vector size, and we just need to split every pack to the smaller size that is implemented. Or we have a pack of a non-power-of-2 size, and need to split it down into smaller sizes that are power-of-2.
> 
> - Packs can have pack internal dependence. This dependence happens at a certain "distance". If we split a pack to be smaller than that "distance", then the internal dependence disappears, and we have the desired mutual independence. Example:
> https://github.com/openjdk/jdk/blob/9ee274b0480a6e8e399830fd40c34d99c5621c1b/test/hotspot/jtreg/compiler/loopopts/superword/TestSplitPacks.java#L657-L669
> 
> Note: Soon I will refactor the packset into a new `PackSet` class, and move the split / filter code there.
> 
> **Further Work**
> 
> [JDK-8309267](https://bugs.openjdk.org/browse/JDK-8309267) C2 SuperWord: some tests fail on KNL machines - fail to vectorize
> The issue on KNL machines is that some operations only support vector widths that are smaller than the maximal vector length. Hence, we must split all other vectors that are uses/defs. This change here does exactly that, and so I will be able to put the `UseKNLSetting` in the IRFramework whitelist. I will do that in a separate RFE.

src/hotspot/share/opto/superword.cpp line 527:

> 525: 
> 526:   // Now we only remove packs:
> 527:   construct_my_pack_map();

Note: as with the `filter_packs` methods, I will for now just leave all `split_packs` methods here. I can clean this up in a future RFE. Things around here may still change, be added and removed. For now it is most convenient to keep it all here in a simple list.

Also: I now moved `construct_my_pack_map` before the filtering. This allows the filters to have access to `my_pack`. I need that for the `use_def_boundaries` check.

src/hotspot/share/opto/superword.cpp line 969:

> 967:   }
> 968: 
> 969:   if (isomorphic(s1, s2) && !is_populate_index(s1, s2)) {

Note: As explained in `test7a`:
If we pack the `iv + 1`, `iv + 2`, `iv + 3` ... `AddI` nodes, then we get a pack that is non-power-of-2 in most cases, since `iv + 0` has collapsed to `iv`, and is not an `AddI` but a `Phi` node. That then causes `split_packs_at_use_def_boundaries` to cut the use-packs, and things unravel from there.

So I had 2 options:
1. Create `Phi, AddI, AddI, AddI, ...` packs, and deal with the consequences of having nodes of different types in a pack. Plus, we would have the `iv` in a pack, which also has consequences all over the place... Not great.
2. Forbid anything that looks like a `PopulateIndex` to be packed. It does not need to be packed, and will still be vectorized, see `SuperWord::vector_opd`. I went with this option.

src/hotspot/share/opto/superword.cpp line 1529:

> 1527: }
> 1528: 
> 1529: void SuperWord::split_packs_longer_than_max_vector_size() {

Note: This is not covered by `split_packs_only_implemented_with_smaller_size`. No vector larger than max vector size is implemented, so we automatically split it down.

src/hotspot/share/opto/superword.cpp line 1944:

> 1942:     int old_packset_length = _packset.length();
> 1943:     filter_packs(nullptr, // don't dump each time
> 1944:                  "not profitable",

Note: this was a mistake of a previous PR. Only relevant for debug printing, so I just fix it here.

src/hotspot/share/opto/superword.hpp line 570:

> 568:   SplitStatus split_pack(const char* split_name, Node_List* pack, SplitTask task);
> 569:   template <typename SplitStrategy>
> 570:   void split_packs(const char* split_name, SplitStrategy strategy);

Note: I plan to move all of the methods for splitting and filtering to the `PackSet` class in a future refactoring.

test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java line 114:

> 112:         int[] out2 = new int[ITER];
> 113:         for (int i = 0; i < ITER; i++) {
> 114:             out[i] += ((sArr1[2*i] * sArr2[2*i]) + (sArr1[2*i+1] * sArr2[2*i+1]));

Note: `testa` only uses `sArr1`, and I wanted to be sure it works also with this.

test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java line 132:

> 130:         int[] out = new int[ITER];
> 131:         for (int i = 0; i < ITER; i++) {
> 132:             out[i] += ((sArr1[2*i] * sArr2[2*i]) + (sArr1[2*i+1] * sArr2[2*i+1]));

And here I even removed the seemingly unrelated/useful line: `out2[i] += out[i];`

test/hotspot/jtreg/compiler/loopopts/superword/TestSplitPacks.java line 37:

> 35: /*
> 36:  * @test
> 37:  * @bug 8309267

Suggestion:

 * @bug 8326139

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1494196842
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1497098355
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1494198201
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1494199585
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1494201714
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1497695149
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1497727264
PR Review Comment: https://git.openjdk.org/jdk/pull/17848#discussion_r1494204627


More information about the hotspot-compiler-dev mailing list