RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops
Christian Hagedorn
chagedorn at openjdk.org
Mon Mar 24 20:47:16 UTC 2025
On Mon, 24 Mar 2025 08:22:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> This was a fuzzer failure, which hit an assert in SuperWord:
>
> `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop`
>
> We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible.
>
> I'm proposing two changes in this PR:
> - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop.
> - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`.
>
> I added 2 tests:
> - The fuzzer generated test that hits the assert before this patch.
> - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case.
>
> ---------------
>
> **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example:
>
> `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java`
>
> With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`
>
>
> Counted Loop: N537/N176 counted [int,100),+1 (-1 iters)
> Loop: N0/N0 has_sfpt
> Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 }
> Loop: N536/N535
> Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined
> Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt
> Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt
> Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined
> PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined
> Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined
> Poor node estimate: 306 >> 92
> Loop: N0/N0 has_sfpt
> Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 }
> Loop: N556/N557 sfpts={ 559 }
> Loop: N552/N554 counted...
Looks good!
> It seems that we are able to detect some loops as empty loops, including the pre-loop. But somhow the main-loop is not removed by "empty loop", and now this main-loop cannot traverse through the pre-loop to the multiversion_if.
There are some bailouts in `IdealLoopTree::remove_main_post_loops()` (called from the empty loop removal optimization). So it seems that we cannot always eliminate the main loop when the pre loop is removed. But maybe the main loop could still be eliminated in your case somehow - might be worth investigating further.
src/hotspot/share/opto/loopTransform.cpp line 3435:
> 3433: if (!peel_only) {
> 3434: // We are going to add pre-loop and post-loop (PreMainPost).
> 3435: // But should we also multi-version for auto-vectorization speculative
Just a side note: It seems that we sometimes say "multiversion" and sometimes "multi-version". I find the latter more readable. Maybe we can make it consistent at some point in a separate task. This could also mean that we have `*_multi_version_*` in method names instead of `*_multiversion*_`.
test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionWithPeelMainPost.java line 31:
> 29: * @summary Test case where we used to Multiversion a PeelMainPost loop,
> 30: * which is useless and triggered an assert later on.
> 31: * @run driver compiler.loopopts.superword.TestMultiversionWithPeelMainPost
Should be `main` to be able to add flags in the CI:
Suggestion:
* @run main compiler.loopopts.superword.TestMultiversionWithPeelMainPost
test/hotspot/jtreg/compiler/loopopts/superword/TestPeelMainPostNoMultiversioning.java line 63:
> 61: long y = multiplicator;
> 62: for (int i = 0; i < 10_000; i++) {
> 63: x *= y; // No memory load/stroe -> PeelMainPost
Suggestion:
x *= y; // No memory load/store -> PeelMainPost
-------------
Marked as reviewed by chagedorn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24183#pullrequestreview-2711610938
PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2010888155
PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2010883383
PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2010884684
More information about the hotspot-compiler-dev
mailing list