RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops

Emanuel Peter epeter at openjdk.org
Mon Mar 24 14:11:51 UTC 2025


This was a fuzzer failure, which hit an assert in SuperWord:

`# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop`

We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible.

I'm proposing two changes in this PR:
- We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop.
- In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`.

I added 2 tests:
- The fuzzer generated test that hits the assert before this patch.
- An IR test that checks that we do not multiversion in a  `PeelMainPost` loop case.

---------------

**FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example:

`test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java`

With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`


Counted            Loop: N537/N176  counted [int,100),+1 (-1 iters) 
Loop: N0/N0  has_sfpt
  Loop: N307/N361  limit_check profile_predicated predicated sfpts={ 182 495 }
    Loop: N536/N535 
      Loop: N537/N176  counted [int,100),+1 (-1 iters)  has_sfpt strip_mined
    Loop: N379/N383  limit_check profile_predicated predicated counted [int,int),+1 (4 iters)  pre rc  has_sfpt
    Loop: N353/N357  counted [int,1000),+1 (4 iters)  post rc  has_sfpt
Multiversion       Loop: N537/N176  counted [int,100),+1 (100 iters)  has_sfpt strip_mined
PreMainPost        Loop: N537/N176  counted [int,100),+1 (100 iters)  multiversion_fast has_sfpt strip_mined
Unroll 2           Loop: N537/N176  counted [int,100),+1 (100 iters)  main multiversion_fast has_sfpt strip_mined
Poor node estimate: 306 >> 92
Loop: N0/N0  has_sfpt
  Loop: N307/N361  limit_check profile_predicated predicated sfpts={ 182 }
    Loop: N556/N557  sfpts={ 559 }
      Loop: N552/N554  counted [int,100),+1 (100 iters)  multiversion_delayed_slow has_sfpt strip_mined
    Loop: N599/N601  counted [int,int),+1 (4 iters)  pre multiversion_fast has_sfpt
    Loop: N536/N535  sfpts={ 538 }
      Loop: N629/N176  counted [int,99),+2 (100 iters)  main multiversion_fast has_sfpt strip_mined
    Loop: N575/N577  counted [int,100),+1 (4 iters)  post multiversion_fast has_sfpt
    Loop: N379/N383  limit_check profile_predicated predicated counted [int,int),+1 (4 iters)  pre rc  has_sfpt
    Loop: N353/N357  counted [int,1000),+1 (4 iters)  post rc  has_sfpt
Parallel IV: 643       Loop: N552/N554  counted [int,100),+1 (100 iters)  multiversion_delayed_slow has_sfpt strip_mined
Parallel IV: 646     Loop: N599/N601  counted [int,int),+1 (4 iters)  pre multiversion_fast has_sfpt
Parallel IV: 652       Loop: N629/N176  counted [int,99),+2 (100 iters)  main multiversion_fast has_sfpt strip_mined
Parallel IV: 649     Loop: N575/N577  counted [int,100),+1 (4 iters)  post multiversion_fast has_sfpt
Loop: N0/N0  has_sfpt
  Loop: N307/N361  limit_check profile_predicated predicated sfpts={ 182 }
    Loop: N556/N557  sfpts={ 559 }
      Loop: N552/N554  counted [int,100),+1 (100 iters)  multiversion_delayed_slow has_sfpt strip_mined
    Loop: N599/N601  counted [int,int),+1 (4 iters)  pre multiversion_fast has_sfpt
    Loop: N536/N535  sfpts={ 538 }
      Loop: N629/N176  counted [int,99),+2 (100 iters)  main multiversion_fast has_sfpt strip_mined
    Loop: N575/N577  counted [int,100),+1 (4 iters)  post multiversion_fast has_sfpt
    Loop: N379/N383  limit_check profile_predicated predicated counted [int,int),+1 (4 iters)  pre rc  has_sfpt
    Loop: N353/N357  counted [int,1000),+1 (4 iters)  post rc  has_sfpt
Empty without zero trip guard         Loop: N552/N554  counted [int,100),+1 (100 iters)  multiversion_delayed_slow has_sfpt strip_mined
Peel               Loop: N552/N554  counted [int,100),+1 (100 iters)  multiversion_delayed_slow has_sfpt strip_mined
Empty without zero trip guard       Loop: N599/N601  counted [int,int),+1 (4 iters)  pre multiversion_fast has_sfpt
Peel             Loop: N599/N601  counted [int,int),+1 (4 iters)  pre multiversion_fast has_sfpt
Unroll 4           Loop: N629/N176  counted [int,99),+2 (100 iters)  main multiversion_fast has_sfpt strip_mined


It seems that we are able to detect some loops as empty loops, including the pre-loop. But somhow the main-loop is not removed by "empty loop", and now this main-loop cannot traverse through the pre-loop to the `multiversion_if`.

If reviewers thing this really should be investigated, I could file a follow-up RFE.

-------------

Commit messages:
 - Merge branch 'master' into JDK-8352587-Multiversion-PeelMainPost
 - rm assert
 - peel-main-post IR test
 - the fix
 - JDK-8352587

Changes: https://git.openjdk.org/jdk/pull/24183/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24183&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8352587
  Stats: 138 lines in 4 files changed: 134 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/24183.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24183/head:pull/24183

PR: https://git.openjdk.org/jdk/pull/24183


More information about the hotspot-compiler-dev mailing list