RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops

Tue Mar 25 07:33:26 UTC 2025

On Mon, 24 Mar 2025 20:34:30 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> This was a fuzzer failure, which hit an assert in SuperWord:
>> 
>> `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop`
>> 
>> We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible.
>> 
>> I'm proposing two changes in this PR:
>> - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop.
>> - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`.
>> 
>> I added 2 tests:
>> - The fuzzer generated test that hits the assert before this patch.
>> - An IR test that checks that we do not multiversion in a  `PeelMainPost` loop case.
>> 
>> ---------------
>> 
>> **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example:
>> 
>> `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java`
>> 
>> With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`
>> 
>> 
>> Counted            Loop: N537/N176  counted [int,100),+1 (-1 iters) 
>> Loop: N0/N0  has_sfpt
>>   Loop: N307/N361  limit_check profile_predicated predicated sfpts={ 182 495 }
>>     Loop: N536/N535 
>>       Loop: N537/N176  counted [int,100),+1 (-1 iters)  has_sfpt strip_mined
>>     Loop: N379/N383  limit_check profile_predicated predicated counted [int,int),+1 (4 iters)  pre rc  has_sfpt
>>     Loop: N353/N357  counted [int,1000),+1 (4 iters)  post rc  has_sfpt
>> Multiversion       Loop: N537/N176  counted [int,100),+1 (100 iters)  has_sfpt strip_mined
>> PreMainPost        Loop: N537/N176  counted [int,100),+1 (100 iters)  multiversion_fast has_sfpt strip_mined
>> Unroll 2           Loop: N537/N176  counted [int,100),+1 (100 iters)  main multiversion_fast has_sfpt strip_mined
>> Poor node estimate: 306 >> 92
>> Loop: N0/N0  has_sfpt
>>   Loop: N307/N361  limit_check profile_predicated predicated sfpts={...
>
> src/hotspot/share/opto/loopTransform.cpp line 3435:
> 
>> 3433:       if (!peel_only) {
>> 3434:         // We are going to add pre-loop and post-loop (PreMainPost).
>> 3435:         // But should we also multi-version for auto-vectorization speculative
> 
> Just a side note: It seems that we sometimes say "multiversion" and sometimes "multi-version". I find the latter more readable. Maybe we can make it consistent at some point in a separate task. This could also mean that we have `*_multi_version_*` in method names instead of `*_multiversion*_`.

Hmm... good catch. Personally, I prefer "multiversion" as a single word. But this is very subjective 😅

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2011481992