RFR: 8279888: Local variable independently used by multiple loops can interfere with loop optimizations [v4]

Tobias Hartmann thartmann at openjdk.java.net
Wed Mar 23 12:02:38 UTC 2022


On Tue, 22 Mar 2022 12:45:12 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> The bytecode of the 2 methods of the benchmark is structured
>> differently: loopsWithSharedLocal(), the slowest one, has multiple
>> backedges with a single head while loopsWithScopedLocal() has a single
>> backedge and all the paths in the loop body merge before the
>> backedge. loopsWithSharedLocal() has its head cloned which results in
>> a 2 loops loop nest.
>> 
>> loopsWithSharedLocal() is slow when 2 of the backedges are most
>> commonly taken with one taken only 3 times as often as the other
>> one. So a thread executing that code only runs the inner loop for a
>> few iterations before exiting it and executing the outer loop. I think
>> what happens is that any time the inner loop is entered, some
>> predicates are executed and the overhead of the setup of loop strip
>> mining (if it's enabled) has to be paid. Also, if iteration
>> splitting/unrolling was applied, the main loop is likely never
>> executed and all time is spent in the pre/post loops where potentially
>> some range checks remain.
>> 
>> The fix I propose is that ciTypeFlow, when it clone heads, not only
>> rewires the most frequent loop but also all this other frequent loops
>> that share the same head. loopsWithSharedLocal() and
>> loopsWithScopedLocal() are then fairly similar once c2 parses them.
>> 
>> Without the patch I measure:
>> 
>> LoopLocals.loopsWithScopedLocal      mixed  avgt    5  1108.874 ± 250.463  ns/op
>> LoopLocals.loopsWithSharedLocal      mixed  avgt    5  1575.665 ±  70.372  ns/op
>> 
>> with it:
>> 
>> LoopLocals.loopsWithScopedLocal      mixed  avgt    5  1108.180 ± 245.873  ns/op
>> LoopLocals.loopsWithSharedLocal      mixed  avgt    5  1234.665 ± 157.912  ns/op
>> 
>> But this patch also causes a regression when running one of the
>> benchmarks added by 8278518. From:
>> 
>> SharedLoopHeader.sharedHeader  avgt    5  505.993 ± 44.126  ns/op
>> 
>> to:
>> 
>> SharedLoopHeader.sharedHeader  avgt    5  724.253 ± 1.664  ns/op
>> 
>> The hot method of this benchmark used to be compiled with 2 loops, the
>> inner one a counted loop. With the patch, it's now compiled with a
>> single one which can't be converted into a counted loop because the
>> loop variable is incremented by a different amount along the 2 paths
>> in the loop body. What I propose to fix this is to add a new loop
>> transformation that detects that, because of a merge point, a loop
>> can't be turned into a counted loop and transforms it into 2
>> loops. The benchmark performs better with this:
>> 
>> SharedLoopHeader.sharedHeader  avgt    5  567.150 ± 6.120  ns/op
>> 
>> Not quite on par with the previous score but AFAICT this is due to
>> code generation not being as good (the loop head can't be aligned in
>> particular).
>> 
>> In short, I propose:
>> 
>> - changing ciTypeFlow so that, when it pays off, a loop with
>> multiple backedges is compiled as a single loop with a merge point in
>> the loop body
>> 
>> - adding a new loop transformation so that, when it pays off, a loop
>> with a merge point in the loop body is converted into a 2 loops loop
>> nest, essentially the opposite transformation.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:
> 
>  - Merge branch 'master' into JDK-8279888
>  - Merge commit '44327da170c57ccd31bfad379738c7f00e67f4c1' into JDK-8279888
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
>  - Merge branch 'master' into JDK-8279888
>  - Merge branch 'master' into JDK-8279888
>  - fix
>  - Merge branch 'master' into JDK-8279888
>  - whitespaces
>  - fix & test

`compiler/intrinsics/unsafe/HeapByteBufferTest.java` fails with `-XX:+StressDuplicateBackedge`:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (...workspace/open/src/hotspot/share/opto/macroArrayCopy.cpp:1277), pid=23931, tid=23946
#  assert(alloc != __null) failed: expect alloc
#
# JRE version: Java(TM) SE Runtime Environment (19.0) (fastdebug build 19-internal-2022-03-23-1109324.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 19-internal-2022-03-23-1109324.tobias.hartmann.jdk2, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x13f86f5]  PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode*)+0x10a

-------------

PR: https://git.openjdk.java.net/jdk/pull/7352


More information about the hotspot-compiler-dev mailing list