RFR: 8279888: Local variable independently used by multiple loops can interfere with loop optimizations [v5]
Roland Westrelin
roland at openjdk.java.net
Wed Mar 23 15:57:51 UTC 2022
> The bytecode of the 2 methods of the benchmark is structured
> differently: loopsWithSharedLocal(), the slowest one, has multiple
> backedges with a single head while loopsWithScopedLocal() has a single
> backedge and all the paths in the loop body merge before the
> backedge. loopsWithSharedLocal() has its head cloned which results in
> a 2 loops loop nest.
>
> loopsWithSharedLocal() is slow when 2 of the backedges are most
> commonly taken with one taken only 3 times as often as the other
> one. So a thread executing that code only runs the inner loop for a
> few iterations before exiting it and executing the outer loop. I think
> what happens is that any time the inner loop is entered, some
> predicates are executed and the overhead of the setup of loop strip
> mining (if it's enabled) has to be paid. Also, if iteration
> splitting/unrolling was applied, the main loop is likely never
> executed and all time is spent in the pre/post loops where potentially
> some range checks remain.
>
> The fix I propose is that ciTypeFlow, when it clone heads, not only
> rewires the most frequent loop but also all this other frequent loops
> that share the same head. loopsWithSharedLocal() and
> loopsWithScopedLocal() are then fairly similar once c2 parses them.
>
> Without the patch I measure:
>
> LoopLocals.loopsWithScopedLocal mixed avgt 5 1108.874 ± 250.463 ns/op
> LoopLocals.loopsWithSharedLocal mixed avgt 5 1575.665 ± 70.372 ns/op
>
> with it:
>
> LoopLocals.loopsWithScopedLocal mixed avgt 5 1108.180 ± 245.873 ns/op
> LoopLocals.loopsWithSharedLocal mixed avgt 5 1234.665 ± 157.912 ns/op
>
> But this patch also causes a regression when running one of the
> benchmarks added by 8278518. From:
>
> SharedLoopHeader.sharedHeader avgt 5 505.993 ± 44.126 ns/op
>
> to:
>
> SharedLoopHeader.sharedHeader avgt 5 724.253 ± 1.664 ns/op
>
> The hot method of this benchmark used to be compiled with 2 loops, the
> inner one a counted loop. With the patch, it's now compiled with a
> single one which can't be converted into a counted loop because the
> loop variable is incremented by a different amount along the 2 paths
> in the loop body. What I propose to fix this is to add a new loop
> transformation that detects that, because of a merge point, a loop
> can't be turned into a counted loop and transforms it into 2
> loops. The benchmark performs better with this:
>
> SharedLoopHeader.sharedHeader avgt 5 567.150 ± 6.120 ns/op
>
> Not quite on par with the previous score but AFAICT this is due to
> code generation not being as good (the loop head can't be aligned in
> particular).
>
> In short, I propose:
>
> - changing ciTypeFlow so that, when it pays off, a loop with
> multiple backedges is compiled as a single loop with a merge point in
> the loop body
>
> - adding a new loop transformation so that, when it pays off, a loop
> with a merge point in the loop body is converted into a 2 loops loop
> nest, essentially the opposite transformation.
Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
- review
- Merge branch 'master' into JDK-8279888
- Merge branch 'master' into JDK-8279888
- Merge commit '44327da170c57ccd31bfad379738c7f00e67f4c1' into JDK-8279888
- Update src/hotspot/share/opto/loopopts.cpp
Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
- Update src/hotspot/share/opto/loopopts.cpp
Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
- Merge branch 'master' into JDK-8279888
- Merge branch 'master' into JDK-8279888
- fix
- Merge branch 'master' into JDK-8279888
- ... and 2 more: https://git.openjdk.java.net/jdk/compare/91fab6ad...8b20e0cc
-------------
Changes: https://git.openjdk.java.net/jdk/pull/7352/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7352&range=04
Stats: 1087 lines in 9 files changed: 787 ins; 132 del; 168 mod
Patch: https://git.openjdk.java.net/jdk/pull/7352.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7352/head:pull/7352
PR: https://git.openjdk.java.net/jdk/pull/7352
More information about the hotspot-compiler-dev
mailing list