RFR: 8300256: C2: vectorization is sometimes skipped on loops where it would succeed [v3]

Roland Westrelin roland at openjdk.org
Mon Jan 30 14:53:55 UTC 2023


On Fri, 27 Jan 2023 16:31:41 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> Vectorization for a counted loop cl only proceeds if
>> cl->range_checks_present() returns true. The result of that method is
>> computed lazily and its result cached in the CountedLoopNode and never
>> re-computed. If PhaseIdealLoop::do_range_check() returns 0 then the
>> result of that computation is overwritten (no range checks
>> present). PhaseIdealLoop::do_range_check() counts the number of tests
>> present in the loop body (which is really what range_checks_present()
>> is about) and decrements that count for every check it eliminates
>> except if it's not a comparison with a LoadRange (for a reason that I
>> don't understand). In the case of the test (a pattern from a
>> ByteBuffer benchmark), not all tests are with a LoadRange. As a
>> result, PhaseIdealLoop::do_range_check() returns non zero even though
>> it eliminates all tests. As a result, vectorization is never
>> attempted.
>> 
>> There doesn't seem to be a value in caching the result of
>> range_checks_present() in CountedLoopNode. It's not that expensive to
>> compute, it's only used during loop opts and it's really hard to keep
>> in sync with whether the loop has still tests: several different
>> transformations could remove a test. What I propose instead is to keep
>> roughly the same approach (compute the result lazily and cache it so
>> it doesn't have to be re-computed) but to store it on the
>> IdealLoopTree instead (so it's recomputed on every loop opts pass and
>> there's no risk that it becomes out of sync).
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - @bug in test
>  - Merge branch 'master' into JDK-8300256
>  - review
>  - Merge branch 'master' into JDK-8300256
>  - more
>  - maybe more
>  - more
>  - vectorization not run

FTR, the test fails on x86 (32 bits) for some reason. I changed the test so it doesn't run on x86 (32 bits).

-------------

PR: https://git.openjdk.org/jdk/pull/12116


More information about the hotspot-compiler-dev mailing list