RFR: 8300257: C2: vectorization fails on some simple Memory Segment loops [v3]
Roland Westrelin
roland at openjdk.org
Fri Mar 17 16:48:00 UTC 2023
> In the test case `testByteLong1` (that's extracted from a memory
> segment micro benchmark), the address of the store is initially:
>
>
> (AddP#204 base#195 base#195 (AddL#164 (ConvI2L#158 (CastII#157 (LshiftI#107 iv#101))) invar#163))
>
>
> (#numbers are node numbers to help the discussion).
>
> `iv#101` is the `Phi` of a counted loop. `invar#163` is the
> `baseOffset` load.
>
> To eliminate the range check, the loop is transformed into a loop nest
> and as a consequence the address above becomes:
>
>
> (AddP#204 base#195 base#195 (AddL#164 (ConvI2L#158 (CastII#157 (LShiftI#107 (AddI#326 invar#308 iv#321)))) invar#163))
>
>
> `invar#308` is some expression from a `Phi` of the outer loop.
>
> That `AddP` is transformed multiple times to push the invariants out of loop:
>
>
> (AddP#568 base#195 (AddP#556 base#195 base#195 invar#163) (ConvI2L#158 (CastII#157 (AddI#566 (LShiftI#565 iv#321) invar#577))))
>
>
> then:
>
>
> (AddP#568 base#195 (AddP#847 (AddP#556 base#195 base#195 invar#163) (AddL#838 (ConvI2L#793 (LShiftL#760 iv#767)) (ConvI2L#818 (CastII#779 invar#577)))))
>
>
> and finally:
>
>
> (AddP#568 base#195 (AddP#949 base#195 (AddP#855 base#195 (AddP#556 base#195 base#195 invar#163) (ConvI2L#818 (CastII#809 invar#577))) (ConvI2L#938 (LShiftI#896 iv#908))))
>
>
> `AddP#855` is out of the inner loop.
>
> This doesn't vectorize because:
>
> - there are 2 invariants in the address expression but superword only
> support one (tracked by `_invar` in `SWPointer`)
>
> - there are more levels of `AddP` (4) than superword supports (3)
>
> To fix that, I propose to no longer track the address elements in
> `_invar`, `_negate_invar` and `_invar_scale` but instead to have a
> single `_invar` which is an expression built by superword as it
> follows chains of `addP` nodes. I kept the previous `_invar`,
> `_negate_invar` and `_invar_scale` as debugging and use them to check
> that what vectorized with the previous scheme still does.
>
> I also propose lifting the restriction on 3 levels of `AddP` entirely.
Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision:
- Update test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMultiInvar.java
Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
- Update test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMultiInvar.java
Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
- Update src/hotspot/share/opto/superword.hpp
Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/12942/files
- new: https://git.openjdk.org/jdk/pull/12942/files/cdcc181c..d4d07656
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=12942&range=02
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=12942&range=01-02
Stats: 13 lines in 2 files changed: 0 ins; 2 del; 11 mod
Patch: https://git.openjdk.org/jdk/pull/12942.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/12942/head:pull/12942
PR: https://git.openjdk.org/jdk/pull/12942
More information about the hotspot-compiler-dev
mailing list