RFR: 8300257: C2: vectorization fails on some simple Memory Segment loops [v2]

Tobias Hartmann thartmann at openjdk.org
Thu Mar 16 08:56:29 UTC 2023


On Tue, 14 Mar 2023 10:58:11 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> In the test case `testByteLong1` (that's extracted from a memory
>> segment micro benchmark), the address of the store is initially:
>> 
>> 
>> (AddP#204 base#195 base#195 (AddL#164 (ConvI2L#158 (CastII#157 (LshiftI#107 iv#101))) invar#163))
>> 
>> 
>> (#numbers are node numbers to help the discussion).
>> 
>> `iv#101` is the `Phi` of a counted loop. `invar#163` is the
>> `baseOffset` load.
>> 
>> To eliminate the range check, the loop is transformed into a loop nest
>> and as a consequence the address above becomes:
>> 
>> 
>> (AddP#204 base#195 base#195 (AddL#164 (ConvI2L#158 (CastII#157 (LShiftI#107 (AddI#326 invar#308 iv#321)))) invar#163))
>> 
>> 
>> `invar#308` is some expression from a `Phi` of the outer loop.
>> 
>> That `AddP` is transformed multiple times to push the invariants out of loop:
>> 
>> 
>> (AddP#568 base#195 (AddP#556 base#195 base#195 invar#163) (ConvI2L#158 (CastII#157 (AddI#566 (LShiftI#565 iv#321) invar#577))))
>> 
>> 
>> then:
>> 
>> 
>> (AddP#568 base#195 (AddP#847 (AddP#556 base#195 base#195 invar#163) (AddL#838 (ConvI2L#793 (LShiftL#760 iv#767)) (ConvI2L#818 (CastII#779 invar#577)))))
>> 
>> 
>> and finally:
>> 
>> 
>> (AddP#568 base#195 (AddP#949 base#195 (AddP#855 base#195 (AddP#556 base#195 base#195 invar#163) (ConvI2L#818 (CastII#809 invar#577))) (ConvI2L#938 (LShiftI#896 iv#908))))
>> 
>> 
>> `AddP#855` is out of the inner loop. 
>> 
>> This doesn't vectorize because:
>> 
>> - there are 2 invariants in the address expression but superword only
>>   support one (tracked by `_invar` in `SWPointer`)
>> 
>> - there are more levels of `AddP` (4) than superword supports (3)
>> 
>> To fix that, I propose to no longer track the address elements in
>> `_invar`, `_negate_invar` and `_invar_scale` but instead to have a
>> single `_invar` which is an expression built by superword as it
>> follows chains of `addP` nodes. I kept the previous `_invar`,
>> `_negate_invar` and `_invar_scale` as debugging and use them to check
>> that what vectorized with the previous scheme still does.
>> 
>> I also propose lifting the restriction on 3 levels of `AddP` entirely.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
> 
>  - NULL -> nullptr
>  - Merge branch 'master' into JDK-8300257
>  - fix & test

Looks good to me. I'll run some perf testing and report back.

src/hotspot/share/opto/superword.cpp line 4318:

> 4316:   if (opc == Op_AddI) {
> 4317:     if (n->in(2)->is_Con() && invariant(n->in(1))) {
> 4318:       maybe_add_to_invar(maybe_negate_invar(negate, n->in(1)));

It feels like `maybe_negate_invar` should be moved into `maybe_add_to_invar` and be controlled by a `negate` argument.

src/hotspot/share/opto/superword.hpp line 695:

> 693:                                     _debug_invar_scale == q._debug_invar_scale &&
> 694:                                     _debug_negate_invar == q._debug_negate_invar), "");
> 695:     return _invar        == q._invar;

Suggestion:

    return _invar == q._invar;

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMultiInvar.java line 78:

> 76:                                     long start4, long stop4,
> 77:                                     long start5, long stop5
> 78:                                     ) {

Suggestion:

    public static void testLoopNest1(byte[] dest, byte[] src,
                                     long start1, long stop1,
                                     long start2, long stop2,
                                     long start3, long stop3,
                                     long start4, long stop4,
                                     long start5, long stop5) {

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMultiInvar.java line 110:

> 108:                                     long start4, long stop4,
> 109:                                     long start5, long stop5
> 110:                                     ) {

Suggestion:

    public static void testLoopNest2(int[] dest, int[] src,
                                     long start1, long stop1,
                                     long start2, long stop2,
                                     long start3, long stop3,
                                     long start4, long stop4,
                                     long start5, long stop5) {

-------------

Marked as reviewed by thartmann (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12942


More information about the hotspot-compiler-dev mailing list