RFR: 8286197: C2: Optimize MemorySegment shape in int loop

Vladimir Kozlov kvn at openjdk.java.net
Thu May 5 16:01:18 UTC 2022


On Thu, 5 May 2022 14:57:11 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> This is another small enhancement for a code shape that showed up in a
> MemorySegment micro benchmark. The shape to optimize is the one from test1:
> 
> 
> for (int i = 0; i < size; i++) {
>     long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
> 
>     j = Objects.checkIndex(j, size * 4);
> 
>     if (((base + j) & 3) != 0) {
>         throw new RuntimeException();
>     }
> 
>     v += UNSAFE.getInt(base + j);
> }
> 
> 
> In that code shape, the loop iv is first scaled, result is then casted
> to long, range checked and finally address of memory location is
> computed.
> 
> The alignment check is transformed so the loop body has no check In
> order to eliminate the range check, that loop is transformed into:
> 
> 
> for (int i1 = ..) {
>     for (int i2 = ..) {
>         long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
> 
>         j = Objects.checkIndex(j, size * 4);
> 
>         v += UNSAFE.getInt(base + j);
>     }
> }
> 
> 
> The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
> 
> In this case, the type of the ConvI2L is [min_jint, max_jint] and type
> of CastLL is [0, max_jint] (the CastLL has a narrower type).
> 
> I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
> case. The convI2L and CastII types can be set to [0, max_jint]. The
> new address shape is then:
> 
> (AddP base (ConvI2L (CastII (LShiftI (AddI ...
> 
> which optimize well.
> 
> (LShiftI (AddI ...
> is transformed into
> (AddI (LShiftI ...
> because one of the AddI input is loop invariant (i2) and we have:
> 
> (AddP base (ConvI2L (CastII (AddI (LShiftI ...
> 
> Then because the ConvI2L and CastII types are [0, max_jint], the AddI
> is pushed through the ConvI2L and CastII:
> 
> (AddP base (AddL (ConvI2L (CastII (LShiftI ...
> 
> base and one of the inputs of the AddL are loop invariant so this
> transformed into:
> 
> (AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
> 
> The (AddP ...) is loop invariant so computed before entry. The
> (ConvI2L ...) only depends on the loop iv.
> 
> The resulting address is a shift + an add. The address before
> transformation requires 2 adds + a shift. Also after unrolling, the
> adress of the second access in the loop is cheaper to compute as it
> can be derived from the address of the first access.
> 
> For all of this to work:
> 1) I added a CastLL::Ideal transformation:
> (CastLL (ConvI2L into (ConvI2l (CastII
> 
> 2) I also had to prevent split if to transform (LShiftI (Phi for the
> iv Phi of a counted loop.
> 
> 
> test2 and test3 test 1) and 2) separately.

Good suggestion. I have comments.

src/hotspot/share/opto/castnode.cpp line 385:

> 383:     const Type* t = Value(phase);
> 384:     const Type* t_in = phase->type(in1);
> 385:     if (t != Type::TOP && t_in != Type::TOP && t != t_in) {

`t != t_in` does not mean that type is narrower in general case. I think we need to check ranges (types meet?).

src/hotspot/share/opto/loopopts.cpp line 1084:

> 1082:   }
> 1083: 
> 1084: // Check for having no control input; not pinned.  Allow

Wrong removed space.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8555


More information about the hotspot-compiler-dev mailing list