RFR: 8286197: C2: Optimize MemorySegment shape in int loop
Vladimir Kozlov
kvn at openjdk.java.net
Thu May 5 16:01:18 UTC 2022
On Thu, 5 May 2022 14:57:11 GMT, Roland Westrelin <roland at openjdk.org> wrote:
> This is another small enhancement for a code shape that showed up in a
> MemorySegment micro benchmark. The shape to optimize is the one from test1:
>
>
> for (int i = 0; i < size; i++) {
> long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
>
> j = Objects.checkIndex(j, size * 4);
>
> if (((base + j) & 3) != 0) {
> throw new RuntimeException();
> }
>
> v += UNSAFE.getInt(base + j);
> }
>
>
> In that code shape, the loop iv is first scaled, result is then casted
> to long, range checked and finally address of memory location is
> computed.
>
> The alignment check is transformed so the loop body has no check In
> order to eliminate the range check, that loop is transformed into:
>
>
> for (int i1 = ..) {
> for (int i2 = ..) {
> long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
>
> j = Objects.checkIndex(j, size * 4);
>
> v += UNSAFE.getInt(base + j);
> }
> }
>
>
> The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
>
> In this case, the type of the ConvI2L is [min_jint, max_jint] and type
> of CastLL is [0, max_jint] (the CastLL has a narrower type).
>
> I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
> case. The convI2L and CastII types can be set to [0, max_jint]. The
> new address shape is then:
>
> (AddP base (ConvI2L (CastII (LShiftI (AddI ...
>
> which optimize well.
>
> (LShiftI (AddI ...
> is transformed into
> (AddI (LShiftI ...
> because one of the AddI input is loop invariant (i2) and we have:
>
> (AddP base (ConvI2L (CastII (AddI (LShiftI ...
>
> Then because the ConvI2L and CastII types are [0, max_jint], the AddI
> is pushed through the ConvI2L and CastII:
>
> (AddP base (AddL (ConvI2L (CastII (LShiftI ...
>
> base and one of the inputs of the AddL are loop invariant so this
> transformed into:
>
> (AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
>
> The (AddP ...) is loop invariant so computed before entry. The
> (ConvI2L ...) only depends on the loop iv.
>
> The resulting address is a shift + an add. The address before
> transformation requires 2 adds + a shift. Also after unrolling, the
> adress of the second access in the loop is cheaper to compute as it
> can be derived from the address of the first access.
>
> For all of this to work:
> 1) I added a CastLL::Ideal transformation:
> (CastLL (ConvI2L into (ConvI2l (CastII
>
> 2) I also had to prevent split if to transform (LShiftI (Phi for the
> iv Phi of a counted loop.
>
>
> test2 and test3 test 1) and 2) separately.
Good suggestion. I have comments.
src/hotspot/share/opto/castnode.cpp line 385:
> 383: const Type* t = Value(phase);
> 384: const Type* t_in = phase->type(in1);
> 385: if (t != Type::TOP && t_in != Type::TOP && t != t_in) {
`t != t_in` does not mean that type is narrower in general case. I think we need to check ranges (types meet?).
src/hotspot/share/opto/loopopts.cpp line 1084:
> 1082: }
> 1083:
> 1084: // Check for having no control input; not pinned. Allow
Wrong removed space.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8555
More information about the hotspot-compiler-dev
mailing list