RFR: 8286197: C2: Optimize MemorySegment shape in int loop [v2]
Roland Westrelin
roland at openjdk.java.net
Fri May 6 09:18:27 UTC 2022
> This is another small enhancement for a code shape that showed up in a
> MemorySegment micro benchmark. The shape to optimize is the one from test1:
>
>
> for (int i = 0; i < size; i++) {
> long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
>
> j = Objects.checkIndex(j, size * 4);
>
> if (((base + j) & 3) != 0) {
> throw new RuntimeException();
> }
>
> v += UNSAFE.getInt(base + j);
> }
>
>
> In that code shape, the loop iv is first scaled, result is then casted
> to long, range checked and finally address of memory location is
> computed.
>
> The alignment check is transformed so the loop body has no check In
> order to eliminate the range check, that loop is transformed into:
>
>
> for (int i1 = ..) {
> for (int i2 = ..) {
> long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
>
> j = Objects.checkIndex(j, size * 4);
>
> v += UNSAFE.getInt(base + j);
> }
> }
>
>
> The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
>
> In this case, the type of the ConvI2L is [min_jint, max_jint] and type
> of CastLL is [0, max_jint] (the CastLL has a narrower type).
>
> I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
> case. The convI2L and CastII types can be set to [0, max_jint]. The
> new address shape is then:
>
> (AddP base (ConvI2L (CastII (LShiftI (AddI ...
>
> which optimize well.
>
> (LShiftI (AddI ...
> is transformed into
> (AddI (LShiftI ...
> because one of the AddI input is loop invariant (i2) and we have:
>
> (AddP base (ConvI2L (CastII (AddI (LShiftI ...
>
> Then because the ConvI2L and CastII types are [0, max_jint], the AddI
> is pushed through the ConvI2L and CastII:
>
> (AddP base (AddL (ConvI2L (CastII (LShiftI ...
>
> base and one of the inputs of the AddL are loop invariant so this
> transformed into:
>
> (AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
>
> The (AddP ...) is loop invariant so computed before entry. The
> (ConvI2L ...) only depends on the loop iv.
>
> The resulting address is a shift + an add. The address before
> transformation requires 2 adds + a shift. Also after unrolling, the
> adress of the second access in the loop is cheaper to compute as it
> can be derived from the address of the first access.
>
> For all of this to work:
> 1) I added a CastLL::Ideal transformation:
> (CastLL (ConvI2L into (ConvI2l (CastII
>
> 2) I also had to prevent split if to transform (LShiftI (Phi for the
> iv Phi of a counted loop.
>
>
> test2 and test3 test 1) and 2) separately.
Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
review
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/8555/files
- new: https://git.openjdk.java.net/jdk/pull/8555/files/6493429c..a122f0cf
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8555&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8555&range=00-01
Stats: 12 lines in 2 files changed: 5 ins; 0 del; 7 mod
Patch: https://git.openjdk.java.net/jdk/pull/8555.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/8555/head:pull/8555
PR: https://git.openjdk.java.net/jdk/pull/8555
More information about the hotspot-compiler-dev
mailing list