RFR: 8286197: C2: Optimize MemorySegment shape in int loop
Roland Westrelin
roland at openjdk.java.net
Thu May 5 15:05:48 UTC 2022
This is another small enhancement for a code shape that showed up in a
MemorySegment micro benchmark. The shape to optimize is the one from test1:
for (int i = 0; i < size; i++) {
long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
j = Objects.checkIndex(j, size * 4);
if (((base + j) & 3) != 0) {
throw new RuntimeException();
}
v += UNSAFE.getInt(base + j);
}
In that code shape, the loop iv is first scaled, result is then casted
to long, range checked and finally address of memory location is
computed.
The alignment check is transformed so the loop body has no check In
order to eliminate the range check, that loop is transformed into:
for (int i1 = ..) {
for (int i2 = ..) {
long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
j = Objects.checkIndex(j, size * 4);
v += UNSAFE.getInt(base + j);
}
}
The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
In this case, the type of the ConvI2L is [min_jint, max_jint] and type
of CastLL is [0, max_jint] (the CastLL has a narrower type).
I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
case. The convI2L and CastII types can be set to [0, max_jint]. The
new address shape is then:
(AddP base (ConvI2L (CastII (LShiftI (AddI ...
which optimize well.
(LShiftI (AddI ...
is transformed into
(AddI (LShiftI ...
because one of the AddI input is loop invariant (i2) and we have:
(AddP base (ConvI2L (CastII (AddI (LShiftI ...
Then because the ConvI2L and CastII types are [0, max_jint], the AddI
is pushed through the ConvI2L and CastII:
(AddP base (AddL (ConvI2L (CastII (LShiftI ...
base and one of the inputs of the AddL are loop invariant so this
transformed into:
(AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
The (AddP ...) is loop invariant so computed before entry. The
(ConvI2L ...) only depends on the loop iv.
The resulting address is a shift + an add. The address before
transformation requires 2 adds + a shift. Also after unrolling, the
adress of the second access in the loop is cheaper to compute as it
can be derived from the address of the first access.
For all of this to work:
1) I added a CastLL::Ideal transformation:
(CastLL (ConvI2L into (ConvI2l (CastII
2) I also had to prevent split if to transform (LShiftI (Phi for the
iv Phi of a counted loop.
test2 and test3 test 1) and 2) separately.
-------------
Commit messages:
- whitespaces
- extra test comment
- test & fix
Changes: https://git.openjdk.java.net/jdk/pull/8555/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8555&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8286197
Stats: 162 lines in 5 files changed: 161 ins; 0 del; 1 mod
Patch: https://git.openjdk.java.net/jdk/pull/8555.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/8555/head:pull/8555
PR: https://git.openjdk.java.net/jdk/pull/8555
More information about the hotspot-compiler-dev
mailing list