RFR: 8286197: C2: Optimize MemorySegment shape in int loop [v2]
Vladimir Kozlov
kvn at openjdk.java.net
Fri May 6 16:16:05 UTC 2022
On Fri, 6 May 2022 09:18:27 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This is another small enhancement for a code shape that showed up in a
>> MemorySegment micro benchmark. The shape to optimize is the one from test1:
>>
>>
>> for (int i = 0; i < size; i++) {
>> long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
>>
>> j = Objects.checkIndex(j, size * 4);
>>
>> if (((base + j) & 3) != 0) {
>> throw new RuntimeException();
>> }
>>
>> v += UNSAFE.getInt(base + j);
>> }
>>
>>
>> In that code shape, the loop iv is first scaled, result is then casted
>> to long, range checked and finally address of memory location is
>> computed.
>>
>> The alignment check is transformed so the loop body has no check In
>> order to eliminate the range check, that loop is transformed into:
>>
>>
>> for (int i1 = ..) {
>> for (int i2 = ..) {
>> long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
>>
>> j = Objects.checkIndex(j, size * 4);
>>
>> v += UNSAFE.getInt(base + j);
>> }
>> }
>>
>>
>> The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
>>
>> In this case, the type of the ConvI2L is [min_jint, max_jint] and type
>> of CastLL is [0, max_jint] (the CastLL has a narrower type).
>>
>> I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
>> case. The convI2L and CastII types can be set to [0, max_jint]. The
>> new address shape is then:
>>
>> (AddP base (ConvI2L (CastII (LShiftI (AddI ...
>>
>> which optimize well.
>>
>> (LShiftI (AddI ...
>> is transformed into
>> (AddI (LShiftI ...
>> because one of the AddI input is loop invariant (i2) and we have:
>>
>> (AddP base (ConvI2L (CastII (AddI (LShiftI ...
>>
>> Then because the ConvI2L and CastII types are [0, max_jint], the AddI
>> is pushed through the ConvI2L and CastII:
>>
>> (AddP base (AddL (ConvI2L (CastII (LShiftI ...
>>
>> base and one of the inputs of the AddL are loop invariant so this
>> transformed into:
>>
>> (AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
>>
>> The (AddP ...) is loop invariant so computed before entry. The
>> (ConvI2L ...) only depends on the loop iv.
>>
>> The resulting address is a shift + an add. The address before
>> transformation requires 2 adds + a shift. Also after unrolling, the
>> adress of the second access in the loop is cheaper to compute as it
>> can be derived from the address of the first access.
>>
>> For all of this to work:
>> 1) I added a CastLL::Ideal transformation:
>> (CastLL (ConvI2L into (ConvI2l (CastII
>>
>> 2) I also had to prevent split if to transform (LShiftI (Phi for the
>> iv Phi of a counted loop.
>>
>>
>> test2 and test3 test 1) and 2) separately.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> review
Good. I will start testing.
-------------
Marked as reviewed by kvn (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/8555
More information about the hotspot-compiler-dev
mailing list