RFR: 8286197: C2: Optimize MemorySegment shape in int loop [v2]

Fri May 6 16:16:05 UTC 2022

On Fri, 6 May 2022 09:18:27 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This is another small enhancement for a code shape that showed up in a
>> MemorySegment micro benchmark. The shape to optimize is the one from test1:
>> 
>> 
>> for (int i = 0; i < size; i++) {
>>     long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
>> 
>>     j = Objects.checkIndex(j, size * 4);
>> 
>>     if (((base + j) & 3) != 0) {
>>         throw new RuntimeException();
>>     }
>> 
>>     v += UNSAFE.getInt(base + j);
>> }
>> 
>> 
>> In that code shape, the loop iv is first scaled, result is then casted
>> to long, range checked and finally address of memory location is
>> computed.
>> 
>> The alignment check is transformed so the loop body has no check In
>> order to eliminate the range check, that loop is transformed into:
>> 
>> 
>> for (int i1 = ..) {
>>     for (int i2 = ..) {
>>         long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
>> 
>>         j = Objects.checkIndex(j, size * 4);
>> 
>>         v += UNSAFE.getInt(base + j);
>>     }
>> }
>> 
>> 
>> The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
>> 
>> In this case, the type of the ConvI2L is [min_jint, max_jint] and type
>> of CastLL is [0, max_jint] (the CastLL has a narrower type).
>> 
>> I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
>> case. The convI2L and CastII types can be set to [0, max_jint]. The
>> new address shape is then:
>> 
>> (AddP base (ConvI2L (CastII (LShiftI (AddI ...
>> 
>> which optimize well.
>> 
>> (LShiftI (AddI ...
>> is transformed into
>> (AddI (LShiftI ...
>> because one of the AddI input is loop invariant (i2) and we have:
>> 
>> (AddP base (ConvI2L (CastII (AddI (LShiftI ...
>> 
>> Then because the ConvI2L and CastII types are [0, max_jint], the AddI
>> is pushed through the ConvI2L and CastII:
>> 
>> (AddP base (AddL (ConvI2L (CastII (LShiftI ...
>> 
>> base and one of the inputs of the AddL are loop invariant so this
>> transformed into:
>> 
>> (AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
>> 
>> The (AddP ...) is loop invariant so computed before entry. The
>> (ConvI2L ...) only depends on the loop iv.
>> 
>> The resulting address is a shift + an add. The address before
>> transformation requires 2 adds + a shift. Also after unrolling, the
>> adress of the second access in the loop is cheaper to compute as it
>> can be derived from the address of the first access.
>> 
>> For all of this to work:
>> 1) I added a CastLL::Ideal transformation:
>> (CastLL (ConvI2L into (ConvI2l (CastII
>> 
>> 2) I also had to prevent split if to transform (LShiftI (Phi for the
>> iv Phi of a counted loop.
>> 
>> 
>> test2 and test3 test 1) and 2) separately.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Good. I will start testing.

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8555