RFR: 8286197: C2: Optimize MemorySegment shape in int loop [v2]
Tobias Hartmann
thartmann at openjdk.java.net
Tue Jun 7 09:24:10 UTC 2022
On Fri, 6 May 2022 09:18:27 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This is another small enhancement for a code shape that showed up in a
>> MemorySegment micro benchmark. The shape to optimize is the one from test1:
>>
>>
>> for (int i = 0; i < size; i++) {
>> long j = i * UNSAFE.ARRAY_INT_INDEX_SCALE;
>>
>> j = Objects.checkIndex(j, size * 4);
>>
>> if (((base + j) & 3) != 0) {
>> throw new RuntimeException();
>> }
>>
>> v += UNSAFE.getInt(base + j);
>> }
>>
>>
>> In that code shape, the loop iv is first scaled, result is then casted
>> to long, range checked and finally address of memory location is
>> computed.
>>
>> The alignment check is transformed so the loop body has no check In
>> order to eliminate the range check, that loop is transformed into:
>>
>>
>> for (int i1 = ..) {
>> for (int i2 = ..) {
>> long j = (i1 + i2) * UNSAFE.ARRAY_INT_INDEX_SCALE;
>>
>> j = Objects.checkIndex(j, size * 4);
>>
>> v += UNSAFE.getInt(base + j);
>> }
>> }
>>
>>
>> The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
>>
>> In this case, the type of the ConvI2L is [min_jint, max_jint] and type
>> of CastLL is [0, max_jint] (the CastLL has a narrower type).
>>
>> I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
>> case. The convI2L and CastII types can be set to [0, max_jint]. The
>> new address shape is then:
>>
>> (AddP base (ConvI2L (CastII (LShiftI (AddI ...
>>
>> which optimize well.
>>
>> (LShiftI (AddI ...
>> is transformed into
>> (AddI (LShiftI ...
>> because one of the AddI input is loop invariant (i2) and we have:
>>
>> (AddP base (ConvI2L (CastII (AddI (LShiftI ...
>>
>> Then because the ConvI2L and CastII types are [0, max_jint], the AddI
>> is pushed through the ConvI2L and CastII:
>>
>> (AddP base (AddL (ConvI2L (CastII (LShiftI ...
>>
>> base and one of the inputs of the AddL are loop invariant so this
>> transformed into:
>>
>> (AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
>>
>> The (AddP ...) is loop invariant so computed before entry. The
>> (ConvI2L ...) only depends on the loop iv.
>>
>> The resulting address is a shift + an add. The address before
>> transformation requires 2 adds + a shift. Also after unrolling, the
>> adress of the second access in the loop is cheaper to compute as it
>> can be derived from the address of the first access.
>>
>> For all of this to work:
>> 1) I added a CastLL::Ideal transformation:
>> (CastLL (ConvI2L into (ConvI2l (CastII
>>
>> 2) I also had to prevent split if to transform (LShiftI (Phi for the
>> iv Phi of a counted loop.
>>
>>
>> test2 and test3 test 1) and 2) separately.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> review
Looks good to me but I'm wondering if we should delay this to JDK 20 as we are late for the JDK 19 release and had many issues with Cast/ConvNodes before.
src/hotspot/share/opto/castnode.cpp line 374:
> 372: #endif
> 373:
> 374: Node *CastLLNode::Ideal(PhaseGVN *phase, bool can_reshape) {
Suggestion:
Node* CastLLNode::Ideal(PhaseGVN* phase, bool can_reshape) {
src/hotspot/share/opto/castnode.hpp line 121:
> 119: }
> 120:
> 121: virtual Node *Ideal(PhaseGVN *phase, bool can_reshape);
Suggestion:
virtual Node* Ideal(PhaseGVN* phase, bool can_reshape);
test/hotspot/jtreg/compiler/c2/irTests/TestConvI2LCastLongLoop.java line 34:
> 32: /*
> 33: * @test
> 34: * @bug 8286197
Suggestion:
* @bug 8286197
* @key randomness
-------------
Marked as reviewed by thartmann (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/8555
More information about the hotspot-compiler-dev
mailing list