RFR: 8289996: Fix array range check hoisting for some scaled loop iv
Pengfei Li
pli at openjdk.org
Sun Jul 17 08:32:01 UTC 2022
On Sun, 17 Jul 2022 06:54:30 GMT, Quan Anh Mai <duke at openjdk.org> wrote:
>> Recently we found some array range checks in loops are not hoisted by
>> C2's loop predication phase as expected. Below is a typical case.
>>
>> for (int i = 0; i < size; i++) {
>> b[3 * i] = a[3 * i];
>> }
>>
>> Ideally, C2 can hoist the range check of an array access in loop if the
>> array index is a linear function of the loop's induction variable (iv).
>> Say, range check in `arr[exp]` can be hoisted if
>>
>> exp = k1 * iv + k2 + inv
>>
>> where `k1` and `k2` are compile-time constants, and `inv` is an optional
>> loop invariant. But in above case, C2 igvn does some strength reduction
>> on the `MulINode` used to compute `3 * i`. It results in the linear index
>> expression not being recognized. So far we found 2 ideal transformations
>> that may affect linear expression recognition. They are
>>
>> - `k * iv` --> `iv << m + iv << n` if k is the sum of 2 pow-of-2 values
>> - `k * iv` --> `iv << m - iv` if k+1 is a pow-of-2 value
>>
>> To avoid range check hoisting and further optimizations being broken, we
>> have tried improving the linear recognition. But after some experiments,
>> we found complex and recursive pattern match does not always work well.
>> In this patch we propose to defer these 2 ideal transformations to the
>> phase of post loop igvn. In other words, these 2 strength reductions can
>> only be done after all loop optimizations are over.
>>
>> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
>> We also tested the performance via JMH and see obvious improvement.
>>
>> Benchmark Improvement
>> RangeCheckHoisting.ivScaled3 +21.2%
>> RangeCheckHoisting.ivScaled7 +6.6%
>
> src/hotspot/share/opto/mulnode.cpp line 277:
>
>> 275: phase->C->record_for_post_loop_opts_igvn(this);
>> 276: return NULL;
>> 277: }
>
> This defers the whole idealisation, should we only skip this particular transformation instead? Also should this be applied to `MulLNode`, too? Thanks.
I think this only defers the transformation if scale value is a constant and has exactly 2 bits set in binary. Could you elaborate how to make it more particular?
AFAIK, all normal Java array accesses are using 32-bit indices. When running on a 64-bit platform, we use a `ConvI2L` in element address computing but it's done after whole index expression computing. Is there any special array accesses that may use `MulL`?
-------------
PR: https://git.openjdk.org/jdk/pull/9508
More information about the hotspot-compiler-dev
mailing list