RFR: 8289996: Fix array range check hoisting for some scaled loop iv

Sun Jul 17 08:32:01 UTC 2022

On Sun, 17 Jul 2022 06:54:30 GMT, Quan Anh Mai <duke at openjdk.org> wrote:

>> Recently we found some array range checks in loops are not hoisted by
>> C2's loop predication phase as expected. Below is a typical case.
>> 
>>   for (int i = 0; i < size; i++) {
>>     b[3 * i] = a[3 * i];
>>   }
>> 
>> Ideally, C2 can hoist the range check of an array access in loop if the
>> array index is a linear function of the loop's induction variable (iv).
>> Say, range check in `arr[exp]` can be hoisted if
>> 
>>   exp = k1 * iv + k2 + inv
>> 
>> where `k1` and `k2` are compile-time constants, and `inv` is an optional
>> loop invariant. But in above case, C2 igvn does some strength reduction
>> on the `MulINode` used to compute `3 * i`. It results in the linear index
>> expression not being recognized. So far we found 2 ideal transformations
>> that may affect linear expression recognition. They are
>> 
>> - `k * iv` --> `iv << m + iv << n` if k is the sum of 2 pow-of-2 values
>> - `k * iv` --> `iv << m - iv` if k+1 is a pow-of-2 value
>> 
>> To avoid range check hoisting and further optimizations being broken, we
>> have tried improving the linear recognition. But after some experiments,
>> we found complex and recursive pattern match does not always work well.
>> In this patch we propose to defer these 2 ideal transformations to the
>> phase of post loop igvn. In other words, these 2 strength reductions can
>> only be done after all loop optimizations are over.
>> 
>> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
>> We also tested the performance via JMH and see obvious improvement.
>> 
>> Benchmark                        Improvement
>> RangeCheckHoisting.ivScaled3          +21.2%
>> RangeCheckHoisting.ivScaled7           +6.6%
>
> src/hotspot/share/opto/mulnode.cpp line 277:
> 
>> 275:         phase->C->record_for_post_loop_opts_igvn(this);
>> 276:         return NULL;
>> 277:       }
> 
> This defers the whole idealisation, should we only skip this particular transformation instead? Also should this be applied to `MulLNode`, too? Thanks.

I think this only defers the transformation if scale value is a constant and has exactly 2 bits set in binary. Could you elaborate how to make it more particular?

AFAIK, all normal Java array accesses are using 32-bit indices. When running on a 64-bit platform, we use a `ConvI2L` in element address computing but it's done after whole index expression computing. Is there any special array accesses that may use `MulL`?

-------------

PR: https://git.openjdk.org/jdk/pull/9508