RFR: 8289996: Fix array range check hoisting for some scaled loop iv [v2]
John R Rose
jrose at openjdk.org
Wed Jul 20 22:40:02 UTC 2022
On Tue, 19 Jul 2022 02:06:17 GMT, Pengfei Li <pli at openjdk.org> wrote:
>> Recently we found some array range checks in loops are not hoisted by
>> C2's loop predication phase as expected. Below is a typical case.
>>
>> for (int i = 0; i < size; i++) {
>> b[3 * i] = a[3 * i];
>> }
>>
>> Ideally, C2 can hoist the range check of an array access in loop if the
>> array index is a linear function of the loop's induction variable (iv).
>> Say, range check in `arr[exp]` can be hoisted if
>>
>> exp = k1 * iv + k2 + inv
>>
>> where `k1` and `k2` are compile-time constants, and `inv` is an optional
>> loop invariant. But in above case, C2 igvn does some strength reduction
>> on the `MulINode` used to compute `3 * i`. It results in the linear index
>> expression not being recognized. So far we found 2 ideal transformations
>> that may affect linear expression recognition. They are
>>
>> - `k * iv` --> `iv << m + iv << n` if k is the sum of 2 pow-of-2 values
>> - `k * iv` --> `iv << m - iv` if k+1 is a pow-of-2 value
>>
>> To avoid range check hoisting and further optimizations being broken, we
>> have tried improving the linear recognition. But after some experiments,
>> we found complex and recursive pattern match does not always work well.
>> In this patch we propose to defer these 2 ideal transformations to the
>> phase of post loop igvn. In other words, these 2 strength reductions can
>> only be done after all loop optimizations are over.
>>
>> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
>> We also tested the performance via JMH and see obvious improvement.
>>
>> Benchmark Improvement
>> RangeCheckHoisting.ivScaled3 +21.2%
>> RangeCheckHoisting.ivScaled7 +6.6%
>
> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision:
>
> Address comment from merykitty
src/hotspot/share/opto/mulnode.cpp line 369:
> 367: Node *res = NULL;
> 368: julong bit1 = abs_con & (0-abs_con); // Extract low bit
> 369: if (bit1 == abs_con) { // Found a power of 2?
May I suggest, since you are working on this code, maybe factoring out the idiom `x & -x`? (Sadly, I think there may be multiple opinions about exactly where to name this idiom, which might prevent us from starting the fix. If that's the case I suggest a follow-up bug.)
diff --git a/src/hotspot/share/utilities/powerOfTwo.hpp b/src/hotspot/share/utilities/powerOfTwo.hpp
index a98b81e8037..83713d373ee 100644
--- a/src/hotspot/share/utilities/powerOfTwo.hpp
+++ b/src/hotspot/share/utilities/powerOfTwo.hpp
@@ -119,4 +119,14 @@ inline T next_power_of_2(T value) {
return round_up_power_of_2(value + 1);
}
+// Return the largest power of two that is a submultiple of the given value.
+// This is the same as the numeric value of the least-significant set bit.
+// For unsigned values, it replaces the old trick of (value & -value).
+// precondition: value > 0.
+template<typename T, ENABLE_IF(std::is_integral<T>::value)>
+inline T submultiple_power_of_2(T value) {
+ assert(value > 0, "Invalid value");
+ return value & -value;
+}
+
#endif // SHARE_UTILITIES_POWEROFTWO_HPP
src/hotspot/share/opto/mulnode.cpp line 376:
> 374: bit2 = bit2 & (0-bit2); // Extract 2nd bit
> 375: if (bit2 + bit1 == abs_con) { // Found all bits in con?
> 376: if (!phase->C->post_loop_opts_phase()) {
Thanks or adding the MulL case. Nowadays the range check optimizations don't apply only to arrays, and apply to both int and long indexes, mainly because of Panama.
-------------
PR: https://git.openjdk.org/jdk/pull/9508
More information about the hotspot-compiler-dev
mailing list