RFR: 8276116: C2: optimize long range checks in int counted loops [v3]
Vladimir Kozlov
kvn at openjdk.java.net
Mon Dec 6 21:25:13 UTC 2021
On Mon, 29 Nov 2021 10:26:40 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> Maurizio noticed that some of his panama micro benchmarks don't
>> perform better avec 8259609 (C2: optimize long range checks in long
>> counted loops). The reason is that 8259609 optimizes long range checks
>> in long counted loops but some of his benchmarks include long range
>> checks in int counted loops:
>>
>> for (int i = start; i < stop; i += inc) {
>> Objects.checkIndex(scale * ((long)i) + offset, length);
>> }
>>
>> This change applies the transformation from 8259609 for long counted
>> loop/long range checks to int counted loop/long range checks. That
>> includes creating a loop nest and transforming the long range check to
>> an int range check that's subject to range elimination in the inner
>> loop.
>>
>> The reason it's required to create a loop nest is that the long range
>> check transformation logic depends on no overflow of scale * i for the
>> range of values that the transformed range check is applied to.
>>
>> As a consequence, this change is mostly refactoring to make the loop
>> nest creation and range check transformation parameterized by the type
>> of the transformed loop.
>>
>> I think this transformation needs to be applied as late as possible
>> but, in the case of an int counted loop, before pre/main/post loops
>> are created. I had to move it to IdealLoopTree::iteration_split_impl()
>> because of that.
>>
>> There's an alternate shape for a long range check in an int counted
>> loop that Maurizio insisted needs to be supported:
>>
>> for (int i = start; i < stop; i += inc) {
>> Objects.checkIndex(((long)(scale * i)) + offset, length);
>> }
>>
>> scale * i can overflow in that case. This is also supported but as a
>> corner case of the previous one. The code in
>> PhaseIdealLoop::transform_long_range_checks() has a comment about
>> that.
>>
>> Note also that this transformation works best if loop strip mining is
>> enabled (that is for G1, ZGC, Shenandoah by default). The reason is
>> that it needs a safepoint and when loop strip mining is enabled, the
>> outer loop contains one that's always available. A way to have this
>> work as well for all GCs would be to always construct the loop strip
>> mining loop nest (whether loop strip mining is enabled or not) and
>> then only once loop opts are over remove the outer loop when loop
>> strip mining is disabled. I'm looking for feedback on this.
>>
>> BTW, something doesn't seem right in IdealLoopTree::iteration_split_impl():
>>
>> https://github.com/rwestrel/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L3475
>>
>> should_peel causes transformations to be skipped but peeling is never
>> applied AFAICT. Does it make sense to anyone?
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> test fix
In general looks good to me.
src/hotspot/cpu/x86/x86_32.ad line 13132:
> 13130: %}
> 13131:
> 13132: instruct cmovLL_reg_LTGE_U(cmpOpU cmp, flagsReg_ulong_LTGE flags, eRegL dst, eRegL src) %{
How it is related to these changes? Seems like addition to [8277324](https://github.com/openjdk/jdk/pull/6427) changes. Could be pushed separately.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6576
More information about the hotspot-compiler-dev
mailing list