RFR: 8259609: C2: optimize long range checks in long counted loops [v12]
Tobias Hartmann
thartmann at openjdk.java.net
Fri Oct 22 07:07:08 UTC 2021
On Thu, 21 Oct 2021 06:50:41 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> JDK-8255150 makes it possible for java code to explicitly perform a
>> range check on long values. JDK-8223051 provides a transformation of
>> long counted loops into loop nests with an inner int counted
>> loop. With this change I propose transforming long range checks that
>> operate on the iv of a long counted loop into range checks that
>> operate on the iv of the int inner loop once it has been
>> created. Existing range check eliminations can then kick in.
>>
>> Transformation of range checks is piggy backed on the loop nest
>> creation for 2 reasons:
>>
>> - pattern matching range checks is easier right before the loop nest
>> is created
>>
>> - the number of iterations of the inner loop is adjusted so scale *
>> inner_iv doesn't overflow
>>
>> C2 has logic to delay some split if transformations so they don't
>> break the scale * iv + offset pattern. I reused that logic for long
>> range checks and had to relax what's considered a range check because
>> initially a range check from Object.checkIndex() may include a test
>> for range > 0 that needs a round of loop opts to be hoisted. I realize
>> there's some code duplication but I didn't see a way to share logic
>> between IdealLoopTree::may_have_range_check()
>> IdealLoopTree::policy_range_check() that would feel right.
>>
>> I realize the comment in PhaseIdealLoop::transform_long_range_checks()
>> is scary. FWIW, it's not as complicated as it looks. I found drawing
>> the range covered by the entire long loop and the range covered by the
>> inner loop help see how range checks can be transformed. Then the
>> comment helps make sure all cases are covered and verify the generated
>> code actually covers all of them.
>>
>> One issue is overflow. I think the fact that inner_iv * scale doesn't
>> overflow helps simplify thing. One possible overflow is that of scale
>> * upper + offset which is handled by forcing all range checks in that
>> case to deoptimize. I don't think other case of overflow needs special
>> handling.
>>
>> This was tested with a Memory Segment micro benchmark (and patched
>> Memory Segment support to take advantage of the new checkIndex
>> intrinsic, both provided by Maurizio). Range checks in the micro
>> benchmark are properly optimized (and performance increases
>> significantly).
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> build fix
Looks good to me overall but I did not verify the `transform_long_range_checks` logic in detail.
I gave this a good amount of testing in our infra (tier 1-6). All green.
src/hotspot/share/opto/loopnode.hpp line 1657:
> 1655: void try_sink_out_of_loop(Node* n);
> 1656:
> 1657: Node* clamp(Node* pNode, Node* pNode1, Node* pNode2);
Argument naming is not consistent with the implementation.
-------------
Marked as reviewed by thartmann (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/2045
More information about the hotspot-compiler-dev
mailing list