RFR: 8259609: C2: optimize long range checks in long counted loops
Tobias Hartmann
thartmann at openjdk.java.net
Wed Jan 13 10:47:55 UTC 2021
On Tue, 12 Jan 2021 10:15:01 GMT, Roland Westrelin <roland at openjdk.org> wrote:
> JDK-8255150 makes it possible for java code to explicitly perform a
> range check on long values. JDK-8223051 provides a transformation of
> long counted loops into loop nests with an inner int counted
> loop. With this change I propose transforming long range checks that
> operate on the iv of a long counted loop into range checks that
> operate on the iv of the int inner loop once it has been
> created. Existing range check eliminations can then kick in.
>
> Transformation of range checks is piggy backed on the loop nest
> creation for 2 reasons:
>
> - pattern matching range checks is easier right before the loop nest
> is created
>
> - the number of iterations of the inner loop is adjusted so scale *
> inner_iv doesn't overflow
>
> C2 has logic to delay some split if transformations so they don't
> break the scale * iv + offset pattern. I reused that logic for long
> range checks and had to relax what's considered a range check because
> initially a range check from Object.checkIndex() may include a test
> for range > 0 that needs a round of loop opts to be hoisted. I realize
> there's some code duplication but I didn't see a way to share logic
> between IdealLoopTree::may_have_range_check()
> IdealLoopTree::policy_range_check() that would feel right.
>
> I realize the comment in PhaseIdealLoop::transform_long_range_checks()
> is scary. FWIW, it's not as complicated as it looks. I found drawing
> the range covered by the entire long loop and the range covered by the
> inner loop help see how range checks can be transformed. Then the
> comment helps make sure all cases are covered and verify the generated
> code actually covers all of them.
>
> One issue is overflow. I think the fact that inner_iv * scale doesn't
> overflow helps simplify thing. One possible overflow is that of scale
> * upper + offset which is handled by forcing all range checks in that
> case to deoptimize. I don't think other case of overflow needs special
> handling.
>
> This was tested with a Memory Segment micro benchmark (and patched
> Memory Segment support to take advantage of the new checkIndex
> intrinsic, both provided by Maurizio). Range checks in the micro
> benchmark are properly optimized (and performance increases
> significantly).
Hi Roland,
I haven't looked at the code in detail yet but submitted some quick testing. Unfortunately, there are failures with an internal test that I'm unable to share. Some details:
# Internal Error (src/hotspot/share/utilities/globalDefinitions.hpp:460), pid=27250, tid=27265
# assert(static_cast<T1>(result) == thing) failed: must be
Stack: [0x00007fca9cdfa000,0x00007fca9cefb000], sp=0x00007fca9cef5360, free space=1004k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x12f5a3e] IdealLoopTree::policy_range_check(PhaseIdealLoop*) const [clone .part.0]+0x33e
V [libjvm.so+0x1301e4f] IdealLoopTree::iteration_split_impl(PhaseIdealLoop*, Node_List&) [clone .part.0]+0x6ef
V [libjvm.so+0x13021bb] IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x13b
V [libjvm.so+0x13020b5] IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x35
V [libjvm.so+0x1330633] PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0xc83
V [libjvm.so+0xa1cbdb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x32b
V [libjvm.so+0xa18636] Compile::Optimize()+0x586
V [libjvm.so+0xa1b370] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1840
V [libjvm.so+0x84b1bc] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
V [libjvm.so+0xa2b3a8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe88
V [libjvm.so+0xa2bff8] CompileBroker::compiler_thread_loop()+0x5a8
V [libjvm.so+0x18b06c6] JavaThread::thread_main_inner()+0x256
V [libjvm.so+0x18b70d0] Thread::call_run()+0x100
V [libjvm.so+0x1599dd6] thread_native_entry(Thread*)+0x116
Hope that helps.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2045
More information about the hotspot-compiler-dev
mailing list