RFR: 8259609: C2: optimize long range checks in long counted loops

Tobias Hartmann thartmann at openjdk.java.net
Wed Jan 13 10:47:55 UTC 2021


On Tue, 12 Jan 2021 10:15:01 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> JDK-8255150 makes it possible for java code to explicitly perform a
> range check on long values. JDK-8223051 provides a transformation of
> long counted loops into loop nests with an inner int counted
> loop. With this change I propose transforming long range checks that
> operate on the iv of a long counted loop into range checks that
> operate on the iv of the int inner loop once it has been
> created. Existing range check eliminations can then kick in.
> 
> Transformation of range checks is piggy backed on the loop nest
> creation for 2 reasons:
> 
> - pattern matching range checks is easier right before the loop nest
>   is created
> 
> - the number of iterations of the inner loop is adjusted so scale *
>   inner_iv doesn't overflow
> 
> C2 has logic to delay some split if transformations so they don't
> break the scale * iv + offset pattern. I reused that logic for long
> range checks and had to relax what's considered a range check because
> initially a range check from Object.checkIndex() may include a test
> for range > 0 that needs a round of loop opts to be hoisted. I realize
> there's some code duplication but I didn't see a way to share logic
> between IdealLoopTree::may_have_range_check()
> IdealLoopTree::policy_range_check() that would feel right.
> 
> I realize the comment in PhaseIdealLoop::transform_long_range_checks()
> is scary. FWIW, it's not as complicated as it looks. I found drawing
> the range covered by the entire long loop and the range covered by the
> inner loop help see how range checks can be transformed. Then the
> comment helps make sure all cases are covered and verify the generated
> code actually covers all of them.
> 
> One issue is overflow. I think the fact that inner_iv * scale doesn't
> overflow helps simplify thing. One possible overflow is that of scale
> * upper + offset which is handled by forcing all range checks in that
> case to deoptimize. I don't think other case of overflow needs special
> handling.
> 
> This was tested with a Memory Segment micro benchmark (and patched
> Memory Segment support to take advantage of the new checkIndex
> intrinsic, both provided by Maurizio). Range checks in the micro
> benchmark are properly optimized (and performance increases
> significantly).

Hi Roland,

I haven't looked at the code in detail yet but submitted some quick testing. Unfortunately, there are failures with an internal test that I'm unable to share. Some details:

#  Internal Error (src/hotspot/share/utilities/globalDefinitions.hpp:460), pid=27250, tid=27265
#  assert(static_cast<T1>(result) == thing) failed: must be

Stack: [0x00007fca9cdfa000,0x00007fca9cefb000],  sp=0x00007fca9cef5360,  free space=1004k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x12f5a3e]  IdealLoopTree::policy_range_check(PhaseIdealLoop*) const [clone .part.0]+0x33e
V  [libjvm.so+0x1301e4f]  IdealLoopTree::iteration_split_impl(PhaseIdealLoop*, Node_List&) [clone .part.0]+0x6ef
V  [libjvm.so+0x13021bb]  IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x13b
V  [libjvm.so+0x13020b5]  IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x35
V  [libjvm.so+0x1330633]  PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0xc83
V  [libjvm.so+0xa1cbdb]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x32b
V  [libjvm.so+0xa18636]  Compile::Optimize()+0x586
V  [libjvm.so+0xa1b370]  Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1840
V  [libjvm.so+0x84b1bc]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
V  [libjvm.so+0xa2b3a8]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe88
V  [libjvm.so+0xa2bff8]  CompileBroker::compiler_thread_loop()+0x5a8
V  [libjvm.so+0x18b06c6]  JavaThread::thread_main_inner()+0x256
V  [libjvm.so+0x18b70d0]  Thread::call_run()+0x100
V  [libjvm.so+0x1599dd6]  thread_native_entry(Thread*)+0x116

Hope that helps.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2045


More information about the hotspot-compiler-dev mailing list