RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling
Tobias Hartmann
thartmann at openjdk.org
Thu Sep 8 10:58:59 UTC 2022
On Wed, 7 Sep 2022 15:06:18 GMT, Jie Fu <jiefu at openjdk.org> wrote:
> Hi all,
>
> Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling.
>
> # Background
>
> While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling.
>
> Here is a reproducer.
>
> public class UnexpectedLoopExitDeopt {
> public static final int N = 20000000;
>
> public static int d1[] = new int[N];
> public static int d2[] = new int[N];
>
> public static void main(String[] args) {
> System.out.println(test(d1));
> System.out.println(test(d2));
> }
>
> public static int test(int[] a) {
> int sum = 0;
> for(int i = 0; i < a.length; i++) {
> sum += a[i];
> }
> return sum;
> }
> }
>
>
> The following is the compilation sequence.
>
> 77 1 3 java.lang.Object::<init> (1 bytes)
> 83 2 3 java.lang.String::isLatin1 (19 bytes)
> 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes)
> 84 3 3 java.lang.String::charAt (25 bytes)
> 85 4 3 java.lang.StringLatin1::charAt (15 bytes)
> 86 7 3 java.lang.String::coder (15 bytes)
> 86 8 3 java.lang.String::hashCode (60 bytes)
> 87 5 3 java.lang.String::checkIndex (10 bytes)
> 87 9 3 java.lang.String::length (11 bytes)
> 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static)
> 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static)
> 96 12 n 0 java.lang.Object::hashCode (native)
> 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native)
> 98 14 3 java.util.Objects::requireNonNull (14 bytes)
> 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static)
> 98 16 1 java.lang.Enum::ordinal (5 bytes)
> 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static)
> 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native)
> 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes)
> 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes)
> 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant
> 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes)
> 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt
> 0
> 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt
> 0
>
>
> The last two deopts (made not entrant) happened in the loop exit which are unexpected.
>
>
> # Reason
>
> The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate).
>
> Here is the profiling data for `UnexpectedLoopExitDeopt::test`.
> We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`.
> The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`).
> So the taken count should be at least 1 for `if_icmpge` @ bci=7.
>
> 0 iconst_0
> 1 istore_1
> 2 iconst_0
> 3 istore_2
>
> 4 iload_2
> 5 fast_aload_0
> 6 arraylength
> 7 if_icmpge 22
> 0 bci: 7 BranchData taken(0) displacement(56)
> not taken(264957)
>
> 10 iload_1
> 11 fast_aload_0
> 12 iload_2
> 13 iaload
> 14 iadd
> 15 istore_1
> 16 iinc #2 1
> 19 goto 4
> 32 bci: 19 JumpData taken(266667) displacement(-32)
>
> 22 iload_1
> 23 ireturn
>
>
> # Fix
>
> The main idea is to detect if the branch taken target is a loop exit.
> If so, set the taken count to be at least 1.
> This is fine because most loops should be finite and would execute the loop exit code at lease once.
> For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit.
>
> # Testing
>
> tier1~3 on Linux/x64, no regression
>
> Thanks.
> Best regards,
> Jie
Do these deoptimizations really affect performance of your program or did you just spot them when looking at the logs?
Such surprising deopts are actually expected with optimistic, profile guided optimizations and happen in many other scenarios as well. They are usually harmless. Also, the profile information is not necessarily incorrect but might just be outdated because we stop profiling once we reach C2. Marking all loop exits as taken seems hacky and might have unexpected side effects.
Also, wouldn't C2 still insert a `Deoptimization::Reason_unreached` or `Deoptimization::Reason_unstable_if` trap for subsequent instructions after the loop exit for which profiling also suggests that they were never executed?
src/hotspot/share/ci/ciMethodBlocks.cpp line 166:
> 164: if (dest_bci < bci) {
> 165: next_block->set_is_loop_exit();
> 166: }
I think this loop detection logic is both wrong and incomplete. For example, javac generates no `goto` for the following loop:
int i = 0;
do {
} while (i++ < 10);
And in the following case, the block after the `goto` corresponding to the `continue` statement is not the loop exit:
int i = 0;
label:
while (true) {
i++;
if (i == 1)
continue label;
if (i == 2)
break;
}
I just quickly hacked this, there are probably better examples.
-------------
Changes requested by thartmann (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10200
More information about the hotspot-compiler-dev
mailing list