C2 produces redundant (?) assembly for while-loop in certain cases (JDK-8278518)
Сергей Цыпанов
sergei.tsypanov at yandex.ru
Fri Dec 17 08:27:25 UTC 2021
Hello,
a week ago I filed the issue called JDK-8278518 String(byte[], int, int, Charset) constructor
and String.translateEscapes() miss bounds check elimination
for the case originally discovered by Amir Hadadi in
https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor
Consider the part of the code of String(byte[], int, int, Charset):
-----------------------------------------------------------------------------------
while (offset < sl) {
int b1 = bytes[offset];
if (b1 >= 0) {
dst[dp++] = (byte)b1;
offset++; // <---
continue;
}
if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
offset + 1 < sl) {
int b2 = bytes[offset + 1];
if (!isNotContinuation(b2)) {
dst[dp++] = (byte)decode2(b1, b2);
offset += 2;
continue;
}
}
// anything not a latin1, including the repl
// we have to go with the utf16
break;
}
-----------------------------------------------------------------------------------
Originally we were sure, that compiler doesn't eliminate bounds check for
accessing byte[] regardless of predictable value of 'offset',
see a part of LinuxPerfAsmProfiler output:
-----------------------------------------------------------------------------------
3.62% ││ │ 0x00007fed70eb4c1c: mov %ebx,%ecx
2.29% ││ │ 0x00007fed70eb4c1e: mov %edx,%r9d
2.22% ││ │ 0x00007fed70eb4c21: mov (%rsp),%r8 ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
││ │ ; - java.lang.String::<init>@107 (line 537)
2.32% ↘│ │ 0x00007fed70eb4c25: cmp %r13d,%ecx
│ │ 0x00007fed70eb4c28: jge 0x00007fed70eb5388 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - java.lang.String::<init>@110 (line 537)
3.05% │ │ 0x00007fed70eb4c2e: cmp 0x8(%rsp),%ecx
│ │ 0x00007fed70eb4c32: jae 0x00007fed70eb5319
2.38% │ │ 0x00007fed70eb4c38: mov %r8,(%rsp)
2.64% │ │ 0x00007fed70eb4c3c: movslq %ecx,%r8
2.46% │ │ 0x00007fed70eb4c3f: mov %rax,%rbx
3.44% │ │ 0x00007fed70eb4c42: sub %r8,%rbx
2.62% │ │ 0x00007fed70eb4c45: add $0x1,%rbx
2.64% │ │ 0x00007fed70eb4c49: and $0xfffffffffffffffe,%rbx
2.30% │ │ 0x00007fed70eb4c4d: mov %ebx,%r8d
3.08% │ │ 0x00007fed70eb4c50: add %ecx,%r8d
2.55% │ │ 0x00007fed70eb4c53: movslq %r8d,%r8
2.45% │ │ 0x00007fed70eb4c56: add $0xfffffffffffffffe,%r8
2.13% │ │ 0x00007fed70eb4c5a: cmp (%rsp),%r8
│ │ 0x00007fed70eb4c5e: jae 0x00007fed70eb5319
3.36% │ │ 0x00007fed70eb4c64: mov %ecx,%edi ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - java.lang.String::<init>@113 (line 538)
2.86% │ ↗│ 0x00007fed70eb4c66: movsbl 0x10(%r14,%rdi,1),%r8d ;*baload {reexecute=0 rethrow=0 return_oop=0}
│ ││ ; - java.lang.String::<init>@115 (line 538)
2.48% │ ││ 0x00007fed70eb4c6c: mov %r9d,%edx
2.26% │ ││ 0x00007fed70eb4c6f: inc %edx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ││ ; - java.lang.String::<init>@127 (line 540)
3.28% │ ││ 0x00007fed70eb4c71: mov %edi,%ebx
2.44% │ ││ 0x00007fed70eb4c73: inc %ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ││ ; - java.lang.String::<init>@134 (line 541)
2.35% │ ││ 0x00007fed70eb4c75: test %r8d,%r8d
╰ ││ 0x00007fed70eb4c78: jge 0x00007fed70eb4c04 ;*iflt {reexecute=0 rethrow=0 return_oop=0}
││ ; - java.lang.String::<init>@120 (line 539)
-----------------------------------------------------------------------------------
If we change while-loop condition from
while (offset < sl)
to
while (offset >= 0 && offset < sl)
the part of the instructions between if_icmpge and aload_1 in baseline disappears:
-----------------------------------------------------------------------------------
17.28% ││ 0x00007f6b88eb6061: mov %edx,%r10d ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
││ ; - java.lang.String::<init>@107 (line 537)
0.11% ↘│ 0x00007f6b88eb6064: test %r10d,%r10d
│ 0x00007f6b88eb6067: jl 0x00007f6b88eb669c ;*iflt {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.String::<init>@108 (line 537)
0.39% │ 0x00007f6b88eb606d: cmp %r13d,%r10d
│ 0x00007f6b88eb6070: jge 0x00007f6b88eb66d0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.String::<init>@114 (line 537)
0.66% │ 0x00007f6b88eb6076: mov %ebx,%r9d
13.70% │ 0x00007f6b88eb6079: cmp 0x8(%rsp),%r10d
0.01% │ 0x00007f6b88eb607e: jae 0x00007f6b88eb6671
0.14% │ 0x00007f6b88eb6084: movsbl 0x10(%r14,%r10,1),%edi ;*baload {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.String::<init>@119 (line 538)
0.37% │ 0x00007f6b88eb608a: mov %r9d,%ebx
0.99% │ 0x00007f6b88eb608d: inc %ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.String::<init>@131 (line 540)
12.88% │ 0x00007f6b88eb608f: movslq %r9d,%rsi ;*bastore {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.String::<init>@196 (line 548)
0.17% │ 0x00007f6b88eb6092: mov %r10d,%edx
0.39% │ 0x00007f6b88eb6095: inc %edx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.String::<init>@138 (line 541)
0.96% │ 0x00007f6b88eb6097: test %edi,%edi
0.02% │ 0x00007f6b88eb6099: jl 0x00007f6b88eb60dc ;*iflt {reexecute=0 rethrow=0 return_oop=0}
-----------------------------------------------------------------------------------
While this can be fixed on Java side (https://github.com/openjdk/jdk/pull/6812),
the author of SO question pointed out (and I agree) that this is likely to be
a HotSpot compiler issue and should be fixed on JVM side.
Moreover, in the comments to my PR it was mentioned, that my original speculation
is wrong, and bounds check is still here:
13.70% │ 0x00007f6b88eb6079: cmp 0x8(%rsp),%r10d
0.01% │ 0x00007f6b88eb607e: jae 0x00007f6b88eb6671
so the strange assembly code is something redundant.
For this I've got two questions:
1) Is my original theory about bounds check wrong and if it indeed is,
then what is that disappearing assembly part about?
2) Should this be fixed on JVM or Java side?
If we choose JVM then the issue needs to be reassigned
to someone else, because I'm not able
to elaborate and test the proper fix.
Regards,
Sergey Tsypanov
More information about the hotspot-compiler-dev
mailing list