C2 produces redundant (?) assembly for while-loop in certain cases (JDK-8278518)

Сергей Цыпанов sergei.tsypanov at yandex.ru
Fri Dec 17 08:27:25 UTC 2021


Hello,

a week ago I filed the issue called JDK-8278518 String(byte[], int, int, Charset) constructor
 and String.translateEscapes() miss bounds check elimination
for the case originally discovered by Amir Hadadi in
https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor

Consider the part of the code of String(byte[], int, int, Charset):
-----------------------------------------------------------------------------------
while (offset < sl) {
    int b1 = bytes[offset];
    if (b1 >= 0) {
        dst[dp++] = (byte)b1;
        offset++; // <---
        continue;
    }
    if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
            offset + 1 < sl) {
        int b2 = bytes[offset + 1];
        if (!isNotContinuation(b2)) {
            dst[dp++] = (byte)decode2(b1, b2);
            offset += 2;
            continue;
        }
    }
    // anything not a latin1, including the repl
    // we have to go with the utf16
    break;
}
-----------------------------------------------------------------------------------
Originally we were sure, that compiler doesn't eliminate bounds check for
accessing byte[] regardless of predictable value of 'offset',
see a part of LinuxPerfAsmProfiler output:
-----------------------------------------------------------------------------------
  3.62%           ││            │  0x00007fed70eb4c1c:   mov    %ebx,%ecx
  2.29%           ││            │  0x00007fed70eb4c1e:   mov    %edx,%r9d
  2.22%           ││            │  0x00007fed70eb4c21:   mov    (%rsp),%r8                   ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
                  ││            │                                                            ; - java.lang.String::<init>@107 (line 537)
  2.32%           ↘│            │  0x00007fed70eb4c25:   cmp    %r13d,%ecx
                   │            │  0x00007fed70eb4c28:   jge    0x00007fed70eb5388           ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                   │            │                                                            ; - java.lang.String::<init>@110 (line 537)
  3.05%            │            │  0x00007fed70eb4c2e:   cmp    0x8(%rsp),%ecx
                   │            │  0x00007fed70eb4c32:   jae    0x00007fed70eb5319
  2.38%            │            │  0x00007fed70eb4c38:   mov    %r8,(%rsp)
  2.64%            │            │  0x00007fed70eb4c3c:   movslq %ecx,%r8
  2.46%            │            │  0x00007fed70eb4c3f:   mov    %rax,%rbx
  3.44%            │            │  0x00007fed70eb4c42:   sub    %r8,%rbx
  2.62%            │            │  0x00007fed70eb4c45:   add    $0x1,%rbx
  2.64%            │            │  0x00007fed70eb4c49:   and    $0xfffffffffffffffe,%rbx
  2.30%            │            │  0x00007fed70eb4c4d:   mov    %ebx,%r8d
  3.08%            │            │  0x00007fed70eb4c50:   add    %ecx,%r8d
  2.55%            │            │  0x00007fed70eb4c53:   movslq %r8d,%r8
  2.45%            │            │  0x00007fed70eb4c56:   add    $0xfffffffffffffffe,%r8
  2.13%            │            │  0x00007fed70eb4c5a:   cmp    (%rsp),%r8
                   │            │  0x00007fed70eb4c5e:   jae    0x00007fed70eb5319
  3.36%            │            │  0x00007fed70eb4c64:   mov    %ecx,%edi                    ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
                   │            │                                                            ; - java.lang.String::<init>@113 (line 538)
  2.86%            │           ↗│  0x00007fed70eb4c66:   movsbl 0x10(%r14,%rdi,1),%r8d       ;*baload {reexecute=0 rethrow=0 return_oop=0}
                   │           ││                                                            ; - java.lang.String::<init>@115 (line 538)
  2.48%            │           ││  0x00007fed70eb4c6c:   mov    %r9d,%edx
  2.26%            │           ││  0x00007fed70eb4c6f:   inc    %edx                         ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                   │           ││                                                            ; - java.lang.String::<init>@127 (line 540)
  3.28%            │           ││  0x00007fed70eb4c71:   mov    %edi,%ebx
  2.44%            │           ││  0x00007fed70eb4c73:   inc    %ebx                         ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                   │           ││                                                            ; - java.lang.String::<init>@134 (line 541)
  2.35%            │           ││  0x00007fed70eb4c75:   test   %r8d,%r8d
                   ╰           ││  0x00007fed70eb4c78:   jge    0x00007fed70eb4c04           ;*iflt {reexecute=0 rethrow=0 return_oop=0}
                               ││                                                            ; - java.lang.String::<init>@120 (line 539)
-----------------------------------------------------------------------------------
If we change while-loop condition from

while (offset < sl)

to

while (offset >= 0 && offset < sl)

the part of the instructions between if_icmpge and aload_1 in baseline disappears:
-----------------------------------------------------------------------------------
 17.28%           ││  0x00007f6b88eb6061:   mov    %edx,%r10d                   ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
                  ││                                                            ; - java.lang.String::<init>@107 (line 537)
  0.11%           ↘│  0x00007f6b88eb6064:   test   %r10d,%r10d
                   │  0x00007f6b88eb6067:   jl     0x00007f6b88eb669c           ;*iflt {reexecute=0 rethrow=0 return_oop=0}
                   │                                                            ; - java.lang.String::<init>@108 (line 537)
  0.39%            │  0x00007f6b88eb606d:   cmp    %r13d,%r10d
                   │  0x00007f6b88eb6070:   jge    0x00007f6b88eb66d0           ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                   │                                                            ; - java.lang.String::<init>@114 (line 537)
  0.66%            │  0x00007f6b88eb6076:   mov    %ebx,%r9d
 13.70%            │  0x00007f6b88eb6079:   cmp    0x8(%rsp),%r10d
  0.01%            │  0x00007f6b88eb607e:   jae    0x00007f6b88eb6671
  0.14%            │  0x00007f6b88eb6084:   movsbl 0x10(%r14,%r10,1),%edi       ;*baload {reexecute=0 rethrow=0 return_oop=0}
                   │                                                            ; - java.lang.String::<init>@119 (line 538)
  0.37%            │  0x00007f6b88eb608a:   mov    %r9d,%ebx
  0.99%            │  0x00007f6b88eb608d:   inc    %ebx                         ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                   │                                                            ; - java.lang.String::<init>@131 (line 540)
 12.88%            │  0x00007f6b88eb608f:   movslq %r9d,%rsi                    ;*bastore {reexecute=0 rethrow=0 return_oop=0}
                   │                                                            ; - java.lang.String::<init>@196 (line 548)
  0.17%            │  0x00007f6b88eb6092:   mov    %r10d,%edx
  0.39%            │  0x00007f6b88eb6095:   inc    %edx                         ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                   │                                                            ; - java.lang.String::<init>@138 (line 541)
  0.96%            │  0x00007f6b88eb6097:   test   %edi,%edi
  0.02%            │  0x00007f6b88eb6099:   jl     0x00007f6b88eb60dc           ;*iflt {reexecute=0 rethrow=0 return_oop=0}
-----------------------------------------------------------------------------------
While this can be fixed on Java side (https://github.com/openjdk/jdk/pull/6812),
the author of SO question pointed out (and I agree) that this is likely to be
a HotSpot compiler issue and should be fixed on JVM side.

Moreover, in the comments to my PR it was mentioned, that my original speculation
is wrong, and bounds check is still here:

13.70%            │  0x00007f6b88eb6079:   cmp    0x8(%rsp),%r10d
 0.01%            │  0x00007f6b88eb607e:   jae    0x00007f6b88eb6671

so the strange assembly code is something redundant.

For this I've got two questions:

1) Is my original theory about bounds check wrong and if it indeed is,
then what is that disappearing assembly part about?

2) Should this be fixed on JVM or Java side?
If we choose JVM then the issue needs to be reassigned
to someone else, because I'm not able
to elaborate and test the proper fix.

Regards,
Sergey Tsypanov


More information about the hotspot-compiler-dev mailing list