RFR: 8278518: String(byte[], int, int, Charset) constructor and String.translateEscapes() miss bounds check elimination
Alan Bateman
alanb at openjdk.java.net
Tue Dec 14 13:24:06 UTC 2021
On Mon, 13 Dec 2021 09:55:36 GMT, Alan Bateman <alanb at openjdk.org> wrote:
>> Originally this was spotted by by Amir Hadadi in https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor
>>
>> It looks like in the following code in `String(byte[], int, int, Charset)`
>>
>> while (offset < sl) {
>> int b1 = bytes[offset];
>> if (b1 >= 0) {
>> dst[dp++] = (byte)b1;
>> offset++; // <---
>> continue;
>> }
>> if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
>> offset + 1 < sl) {
>> int b2 = bytes[offset + 1];
>> if (!isNotContinuation(b2)) {
>> dst[dp++] = (byte)decode2(b1, b2);
>> offset += 2;
>> continue;
>> }
>> }
>> // anything not a latin1, including the repl
>> // we have to go with the utf16
>> break;
>> }
>>
>> bounds check elimination is not executed when accessing byte array via `bytes[offset].
>>
>> The reason, I guess, is that offset variable is modified within the loop (marked with arrow).
>>
>> Possible fix for this could be changing:
>>
>> `while (offset < sl)` ---> `while (offset >= 0 && offset < sl)`
>>
>> However the best is to invest in C2 optimization to handle all such cases.
>>
>> The following benchmark demonstrates good improvement:
>>
>> @State(Scope.Thread)
>> @BenchmarkMode(Mode.AverageTime)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> public class StringConstructorBenchmark {
>> private byte[] array;
>> private String str;
>>
>> @Setup
>> public void setup() {
>> str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. Я";//Latin1 ending with Russian
>> array = str.getBytes(StandardCharsets.UTF_8);
>> }
>>
>> @Benchmark
>> public String newString() {
>> return new String(array, 0, array.length, StandardCharsets.UTF_8);
>> }
>>
>> @Benchmark
>> public String translateEscapes() {
>> return str.translateEscapes();
>> }
>> }
>>
>> Results:
>>
>> //baseline
>> Benchmark Mode Cnt Score Error Units
>> StringConstructorBenchmark.newString avgt 50 173,092 ± 3,048 ns/op
>>
>> //patched
>> Benchmark Mode Cnt Score Error Units
>> StringConstructorBenchmark.newString avgt 50 126,908 ± 2,355 ns/op
>>
>> The same is observed in String.translateEscapes() for the same String as in the benchmark above:
>>
>> //baseline
>> Benchmark Mode Cnt Score Error Units
>> StringConstructorBenchmark.translateEscapes avgt 100 53,627 ± 0,850 ns/op
>>
>> //patched
>> Benchmark Mode Cnt Score Error Units
>> StringConstructorBenchmark.translateEscapes avgt 100 48,087 ± 1,129 ns/op
>>
>> Also I've looked into this with `LinuxPerfAsmProfiler`, full output for baseline is available here https://gist.github.com/stsypanov/d2524f98477d633fb1d4a2510fedeea6 and for patched code here https://gist.github.com/stsypanov/16c787e4f9fa3dd122522f16331b68b7.
>>
>> Here's the part of baseline assembly responsible for `while` loop:
>>
>> 3.62% ││ │ 0x00007fed70eb4c1c: mov %ebx,%ecx
>> 2.29% ││ │ 0x00007fed70eb4c1e: mov %edx,%r9d
>> 2.22% ││ │ 0x00007fed70eb4c21: mov (%rsp),%r8 ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
>> ││ │ ; - java.lang.String::<init>@107 (line 537)
>> 2.32% ↘│ │ 0x00007fed70eb4c25: cmp %r13d,%ecx
>> │ │ 0x00007fed70eb4c28: jge 0x00007fed70eb5388 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>> │ │ ; - java.lang.String::<init>@110 (line 537)
>> 3.05% │ │ 0x00007fed70eb4c2e: cmp 0x8(%rsp),%ecx
>> │ │ 0x00007fed70eb4c32: jae 0x00007fed70eb5319
>> 2.38% │ │ 0x00007fed70eb4c38: mov %r8,(%rsp)
>> 2.64% │ │ 0x00007fed70eb4c3c: movslq %ecx,%r8
>> 2.46% │ │ 0x00007fed70eb4c3f: mov %rax,%rbx
>> 3.44% │ │ 0x00007fed70eb4c42: sub %r8,%rbx
>> 2.62% │ │ 0x00007fed70eb4c45: add $0x1,%rbx
>> 2.64% │ │ 0x00007fed70eb4c49: and $0xfffffffffffffffe,%rbx
>> 2.30% │ │ 0x00007fed70eb4c4d: mov %ebx,%r8d
>> 3.08% │ │ 0x00007fed70eb4c50: add %ecx,%r8d
>> 2.55% │ │ 0x00007fed70eb4c53: movslq %r8d,%r8
>> 2.45% │ │ 0x00007fed70eb4c56: add $0xfffffffffffffffe,%r8
>> 2.13% │ │ 0x00007fed70eb4c5a: cmp (%rsp),%r8
>> │ │ 0x00007fed70eb4c5e: jae 0x00007fed70eb5319
>> 3.36% │ │ 0x00007fed70eb4c64: mov %ecx,%edi ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
>> │ │ ; - java.lang.String::<init>@113 (line 538)
>> 2.86% │ ↗│ 0x00007fed70eb4c66: movsbl 0x10(%r14,%rdi,1),%r8d ;*baload {reexecute=0 rethrow=0 return_oop=0}
>> │ ││ ; - java.lang.String::<init>@115 (line 538)
>> 2.48% │ ││ 0x00007fed70eb4c6c: mov %r9d,%edx
>> 2.26% │ ││ 0x00007fed70eb4c6f: inc %edx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> │ ││ ; - java.lang.String::<init>@127 (line 540)
>> 3.28% │ ││ 0x00007fed70eb4c71: mov %edi,%ebx
>> 2.44% │ ││ 0x00007fed70eb4c73: inc %ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> │ ││ ; - java.lang.String::<init>@134 (line 541)
>> 2.35% │ ││ 0x00007fed70eb4c75: test %r8d,%r8d
>> ╰ ││ 0x00007fed70eb4c78: jge 0x00007fed70eb4c04 ;*iflt {reexecute=0 rethrow=0 return_oop=0}
>> ││ ; - java.lang.String::<init>@120 (line 539)
>>
>> and this one is for patched code:
>>
>> 17.28% ││ 0x00007f6b88eb6061: mov %edx,%r10d ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
>> ││ ; - java.lang.String::<init>@107 (line 537)
>> 0.11% ↘│ 0x00007f6b88eb6064: test %r10d,%r10d
>> │ 0x00007f6b88eb6067: jl 0x00007f6b88eb669c ;*iflt {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - java.lang.String::<init>@108 (line 537)
>> 0.39% │ 0x00007f6b88eb606d: cmp %r13d,%r10d
>> │ 0x00007f6b88eb6070: jge 0x00007f6b88eb66d0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - java.lang.String::<init>@114 (line 537)
>> 0.66% │ 0x00007f6b88eb6076: mov %ebx,%r9d
>> 13.70% │ 0x00007f6b88eb6079: cmp 0x8(%rsp),%r10d
>> 0.01% │ 0x00007f6b88eb607e: jae 0x00007f6b88eb6671
>> 0.14% │ 0x00007f6b88eb6084: movsbl 0x10(%r14,%r10,1),%edi ;*baload {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - java.lang.String::<init>@119 (line 538)
>> 0.37% │ 0x00007f6b88eb608a: mov %r9d,%ebx
>> 0.99% │ 0x00007f6b88eb608d: inc %ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - java.lang.String::<init>@131 (line 540)
>> 12.88% │ 0x00007f6b88eb608f: movslq %r9d,%rsi ;*bastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - java.lang.String::<init>@196 (line 548)
>> 0.17% │ 0x00007f6b88eb6092: mov %r10d,%edx
>> 0.39% │ 0x00007f6b88eb6095: inc %edx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - java.lang.String::<init>@138 (line 541)
>> 0.96% │ 0x00007f6b88eb6097: test %edi,%edi
>> 0.02% │ 0x00007f6b88eb6099: jl 0x00007f6b88eb60dc ;*iflt {reexecute=0 rethrow=0 return_oop=0}
>>
>> Between instructions mapped to `if_icmpge` and `aload_1` in baseline we have bounds check which is missing from patched code.
>
>> Originally this was spotted by by Amir Hadadi in https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor
>
> Before anyone looks at this, can you confirm that the patch does not include any code or tests/benchmarks that were taken from SO?
> @AlanBateman the benchmark is mine along with the changes for `translateEscapes` and `newStringUTF8NoRepl`, the change for constructor is from SO.
I don't know how we can progress this PR if the patch includes code copied from SO. Maybe the PR should be closed, the JBS issue unassigned, and leave it to someone else to start again? Maybe you could get Amit to sign the OCA and you co-contribute/author the PR? I can't look at the patch so I don't know how significant the changes, maybe there are other options.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6812
More information about the core-libs-dev
mailing list