[PATCH] Performance improvement of StringBuilder.append(CharSerquence, int, int)
Tagir Valeev
amaembo at gmail.com
Fri Jan 11 15:05:25 UTC 2019
Hello!
Your benchmark tests only one of many possible cases. What if string
contains non-latin1 characters? What if many strings are appended to the
same string builder? What if you append only one character from the long
string? Performance fixes in such widely used class could be introduced
only after investigating all possible use scenarios. Also not only cpu
time, but also memory allocations should be measured.
With best regards,
Tagir Valeev.
пт, 11 янв. 2019 г., 2:35 Сергей Цыпанов sergei.tsypanov at yandex.ru:
> Hi,
>
> while investigating an issue I've found out that performance of expression
> (1) can be significantly improved by turning it into (2)
>
>
> ----------------------------------------------------------------------------------------------------
>
> //str is an instance of java.lang.String
>
> (1) return new StringBuilder().append(str, beginIndex,
> endIndex).toString();
>
> ---->
>
> (2) String substring = str.substring(beginIndex, endIndex);
>
> return new StringBuilder().append(substring).toString();
>
>
> ----------------------------------------------------------------------------------------------------
>
> Using benchmark [1] I've got the following results:
>
> Benchmark
> (length) Mode Cnt Score Error Units
> StringBuilderAppendBenchmark.appendBounds
> 10 avgt 20 21,161 ± 0,545 ns/op
> StringBuilderAppendBenchmark.appendBounds
> 100 avgt 20 72,230 ± 3,452 ns/op
> StringBuilderAppendBenchmark.appendBounds
> 1000 avgt 20 476,256 ± 12,044 ns/op
>
> StringBuilderAppendBenchmark.appendSubString
> 10 avgt 20 26,023 ± 0,871 ns/op
> StringBuilderAppendBenchmark.appendSubString
> 100 avgt 20 38,824 ± 1,185 ns/op
> StringBuilderAppendBenchmark.appendSubString
> 1000 avgt 20 214,072 ± 3,372 ns/op
>
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 10 avgt 20 80,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 100 avgt 20 320,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 1000 avgt 20 2120,001 ± 0,001 B/op
>
> StringBuilderAppendBenchmark.appendSubString:·gc.alloc.rate.norm
> 10 avgt 20 96,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendSubString:·gc.alloc.rate.norm
> 100 avgt 20 192,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendSubString:·gc.alloc.rate.norm
> 1000 avgt 20 1088,000 ± 0,001 B/op
>
>
> The reason for such a difference seems to be @HotSpotIntrinsicCandidate
> over StringBuilder.append(String),
> while StringBuilder.append(CharSerquence, int, int) copies chars/bytes one
> by one.
>
> I've prepared a patch [2] and applied it to fresh sources of JDK 11.
> After rebuild the same benchmark demonstrated severe performance
> degradation:
>
> - before patch
>
> Benchmark
> (length) Mode Cnt Score Error Units
> StringBuilderAppendBenchmark.appendBounds
> 10 avgt 20 21,592 ± 0,763 ns/op
> StringBuilderAppendBenchmark.appendBounds
> 100 avgt 20 72,241 ± 2,799 ns/op
> StringBuilderAppendBenchmark.appendBounds
> 1000 avgt 20 488,106 ± 43,042 ns/op
>
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 10 avgt 20 80,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 100 avgt 20 320,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 1000 avgt 20 2132,001 ± 10,691 B/op
>
> - after patch
>
> Benchmark
> (length) Mode Cnt Score Error Units
> StringBuilderAppendBenchmark.appendBounds
> 10 avgt 20 40,186 ± 1,828 ns/op
> StringBuilderAppendBenchmark.appendBounds
> 100 avgt 20 89,031 ± 6,664 ns/op
> StringBuilderAppendBenchmark.appendBounds
> 1000 avgt 20 669,934 ± 76,888 ns/op
>
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 10 avgt 20 152,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 100 avgt 20 440,000 ± 0,001 B/op
> StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm
> 1000 avgt 20 2688,001 ± 0,002 B/op
>
> As I see the reason of degradation is that patched code now dodges
> intrinsified StringBuilder.append(String) and calls directly
> AbstractStringBuilder.append(String) which in spite of utilizing
> System::arraycopy performes worse than intrinsic.
>
> So my question is should I rework the patch somehow to make it perform
> better or I need to apply the transformation mentioned above ad hoc?
>
>
> Appendix:
>
> 1) benchmark
>
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
> public class StringBuilderAppendBenchmark {
>
> @Benchmark
> @SuppressWarnings("StringBufferReplaceableByString")
> public String appendSubString(Data data) {
> String str = data.str;
> int beginIndex = data.beginIndex;
> int endIndex = data.endIndex;
>
> String substring = str.substring(beginIndex, endIndex);
>
> return new
> StringBuilder().append('L').append(substring).append(';').toString();
> }
>
> @Benchmark
> public String appendBounds(Data data) {
> String str = data.str;
> int beginIndex = data.beginIndex;
> int endIndex = data.endIndex;
>
> return new StringBuilder().append('L').append(str, beginIndex,
> endIndex).append(';').toString();
> }
>
> @State(Scope.Thread)
> public static class Data {
> String str;
>
> @Param({"10", "100", "1000"})
> private int length;
>
> private int beginIndex;
> private int endIndex;
>
> @Setup
> public void setup() {
> str = randomString();
> beginIndex = length / 4;
> endIndex = length / 4* 3;
> }
>
> private String randomString() {
> ThreadLocalRandom random = ThreadLocalRandom.current();
> char[] chars = "abcdefghijklmnopqrstuvwxyz".toCharArray();
>
> StringBuilder sb = new StringBuilder(length);
> for (int i = 0; i < length; i++) {
> char c = chars[random.nextInt(chars.length)];
> sb.append(c);
> }
> return sb.toString();
> }
> }
>
> }
>
>
> 2) patch
>
> diff --git
> a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> @@ -626,10 +626,7 @@
> s = "null";
> }
> checkRange(start, end, s.length());
> - int len = end - start;
> - ensureCapacityInternal(count + len);
> - appendChars(s, start, end);
> - return this;
> + return append(s.subSequence(start, end).toString());
> }
>
> /**
> @@ -1686,27 +1683,6 @@
> this.count = count + end - off;
> }
>
> - private final void appendChars(CharSequence s, int off, int end) {
> - if (isLatin1()) {
> - byte[] val = this.value;
> - for (int i = off, j = count; i < end; i++) {
> - char c = s.charAt(i);
> - if (StringLatin1.canEncode(c)) {
> - val[j++] = (byte)c;
> - } else {
> - count = j;
> - inflate();
> - StringUTF16.putCharsSB(this.value, j, s, i, end);
> - count += end - i;
> - return;
> - }
> - }
> - } else {
> - StringUTF16.putCharsSB(this.value, count, s, off, end);
> - }
> - count += end - off;
> - }
> -
> /* IndexOutOfBoundsException, if out of bounds */
> private static void checkRange(int start, int end, int len) {
> if (start < 0 || start > end || end > len) {
>
>
>
>
More information about the core-libs-dev
mailing list