RFR: 8254082: AbstractStringBuilder.insert(int dstOffset, CharSequence s, int start, int end) is missing fast-path for String [v2]
Claes Redestad
redestad at openjdk.java.net
Wed Nov 25 10:18:07 UTC 2020
On Wed, 25 Nov 2020 08:58:21 GMT, Сергей Цыпанов <github.com+10835776+stsypanov at openjdk.org> wrote:
>> Original mail: https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-September/069197.html
>>
>> Hello,
>>
>> while working with `StringBuilder.insert()` I've spotted that its delegate `AbstractStringBuilder.insert()` is missing
>> a fast-path for the most frequent case when its argument is `String`.
>>
>> Previously they did similart optimization for `StirngBuilder.append(CharSequence, int, int)`,
>> see https://bugs.openjdk.java.net/browse/JDK-8224986
>>
>> I'd like to contribute a trivial patch that brings improvement for the case when SB's content is Latin1
>> and inserted String is Latin1 as well.
>>
>> To measure improvement I've used simple benchmark:
>> @BenchmarkMode(Mode.AverageTime)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>> public class StringBuilderInsertBenchmark {
>>
>> @Benchmark
>> public StringBuilder insert(Data data) {
>> String string = data.string;
>> return new StringBuilder().append("ABC").insert(1, string, 1, data.length + 1);
>> }
>>
>> @State(Scope.Thread)
>> public static class Data {
>> String string;
>>
>> @Param({"true", "false"})
>> private boolean latin;
>>
>> @Param({"8", "64", "128", "1024"})
>> private int length;
>>
>> @Setup
>> public void setup() {
>> String alphabet = latin
>> ? "abcdefghijklmnopqrstuvwxyz" // English
>> : "абвгдеёжзиклмнопрстуфхцчшщьыъэюя"; // Russian
>>
>> string = new RandomStringGenerator().randomString(alphabet, length + 2);
>> }
>> }
>> }
>>
>> public final class RandomStringGenerator {
>>
>> public String randomString(String alphabet, int length) {
>> char[] chars = alphabet.toCharArray();
>>
>> ThreadLocalRandom random = ThreadLocalRandom.current();
>>
>> char[] array = new char[length];
>> for (int i = 0; i < length; i++) {
>> array[i] = chars[random.nextInt(chars.length)];
>> }
>>
>> return new String(array);
>> }
>> }
>> Which gives
>>
>> (latin) (length) original patched Units
>> insert true 8 24.2 ± 0.1 22.2 ± 0.0 ns/op
>> insert true 64 53.8 ± 0.2 36.1 ± 0.1 ns/op
>> insert true 128 80.9 ± 0.2 44.6 ± 0.0 ns/op
>> insert true 1024 365.4 ± 0.5 109.8 ± 3.9 ns/op
>>
>> insert false 8 33.5 ± 0.5 32.3 ± 0.2 ns/op
>> insert false 64 73.2 ± 0.3 73.2 ± 0.2 ns/op
>> insert false 128 103.9 ± 0.6 103.3 ± 0.1 ns/op
>> insert false 1024 576.5 ± 4.8 569.5 ± 2.0 ns/op
>> Patch is attached. As of tests tier1 and tier2 are ok.
>>
>> With best regards,
>> Sergey Tsypanov
>
> Сергей Цыпанов has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>
> - Merge branch 'master' into asb
> - 8254082: Add fast-path for String into AbstractStringBuilder.insert(int, CharSequence, int, int)
Hi Sergey,
this is interesting! I think `StringBuilder.insert` wasn't given much love in JEP 254 since it's likely (much?) less commonly used than `append`. It'd of course be nice to get it up to speed.
Looking at your patch and existing code I think there might be ways to improve this further and get an improvement on most non-latin1 cases too. Possibly cleaning things up a bit, too.
(`putCharsAt(int, String, int, int)` should probably be called `putStringAt`?)
src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 1716:
> 1714: }
> 1715:
> 1716: private void putCharsAt(int index, String s, int off, int end) {
Comparing this with `putStringAt(int index, String str)` below I think begs for some consolidation here. I think we either should add a `getBytes(value, index, coder, length)`, or - perhaps preferably? - factor out the package private `String.getBytes` and implement it here in ASB using `s.value()`
src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 1721:
> 1719: return;
> 1720: }
> 1721: inflate();
Like in `String.getBytes(byte[], int, byte)` I think we could do an `arraycopy` if both are `false` too, just need to carefully adjust the `index` et.c. In fact the only case that can't use an `arraycopy` in the end is when `s.isLatin1()` and the current sb is already inflated (that's what the `StringLatin1.inflate` branch does in `getBytes`).
I think if you consolidate/merge this with the logic in `String.getBytes(byte[], int, byte)` as suggested you'll end up with a sizeable improvement on non-latin1 cases too.
-------------
Changes requested by redestad (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/402
More information about the core-libs-dev
mailing list