RFR: 8254082: AbstractStringBuilder.insert(int dstOffset, CharSequence s, int start, int end) is missing fast-path for String [v3]

Claes Redestad redestad at openjdk.java.net
Fri Nov 27 21:14:01 UTC 2020


On Fri, 27 Nov 2020 20:26:17 GMT, Сергей Цыпанов <github.com+10835776+stsypanov at openjdk.org> wrote:

>> Original mail: https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-September/069197.html
>> 
>> Hello,
>> 
>> while working with `StringBuilder.insert()` I've spotted that its delegate `AbstractStringBuilder.insert()` is missing
>> a fast-path for the most frequent case when its argument is `String`.
>> 
>> Previously they did similart optimization for `StirngBuilder.append(CharSequence, int, int)`,
>> see https://bugs.openjdk.java.net/browse/JDK-8224986
>> 
>> I'd like to contribute a trivial patch that brings improvement for the case when SB's content is Latin1
>> and inserted String is Latin1 as well.
>> 
>> To measure improvement I've used simple benchmark:
>> @BenchmarkMode(Mode.AverageTime)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>> public class StringBuilderInsertBenchmark {
>> 
>>   @Benchmark
>>   public StringBuilder insert(Data data) {
>>     String string = data.string;
>>     return new StringBuilder().append("ABC").insert(1, string, 1, data.length + 1);
>>   }
>> 
>>   @State(Scope.Thread)
>>   public static class Data {
>>     String string;
>> 
>>     @Param({"true", "false"})
>>     private boolean latin;
>> 
>>     @Param({"8", "64", "128", "1024"})
>>     private int length;
>> 
>>     @Setup
>>     public void setup() {
>>       String alphabet = latin
>>         ? "abcdefghijklmnopqrstuvwxyz"        // English
>>         : "абвгдеёжзиклмнопрстуфхцчшщьыъэюя"; // Russian
>> 
>>       string = new RandomStringGenerator().randomString(alphabet, length + 2);
>>     }
>>   }
>> }
>> 
>> public final class RandomStringGenerator {
>> 
>>   public String randomString(String alphabet, int length) {
>>     char[] chars = alphabet.toCharArray();
>> 
>>     ThreadLocalRandom random = ThreadLocalRandom.current();
>> 
>>     char[] array = new char[length];
>>     for (int i = 0; i < length; i++) {
>>       array[i] = chars[random.nextInt(chars.length)];
>>     }
>> 
>>     return new String(array);
>>   }
>> }
>> Which gives
>> 
>>             (latin)  (length)       original       patched    Units
>> insert         true         8    24.2 ±  0.1   22.2 ±  0.0    ns/op
>> insert         true        64    53.8 ±  0.2   36.1 ±  0.1    ns/op
>> insert         true       128    80.9 ±  0.2   44.6 ±  0.0    ns/op
>> insert         true      1024   365.4 ±  0.5  109.8 ±  3.9    ns/op
>> 
>> insert        false         8    33.5 ±  0.5   32.3 ±  0.2    ns/op
>> insert        false        64    73.2 ±  0.3   73.2 ±  0.2    ns/op
>> insert        false       128   103.9 ±  0.6  103.3 ±  0.1    ns/op
>> insert        false      1024   576.5 ±  4.8  569.5 ±  2.0    ns/op
>> Patch is attached. As of tests tier1 and tier2 are ok.
>> 
>> With best regards,
>> Sergey Tsypanov
>
> Сергей Цыпанов has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - 8254082: Consolidate putStringAt() methods
>  - Merge branch 'master' into asb
>  - Merge branch 'master' into asb
>  - 8254082: Add fast-path for String into AbstractStringBuilder.insert(int, CharSequence, int, int)

src/java.base/share/classes/java/lang/String.java line 3620:

> 3618:     void getBytes(byte[] dst, int srcPos, int dstBegin, byte coder, int length) {
> 3619:         if (coder() == coder) {
> 3620:             System.arraycopy(value, srcPos, dst, dstBegin << coder, length);

Would it be more consistent and easier to follow if we took the length in chars, not in bytes, and adjusted it here with the coder? JITs should do short work of the need to adjust by calling `length()` in `getBytes(byte[], int, byte)` above.

If we're worried about unexpected performance effects we can separate these two `getBytes` methods to independent methods rather than delegate from one to the other.

-------------

PR: https://git.openjdk.java.net/jdk/pull/402


More information about the core-libs-dev mailing list