[PATCH] Enhancement proposal for java.util.StringJoiner

Сергей Цыпанов sergei.tsypanov at yandex.ru
Mon Feb 3 22:38:58 UTC 2020


Hello,

as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued Strings.

This applies for the cases when all joined Strings as well as delimiter, prefix and suffix contain only ASCII symbols.

As a result when StringJoiner.toString() is invoked, byte[] stored in String is inflated in order to fill in char[] and
finally char[] is compressed when constructor of String is called:

String delimiter = this.delimiter;
char[] chars = new char[this.len + addLen];
int k = getChars(this.prefix, chars, 0);
if (size > 0) {
    k += getChars(elts[0], chars, k);        // inflate byte[] -> char[]

    for(int i = 1; i < size; ++i) {
        k += getChars(delimiter, chars, k);
        k += getChars(elts[i], chars, k);
    }
}

k += getChars(this.suffix, chars, k);
return new String(chars);                    // compress char[] -> byte[]

This can be improved by detecting cases when String.isLatin1() returns true for all involved Strings.

I've prepared a patch along with benchmark proving that this change is correct and brings improvement.
The only concern I have is about String.isLatin1(): as far as String belongs to java.lang and StringJoiner to java.util
package-private String.isLatin1() cannot be directly accessed, we need to make it public for successful compilation.

Another solution is to create an intermediate utility class located in java.lang which delegates the call to String.isLatin1():

package java.lang;

public class StringHelper {
    public static boolean isLatin1(String str) {
        return str.isLatin1();
    }
}

This allows to keep java.lang.String intact and have access to it's package-private method outside of java.lang package.

Below I've added results of benchmarking for specified case (all Strings are Latin1). The other case (at least one String is UTF-8) uses existing code so there will be only a tiny regression due to several if-checks.

With best regards,
Sergey Tsypanov



                                          (count)  (length)         Original             Patched            Units
stringJoiner                                    1         1     26.7 ±   1.3        38.2 ±   1.1            ns/op
stringJoiner                                    1         5     27.4 ±   0.0        40.5 ±   2.2            ns/op
stringJoiner                                    1        10     29.6 ±   1.9        38.4 ±   1.9            ns/op
stringJoiner                                    1       100     61.1 ±   6.9        47.6 ±   0.6            ns/op
stringJoiner                                    5         1     91.1 ±   6.7        83.6 ±   2.0            ns/op
stringJoiner                                    5         5     96.1 ±  10.7        85.6 ±   1.1            ns/op
stringJoiner                                    5        10    105.5 ±  14.3        84.7 ±   1.1            ns/op
stringJoiner                                    5       100    266.6 ±  30.1       139.6 ±  14.0            ns/op
stringJoiner                                   10         1    190.7 ±  23.0       162.0 ±   2.9            ns/op
stringJoiner                                   10         5    200.0 ±  16.9       167.5 ±  11.0            ns/op
stringJoiner                                   10        10    216.4 ±  12.4       164.8 ±   1.7            ns/op
stringJoiner                                   10       100    545.3 ±  49.7       282.2 ±  12.0            ns/op
stringJoiner                                  100         1   1467.0 ±  90.3      1302.0 ±  18.5            ns/op
stringJoiner                                  100         5   1491.8 ± 166.2      1493.0 ± 135.4            ns/op
stringJoiner                                  100        10   1768.8 ± 160.6      1760.8 ± 111.4            ns/op
stringJoiner                                  100       100   3654.3 ± 113.1      3120.9 ± 175.9            ns/op

stringJoiner:·gc.alloc.rate.norm                1         1    120.0 ±   0.0       120.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1         5    128.0 ±   0.0       120.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1        10    144.0 ±   0.0       136.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1       100    416.0 ±   0.0       312.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5         1    144.0 ±   0.0       136.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5         5    200.0 ±   0.0       168.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5        10    272.0 ±   0.0       216.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5       100   1632.0 ±   0.0      1128.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10         1    256.0 ±   0.0       232.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10         5    376.0 ±   0.0       312.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10        10    520.0 ±   0.0       408.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10       100   3224.1 ±   0.0      2216.1 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100         1   1760.2 ±  14.9      1544.2 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100         5   2960.3 ±  14.9      2344.2 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100        10   4440.4 ±   0.0      3336.3 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100       100  31449.3 ±  12.2     21346.7 ±  14.7             B/op





-------------- next part --------------
A non-text attachment was scrubbed...
Name: sj.patch
Type: text/x-diff
Size: 5066 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20200204/89e67945/sj-0001.patch>


More information about the core-libs-dev mailing list