[PATCH] Enhancement proposal for java.util.StringJoiner

Сергей Цыпанов sergei.tsypanov at yandex.ru
Thu Feb 13 09:23:40 UTC 2020


Hello, 

I've reworked the patch in order to decide about char[] / byte[] lazily.

This allows to dodge performance impact of reflective calls to String.isLatin1() for non-latin Strings
and at the same time keep the benefits of discrimination between char[] / byte[].

I've collected resutls for all the versions of my patch into the table below. As you can see 
StringBuilder reduces complexity but significantly slows down the case then non-latin Strings are joined.

Exclamation marks denote the cases where the last implementation clearly looses in performance
agains original StringJoiner.

Regards,
Sergey Tsypanov

P.S.

Also I think the further improvement is possible here: Tagir mentioned decoding of byte[] happening in
constructor of String(byte[]). The constructor is not aware about how the bytes are encoded and
has to decode them. However in our case when came to `new String(byte[])` we are sure that the bytes
are of Latin1. So we can skip decoding by calling StringLatin1.newString(). However, it copies incoming
byte[] which is again redundant in our particular case.


Benchmark                              (count)  (latin)  (length)          Original           Patched1                 SB          Patched2        Units
stringJoiner                                 1     true         1      26.9 ±   0.7       48.8 ±   2.2       33.4 ±   0.2      42.6 ±   1.0        ns/op   !
stringJoiner                                 1     true         5      30.5 ±   1.0       46.1 ±   2.1       33.0 ±   0.1      42.3 ±   1.1        ns/op   !
stringJoiner                                 1     true        10      31.2 ±   0.6       47.3 ±   1.3       34.3 ±   0.3      46.6 ±   1.4        ns/op   !
stringJoiner                                 1     true       100      62.5 ±   3.3       79.9 ±   4.8       44.4 ±   0.1      63.5 ±   2.1        ns/op
stringJoiner                                 5     true         1      78.2 ±   1.6      110.3 ±   2.9       87.8 ±   0.8      93.4 ±   1.7        ns/op   !
stringJoiner                                 5     true         5      94.2 ±   8.7      116.6 ±   0.7       88.2 ±   0.9      91.4 ±   0.9        ns/op
stringJoiner                                 5     true        10      95.3 ±   6.9      100.1 ±   0.4       91.6 ±   0.6      93.7 ±   0.4        ns/op
stringJoiner                                 5     true       100     188.0 ±  10.2      136.0 ±   0.4      126.1 ±   0.7     135.3 ±   0.9        ns/op
stringJoiner                                10     true         1     160.3 ±   4.5      172.9 ±   0.8      177.6 ±   0.8     172.1 ±   0.6        ns/op   !
stringJoiner                                10     true         5     169.0 ±   4.7      180.2 ±   9.1      179.4 ±   1.0     171.7 ±   0.6        ns/op
stringJoiner                                10     true        10     205.7 ±  16.4      182.7 ±   1.1      189.5 ±   1.2     178.3 ±   0.5        ns/op
stringJoiner                                10     true       100     366.5 ±  17.0      284.5 ±   3.1      290.0 ±   0.8     282.1 ±   0.9        ns/op
stringJoiner                               100     true         1    1117.6 ±  11.1     2123.7 ±  11.1     1563.8 ±   2.8    1379.5 ±   4.4        ns/op   !
stringJoiner                               100     true         5    1270.7 ±  40.2     2163.6 ±  12.4     1592.4 ±   4.0    1426.1 ±   4.4        ns/op   !
stringJoiner                               100     true        10    1364.4 ±  14.0     2283.8 ±  16.1     1773.7 ±  57.7    1525.0 ±   3.2        ns/op   !
stringJoiner                               100     true       100    3592.9 ± 164.8     3535.2 ±  29.9     2899.2 ±  51.0    3043.9 ±  85.6        ns/op

stringJoiner                                 1    false         1      35.6 ±   1.2       59.1 ±   3.0       52.7 ±   1.2      44.9 ±   0.9        ns/op   !
stringJoiner                                 1    false         5      39.3 ±   1.2       52.6 ±   2.5       54.4 ±   1.6      37.4 ±   0.9        ns/op
stringJoiner                                 1    false        10      42.2 ±   1.6       53.6 ±   0.3       52.2 ±   1.0      43.2 ±   1.0        ns/op
stringJoiner                                 1    false       100      70.5 ±   1.8       86.4 ±   0.4       78.6 ±   1.2      75.8 ±   2.2        ns/op   !
stringJoiner                                 5    false         1      89.0 ±   3.5      102.2 ±   1.0      116.3 ±   3.8      85.3 ±   0.1        ns/op
stringJoiner                                 5    false         5      87.6 ±   0.7      106.5 ±   1.2      115.2 ±   2.9      90.7 ±   0.6        ns/op   !
stringJoiner                                 5    false        10     109.0 ±   5.6      116.5 ±   1.2      126.5 ±   0.5      97.2 ±   0.3        ns/op
stringJoiner                                 5    false       100     324.0 ±  16.5      221.9 ±   0.5      288.9 ±   0.5     227.3 ±   0.7        ns/op
stringJoiner                                10    false         1     183.9 ±   5.9      204.7 ±   5.5      261.2 ±   7.7     168.4 ±   0.7        ns/op
stringJoiner                                10    false         5     198.7 ±   9.7      202.4 ±   1.5      253.3 ±   6.7     178.1 ±   0.6        ns/op
stringJoiner                                10    false        10     196.7 ±   6.9      226.7 ±   6.4      274.3 ±   7.0     191.8 ±   0.5        ns/op
stringJoiner                                10    false       100     535.8 ±   2.3      553.0 ±   5.6      677.3 ±   6.4     538.8 ±   2.3        ns/op
stringJoiner                               100    false         1    1674.6 ± 122.1     1940.8 ±  16.2     2212.5 ±  32.2    1418.9 ±   2.9        ns/op
stringJoiner                               100    false         5    1791.9 ±  58.1     2158.1 ±  12.0     2492.8 ±  30.9    1583.9 ±   3.9        ns/op
stringJoiner                               100    false        10    2124.1 ± 193.3     2364.0 ±  25.2     2611.7 ±  17.5    1861.8 ±   5.9        ns/op
stringJoiner                               100    false       100    4323.4 ±  29.2     4675.5 ±  11.8     5501.3 ±  21.1    4172.3 ±  12.7        ns/op

stringJoiner:·gc.alloc.rate.norm             1     true         1     120.0 ±   0.0      120.0 ±   0.0      144.0 ±   0.0     120.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             1     true         5     128.0 ±   0.0      120.0 ±   0.0      144.0 ±   0.0     120.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             1     true        10     144.0 ±   0.0      136.0 ±   0.0      160.0 ±   0.0     136.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             1     true       100     416.0 ±   0.0      312.0 ±   0.0      336.0 ±   0.0     312.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5     true         1     144.0 ±   0.0      136.0 ±   0.0      160.0 ±   0.0     136.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5     true         5     200.0 ±   0.0      168.0 ±   0.0      192.0 ±   0.0     168.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5     true        10     272.0 ±   0.0      216.0 ±   0.0      240.0 ±   0.0     216.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5     true       100    1632.0 ±   0.0     1128.0 ±   0.0     1152.0 ±   0.0    1128.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10     true         1     256.0 ±   0.0      232.0 ±   0.0      256.0 ±   0.0     232.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10     true         5     376.0 ±   0.0      316.8 ±   4.9      336.0 ±   0.0     312.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10     true        10     520.0 ±   0.0      408.0 ±   0.0      432.0 ±   0.0     408.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10     true       100    3224.1 ±   0.0     2236.9 ±  21.2     2240.1 ±   0.0    2216.1 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100     true         1    1748.1 ±   4.0     1592.2 ±   0.0     1568.2 ±   0.0    1544.2 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100     true         5    2948.2 ±   4.0     2392.3 ±   0.0     2368.2 ±   0.0    2344.2 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100     true        10    4444.3 ±   4.0     3384.3 ±   0.0     3364.3 ±   4.0    3336.3 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100     true       100   31441.4 ±   0.0    21385.4 ±   0.0    21365.1 ±   4.1   21353.2 ±   6.6        B/op

stringJoiner:·gc.alloc.rate.norm             1    false         1     144.0 ±   0.0      144.0 ±   0.0      192.0 ±   0.0     144.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             1    false         5     160.0 ±   0.0      160.0 ±   0.0      208.0 ±   0.0     160.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             1    false        10     184.0 ±   0.0      184.0 ±   0.0      240.0 ±   0.0     184.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             1    false       100     640.0 ±   0.0      640.0 ±   0.0      784.0 ±   0.0     640.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5    false         1     184.0 ±   0.0      184.0 ±   0.0      240.0 ±   0.0     184.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5    false         5     280.0 ±   0.0      280.0 ±   0.0      349.6 ±   2.4     280.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5    false        10     400.0 ±   0.0      400.0 ±   0.0      496.0 ±   0.0     400.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm             5    false       100    2664.1 ±   0.0     2664.1 ±   0.0     3216.1 ±   0.0    2664.1 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10    false         1     320.0 ±   0.0      334.4 ±   7.4      384.0 ±   0.0     320.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10    false         5     520.0 ±   0.0      520.0 ±   0.0      624.0 ±   0.0     520.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10    false        10     760.0 ±   0.0      769.6 ±   6.5      912.0 ±   0.0     760.0 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm            10    false       100    5264.2 ±   0.0     5273.8 ±   6.5     6320.3 ±   0.0    5264.2 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100    false         1    2204.2 ±   4.0     2216.3 ±   0.0     2436.3 ±   6.8    2168.2 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100    false         5    4196.3 ±   6.2     4216.4 ±   0.0     4832.4 ±   6.6    4168.3 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100    false        10    6696.5 ±   5.4     6712.6 ±   0.0     7844.7 ±   4.0    6664.5 ±   0.0        B/op
stringJoiner:·gc.alloc.rate.norm           100    false       100   51702.0 ±   4.1    51714.4 ±   0.0    61838.6 ±   6.2   51666.1 ±   0.0        B/op


10.02.2020, 14:08, "Tagir Valeev" <amaembo at gmail.com>:
> Hello!
>
> In many tests, I see little or no performance improvements. E.g.:
> stringJoiner 100 10 1768.8 ±
> 160.6 1760.8 ± 111.4 ns/op
>
> How would you explain that this code change doesn't improve the
> performance for given count and length?
>
> Also, you are using `new String(bytes)` assuming that the platform
> charset is latin1-compatible. This is not always true, so your code
> would produce incorrect results depending on this setting. In general,
> decoding via charset is tons of extra work. Instead, I would
> experiment with pre-sized StringBuilder inside compactElts, instead of
> char[] array. While it does some extra checks, StringBuilder already
> can use the compact string representation, so if it's properly
> pre-sized, no extra memory will be allocated.
>
> Finally, if you optimize for the case when X is true you should always
> benchmark what happens when X is false. It's great that in some cases
> we see a speedup for latin1 strings. But what if not all of the
> strings are latin1? Is there any performance degradation? If yes, can
> we tolerate it?
>
> With best regards,
> Tagir Valeev.
>
> On Mon, Feb 10, 2020 at 3:22 PM Сергей Цыпанов
> <sergei.tsypanov at yandex.ru> wrote:
>>  Hello,
>>
>>  I've reworked the code, patch is attached. Could you please review my solution regarding usage of SahredSecrets?
>>
>>  P.S. After I've created the patch it came to my mind that instead of checking all Strings when calling StringJoiner.add()
>>  we can check them in toString() method and fail-fast in case at least one of them is non-latin. This likely to reduce
>>  regression related to usage of reflection.
>>
>>  Regards,
>>  Sergey Tsypanov
>>
>>  05.02.2020, 23:21, "forax at univ-mlv.fr" <forax at univ-mlv.fr>:
>>  > ----- Mail original -----
>>  >> De: "Сергей Цыпанов" <sergei.tsypanov at yandex.ru>
>>  >> À: "Remi Forax" <forax at univ-mlv.fr>, "core-libs-dev" <core-libs-dev at openjdk.java.net>
>>  >> Envoyé: Mercredi 5 Février 2020 22:12:34
>>  >> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>>  >
>>  >> Hello,
>>  >>
>>  >>> If you want to optimize StringJoiner, the best way to do it is to use the shared
>>  >>> secret mechanism so a java.util class can see implementation details of a
>>  >>> java.lang class without exposing those details publicly.
>>  >>> As an example, take a look to EnumSet and its implementations.
>>  >>
>>  >> I've looked into SharedSecrets, it seems there's no ready-to-use method for
>>  >> accessing package-private method. Do you mean it's necessary to add particular
>>  >> functionality to JavaLangReflectionAccess as they did for JavaLangAccess in
>>  >> order to deal with EnumSet?
>>  >
>>  > yes !
>>  > crossing package boundary in a non public way is not free,
>>  > but given that StringJoiner is used quite often (directly or indirectly using Collectors.joining()), it may worth the cost.
>>  >
>>  >> Regards,
>>  >> Sergey
>>  >
>>  > Regards,
>>  > Rémi
>>  >
>>  >> 04.02.2020, 12:12, "Remi Forax" <forax at univ-mlv.fr>:
>>  >>> ----- Mail original -----
>>  >>>> De: "Сергей Цыпанов" <sergei.tsypanov at yandex.ru>
>>  >>>> À: "jonathan gibbons" <jonathan.gibbons at oracle.com>, "core-libs-dev"
>>  >>>> <core-libs-dev at openjdk.java.net>
>>  >>>> Envoyé: Mardi 4 Février 2020 08:53:31
>>  >>>> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>>  >>>
>>  >>>> Hello,
>>  >>>
>>  >>> Hi Sergey,
>>  >>>
>>  >>>> I'd probably agree about a new class in java.lang, but what is wrong about
>>  >>>> exposing package-private method
>>  >>>> which doesn't modify the state of the object and has no side effects?
>>  >>>
>>  >>> You can not change the implementation anymore,
>>  >>> by example if instead of having a split between latin1 and non latin1, we decide
>>  >>> in the future to split between utf8 and non utf8.
>>  >>>
>>  >>> If you want to optimize StringJoiner, the best way to do it is to use the shared
>>  >>> secret mechanism so a java.util class can see implementation details of a
>>  >>> java.lang class without exposing those details publicly.
>>  >>> As an example, take a look to EnumSet and its implementations.
>>  >>>
>>  >>> regards,
>>  >>> Rémi
>>  >>>
>>  >>>> 04.02.2020, 00:58, "Jonathan Gibbons" <jonathan.gibbons at oracle.com>:
>>  >>>>> Sergey,
>>  >>>>>
>>  >>>>> It is equally bad to create a new class in the java.lang package as it
>>  >>>>> is to add a new public method to java.lang.String.
>>  >>>>>
>>  >>>>> -- Jon
>>  >>>>>
>>  >>>>> On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
>>  >>>>>> Hello,
>>  >>>>>>
>>  >>>>>> as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued
>>  >>>>>> Strings.
>>  >>>>>>
>>  >>>>>> This applies for the cases when all joined Strings as well as delimiter, prefix
>>  >>>>>> and suffix contain only ASCII symbols.
>>  >>>>>>
>>  >>>>>> As a result when StringJoiner.toString() is invoked, byte[] stored in String is
>>  >>>>>> inflated in order to fill in char[] and
>>  >>>>>> finally char[] is compressed when constructor of String is called:
>>  >>>>>>
>>  >>>>>> String delimiter = this.delimiter;
>>  >>>>>> char[] chars = new char[this.len + addLen];
>>  >>>>>> int k = getChars(this.prefix, chars, 0);
>>  >>>>>> if (size > 0) {
>>  >>>>>> k += getChars(elts[0], chars, k); // inflate byte[] -> char[]
>>  >>>>>>
>>  >>>>>> for(int i = 1; i < size; ++i) {
>>  >>>>>> k += getChars(delimiter, chars, k);
>>  >>>>>> k += getChars(elts[i], chars, k);
>>  >>>>>> }
>>  >>>>>> }
>>  >>>>>>
>>  >>>>>> k += getChars(this.suffix, chars, k);
>>  >>>>>> return new String(chars); // compress char[] -> byte[]
>>  >>>>>>
>>  >>>>>> This can be improved by detecting cases when String.isLatin1() returns true for
>>  >>>>>> all involved Strings.
>>  >>>>>>
>>  >>>>>> I've prepared a patch along with benchmark proving that this change is correct
>>  >>>>>> and brings improvement.
>>  >>>>>> The only concern I have is about String.isLatin1(): as far as String belongs to
>>  >>>>>> java.lang and StringJoiner to java.util
>>  >>>>>> package-private String.isLatin1() cannot be directly accessed, we need to make
>>  >>>>>> it public for successful compilation.
>>  >>>>>>
>>  >>>>>> Another solution is to create an intermediate utility class located in java.lang
>>  >>>>>> which delegates the call to String.isLatin1():
>>  >>>>>>
>>  >>>>>> package java.lang;
>>  >>>>>>
>>  >>>>>> public class StringHelper {
>>  >>>>>> public static boolean isLatin1(String str) {
>>  >>>>>> return str.isLatin1();
>>  >>>>>> }
>>  >>>>>> }
>>  >>>>>>
>>  >>>>>> This allows to keep java.lang.String intact and have access to it's
>>  >>>>>> package-private method outside of java.lang package.
>>  >>>>>>
>>  >>>>>> Below I've added results of benchmarking for specified case (all Strings are
>>  >>>>>> Latin1). The other case (at least one String is UTF-8) uses existing code so
>>  >>>>>> there will be only a tiny regression due to several if-checks.
>>  >>>>>>
>>  >>>>>> With best regards,
>>  >>>>>> Sergey Tsypanov
>>  >>>>>>
>>  >>>>>> (count) (length) Original Patched Units
>>  >>>>>> stringJoiner 1 1 26.7 ± 1.3 38.2 ± 1.1 ns/op
>>  >>>>>> stringJoiner 1 5 27.4 ± 0.0 40.5 ± 2.2 ns/op
>>  >>>>>> stringJoiner 1 10 29.6 ± 1.9 38.4 ± 1.9 ns/op
>>  >>>>>> stringJoiner 1 100 61.1 ± 6.9 47.6 ± 0.6 ns/op
>>  >>>>>> stringJoiner 5 1 91.1 ± 6.7 83.6 ± 2.0 ns/op
>>  >>>>>> stringJoiner 5 5 96.1 ± 10.7 85.6 ± 1.1 ns/op
>>  >>>>>> stringJoiner 5 10 105.5 ± 14.3 84.7 ± 1.1 ns/op
>>  >>>>>> stringJoiner 5 100 266.6 ± 30.1 139.6 ± 14.0 ns/op
>>  >>>>>> stringJoiner 10 1 190.7 ± 23.0 162.0 ± 2.9 ns/op
>>  >>>>>> stringJoiner 10 5 200.0 ± 16.9 167.5 ± 11.0 ns/op
>>  >>>>>> stringJoiner 10 10 216.4 ± 12.4 164.8 ± 1.7 ns/op
>>  >>>>>> stringJoiner 10 100 545.3 ± 49.7 282.2 ± 12.0 ns/op
>>  >>>>>> stringJoiner 100 1 1467.0 ± 90.3 1302.0 ± 18.5 ns/op
>>  >>>>>> stringJoiner 100 5 1491.8 ± 166.2 1493.0 ± 135.4 ns/op
>>  >>>>>> stringJoiner 100 10 1768.8 ± 160.6 1760.8 ± 111.4 ns/op
>>  >>>>>> stringJoiner 100 100 3654.3 ± 113.1 3120.9 ± 175.9 ns/op
>>  >>>>>>
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0 120.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0 120.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0 136.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0 312.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0 136.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0 168.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0 216.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0 232.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0 312.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0 408.0 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0 2216.1 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9 1544.2 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9 2344.2 ± 0.0 B/op
>>  >>>>>> stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0 3336.3 ± 0.0 B/op
>>  >> >> >> stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2 21346.7 ± 14.7 B/op
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sj_2.patch
Type: text/x-diff
Size: 12049 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20200213/bc276e35/sj_2-0001.patch>


More information about the core-libs-dev mailing list