[PATCH] Enhancement proposal for java.util.StringJoiner

Tagir Valeev amaembo at gmail.com
Mon Feb 10 12:08:45 UTC 2020


Hello!

In many tests, I see little or no performance improvements. E.g.:
stringJoiner                                  100        10   1768.8 ±
160.6      1760.8 ± 111.4            ns/op

How would you explain that this code change doesn't improve the
performance for given count and length?

Also, you are using `new String(bytes)` assuming that the platform
charset is latin1-compatible. This is not always true, so your code
would produce incorrect results depending on this setting. In general,
decoding via charset is tons of extra work. Instead, I would
experiment with pre-sized StringBuilder inside compactElts, instead of
char[] array. While it does some extra checks, StringBuilder already
can use the compact string representation, so if it's properly
pre-sized, no extra memory will be allocated.

Finally, if you optimize for the case when X is true you should always
benchmark what happens when X is false. It's great that in some cases
we see a speedup for latin1 strings. But what if not all of the
strings are latin1? Is there any performance degradation? If yes, can
we tolerate it?

With best regards,
Tagir Valeev.

On Mon, Feb 10, 2020 at 3:22 PM Сергей Цыпанов
<sergei.tsypanov at yandex.ru> wrote:
>
> Hello,
>
> I've reworked the code, patch is attached. Could you please review my solution regarding usage of SahredSecrets?
>
> P.S. After I've created the patch it came to my mind that instead of checking all Strings when calling StringJoiner.add()
> we can check them in toString() method and fail-fast in case at least one of them is non-latin. This likely to reduce
> regression related to usage of reflection.
>
> Regards,
> Sergey Tsypanov
>
> 05.02.2020, 23:21, "forax at univ-mlv.fr" <forax at univ-mlv.fr>:
> > ----- Mail original -----
> >>  De: "Сергей Цыпанов" <sergei.tsypanov at yandex.ru>
> >>  À: "Remi Forax" <forax at univ-mlv.fr>, "core-libs-dev" <core-libs-dev at openjdk.java.net>
> >>  Envoyé: Mercredi 5 Février 2020 22:12:34
> >>  Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
> >
> >>  Hello,
> >>
> >>>  If you want to optimize StringJoiner, the best way to do it is to use the shared
> >>>  secret mechanism so a java.util class can see implementation details of a
> >>>  java.lang class without exposing those details publicly.
> >>>  As an example, take a look to EnumSet and its implementations.
> >>
> >>  I've looked into SharedSecrets, it seems there's no ready-to-use method for
> >>  accessing package-private method. Do you mean it's necessary to add particular
> >>  functionality to JavaLangReflectionAccess as they did for JavaLangAccess in
> >>  order to deal with EnumSet?
> >
> > yes !
> > crossing package boundary in a non public way is not free,
> > but given that StringJoiner is used quite often (directly or indirectly using Collectors.joining()), it may worth the cost.
> >
> >>  Regards,
> >>  Sergey
> >
> > Regards,
> > Rémi
> >
> >>  04.02.2020, 12:12, "Remi Forax" <forax at univ-mlv.fr>:
> >>>  ----- Mail original -----
> >>>>   De: "Сергей Цыпанов" <sergei.tsypanov at yandex.ru>
> >>>>   À: "jonathan gibbons" <jonathan.gibbons at oracle.com>, "core-libs-dev"
> >>>>   <core-libs-dev at openjdk.java.net>
> >>>>   Envoyé: Mardi 4 Février 2020 08:53:31
> >>>>   Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
> >>>
> >>>>   Hello,
> >>>
> >>>  Hi Sergey,
> >>>
> >>>>   I'd probably agree about a new class in java.lang, but what is wrong about
> >>>>   exposing package-private method
> >>>>   which doesn't modify the state of the object and has no side effects?
> >>>
> >>>  You can not change the implementation anymore,
> >>>  by example if instead of having a split between latin1 and non latin1, we decide
> >>>  in the future to split between utf8 and non utf8.
> >>>
> >>>  If you want to optimize StringJoiner, the best way to do it is to use the shared
> >>>  secret mechanism so a java.util class can see implementation details of a
> >>>  java.lang class without exposing those details publicly.
> >>>  As an example, take a look to EnumSet and its implementations.
> >>>
> >>>  regards,
> >>>  Rémi
> >>>
> >>>>   04.02.2020, 00:58, "Jonathan Gibbons" <jonathan.gibbons at oracle.com>:
> >>>>>   Sergey,
> >>>>>
> >>>>>   It is equally bad to create a new class in the java.lang package as it
> >>>>>   is to add a new public method to java.lang.String.
> >>>>>
> >>>>>   -- Jon
> >>>>>
> >>>>>   On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
> >>>>>>    Hello,
> >>>>>>
> >>>>>>    as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued
> >>>>>>    Strings.
> >>>>>>
> >>>>>>    This applies for the cases when all joined Strings as well as delimiter, prefix
> >>>>>>    and suffix contain only ASCII symbols.
> >>>>>>
> >>>>>>    As a result when StringJoiner.toString() is invoked, byte[] stored in String is
> >>>>>>    inflated in order to fill in char[] and
> >>>>>>    finally char[] is compressed when constructor of String is called:
> >>>>>>
> >>>>>>    String delimiter = this.delimiter;
> >>>>>>    char[] chars = new char[this.len + addLen];
> >>>>>>    int k = getChars(this.prefix, chars, 0);
> >>>>>>    if (size > 0) {
> >>>>>>         k += getChars(elts[0], chars, k); // inflate byte[] -> char[]
> >>>>>>
> >>>>>>         for(int i = 1; i < size; ++i) {
> >>>>>>             k += getChars(delimiter, chars, k);
> >>>>>>             k += getChars(elts[i], chars, k);
> >>>>>>         }
> >>>>>>    }
> >>>>>>
> >>>>>>    k += getChars(this.suffix, chars, k);
> >>>>>>    return new String(chars); // compress char[] -> byte[]
> >>>>>>
> >>>>>>    This can be improved by detecting cases when String.isLatin1() returns true for
> >>>>>>    all involved Strings.
> >>>>>>
> >>>>>>    I've prepared a patch along with benchmark proving that this change is correct
> >>>>>>    and brings improvement.
> >>>>>>    The only concern I have is about String.isLatin1(): as far as String belongs to
> >>>>>>    java.lang and StringJoiner to java.util
> >>>>>>    package-private String.isLatin1() cannot be directly accessed, we need to make
> >>>>>>    it public for successful compilation.
> >>>>>>
> >>>>>>    Another solution is to create an intermediate utility class located in java.lang
> >>>>>>    which delegates the call to String.isLatin1():
> >>>>>>
> >>>>>>    package java.lang;
> >>>>>>
> >>>>>>    public class StringHelper {
> >>>>>>         public static boolean isLatin1(String str) {
> >>>>>>             return str.isLatin1();
> >>>>>>         }
> >>>>>>    }
> >>>>>>
> >>>>>>    This allows to keep java.lang.String intact and have access to it's
> >>>>>>    package-private method outside of java.lang package.
> >>>>>>
> >>>>>>    Below I've added results of benchmarking for specified case (all Strings are
> >>>>>>    Latin1). The other case (at least one String is UTF-8) uses existing code so
> >>>>>>    there will be only a tiny regression due to several if-checks.
> >>>>>>
> >>>>>>    With best regards,
> >>>>>>    Sergey Tsypanov
> >>>>>>
> >>>>>>                                               (count) (length) Original Patched Units
> >>>>>>    stringJoiner 1 1 26.7 ± 1.3 38.2 ± 1.1 ns/op
> >>>>>>    stringJoiner 1 5 27.4 ± 0.0 40.5 ± 2.2 ns/op
> >>>>>>    stringJoiner 1 10 29.6 ± 1.9 38.4 ± 1.9 ns/op
> >>>>>>    stringJoiner 1 100 61.1 ± 6.9 47.6 ± 0.6 ns/op
> >>>>>>    stringJoiner 5 1 91.1 ± 6.7 83.6 ± 2.0 ns/op
> >>>>>>    stringJoiner 5 5 96.1 ± 10.7 85.6 ± 1.1 ns/op
> >>>>>>    stringJoiner 5 10 105.5 ± 14.3 84.7 ± 1.1 ns/op
> >>>>>>    stringJoiner 5 100 266.6 ± 30.1 139.6 ± 14.0 ns/op
> >>>>>>    stringJoiner 10 1 190.7 ± 23.0 162.0 ± 2.9 ns/op
> >>>>>>    stringJoiner 10 5 200.0 ± 16.9 167.5 ± 11.0 ns/op
> >>>>>>    stringJoiner 10 10 216.4 ± 12.4 164.8 ± 1.7 ns/op
> >>>>>>    stringJoiner 10 100 545.3 ± 49.7 282.2 ± 12.0 ns/op
> >>>>>>    stringJoiner 100 1 1467.0 ± 90.3 1302.0 ± 18.5 ns/op
> >>>>>>    stringJoiner 100 5 1491.8 ± 166.2 1493.0 ± 135.4 ns/op
> >>>>>>    stringJoiner 100 10 1768.8 ± 160.6 1760.8 ± 111.4 ns/op
> >>>>>>    stringJoiner 100 100 3654.3 ± 113.1 3120.9 ± 175.9 ns/op
> >>>>>>
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0 120.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0 120.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0 136.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0 312.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0 136.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0 168.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0 216.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0 232.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0 312.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0 408.0 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0 2216.1 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9 1544.2 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9 2344.2 ± 0.0 B/op
> >>>>>>    stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0 3336.3 ± 0.0 B/op
> >>  >>  >>  stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2 21346.7 ± 14.7 B/op


More information about the core-libs-dev mailing list