[PATCH] Enhancement proposal for java.util.StringJoiner

forax at univ-mlv.fr forax at univ-mlv.fr
Wed Feb 5 21:21:30 UTC 2020


----- Mail original -----
> De: "Сергей Цыпанов" <sergei.tsypanov at yandex.ru>
> À: "Remi Forax" <forax at univ-mlv.fr>, "core-libs-dev" <core-libs-dev at openjdk.java.net>
> Envoyé: Mercredi 5 Février 2020 22:12:34
> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner

> Hello,
> 
>> If you want to optimize StringJoiner, the best way to do it is to use the shared
>> secret mechanism so a java.util class can see implementation details of a
>> java.lang class without exposing those details publicly.
>> As an example, take a look to EnumSet and its implementations.
> 
> I've looked into SharedSecrets, it seems there's no ready-to-use method for
> accessing package-private method. Do you mean it's necessary to add particular
> functionality to JavaLangReflectionAccess as they did for JavaLangAccess in
> order to deal with EnumSet?

yes !
crossing package boundary in a non public way is not free,
but given that StringJoiner is used quite often (directly or indirectly using Collectors.joining()), it may worth the cost.

> 
> Regards,
> Sergey

Regards,
Rémi

> 
> 04.02.2020, 12:12, "Remi Forax" <forax at univ-mlv.fr>:
>> ----- Mail original -----
>>>  De: "Сергей Цыпанов" <sergei.tsypanov at yandex.ru>
>>>  À: "jonathan gibbons" <jonathan.gibbons at oracle.com>, "core-libs-dev"
>>>  <core-libs-dev at openjdk.java.net>
>>>  Envoyé: Mardi 4 Février 2020 08:53:31
>>>  Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>>
>>>  Hello,
>>
>> Hi Sergey,
>>
>>>  I'd probably agree about a new class in java.lang, but what is wrong about
>>>  exposing package-private method
>>>  which doesn't modify the state of the object and has no side effects?
>>
>> You can not change the implementation anymore,
>> by example if instead of having a split between latin1 and non latin1, we decide
>> in the future to split between utf8 and non utf8.
>>
>> If you want to optimize StringJoiner, the best way to do it is to use the shared
>> secret mechanism so a java.util class can see implementation details of a
>> java.lang class without exposing those details publicly.
>> As an example, take a look to EnumSet and its implementations.
>>
>> regards,
>> Rémi
>>
>>>  04.02.2020, 00:58, "Jonathan Gibbons" <jonathan.gibbons at oracle.com>:
>>>>  Sergey,
>>>>
>>>>  It is equally bad to create a new class in the java.lang package as it
>>>>  is to add a new public method to java.lang.String.
>>>>
>>>>  -- Jon
>>>>
>>>>  On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
>>>>>   Hello,
>>>>>
>>>>>   as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued
>>>>>   Strings.
>>>>>
>>>>>   This applies for the cases when all joined Strings as well as delimiter, prefix
>>>>>   and suffix contain only ASCII symbols.
>>>>>
>>>>>   As a result when StringJoiner.toString() is invoked, byte[] stored in String is
>>>>>   inflated in order to fill in char[] and
>>>>>   finally char[] is compressed when constructor of String is called:
>>>>>
>>>>>   String delimiter = this.delimiter;
>>>>>   char[] chars = new char[this.len + addLen];
>>>>>   int k = getChars(this.prefix, chars, 0);
>>>>>   if (size > 0) {
>>>>>        k += getChars(elts[0], chars, k); // inflate byte[] -> char[]
>>>>>
>>>>>        for(int i = 1; i < size; ++i) {
>>>>>            k += getChars(delimiter, chars, k);
>>>>>            k += getChars(elts[i], chars, k);
>>>>>        }
>>>>>   }
>>>>>
>>>>>   k += getChars(this.suffix, chars, k);
>>>>>   return new String(chars); // compress char[] -> byte[]
>>>>>
>>>>>   This can be improved by detecting cases when String.isLatin1() returns true for
>>>>>   all involved Strings.
>>>>>
>>>>>   I've prepared a patch along with benchmark proving that this change is correct
>>>>>   and brings improvement.
>>>>>   The only concern I have is about String.isLatin1(): as far as String belongs to
>>>>>   java.lang and StringJoiner to java.util
>>>>>   package-private String.isLatin1() cannot be directly accessed, we need to make
>>>>>   it public for successful compilation.
>>>>>
>>>>>   Another solution is to create an intermediate utility class located in java.lang
>>>>>   which delegates the call to String.isLatin1():
>>>>>
>>>>>   package java.lang;
>>>>>
>>>>>   public class StringHelper {
>>>>>        public static boolean isLatin1(String str) {
>>>>>            return str.isLatin1();
>>>>>        }
>>>>>   }
>>>>>
>>>>>   This allows to keep java.lang.String intact and have access to it's
>>>>>   package-private method outside of java.lang package.
>>>>>
>>>>>   Below I've added results of benchmarking for specified case (all Strings are
>>>>>   Latin1). The other case (at least one String is UTF-8) uses existing code so
>>>>>   there will be only a tiny regression due to several if-checks.
>>>>>
>>>>>   With best regards,
>>>>>   Sergey Tsypanov
>>>>>
>>>>>                                              (count) (length) Original Patched Units
>>>>>   stringJoiner 1 1 26.7 ± 1.3 38.2 ± 1.1 ns/op
>>>>>   stringJoiner 1 5 27.4 ± 0.0 40.5 ± 2.2 ns/op
>>>>>   stringJoiner 1 10 29.6 ± 1.9 38.4 ± 1.9 ns/op
>>>>>   stringJoiner 1 100 61.1 ± 6.9 47.6 ± 0.6 ns/op
>>>>>   stringJoiner 5 1 91.1 ± 6.7 83.6 ± 2.0 ns/op
>>>>>   stringJoiner 5 5 96.1 ± 10.7 85.6 ± 1.1 ns/op
>>>>>   stringJoiner 5 10 105.5 ± 14.3 84.7 ± 1.1 ns/op
>>>>>   stringJoiner 5 100 266.6 ± 30.1 139.6 ± 14.0 ns/op
>>>>>   stringJoiner 10 1 190.7 ± 23.0 162.0 ± 2.9 ns/op
>>>>>   stringJoiner 10 5 200.0 ± 16.9 167.5 ± 11.0 ns/op
>>>>>   stringJoiner 10 10 216.4 ± 12.4 164.8 ± 1.7 ns/op
>>>>>   stringJoiner 10 100 545.3 ± 49.7 282.2 ± 12.0 ns/op
>>>>>   stringJoiner 100 1 1467.0 ± 90.3 1302.0 ± 18.5 ns/op
>>>>>   stringJoiner 100 5 1491.8 ± 166.2 1493.0 ± 135.4 ns/op
>>>>>   stringJoiner 100 10 1768.8 ± 160.6 1760.8 ± 111.4 ns/op
>>>>>   stringJoiner 100 100 3654.3 ± 113.1 3120.9 ± 175.9 ns/op
>>>>>
>>>>>   stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0 120.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0 120.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0 136.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0 312.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0 136.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0 168.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0 216.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0 232.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0 312.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0 408.0 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0 2216.1 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9 1544.2 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9 2344.2 ± 0.0 B/op
>>>>>   stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0 3336.3 ± 0.0 B/op
> >>  >>  stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2 21346.7 ± 14.7 B/op


More information about the core-libs-dev mailing list