Optimize StringConcatHelper::simpleConcat for empty strings

Claes Redestad claes.redestad at oracle.com
Sun Jun 14 18:15:28 UTC 2020


Hi Tagir,

seems reasonable if there's evidence this is likely to happen in
practice. Cache locality might suffer a bit down the line since
String and byte[] becomes less likely to reside in the same slice
of heap memory.

You could simplify as "return new String(s{2,1});".

/Claes

On 2020-06-13 07:08, Tagir Valeev wrote:
> Hello!
> 
> It's quite possible that when we concatenate two strings, one of them
> appears to be empty. We cannot simply return another string in this
> case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
> the result of the string concatenation expression that 'The String
> object is newly created'. However, it's still possible to reuse the
> internal array of another string to reduce allocations:
> 
> --- src/java.base/share/classes/java/lang/StringConcatHelper.java
> (revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
> +++ src/java.base/share/classes/java/lang/StringConcatHelper.java
> (revision 58998+:04e3d254c76b+)
> @@ -420,6 +420,12 @@
>       static String simpleConcat(Object first, Object second) {
>           String s1 = stringOf(first);
>           String s2 = stringOf(second);
> +        if (s1.isEmpty()) {
> +            return new String(s2.value(), s2.coder());
> +        }
> +        if (s2.isEmpty()) {
> +            return new String(s1.value(), s1.coder());
> +        }
>           // start "mixing" in length and coder or arguments, order is not
>           // important
>           long indexCoder = mix(initialCoder(), s2);
> 
> Very simple benchmark like this validates that the concatenation
> became faster if one of the strings is empty:
> 
>    @Param({"", "longlonglongline"})
>    String data;
> 
>    @Param({"", "longlonglongline"})
>    String data2;
> 
>    @Benchmark
>    public String plus() {
>      return data + data2;
>    }
> 
> Without patch I observe on VM 15-ea+20-899:
> 
> Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
> plus                                      avgt   30  15,335 ± 0,186  ns/op
> plus                    longlonglongline  avgt   30  19,867 ± 0,109  ns/op
> plus  longlonglongline                    avgt   30  20,283 ± 0,230  ns/op
> plus  longlonglongline  longlonglongline  avgt   30  26,047 ± 0,230  ns/op
> 
> With patch:
> Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
> plus                                      avgt   30   6,668 ± 0,055  ns/op
> plus                    longlonglongline  avgt   30   6,708 ± 0,114  ns/op
> plus  longlonglongline                    avgt   30   7,003 ± 0,064  ns/op
> plus  longlonglongline  longlonglongline  avgt   30  25,126 ± 0,392  ns/op
> 
> There could be an added cost of up to two branches for the normal case
> (I believe, if one of the strings is constant, then decent JIT can
> eliminate one of the branches). However, I believe, the benefit could
> outweigh it, as empty strings are not that uncommon and in this case,
> we reduce O(N) time and memory consumption to O(1).
> 
> What do you think? Is this a reasonable thing to do? I can file an
> issue and submit a proper webrev if it looks like a useful patch.
> 
> With best regards,
> Tagir Valeev.
> 


More information about the core-libs-dev mailing list