Optimize StringConcatHelper::simpleConcat for empty strings

Tagir Valeev amaembo at gmail.com
Sat Jun 13 05:08:57 UTC 2020


Hello!

It's quite possible that when we concatenate two strings, one of them
appears to be empty. We cannot simply return another string in this
case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
the result of the string concatenation expression that 'The String
object is newly created'. However, it's still possible to reuse the
internal array of another string to reduce allocations:

--- src/java.base/share/classes/java/lang/StringConcatHelper.java
(revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
+++ src/java.base/share/classes/java/lang/StringConcatHelper.java
(revision 58998+:04e3d254c76b+)
@@ -420,6 +420,12 @@
     static String simpleConcat(Object first, Object second) {
         String s1 = stringOf(first);
         String s2 = stringOf(second);
+        if (s1.isEmpty()) {
+            return new String(s2.value(), s2.coder());
+        }
+        if (s2.isEmpty()) {
+            return new String(s1.value(), s1.coder());
+        }
         // start "mixing" in length and coder or arguments, order is not
         // important
         long indexCoder = mix(initialCoder(), s2);

Very simple benchmark like this validates that the concatenation
became faster if one of the strings is empty:

  @Param({"", "longlonglongline"})
  String data;

  @Param({"", "longlonglongline"})
  String data2;

  @Benchmark
  public String plus() {
    return data + data2;
  }

Without patch I observe on VM 15-ea+20-899:

Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
plus                                      avgt   30  15,335 ± 0,186  ns/op
plus                    longlonglongline  avgt   30  19,867 ± 0,109  ns/op
plus  longlonglongline                    avgt   30  20,283 ± 0,230  ns/op
plus  longlonglongline  longlonglongline  avgt   30  26,047 ± 0,230  ns/op

With patch:
Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
plus                                      avgt   30   6,668 ± 0,055  ns/op
plus                    longlonglongline  avgt   30   6,708 ± 0,114  ns/op
plus  longlonglongline                    avgt   30   7,003 ± 0,064  ns/op
plus  longlonglongline  longlonglongline  avgt   30  25,126 ± 0,392  ns/op

There could be an added cost of up to two branches for the normal case
(I believe, if one of the strings is constant, then decent JIT can
eliminate one of the branches). However, I believe, the benefit could
outweigh it, as empty strings are not that uncommon and in this case,
we reduce O(N) time and memory consumption to O(1).

What do you think? Is this a reasonable thing to do? I can file an
issue and submit a proper webrev if it looks like a useful patch.

With best regards,
Tagir Valeev.


More information about the core-libs-dev mailing list