String concatenation tweaks

Remi Forax forax at univ-mlv.fr
Thu Mar 12 07:03:23 UTC 2015


Hi Louis,

On 03/11/2015 10:01 PM, Louis Wasserman wrote:
> OpenJDK's implementation of String concatenation compiles
>
>    "foo" + bar + "quux" + baz
>
> into essentially the same bytecode as
>
>   new StringBuilder()
>     .append("foo")
>     .append(bar)
>     .append("quux")
>     .append(baz)
>     .toString()
>
> We've been successfully experimenting at Google with presizing the 
> StringBuilder to avoid the need for rebuffering, with extensive 
> consultation with martinrb@ and cushon at .  I have not yet ported the 
> patch to head, but wanted to bounce the idea off this list before 
> doing so.  Some key points of interest:
>
>   * It suffices to provide an upper bound on the size, if that's not
>     too much bigger than the real length.  For example, for
>     primitives, we use the bound of the maximum length of the toString
>     of that primitive type: for example, a boolean is treated as
>     having length bounded at 5.
>   * Nonconstant Objects, including CharSequences, have their toString
>     stored in a local.  For example, "foo" + myStringBuilder would be
>     compiled to approximately
>
>     String myStringBuilderToString = myStringBuilder.toString();
>     return new StringBuilder(3 + myStringBuilderToString.length())
>       .append("foo")
>       .append(myStringBuilderToString)
>       .toString();
>
>     This is necessary to deal with the possibility of mutation
>     midexpression.
>

Interresting,
here you have two optimizations, one is to call toString() and store the 
result in local variable for each objects to append, the second one is 
to try to pre-calculate the size of the resulting String.
Do you have done some measurement of former without being combined with 
the later ?

I ask that because I think that the code of OptimizeStringConcat only 
works if Hotspot is able to determine that all the objects to append are 
Strings.

>   *  (Nonconstant primitives are also stored in a local to preserve
>     evaluation order and avoid mutation, but not converted to
>     Strings.  There might be some room for optimization here for
>     primitive values coming from final fields or locals.)
>   * Some mostly-redundant null checking is necessary to deal with the
>     evil edge case where toString() returns null.
>

valueOf(valueOf(x)) is quite ugly but i don't see how to do better :(

>   * Taking all the above into account, our benchmarks showed 15% CPU
>     improvements and 25% fewer bytes allocated relative to the status
>     quo, independent of -XX:+OptimizeStringConcat.
>   * While we were at it, in the case of two arguments that are
>     statically known to be Strings, our benchmarks show String.concat
>     to be firmly more efficient than the StringBuilder, even in the
>     presence of flags like -XX:+OptimizeStringConcat.  This is
>     arguably a separate optimization, but nonetheless effective; our
>     benchmarks at the time suggested 40% CPU improvements and 60%
>     fewer bytes allocated relative to the status quo.
>
> So for example, "foo" + myInt + myString + "bar" + myObj would be 
> compiled to the equivalent of
>
> int myIntTmp = myInt;
> String myStringTmp = String.valueOf(myString); // defend against null
> String myObjTmp = String.valueOf(String.valueOf(myObj)); // defend 
> against evil toString implementations returning null
>
> return new StringBuilder(
>      17 // length of "foo" (3) + max length of myInt (11) + length of 
> "bar" (3)
>      + myStringTmp.length()
>      + myObjTmp.length())
>    .append("foo")
>    .append(myIntTmp)
>    .append(myStringTmp)
>    .append("bar")
>    .append(myObjTmp)
>    .toString();
>
> As far as language constraints go, the JLS is (apparently 
> deliberately) vague about how string concatenation is implemented. 
>  "An implementation may choose to perform conversion and concatenation 
> in one step to avoid creating and then discarding an intermediate 
> String object. To increase the performance of repeated string 
> concatenation, a Java compiler may use the StringBuffer class or a 
> similar technique to reduce the number of intermediate String objects 
> that are created by evaluation of an expression."  We see no reason 
> this approach would not qualify as a "similar technique."
>
> If these suggestions (and performance numbers) are of interest, I can 
> port our patch for upstream use.

cheers,
Rémi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150312/4bbcc00d/attachment.html>


More information about the compiler-dev mailing list