Optimize StringConcatHelper::simpleConcat for empty strings

Claes Redestad claes.redestad at oracle.com
Sun Jun 14 23:36:45 UTC 2020


Hi Peter, Tagir,

On 2020-06-14 23:04, Peter Levart wrote:
> Hi Tagir,
> 
> 
> I think that there might be cases where one of the arguments of 
> concatenation is constant empty String. I found myself sometimes doing 
> "" + someNonStringValue (instead of more readable 
> String.valueOf(someNonStringValue)) simply because of laziness. So does 
> this optimization help in that case as much as with non-constant "" + 
> "longlonglongline" ?

perhaps we should consider adding unary concat specializations:

http://cr.openjdk.java.net/~redestad/scratch/unary_concat.00/

For primitive arguments we just return a handle to String.valueOf,
which helps with bootstrap overheads (a Hello World that does "" +
args.length drops from 66ms to 51ms on my machine).

  "" + stringArg can't be reduced to the objectStringifier, since we
need that new String wrapper.

Results on the added concatEmpty micros - before:

Benchmark                       (intValue)  Mode  Cnt   Score   Error  Units
StringConcat.concatEmptyInt           4711  avgt    5  15.461 ± 1.266  ns/op
StringConcat.concatEmptyString        4711  avgt    5   8.173 ± 0.515  ns/op

- after:

Benchmark                       (intValue)  Mode  Cnt   Score   Error  Units
StringConcat.concatEmptyInt           4711  avgt    5  15.470 ± 1.235  ns/op
StringConcat.concatEmptyString        4711  avgt    5  18.420 ± 1.572  ns/op

This doesn't help the case where the arguments are both arguments andone
happen to be an empty string. Tagir's patch would still make sense
there.

/Claes

> 
> 
> Also, for cases where one argument evaluates to empty string and the 
> other is not a String, I think you could simply return the seconds 
> string (not creating another instance with reused value/coder):
> 
> 
> if (s1.isEmpty()) {
> 
>    return s2 == second ? new String(s2.value(), s2.coder()) : s2;
> 
> }
> 
> 
> Regards, Peter
> 
> 
> On 6/13/20 7:08 AM, Tagir Valeev wrote:
>> Hello!
>>
>> It's quite possible that when we concatenate two strings, one of them
>> appears to be empty. We cannot simply return another string in this
>> case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
>> the result of the string concatenation expression that 'The String
>> object is newly created'. However, it's still possible to reuse the
>> internal array of another string to reduce allocations:
>>
>> --- src/java.base/share/classes/java/lang/StringConcatHelper.java
>> (revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
>> +++ src/java.base/share/classes/java/lang/StringConcatHelper.java
>> (revision 58998+:04e3d254c76b+)
>> @@ -420,6 +420,12 @@
>>       static String simpleConcat(Object first, Object second) {
>>           String s1 = stringOf(first);
>>           String s2 = stringOf(second);
>> +        if (s1.isEmpty()) {
>> +            return new String(s2.value(), s2.coder());
>> +        }
>> +        if (s2.isEmpty()) {
>> +            return new String(s1.value(), s1.coder());
>> +        }
>>           // start "mixing" in length and coder or arguments, order is 
>> not
>>           // important
>>           long indexCoder = mix(initialCoder(), s2);
>>
>> Very simple benchmark like this validates that the concatenation
>> became faster if one of the strings is empty:
>>
>>    @Param({"", "longlonglongline"})
>>    String data;
>>
>>    @Param({"", "longlonglongline"})
>>    String data2;
>>
>>    @Benchmark
>>    public String plus() {
>>      return data + data2;
>>    }
>>
>> Without patch I observe on VM 15-ea+20-899:
>>
>> Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  
>> Units
>> plus                                      avgt   30  15,335 ± 0,186  
>> ns/op
>> plus                    longlonglongline  avgt   30  19,867 ± 0,109  
>> ns/op
>> plus  longlonglongline                    avgt   30  20,283 ± 0,230  
>> ns/op
>> plus  longlonglongline  longlonglongline  avgt   30  26,047 ± 0,230  
>> ns/op
>>
>> With patch:
>> Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  
>> Units
>> plus                                      avgt   30   6,668 ± 0,055  
>> ns/op
>> plus                    longlonglongline  avgt   30   6,708 ± 0,114  
>> ns/op
>> plus  longlonglongline                    avgt   30   7,003 ± 0,064  
>> ns/op
>> plus  longlonglongline  longlonglongline  avgt   30  25,126 ± 0,392  
>> ns/op
>>
>> There could be an added cost of up to two branches for the normal case
>> (I believe, if one of the strings is constant, then decent JIT can
>> eliminate one of the branches). However, I believe, the benefit could
>> outweigh it, as empty strings are not that uncommon and in this case,
>> we reduce O(N) time and memory consumption to O(1).
>>
>> What do you think? Is this a reasonable thing to do? I can file an
>> issue and submit a proper webrev if it looks like a useful patch.
>>
>> With best regards,
>> Tagir Valeev.


More information about the core-libs-dev mailing list