Optimize StringConcatHelper::simpleConcat for empty strings
Peter Levart
peter.levart at gmail.com
Sun Jun 14 21:04:03 UTC 2020
Hi Tagir,
I think that there might be cases where one of the arguments of
concatenation is constant empty String. I found myself sometimes doing
"" + someNonStringValue (instead of more readable
String.valueOf(someNonStringValue)) simply because of laziness. So does
this optimization help in that case as much as with non-constant "" +
"longlonglongline" ?
Also, for cases where one argument evaluates to empty string and the
other is not a String, I think you could simply return the seconds
string (not creating another instance with reused value/coder):
if (s1.isEmpty()) {
return s2 == second ? new String(s2.value(), s2.coder()) : s2;
}
Regards, Peter
On 6/13/20 7:08 AM, Tagir Valeev wrote:
> Hello!
>
> It's quite possible that when we concatenate two strings, one of them
> appears to be empty. We cannot simply return another string in this
> case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
> the result of the string concatenation expression that 'The String
> object is newly created'. However, it's still possible to reuse the
> internal array of another string to reduce allocations:
>
> --- src/java.base/share/classes/java/lang/StringConcatHelper.java
> (revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
> +++ src/java.base/share/classes/java/lang/StringConcatHelper.java
> (revision 58998+:04e3d254c76b+)
> @@ -420,6 +420,12 @@
> static String simpleConcat(Object first, Object second) {
> String s1 = stringOf(first);
> String s2 = stringOf(second);
> + if (s1.isEmpty()) {
> + return new String(s2.value(), s2.coder());
> + }
> + if (s2.isEmpty()) {
> + return new String(s1.value(), s1.coder());
> + }
> // start "mixing" in length and coder or arguments, order is not
> // important
> long indexCoder = mix(initialCoder(), s2);
>
> Very simple benchmark like this validates that the concatenation
> became faster if one of the strings is empty:
>
> @Param({"", "longlonglongline"})
> String data;
>
> @Param({"", "longlonglongline"})
> String data2;
>
> @Benchmark
> public String plus() {
> return data + data2;
> }
>
> Without patch I observe on VM 15-ea+20-899:
>
> Benchmark (data) (data2) Mode Cnt Score Error Units
> plus avgt 30 15,335 ± 0,186 ns/op
> plus longlonglongline avgt 30 19,867 ± 0,109 ns/op
> plus longlonglongline avgt 30 20,283 ± 0,230 ns/op
> plus longlonglongline longlonglongline avgt 30 26,047 ± 0,230 ns/op
>
> With patch:
> Benchmark (data) (data2) Mode Cnt Score Error Units
> plus avgt 30 6,668 ± 0,055 ns/op
> plus longlonglongline avgt 30 6,708 ± 0,114 ns/op
> plus longlonglongline avgt 30 7,003 ± 0,064 ns/op
> plus longlonglongline longlonglongline avgt 30 25,126 ± 0,392 ns/op
>
> There could be an added cost of up to two branches for the normal case
> (I believe, if one of the strings is constant, then decent JIT can
> eliminate one of the branches). However, I believe, the benefit could
> outweigh it, as empty strings are not that uncommon and in this case,
> we reduce O(N) time and memory consumption to O(1).
>
> What do you think? Is this a reasonable thing to do? I can file an
> issue and submit a proper webrev if it looks like a useful patch.
>
> With best regards,
> Tagir Valeev.
More information about the core-libs-dev
mailing list