Optimize StringConcatHelper::simpleConcat for empty strings
Claes Redestad
claes.redestad at oracle.com
Mon Jun 15 08:10:10 UTC 2020
Hi Tagir,
On 2020-06-15 05:57, Tagir Valeev wrote:
> Hello, Peter and Claes!
>
>> Cache locality might suffer a bit down the line since
>> String and byte[] becomes less likely to reside in the same slice
>> of heap memory.
>
> There's a JEP 192 String Deduplication in G1 that does similar
> optimization inside the GC (currently it's off by default).
> The 'Risk and assumptions' section reads as:
>
>> Normally a String object and its corresponding character array will be placed next to each other in memory, leading to good cache locality. After deduplicating a String object its character array will be further away, which might introduce a slight performance overhead. Initial measurements have, however, shown that this does not seem to be a problem. In fact, after deduplication the total amount of accessed memory is reduced which tends to result in improved cache hit rates.
>
> So I believe it would not be a big problem in our case either.
The moral difference is that String deduplication is an opt-in feature,
so users can decide if the trade-offs inherent is right for them. Here
we are making that choice for users, albeit for a very small subset of
all possible Strings. This saves a lot of cycles and memory up front,
which I think weighs in favor here (many Strings are short-lived).
>
>> You could simplify as "return new String(s{2,1});".
>
> Thanks! Totally forgot about that String constructor.
>
> Peter,
>> Also, for cases where one argument evaluates to empty string and the
>> other is not a String, I think you could simply return the seconds
>> string (not creating another instance with reused value/coder)\
>
> I believe, this may still violate the spec. The resulting object must
> be a 'newly created' while the custom toString() may return cached or
> constant string. So we still need to copy it. This also applies to the
> Claes' unaryConcat:
>
> + @ForceInline
> + static String unaryConcat(Object first) {
> + if (first instanceof String) {
> + // Must produce a new String
> + return new String((String)first);
> + } else {
> + return stringOf(first);
> + }
> + }
>
> It should be just
> // Must produce a new String
> return new String(stringOf(first));
True.
>
> So should I file a separate JBS issue for my optimization or both
> cases (constant empty string and non-constant empty string) could be
> handled under a single issue?
Up to you. I can fold the patches into one or do them separately, with
my unaryConcat as a follow-up.
/Claes
>
> With best regards,
> Tagir Valeev.
>
> On Mon, Jun 15, 2020 at 4:04 AM Peter Levart <peter.levart at gmail.com> wrote:
>>
>> Hi Tagir,
>>
>>
>> I think that there might be cases where one of the arguments of
>> concatenation is constant empty String. I found myself sometimes doing
>> "" + someNonStringValue (instead of more readable
>> String.valueOf(someNonStringValue)) simply because of laziness. So does
>> this optimization help in that case as much as with non-constant "" +
>> "longlonglongline" ?
>>
>>
>> Also, for cases where one argument evaluates to empty string and the
>> other is not a String, I think you could simply return the seconds
>> string (not creating another instance with reused value/coder):
>>
>>
>> if (s1.isEmpty()) {
>>
>> return s2 == second ? new String(s2.value(), s2.coder()) : s2;
>>
>> }
>>
>>
>> Regards, Peter
>>
>>
>> On 6/13/20 7:08 AM, Tagir Valeev wrote:
>>> Hello!
>>>
>>> It's quite possible that when we concatenate two strings, one of them
>>> appears to be empty. We cannot simply return another string in this
>>> case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
>>> the result of the string concatenation expression that 'The String
>>> object is newly created'. However, it's still possible to reuse the
>>> internal array of another string to reduce allocations:
>>>
>>> --- src/java.base/share/classes/java/lang/StringConcatHelper.java
>>> (revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
>>> +++ src/java.base/share/classes/java/lang/StringConcatHelper.java
>>> (revision 58998+:04e3d254c76b+)
>>> @@ -420,6 +420,12 @@
>>> static String simpleConcat(Object first, Object second) {
>>> String s1 = stringOf(first);
>>> String s2 = stringOf(second);
>>> + if (s1.isEmpty()) {
>>> + return new String(s2.value(), s2.coder());
>>> + }
>>> + if (s2.isEmpty()) {
>>> + return new String(s1.value(), s1.coder());
>>> + }
>>> // start "mixing" in length and coder or arguments, order is not
>>> // important
>>> long indexCoder = mix(initialCoder(), s2);
>>>
>>> Very simple benchmark like this validates that the concatenation
>>> became faster if one of the strings is empty:
>>>
>>> @Param({"", "longlonglongline"})
>>> String data;
>>>
>>> @Param({"", "longlonglongline"})
>>> String data2;
>>>
>>> @Benchmark
>>> public String plus() {
>>> return data + data2;
>>> }
>>>
>>> Without patch I observe on VM 15-ea+20-899:
>>>
>>> Benchmark (data) (data2) Mode Cnt Score Error Units
>>> plus avgt 30 15,335 ± 0,186 ns/op
>>> plus longlonglongline avgt 30 19,867 ± 0,109 ns/op
>>> plus longlonglongline avgt 30 20,283 ± 0,230 ns/op
>>> plus longlonglongline longlonglongline avgt 30 26,047 ± 0,230 ns/op
>>>
>>> With patch:
>>> Benchmark (data) (data2) Mode Cnt Score Error Units
>>> plus avgt 30 6,668 ± 0,055 ns/op
>>> plus longlonglongline avgt 30 6,708 ± 0,114 ns/op
>>> plus longlonglongline avgt 30 7,003 ± 0,064 ns/op
>>> plus longlonglongline longlonglongline avgt 30 25,126 ± 0,392 ns/op
>>>
>>> There could be an added cost of up to two branches for the normal case
>>> (I believe, if one of the strings is constant, then decent JIT can
>>> eliminate one of the branches). However, I believe, the benefit could
>>> outweigh it, as empty strings are not that uncommon and in this case,
>>> we reduce O(N) time and memory consumption to O(1).
>>>
>>> What do you think? Is this a reasonable thing to do? I can file an
>>> issue and submit a proper webrev if it looks like a useful patch.
>>>
>>> With best regards,
>>> Tagir Valeev.
More information about the core-libs-dev
mailing list