Feedback on the implementation of StringJoiner
Remi Forax
forax at univ-mlv.fr
Tue Dec 11 00:08:03 PST 2012
On 12/11/2012 03:16 AM, Henry Jen wrote:
> Hi Remi,
Hi Henry,
>
> First of all, thanks for review. Comments inline.
>
> On 12/10/2012 04:35 PM, Remi Forax wrote:
>> On 12/10/2012 05:26 PM, Henry Jen wrote:
>>> Hi Roel,
>>>
>>> http://cr.openjdk.java.net/~henryjen/lambda/StringJoiner.1/webrev/
>> A StringJoiner is something which is able to concatenate String but
>> wait, it's also a CharSequence too.
>> Why using inheritance over composition here, if you want a CharSequence,
>> why not having a method toCharSequence()
>> I don't see myself why StringJoiner need to return a CharSequence given
>> that String is a CharSequence.
>>
> I am not sure why it was implemented this way, I am assuming
> StringJoiner would like to implement CharSequence API without actually
> call toString() for reasons you mentioned regrading performance.
>
> I have no issue to remove CharSequence interface, also, I think there
> are some more methods can be removed.
>
>> CharSequence is interresting when you take it as parameter but when you
>> return it (or worst implement it),
>> what is the gain ?
>>
>> So the conception is awful, and the implementation is awful too.
>>
>> Let say I just want to add a String and call toString, add will put the
>> String into the StringBuilder, Ok and toString(), surprise, toString
>> execute value.toString() + suffix, + is implemented by creating a new
>> StringBuilder and adding the two strings, so the code take the
>> StringBuilder value, create a new String from it (the .toString()),
>> create a new StringBuilder (the +) append the freshly created String
>> into the StringBuilder, append the suffix into the StringBuilder and
>> recreate a new String. If you count, you have all the chars of the
>> StringBuilder that are copied 3 times.
>>
> So the complain is the extra instance of StringBuilder with
> concatenation of suffix involved in toString(). I doubt we can implement
> a better version just do allocation and copy, and I am not sure that
> worth the effort. I am assuming that will be optimized by VM to concat
> two immutable string.
toString() load the StringBuilder (value) from a field, the VM will be
able to do something only if it can prove that the StringJoiner doesn't
escape.
This mean that the VM is able to inline the whole stream pipeline. While
it's doable in the future, we are far from that point.
The optimization the VM does is to avoid to create the intermediary
StringBuilder but not the intermediary array of chars.
>
> Noted that, we will have to add suffix only when toString() is called,
> but keeps the state of StringJoiner for more item to be added. So
> toString() will make the concatenation of suffix.
yes, that the intended semantics.
>
> Now, for each add, we are using the same StringBuilder for appending, so
> I don't think there is much problems.
so you ask to copy the chars of the String in the StringBuilder, which
is not necessary see below.
>
>> The right way to implement it is to use an ArrayList (or a dedicated
>> list as the one recently puhed in the lambda repo) to store all the
>> CharSequence that are added, and to also maintain the total number of
>> chars, when toString is called, allocate one new char array with the
>> size (total number of chars + (list.size() - 1) * delimiterSize +
>> prefix and suffix size and then crawle the ArrayList and store each
>> CharSequence at the right place.
>>
> The add will make copy(optimized by StringBuilder.append) of the string,
> so using a dedicate list, to my understanding, is no better than
> StringBuilder.append.
>
> If we don't make a copy, the behavior become nondeterministic depending
> on what CharSequence is passed as argument, which is confusing and
> inconsistent, thus I don't think that was the goal.
yes, you have to call toString() before storing it in the ArrayList of
String but a stream of CharSequence has the very same issue so I don't
expect people will use a stream of CharSequence that are not String very
often so the copy is not needed in more than 80% of the use cases.
so to summarize, using an ArrayList is better than a StringBuilder
because most of the time we will join strings.
>
> Let me know ideas on improvement, I am all ears.
>
> Cheers,
> Henry
>
cheers,
Rémi
More information about the lambda-dev
mailing list