String.contains(CharSequence) calls toString on argument

Tomasz Kowalczewski tomasz.kowalczewski at gmail.com
Thu May 28 10:57:26 UTC 2015


Hi,

thank you for taking time to read my proposal. Is this a matter of some
OpenJDK coding conventions? I myself never use ternary operator and always
use if with braces. My experience is that it improves code readability. If
I have to apply one of proposed changes I would choose first or third.

Regards,
Tomasz

On Thu, May 28, 2015 at 12:35 PM, Ulf Zibis <Ulf.Zibis at cosoco.de> wrote:

> Hi,
>
> I more would like:
>
> 2199         return (s instanceof String)
> 2200                 ? indexOf((String) s) >= 0;
> 2201                 : indexOf(value, 0, value.length, s, 0, s.length(),
> 0) >= 0;
>
> or:
>
> 2199         return (s instanceof String
> 2200                 ? indexOf((String) s)
> 2201                 : indexOf(value, 0, value.length, s, 0, s.length(),
> 0))
> 2202                 >= 0;
>
> or:
>
> 2199         int index = (s instanceof String)
> 2200                 ? indexOf((String) s)
> 2201                 : indexOf(value, 0, value.length, s, 0, s.length(),
> 0);
> 2202return  index  >= 0;
>
> -Ulf
>
>
>
> Am 28.05.2015 um 12:06 schrieb Tomasz Kowalczewski:
>
>> Hello all,
>>
>> encouraged by your responses I did a simple implementation and performed
>> some JMH experiments. Details below.
>>
>> Webrev [1] contains my changes. I deliberately wanted to do minimal code
>> changes. The gist of it is implementing indexOf that works on CharSequence
>> instead of String's internal char array. Both versions are almost
>> identical
>> (I did not change existing implementation in place to not impact all other
>> String-only paths that use it).
>>
>> In my first attempt I just inlined (and simplified) indexOf implementation
>> into String::contains, but in the end decided to create general purpose
>> (package private) version. This might prove useful for others[2]
>>
>> JMH test project is here [3]. All benchmarks were run using "java -jar
>> benchmarks.jar -wi 10 -i 10 -f -prof gc" on Amazon EC2 instance
>> (c4.xlarge,
>> 4 virtual cores). I have run it against three java builds:
>>
>> a. stock java 8u40
>> b. my own build from clean jdk9 sources
>> c. my own build from modified jdk9 sources
>>
>> Results with gc profiler enabled are included in benchmark repo. This gist
>> contains summary results [4].
>>
>> I know that this test covers only very narrow space of possible
>> performance
>> regressions (ideas form more comprehensive tests are welcome). The results
>> show that String only path is not impacted by my changes. When using
>> actual
>> CharSequence (StringBuilder in this case) we are mostly winning, exception
>> is case when CharSequence is of medium size and has many partial matches
>> in
>> searched string.
>>
>> I hope this exercise will be useful to someone and that this change might
>> be considered for inclusion in OpenJDK. If not then maybe at least tests
>> for String::indexOf?
>>
>> [1] http://openjdk.s3-website-us-east-1.amazonaws.com/
>> [2] Changeset from
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-May/033678.html
>> uses
>> indexOf that could accept CharSequence instead of String.
>> [3] https://github.com/tkowalcz/openjdk-jmh
>> [4] https://gist.github.com/tkowalcz/d8f5b9435e184f65fd2a
>>
>> Regards,
>> Tomasz Kowalczewski
>>
>> On Sat, Mar 21, 2015 at 9:21 AM, Tomasz Kowalczewski <
>> tomasz.kowalczewski at gmail.com> wrote:
>>
>>  There are other places which accept CharSequence and iterate over it
>>> not caring about possible concurrent modification (the only sane way
>>> IMHO). Examples are String.contentEquals and String.join that uses
>>> StringJoiner which iterates its argument char by char.
>>>
>>> It looks like contains is the exception in this regard.
>>>
>>> On 20 mar 2015, at 17:05, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>>
>>>  If CharSequence is mutable and thread-safe, I would expect toString()
>>>>> implementation to provide the atomic snapshot (e.g. do the
>>>>> synchronization). Therefore, I find Sherman's argument interesting.
>>>>>
>>>>
>>>> That's my point about "it ought to be up to the caller" -- they're in a
>>>> position to make this call, but String.contains() is playing defensive
>>>> because it has no clue of the context.
>>>>
>>>> On Fri, Mar 20, 2015 at 11:55 AM, Aleksey Shipilev <
>>>> aleksey.shipilev at oracle.com> wrote:
>>>>
>>>>  If CharSequence is mutable and thread-safe, I would expect toString()
>>>>> implementation to provide the atomic snapshot (e.g. do the
>>>>> synchronization). Therefore, I find Sherman's argument interesting.
>>>>>
>>>>> On the other hand, we don't seem to be having any details/requirements
>>>>> for contains(CharSequence) w.r.t. it's own argument. What if
>>>>> String.contains pulls the values one charAt at a time? The concurrency
>>>>> requirements are not enforceable in CharSequence alone then, unless you
>>>>> call the supposed-to-be-atomic toString() first.
>>>>>
>>>>> -Aleksey.
>>>>>
>>>>>  On 03/20/2015 06:48 PM, Vitaly Davidovich wrote:
>>>>>> Yes, but that ought to be for the caller to decide.  Also, although
>>>>>> the
>>>>>> resulting String is immutable, toString() itself may observe mutation.
>>>>>>
>>>>>> On Fri, Mar 20, 2015 at 11:40 AM, Xueming Shen <
>>>>>>
>>>>> xueming.shen at oracle.com>
>>>
>>>> wrote:
>>>>>>
>>>>>>  On 03/20/2015 02:34 AM, Tomasz Kowalczewski wrote:
>>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> Current implementation of String.contains that accepts CharSequence
>>>>>>>>
>>>>>>> calls
>>>>>
>>>>>> toString on it and passes resulting string to indexOf(String). This
>>>>>>>>
>>>>>>> IMO
>>>
>>>> defeats the purpose of using CharSequences (that is to have a mutable
>>>>>>>> character buffer and not allocate unnecessary objects).
>>>>>>>>
>>>>>>> It is arguable that cs.toString() may serve the purpose of taking a
>>>>>>> snapshot of an otherwise
>>>>>>> "mutable" character buffer?
>>>>>>>
>>>>>>> -Sherman
>>>>>>>
>>>>>>>
>>>>>>> Is changing this a desirable development? It seems pretty
>>>>>>>
>>>>>> straightforward
>>>>>
>>>>>> to port indexOf(String) to use CharSequence.
>>>>>>>>
>>>>>>>> If all you need is patch then I can work on it (I have signed OCA)
>>>>>>>>
>>>>>>> just
>>>
>>>> wanted to make sure it is not a futile work.
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>
>>
>


-- 
Tomasz Kowalczewski



More information about the core-libs-dev mailing list