String.contains(CharSequence) calls toString on argument

Ulf Zibis Ulf.Zibis at CoSoCo.de
Thu May 28 10:35:41 UTC 2015


Hi,

I more would like:

2199         return (s instanceof String)
2200                 ? indexOf((String) s) >= 0;
2201                 : indexOf(value, 0, value.length, s, 0, s.length(), 0) >= 0;

or:

2199         return (s instanceof String
2200                 ? indexOf((String) s)
2201                 : indexOf(value, 0, value.length, s, 0, s.length(), 0))
2202                 >= 0;

or:

2199         int index = (s instanceof String)
2200                 ? indexOf((String) s)
2201                 : indexOf(value, 0, value.length, s, 0, s.length(), 0);
2202return  index  >= 0;
  

-Ulf


Am 28.05.2015 um 12:06 schrieb Tomasz Kowalczewski:
> Hello all,
>
> encouraged by your responses I did a simple implementation and performed
> some JMH experiments. Details below.
>
> Webrev [1] contains my changes. I deliberately wanted to do minimal code
> changes. The gist of it is implementing indexOf that works on CharSequence
> instead of String's internal char array. Both versions are almost identical
> (I did not change existing implementation in place to not impact all other
> String-only paths that use it).
>
> In my first attempt I just inlined (and simplified) indexOf implementation
> into String::contains, but in the end decided to create general purpose
> (package private) version. This might prove useful for others[2]
>
> JMH test project is here [3]. All benchmarks were run using "java -jar
> benchmarks.jar -wi 10 -i 10 -f -prof gc" on Amazon EC2 instance (c4.xlarge,
> 4 virtual cores). I have run it against three java builds:
>
> a. stock java 8u40
> b. my own build from clean jdk9 sources
> c. my own build from modified jdk9 sources
>
> Results with gc profiler enabled are included in benchmark repo. This gist
> contains summary results [4].
>
> I know that this test covers only very narrow space of possible performance
> regressions (ideas form more comprehensive tests are welcome). The results
> show that String only path is not impacted by my changes. When using actual
> CharSequence (StringBuilder in this case) we are mostly winning, exception
> is case when CharSequence is of medium size and has many partial matches in
> searched string.
>
> I hope this exercise will be useful to someone and that this change might
> be considered for inclusion in OpenJDK. If not then maybe at least tests
> for String::indexOf?
>
> [1] http://openjdk.s3-website-us-east-1.amazonaws.com/
> [2] Changeset from
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-May/033678.html uses
> indexOf that could accept CharSequence instead of String.
> [3] https://github.com/tkowalcz/openjdk-jmh
> [4] https://gist.github.com/tkowalcz/d8f5b9435e184f65fd2a
>
> Regards,
> Tomasz Kowalczewski
>
> On Sat, Mar 21, 2015 at 9:21 AM, Tomasz Kowalczewski <
> tomasz.kowalczewski at gmail.com> wrote:
>
>> There are other places which accept CharSequence and iterate over it
>> not caring about possible concurrent modification (the only sane way
>> IMHO). Examples are String.contentEquals and String.join that uses
>> StringJoiner which iterates its argument char by char.
>>
>> It looks like contains is the exception in this regard.
>>
>> On 20 mar 2015, at 17:05, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>
>>>> If CharSequence is mutable and thread-safe, I would expect toString()
>>>> implementation to provide the atomic snapshot (e.g. do the
>>>> synchronization). Therefore, I find Sherman's argument interesting.
>>>
>>> That's my point about "it ought to be up to the caller" -- they're in a
>>> position to make this call, but String.contains() is playing defensive
>>> because it has no clue of the context.
>>>
>>> On Fri, Mar 20, 2015 at 11:55 AM, Aleksey Shipilev <
>>> aleksey.shipilev at oracle.com> wrote:
>>>
>>>> If CharSequence is mutable and thread-safe, I would expect toString()
>>>> implementation to provide the atomic snapshot (e.g. do the
>>>> synchronization). Therefore, I find Sherman's argument interesting.
>>>>
>>>> On the other hand, we don't seem to be having any details/requirements
>>>> for contains(CharSequence) w.r.t. it's own argument. What if
>>>> String.contains pulls the values one charAt at a time? The concurrency
>>>> requirements are not enforceable in CharSequence alone then, unless you
>>>> call the supposed-to-be-atomic toString() first.
>>>>
>>>> -Aleksey.
>>>>
>>>>> On 03/20/2015 06:48 PM, Vitaly Davidovich wrote:
>>>>> Yes, but that ought to be for the caller to decide.  Also, although the
>>>>> resulting String is immutable, toString() itself may observe mutation.
>>>>>
>>>>> On Fri, Mar 20, 2015 at 11:40 AM, Xueming Shen <
>> xueming.shen at oracle.com>
>>>>> wrote:
>>>>>
>>>>>>> On 03/20/2015 02:34 AM, Tomasz Kowalczewski wrote:
>>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> Current implementation of String.contains that accepts CharSequence
>>>> calls
>>>>>>> toString on it and passes resulting string to indexOf(String). This
>> IMO
>>>>>>> defeats the purpose of using CharSequences (that is to have a mutable
>>>>>>> character buffer and not allocate unnecessary objects).
>>>>>> It is arguable that cs.toString() may serve the purpose of taking a
>>>>>> snapshot of an otherwise
>>>>>> "mutable" character buffer?
>>>>>>
>>>>>> -Sherman
>>>>>>
>>>>>>
>>>>>> Is changing this a desirable development? It seems pretty
>>>> straightforward
>>>>>>> to port indexOf(String) to use CharSequence.
>>>>>>>
>>>>>>> If all you need is patch then I can work on it (I have signed OCA)
>> just
>>>>>>> wanted to make sure it is not a futile work.
>>>>
>>>>
>
>




More information about the core-libs-dev mailing list