String.indexOf optimization

Remi Forax forax at univ-mlv.fr
Fri Apr 4 15:47:38 UTC 2014


On 04/04/2014 05:18 PM, Alan Bateman wrote:
> On 04/04/2014 15:49, Zoltan Sziladi wrote:
>> Hi,
>>
>> I am new to this mailing list so please forgive me if this has been
>> discussed before.
>>
>> I was looking at the implementation of String.indexOf and I see that
>> it uses the O(n^2) naive implementation. I have been trying to find
>> out why it does not use some kind of a modern, sophisticated O(n)
>> algorithm but I have no clear answer as of now.
>>
>> My guess is that the average case should be quite good for this
>> algorithm because in practice the partial matches are actually quite
>> rare, so it should work well... usually. Also, I saw that this code
>> was last touched about 6 years ago, so maybe it was just left like
>> this?
>> My concern is actually the worst case scenario. If we compared two
>> long strings with lots of partial matches, then this would perform
>> quite poorly. Wouldn't it be worth having an O(n) implementation here
>> then? Modern O(n) pattern matching algorithms don't use much extra
>> space either.
>> The Collections.sort method also uses an algorithm that prepares for
>> worst case. Maybe a highly optimized quicksort could outperform the
>> current mergesort implementation but I suppose one of the design
>> principles behind that was also to prepare for the worst case. (Even
>> though a nicely implemented quicksort has an expected average case
>> runtime of O(nlogn) regardless of the input).
>>
>> If anyone knows why it is implemented this way or if it were possible
>> to change the implementation, I'd be happy to hear your opinion.
>> Thanks!
>>
> Here's a previous thread where someone else asked about String.indexOf
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-June/018062.html 
>
>
> If you are interested in this topic and can do better then go ahead, 
> I'm sure there would be a lot of people here would be interested in 
> space and time numbers.
>
> -Alan.

You may also want to test against the couple Pattern + Matcher,
which use BoyerMoore (at least the last time i have read the source)

Rémi







More information about the core-libs-dev mailing list