String.indexOf optimization

Fri Apr 4 15:18:13 UTC 2014

On 04/04/2014 15:49, Zoltan Sziladi wrote:
> Hi,
>
> I am new to this mailing list so please forgive me if this has been
> discussed before.
>
> I was looking at the implementation of String.indexOf and I see that
> it uses the O(n^2) naive implementation. I have been trying to find
> out why it does not use some kind of a modern, sophisticated O(n)
> algorithm but I have no clear answer as of now.
>
> My guess is that the average case should be quite good for this
> algorithm because in practice the partial matches are actually quite
> rare, so it should work well... usually. Also, I saw that this code
> was last touched about 6 years ago, so maybe it was just left like
> this?
> My concern is actually the worst case scenario. If we compared two
> long strings with lots of partial matches, then this would perform
> quite poorly. Wouldn't it be worth having an O(n) implementation here
> then? Modern O(n) pattern matching algorithms don't use much extra
> space either.
> The Collections.sort method also uses an algorithm that prepares for
> worst case. Maybe a highly optimized quicksort could outperform the
> current mergesort implementation but I suppose one of the design
> principles behind that was also to prepare for the worst case. (Even
> though a nicely implemented quicksort has an expected average case
> runtime of O(nlogn) regardless of the input).
>
> If anyone knows why it is implemented this way or if it were possible
> to change the implementation, I'd be happy to hear your opinion.
> Thanks!
>
Here's a previous thread where someone else asked about String.indexOf

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-June/018062.html

If you are interested in this topic and can do better then go ahead, I'm 
sure there would be a lot of people here would be interested in space 
and time numbers.

-Alan.