String.indexOf optimization

Zoltan Sziladi kissziszi at gmail.com
Fri Apr 4 14:49:58 UTC 2014


Hi,

I am new to this mailing list so please forgive me if this has been
discussed before.

I was looking at the implementation of String.indexOf and I see that
it uses the O(n^2) naive implementation. I have been trying to find
out why it does not use some kind of a modern, sophisticated O(n)
algorithm but I have no clear answer as of now.

My guess is that the average case should be quite good for this
algorithm because in practice the partial matches are actually quite
rare, so it should work well... usually. Also, I saw that this code
was last touched about 6 years ago, so maybe it was just left like
this?
My concern is actually the worst case scenario. If we compared two
long strings with lots of partial matches, then this would perform
quite poorly. Wouldn't it be worth having an O(n) implementation here
then? Modern O(n) pattern matching algorithms don't use much extra
space either.
The Collections.sort method also uses an algorithm that prepares for
worst case. Maybe a highly optimized quicksort could outperform the
current mergesort implementation but I suppose one of the design
principles behind that was also to prepare for the worst case. (Even
though a nicely implemented quicksort has an expected average case
runtime of O(nlogn) regardless of the input).

If anyone knows why it is implemented this way or if it were possible
to change the implementation, I'd be happy to hear your opinion.
Thanks!

Regards,
Zoltan



More information about the core-libs-dev mailing list