Question on String#indexOf(String)

Tue Apr 28 13:09:12 UTC 2009

2009/4/28 Rémi Forax <forax at univ-mlv.fr>

> Jim Andreou a écrit :
>
>> Answering my own question, probably most (all?) faster algorithms seem to
>> need memory proportional to the size of the alphabet, which is kind of huge
>> for Unicode, so that could be the reason.
>>
> No see:
> http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
>
> And this algorithmm is currently implemented by the regex package
> java.util.regex.
>
> Rémi

Thanks. I have been trying to verify whether repeated pattern searching
would be more performant with regex, but I only manage regex to outperform
indexOf in very pathological cases (where the pattern contains very many
prefixes of it). I'm not sure whether this is due to overhead imposed by
java.util.regex or its really a characteristic of that algorithm; if it is,
then having it as a general indexOf implementation would more likely cause a
slowdown than speed-up.

Regards,
Dimitris

>
>
>> 2009/4/28 Jim Andreou <jim.andreou at gmail.com <mailto:
>> jim.andreou at gmail.com>>
>>
>>    Hi,
>>
>>    I wonder why String#indexOf(String) is implemented as it is.
>>    Apparently, when a character mismatch with the searched pattern is
>>    found, the pattern is only shifted by one character, but there are
>>    faster algorithms, for example
>>    see
>> http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/index.html
>> .
>>    Was anything smarter tried out but had significant disadvantages
>>    for general use? What advantages does the current implementation
>>    have? It looks very pessimistic.
>>
>>    Regards,
>>    Dimitris Andreou
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090428/043d5642/attachment.html>