String.indexOf optimization

Vitaly Davidovich vitalyd at gmail.com
Sat Jan 24 21:19:08 UTC 2015


Try -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic:_indexOf

sent from my phone
On Jan 24, 2015 4:13 PM, "Martin Buchholz" <martinrb at google.com> wrote:

> I took another look.  It seems to me that the java code implementing
> String.indexOf still assumes the old String layout and so is full of dead
> code, which should be excised just to be clean.  (I continue to think the
> change to String layout came too late in the life of Java and is incomplete
> in ways such as this one).  Is there an easy way to disable a hotspot
> intrinsic for testing?
>
> Here's a start at such a change:
>
> diff --git
> a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> @@ -1319,7 +1319,7 @@
>       *          or {@code -1} if there is no such occurrence.
>       */
>      public int indexOf(String str, int fromIndex) {
> -        return String.indexOf(value, 0, count, str, fromIndex);
> +        return String.indexOf(value, str, fromIndex);
>      }
>
>      /**
> diff --git a/src/java.base/share/classes/java/lang/String.java
> b/src/java.base/share/classes/java/lang/String.java
> --- a/src/java.base/share/classes/java/lang/String.java
> +++ b/src/java.base/share/classes/java/lang/String.java
> @@ -1703,8 +1703,7 @@
>       *          or {@code -1} if there is no such occurrence.
>       */
>      public int indexOf(String str, int fromIndex) {
> -        return indexOf(value, 0, value.length,
> -                str.value, 0, str.value.length, fromIndex);
> +        return indexOf(value, str.value, fromIndex);
>      }
>
>      /**
> @@ -1713,16 +1712,11 @@
>       * is the string being searched for.
>       *
>       * @param   source       the characters being searched.
> -     * @param   sourceOffset offset of the source string.
> -     * @param   sourceCount  count of the source string.
>       * @param   target       the characters being searched for.
>       * @param   fromIndex    the index to begin searching from.
>       */
> -    static int indexOf(char[] source, int sourceOffset, int sourceCount,
> -            String target, int fromIndex) {
> -        return indexOf(source, sourceOffset, sourceCount,
> -                       target.value, 0, target.value.length,
> -                       fromIndex);
> +    static int indexOf(char[] source, String target, int fromIndex) {
> +        return indexOf(source, target.value, fromIndex);
>      }
>
>      /**
> @@ -1731,47 +1725,38 @@
>       * is the string being searched for.
>       *
>       * @param   source       the characters being searched.
> -     * @param   sourceOffset offset of the source string.
> -     * @param   sourceCount  count of the source string.
>       * @param   target       the characters being searched for.
> -     * @param   targetOffset offset of the target string.
> -     * @param   targetCount  count of the target string.
>       * @param   fromIndex    the index to begin searching from.
>       */
> -    static int indexOf(char[] source, int sourceOffset, int sourceCount,
> -            char[] target, int targetOffset, int targetCount,
> -            int fromIndex) {
> -        if (fromIndex >= sourceCount) {
> -            return (targetCount == 0 ? sourceCount : -1);
> +    static int indexOf(char[] source, char[] target, int fromIndex) {
> +        final int slen = source.length;
> +        final int tlen = target.length;
> +        if (fromIndex >= slen) {
> +            return (tlen == 0 ? slen : -1);
>          }
>          if (fromIndex < 0) {
>              fromIndex = 0;
>          }
> -        if (targetCount == 0) {
> +        if (tlen == 0) {
>              return fromIndex;
>          }
>
> -        char first = target[targetOffset];
> -        int max = sourceOffset + (sourceCount - targetCount);
> +        char first = target[0];
>
> -        for (int i = sourceOffset + fromIndex; i <= max; i++) {
> +        outer: for (int i = fromIndex, max = slen - tlen;; i++) {
>              /* Look for first character. */
> -            if (source[i] != first) {
> -                while (++i <= max && source[i] != first);
> +            inner: for (;; i++) {
> +                if (i >= max)
> +                    break outer;
> +                if (source[i] == first)
> +                    break inner;
>              }
>
> -            /* Found first character, now look at the rest of v2 */
> -            if (i <= max) {
> -                int j = i + 1;
> -                int end = j + targetCount - 1;
> -                for (int k = targetOffset + 1; j < end && source[j]
> -                        == target[k]; j++, k++);
> -
> -                if (j == end) {
> -                    /* Found whole string. */
> -                    return i - sourceOffset;
> -                }
> +            for (int k = 1, j = i + 1, end = i + tlen; j < end; j++, k++)
> {
> +                if (source[j] != target[k])
> +                    continue outer;
>              }
> +            return i;
>          }
>          return -1;
>      }
>
>
> On Mon, Jan 19, 2015 at 11:13 AM, Zoltan Sziladi <kissziszi at gmail.com>
> wrote:
>
>> Martin, yes, we're talking about that method.
>>
>> Other than tightening that code some... if we find some algorithm that
>> outperforms the naive implementation under certain conditions (let's say
>> when the searched pattern is longer than 10 characters), would it be worth
>> including it as a special case? (For example naive would run normally but
>> if pattern is longer than 10 characters then the other algorithm would
>> run). Or do you think that would just make the code too complicated without
>> enough benefits?
>>
>> Regards,
>> Zoltan
>>
>> On Mon, Jan 12, 2015 at 12:11 PM, Martin Buchholz <martinrb at google.com>
>> wrote:
>>
>>> If there's a clean improvement in the java code, it's worth putting in.
>>> You can try benchmarking with -Xint.
>>> Are we talking about this method?
>>>
>>>     static int indexOf(char[] source, int sourceOffset, int sourceCount,
>>>             char[] target, int targetOffset, int targetCount,
>>>             int fromIndex) {
>>>
>>> It does look like we can tighten the code up a little...
>>>
>>>
>>> On Thu, Jan 8, 2015 at 3:05 PM, Zoltan Sziladi <kissziszi at gmail.com>
>>> wrote:
>>>
>>>> Thanks for the info.
>>>> So that basically means we have 2 implementations of indexOf currently,
>>>> one
>>>> is in HotSpot, the other is in the JDK itself, which rarely gets
>>>> executed.
>>>> Aside from this later fact, isn't it still worth improving the JDK
>>>> implementation if it is possible? I know that the intrinsified method
>>>> gets
>>>> executed most of the time, but still, if we can improve the JDK
>>>> implementation also, why not? I don't know much about other JVMs but
>>>> maybe
>>>> a few don't intrinsify it?
>>>>
>>>> Is there any existing test suite which is considered conclusive enough
>>>> that
>>>> if an implementation beats the naive algorithm in those testcases then
>>>> it
>>>> could be considered as a replacement in the JDK?
>>>>
>>>> Regards,
>>>> Zoltan
>>>>
>>>>
>>>> On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>>> wrote:
>>>>
>>>> > The java impl you saw would be called by (a) interpreter, (b) if you
>>>> > explicitly disable intrinsification of this function, or (c) some
>>>> other JVM
>>>> > that doesn't intrinsify this method (or any method).
>>>> >
>>>> > People don't usually disable intrinsics; if they do, it's because
>>>> they hit
>>>> > some JIT bug and may disable it.
>>>> >
>>>> > On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi at gmail.com>
>>>> > wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> Thanks everyone for all the info.
>>>> >> So, just to go step by step in understanding this.
>>>> >> Andrew said HotSpot would ignore my implementation. So why is there
>>>> an
>>>> >> implementation of indexOf at all in the JDK, if that's not the code
>>>> that's
>>>> >> executed? Is it just a default fallback? When is the indexOf
>>>> function not
>>>> >> intrinsified? When do people usually disable intrinsification?
>>>> >> Sorry if these are newbie questions, I'm new to this part of Java.
>>>> >>
>>>> >> Regards,
>>>> >> Zoltan
>>>> >>
>>>> >> On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph at redhat.com> wrote:
>>>> >>
>>>> >> > Hi,
>>>> >> >
>>>> >> > On 05/01/15 18:59, Zoltan Sziladi wrote:
>>>> >> >
>>>> >> > > This discussion was a long time ago, I was just reading through
>>>> it to
>>>> >> > check
>>>> >> > > again what was the last state of the discussion about the
>>>> >> String.indexOf.
>>>> >> > > There is one part which I still do not understand, hopefully
>>>> someone
>>>> >> > could
>>>> >> > > shed some light on it. A few emails ago Martin mentioned
>>>> >> > >
>>>> >> > > "Hotspot seems to have some intrinsification of String.indexOf,
>>>> which
>>>> >> > > confuses me.
>>>> >> > > Hotspot seems the right place to provide more optimizations for
>>>> this,
>>>> >> > since
>>>> >> > > there has been a fair amount of work creating high-performance
>>>> >> low-level
>>>> >> > > implementations of this idea in C."
>>>> >> > >
>>>> >> > > Then Ivan asked what that actually meant, whether hotspot
>>>> actually
>>>> >> > replaced
>>>> >> > > the jdk implementation with a low level optimized C
>>>> implementation,
>>>> >> but I
>>>> >> > > never saw an answer to that.
>>>> >> >
>>>> >> > You can have a look at an implementation of
>>>> >> MacroAssembler::string_indexof
>>>> >> > in
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >>
>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/vm/macroAssembler_x86.cpp
>>>> >> >
>>>> >> > > Can someone please explain this? If we somehow found an
>>>> algorithm that
>>>> >> > beat
>>>> >> > > the naive implementation in the average case, would it be
>>>> possible to
>>>> >> > just
>>>> >> > > implement it in the JDK?
>>>> >> >
>>>> >> > No, because HotSpot would ignore it.  Any speed improvements have
>>>> to be
>>>> >> > done in the architecture-dependent files.
>>>> >> >
>>>> >> > Andrew.
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>



More information about the core-libs-dev mailing list