String.indexOf optimization

Zoltan Sziladi

4 Apr 2014 4 Apr '14

2:49 p.m.

Hi, I am new to this mailing list so please forgive me if this has been discussed before. I was looking at the implementation of String.indexOf and I see that it uses the O(n^2) naive implementation. I have been trying to find out why it does not use some kind of a modern, sophisticated O(n) algorithm but I have no clear answer as of now. My guess is that the average case should be quite good for this algorithm because in practice the partial matches are actually quite rare, so it should work well... usually. Also, I saw that this code was last touched about 6 years ago, so maybe it was just left like this? My concern is actually the worst case scenario. If we compared two long strings with lots of partial matches, then this would perform quite poorly. Wouldn't it be worth having an O(n) implementation here then? Modern O(n) pattern matching algorithms don't use much extra space either. The Collections.sort method also uses an algorithm that prepares for worst case. Maybe a highly optimized quicksort could outperform the current mergesort implementation but I suppose one of the design principles behind that was also to prepare for the worst case. (Even though a nicely implemented quicksort has an expected average case runtime of O(nlogn) regardless of the input). If anyone knows why it is implemented this way or if it were possible to change the implementation, I'd be happy to hear your opinion. Thanks! Regards, Zoltan

Show replies by date

Alan Bateman

4 Apr 4 Apr

3:18 p.m.

On 04/04/2014 15:49, Zoltan Sziladi wrote:

...

Hi,

I am new to this mailing list so please forgive me if this has been discussed before.

I was looking at the implementation of String.indexOf and I see that it uses the O(n^2) naive implementation. I have been trying to find out why it does not use some kind of a modern, sophisticated O(n) algorithm but I have no clear answer as of now.

My guess is that the average case should be quite good for this algorithm because in practice the partial matches are actually quite rare, so it should work well... usually. Also, I saw that this code was last touched about 6 years ago, so maybe it was just left like this? My concern is actually the worst case scenario. If we compared two long strings with lots of partial matches, then this would perform quite poorly. Wouldn't it be worth having an O(n) implementation here then? Modern O(n) pattern matching algorithms don't use much extra space either. The Collections.sort method also uses an algorithm that prepares for worst case. Maybe a highly optimized quicksort could outperform the current mergesort implementation but I suppose one of the design principles behind that was also to prepare for the worst case. (Even though a nicely implemented quicksort has an expected average case runtime of O(nlogn) regardless of the input).

If anyone knows why it is implemented this way or if it were possible to change the implementation, I'd be happy to hear your opinion. Thanks!

Here's a previous thread where someone else asked about String.indexOf http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-June/018062.html If you are interested in this topic and can do better then go ahead, I'm sure there would be a lot of people here would be interested in space and time numbers. -Alan.

Remi Forax

3:47 p.m.

On 04/04/2014 05:18 PM, Alan Bateman wrote:

...

On 04/04/2014 15:49, Zoltan Sziladi wrote:

...
Hi,

I am new to this mailing list so please forgive me if this has been discussed before.

I was looking at the implementation of String.indexOf and I see that it uses the O(n^2) naive implementation. I have been trying to find out why it does not use some kind of a modern, sophisticated O(n) algorithm but I have no clear answer as of now.

My guess is that the average case should be quite good for this algorithm because in practice the partial matches are actually quite rare, so it should work well... usually. Also, I saw that this code was last touched about 6 years ago, so maybe it was just left like this? My concern is actually the worst case scenario. If we compared two long strings with lots of partial matches, then this would perform quite poorly. Wouldn't it be worth having an O(n) implementation here then? Modern O(n) pattern matching algorithms don't use much extra space either. The Collections.sort method also uses an algorithm that prepares for worst case. Maybe a highly optimized quicksort could outperform the current mergesort implementation but I suppose one of the design principles behind that was also to prepare for the worst case. (Even though a nicely implemented quicksort has an expected average case runtime of O(nlogn) regardless of the input).

If anyone knows why it is implemented this way or if it were possible to change the implementation, I'd be happy to hear your opinion. Thanks!

Here's a previous thread where someone else asked about String.indexOf

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-June/018062.html

If you are interested in this topic and can do better then go ahead, I'm sure there would be a lot of people here would be interested in space and time numbers.

-Alan.

You may also want to test against the couple Pattern + Matcher, which use BoyerMoore (at least the last time i have read the source) Rémi

Martin Buchholz

5:13 p.m.

Summary: Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case. The 64k size of the character set will make Boyer-Moore harder to implement efficiently. On Fri, Apr 4, 2014 at 7:49 AM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...

Hi,

I am new to this mailing list so please forgive me if this has been discussed before.

I was looking at the implementation of String.indexOf and I see that it uses the O(n^2) naive implementation. I have been trying to find out why it does not use some kind of a modern, sophisticated O(n) algorithm but I have no clear answer as of now.

My guess is that the average case should be quite good for this algorithm because in practice the partial matches are actually quite rare, so it should work well... usually. Also, I saw that this code was last touched about 6 years ago, so maybe it was just left like this? My concern is actually the worst case scenario. If we compared two long strings with lots of partial matches, then this would perform quite poorly. Wouldn't it be worth having an O(n) implementation here then? Modern O(n) pattern matching algorithms don't use much extra space either. The Collections.sort method also uses an algorithm that prepares for worst case. Maybe a highly optimized quicksort could outperform the current mergesort implementation but I suppose one of the design principles behind that was also to prepare for the worst case. (Even though a nicely implemented quicksort has an expected average case runtime of O(nlogn) regardless of the input).

If anyone knows why it is implemented this way or if it were possible to change the implementation, I'd be happy to hear your opinion. Thanks!

Regards, Zoltan

Zoltan Sziladi

7 Apr 7 Apr

7:56 a.m.

Thanks everyone for the input. Even though many people who are way smarter than me already tried to beat the naive implementation, I'll try too, just for the fun of it. I'll post my updates here if I find something worthy of mentioning. Zoltan On Fri, Apr 4, 2014 at 7:13 PM, Martin Buchholz <martinrb@google.com> wrote:

...

Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case. The 64k size of the character set will make Boyer-Moore harder to implement efficiently.

On Fri, Apr 4, 2014 at 7:49 AM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

I am new to this mailing list so please forgive me if this has been discussed before.

I was looking at the implementation of String.indexOf and I see that it uses the O(n^2) naive implementation. I have been trying to find out why it does not use some kind of a modern, sophisticated O(n) algorithm but I have no clear answer as of now.

My guess is that the average case should be quite good for this algorithm because in practice the partial matches are actually quite rare, so it should work well... usually. Also, I saw that this code was last touched about 6 years ago, so maybe it was just left like this? My concern is actually the worst case scenario. If we compared two long strings with lots of partial matches, then this would perform quite poorly. Wouldn't it be worth having an O(n) implementation here then? Modern O(n) pattern matching algorithms don't use much extra space either. The Collections.sort method also uses an algorithm that prepares for worst case. Maybe a highly optimized quicksort could outperform the current mergesort implementation but I suppose one of the design principles behind that was also to prepare for the worst case. (Even though a nicely implemented quicksort has an expected average case runtime of O(nlogn) regardless of the input).

If anyone knows why it is implemented this way or if it were possible to change the implementation, I'd be happy to hear your opinion. Thanks!

Regards, Zoltan

Ivan Gerasimov

15 Apr 15 Apr

5:13 p.m.

Hi everyone! On 04.04.2014 21:13, Martin Buchholz wrote:

...

Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case. But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2. Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/ I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable. Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 *573.879* 9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062* 6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 *152.850 * 1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178* 9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603* 1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177* 1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 *387.338* 10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669* 24.343 nsec/op Benchmarks ending with A use the current indexOf implementation, with B use the modified version. These numbers show from 1.4% up to 28% performance increase. The full listing is below. Suggestions about what else to test are welcome! Sincerely yours, Ivan --------------------- /** * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. */ package org.benches; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench { // ||| final static char[] source1 = "abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray(); final static char[] source2 = "acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray(); final static char[] source3 = "tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray(); final static char[] target1 = "abd".toCharArray(); final static char[] source4 = "ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray(); final static char[] source5 = "hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray(); final static char[] target2 = "ab".toCharArray(); final static char[] source6 = "lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray(); final static char[] target3 = "a".toCharArray(); final static char[] source7 = "lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray(); final static char[] target4 = "abcde".toCharArray(); final static char[] source8 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray(); final static char[] target5 = "aaaab".toCharArray(); @GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_3_A(BlackHole bh) { bh.consume(indexOfA(source3, 0, source3.length, target1, 0, target1.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_3_B(BlackHole bh) { bh.consume(indexOfB(source3, 0, source3.length, target1, 0, target1.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_4_A(BlackHole bh) { bh.consume(indexOfA(source4, 0, source4.length, target2, 0, target2.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_4_B(BlackHole bh) { bh.consume(indexOfB(source4, 0, source4.length, target2, 0, target2.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_5_A(BlackHole bh) { bh.consume(indexOfA(source5, 0, source5.length, target2, 0, target2.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_5_B(BlackHole bh) { bh.consume(indexOfB(source5, 0, source5.length, target2, 0, target2.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_6_A(BlackHole bh) { bh.consume(indexOfA(source6, 0, source6.length, target3, 0, target3.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_6_B(BlackHole bh) { bh.consume(indexOfB(source6, 0, source6.length, target3, 0, target3.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_7_A(BlackHole bh) { bh.consume(indexOfA(source7, 0, source7.length, target4, 0, target4.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_7_B(BlackHole bh) { bh.consume(indexOfB(source7, 0, source7.length, target4, 0, target4.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_8_A(BlackHole bh) { bh.consume(indexOfA(source8, 0, source8.length, target5, 0, target5.length, 0)); } @GenerateMicroBenchmark public void benchIndexOf_8_B(BlackHole bh) { bh.consume(indexOfB(source8, 0, source8.length, target5, 0, target5.length, 0)); } // implementations static int indexOfA(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex >= sourceCount) { return (targetCount == 0 ? sourceCount : -1); } if (fromIndex < 0) { fromIndex = 0; } if (targetCount == 0) { return fromIndex; } char first = target[targetOffset]; int max = sourceOffset + (sourceCount - targetCount); for (int i = sourceOffset + fromIndex; i <= max; i++) { /* Look for first character. */ if (source[i] != first) { while (++i <= max && source[i] != first); } /* Found first character, now look at the rest of v2 */ if (i <= max) { int j = i + 1; int end = j + targetCount - 1; for (int k = targetOffset + 1; j < end && source[j] == target[k]; j++, k++); if (j == end) { /* Found whole string. */ return i - sourceOffset; } } } return -1; } static int indexOfB(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex >= sourceCount) { return (targetCount == 0 ? sourceCount : -1); } if (fromIndex < 0) { fromIndex = 0; } if (targetCount == 0) { return fromIndex; } char first = target[targetOffset]; char second = 0; if (targetCount != 1) { second = target[targetOffset + 1]; } int max = sourceOffset + (sourceCount - targetCount); for (int i = sourceOffset + fromIndex; i <= max; i++) { /* Look for first character. */ if (source[i] != first) { while (++i <= max && source[i] != first); } if (targetCount == 1) { return (i <= max) ? i - sourceOffset : -1; } /* Found first character, now look at the rest of v2 */ if (i <= max && source[i + 1] == second) { int j = i + 2; int end = i + targetCount; for (int k = targetOffset + 2; j < end && source[j] == target[k]; j++, k++); if (j == end) { /* Found whole string. */ return i - sourceOffset; } } } return -1; } }

Martin Buchholz

10:53 p.m.

Hi Ivan, There's already an indexOf(int, int) that allows a user to explicitly search for a char (or character). Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C. On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com>wrote:

...

Hi everyone!

On 04.04.2014 21:13, Martin Buchholz wrote:

Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case.

But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2.

Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/

I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable.

Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 * 573.879* 9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062* 6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 * 152.850 * 1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178* 9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603* 1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177* 1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 * 387.338* 10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669* 24.343 nsec/op

Benchmarks ending with A use the current indexOf implementation, with B use the modified version. These numbers show from 1.4% up to 28% performance increase.

The full listing is below.

Suggestions about what else to test are welcome!

Sincerely yours, Ivan

---------------------

/** * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. */ package org.benches;

import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench {

// ||| final static char[] source1 = "abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray(); final static char[] source2 = "acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray(); final static char[] source3 = "tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray(); final static char[] target1 = "abd".toCharArray();

final static char[] source4 = "ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray(); final static char[] source5 = "hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray(); final static char[] target2 = "ab".toCharArray();

final static char[] source6 = "lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray(); final static char[] target3 = "a".toCharArray();

final static char[] source7 = "lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray(); final static char[] target4 = "abcde".toCharArray();

final static char[] source8 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray(); final static char[] target5 = "aaaab".toCharArray();

@GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_3_A(BlackHole bh) { bh.consume(indexOfA(source3, 0, source3.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_3_B(BlackHole bh) { bh.consume(indexOfB(source3, 0, source3.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_4_A(BlackHole bh) { bh.consume(indexOfA(source4, 0, source4.length, target2, 0, target2.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_4_B(BlackHole bh) { bh.consume(indexOfB(source4, 0, source4.length, target2, 0, target2.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_5_A(BlackHole bh) { bh.consume(indexOfA(source5, 0, source5.length, target2, 0, target2.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_5_B(BlackHole bh) { bh.consume(indexOfB(source5, 0, source5.length, target2, 0, target2.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_6_A(BlackHole bh) { bh.consume(indexOfA(source6, 0, source6.length, target3, 0, target3.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_6_B(BlackHole bh) { bh.consume(indexOfB(source6, 0, source6.length, target3, 0, target3.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_7_A(BlackHole bh) { bh.consume(indexOfA(source7, 0, source7.length, target4, 0, target4.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_7_B(BlackHole bh) { bh.consume(indexOfB(source7, 0, source7.length, target4, 0, target4.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_8_A(BlackHole bh) { bh.consume(indexOfA(source8, 0, source8.length, target5, 0, target5.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_8_B(BlackHole bh) { bh.consume(indexOfB(source8, 0, source8.length, target5, 0, target5.length, 0)); }

// implementations

static int indexOfA(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex >= sourceCount) { return (targetCount == 0 ? sourceCount : -1); } if (fromIndex < 0) { fromIndex = 0; } if (targetCount == 0) { return fromIndex; }

char first = target[targetOffset]; int max = sourceOffset + (sourceCount - targetCount);

for (int i = sourceOffset + fromIndex; i <= max; i++) { /* Look for first character. */ if (source[i] != first) { while (++i <= max && source[i] != first); }

/* Found first character, now look at the rest of v2 */ if (i <= max) { int j = i + 1; int end = j + targetCount - 1; for (int k = targetOffset + 1; j < end && source[j] == target[k]; j++, k++);

if (j == end) { /* Found whole string. */ return i - sourceOffset; } } } return -1; }

static int indexOfB(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex >= sourceCount) { return (targetCount == 0 ? sourceCount : -1); } if (fromIndex < 0) { fromIndex = 0; } if (targetCount == 0) { return fromIndex; } char first = target[targetOffset]; char second = 0; if (targetCount != 1) { second = target[targetOffset + 1]; } int max = sourceOffset + (sourceCount - targetCount); for (int i = sourceOffset + fromIndex; i <= max; i++) { /* Look for first character. */ if (source[i] != first) { while (++i <= max && source[i] != first); } if (targetCount == 1) { return (i <= max) ? i - sourceOffset : -1; } /* Found first character, now look at the rest of v2 */ if (i <= max && source[i + 1] == second) { int j = i + 2; int end = i + targetCount; for (int k = targetOffset + 2; j < end && source[j] == target[k]; j++, k++);

if (j == end) { /* Found whole string. */ return i - sourceOffset; } } } return -1; } }

Ivan Gerasimov

18 Apr 18 Apr

8:32 a.m.

On 16.04.2014 2:53, Martin Buchholz wrote:

...

Hi Ivan,

There's already an indexOf(int, int) that allows a user to explicitly search for a char (or character).

Sure. What I meant was not to introduce a special function for searching a single character, but to restrict the general case loop to search for a substring of at least 2 characters. Having this condition hold, we can enter the loop only when two starting characters match, and this can save us a few nanoseconds in many cases.

...

Hotspot seems to have some intrinsification of String.indexOf, which confuses me.

Does it mean that that the java implementation of indexOf is never compiled? When hotspot replaces the jdk implementation with its own one? Is it ever worth to try to optimize the java implementation?

...

Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C.

The hotspot's intrinsic is already optimized for searching substrings of different length. Sincerely yours, Ivan

...

On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com <mailto:ivan.gerasimov@oracle.com>> wrote:

Hi everyone!

On 04.04.2014 21:13, Martin Buchholz wrote:

...
Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case.

But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2.

Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/ <http://cr.openjdk.java.net/%7Eigerasim/indexof/0/webrev/>

I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable.

Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 *573.879*9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062*6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 *152.850 *1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178*9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603*1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177*1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 *387.338*10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669*24.343 nsec/op

Benchmarks ending with A use the current indexOf implementation, with B use the modified version. These numbers show from 1.4% up to 28% performance increase.

The full listing is below.

Suggestions about what else to test are welcome!

Sincerely yours, Ivan

---------------------

/** * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com <http://www.oracle.com> if you need additional information or have any * questions. */ package org.benches;

import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench {

// ||| final static char[] source1 = "abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray(); final static char[] source2 = "acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray(); final static char[] source3 = "tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray(); final static char[] target1 = "abd".toCharArray();

final static char[] source4 = "ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray(); final static char[] source5 = "hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray(); final static char[] target2 = "ab".toCharArray();

final static char[] source6 = "lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray(); final static char[] target3 = "a".toCharArray();

final static char[] source7 = "lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray(); final static char[] target4 = "abcde".toCharArray();

final static char[] source8 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray(); final static char[] target5 = "aaaab".toCharArray();

@GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchm

Martin Buchholz

4:43 p.m.

I remain skeptical that modifying the implementation of static package private String.indexOf is the right approach. If we can produce high-performance implementations of these, perhaps making them public in Arrays.java is the right way. 1766 if (targetCount == 1) {1767 return (i <= max) ? i - sourceOffset : -1; If you're going to special case targetCount == 1, you shouldn't have a test for it in the main loop, since you slow down the general case. Instead, you can sequester the special cases like this: if (targetCount <= 1) { if (targetCount == 0) ... else ... } // now assume targetCount >= 2 On Fri, Apr 18, 2014 at 1:32 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com>wrote:

...

On 16.04.2014 2:53, Martin Buchholz wrote:

Hi Ivan,

There's already an indexOf(int, int) that allows a user to explicitly search for a char (or character).

Sure. What I meant was not to introduce a special function for searching a single character, but to restrict the general case loop to search for a substring of at least 2 characters. Having this condition hold, we can enter the loop only when two starting characters match, and this can save us a few nanoseconds in many cases.

Hotspot seems to have some intrinsification of String.indexOf, which confuses me.

Does it mean that that the java implementation of indexOf is never compiled? When hotspot replaces the jdk implementation with its own one? Is it ever worth to try to optimize the java implementation?

Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C.

The hotspot's intrinsic is already optimized for searching substrings of different length.

Sincerely yours, Ivan

On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov < ivan.gerasimov@oracle.com> wrote:

...
Hi everyone!

On 04.04.2014 21:13, Martin Buchholz wrote:

Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case.

But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2.

Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/

I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable.

Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 * 573.879* 9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062* 6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 * 152.850 * 1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178* 9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603* 1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177* 1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 * 387.338* 10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669* 24.343 nsec/op

Benchmarks ending with A use the current indexOf implementation, with B use the modified version. These numbers show from 1.4% up to 28% performance increase.

The full listing is below.

Suggestions about what else to test are welcome!

Sincerely yours, Ivan

---------------------

/** * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. */ package org.benches;

import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench {

// ||| final static char[] source1 = "abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray(); final static char[] source2 = "acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray(); final static char[] source3 = "tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray(); final static char[] target1 = "abd".toCharArray();

final static char[] source4 = "ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray(); final static char[] source5 = "hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray(); final static char[] target2 = "ab".toCharArray();

final static char[] source6 = "lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray(); final static char[] target3 = "a".toCharArray();

final static char[] source7 = "lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray(); final static char[] target4 = "abcde".toCharArray();

final static char[] source8 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray(); final static char[] target5 = "aaaab".toCharArray();

@GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchm

Zoltan Sziladi

5 Jan 5 Jan

6:59 p.m.

Hi, This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned "Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C." Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that. Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK? Also, is there a test set which we could consider conclusive enough to actually change the implementation based on results from that? (For example if I create an implementation that beats the naive algorithm in those testcases, then we could consider it faster in average case) Thanks! Zoltan On Fri, Apr 18, 2014 at 9:43 AM, Martin Buchholz <martinrb@google.com> wrote:

...

I remain skeptical that modifying the implementation of static package private String.indexOf is the right approach.

If we can produce high-performance implementations of these, perhaps making them public in Arrays.java is the right way.

1766 if (targetCount == 1) {1767 return (i <= max) ? i - sourceOffset : -1;

If you're going to special case targetCount == 1, you shouldn't have a test for it in the main loop, since you slow down the general case. Instead, you can sequester the special cases like this:

if (targetCount <= 1) { if (targetCount == 0) ... else ... }

// now assume targetCount >= 2

On Fri, Apr 18, 2014 at 1:32 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com>wrote:

...
On 16.04.2014 2:53, Martin Buchholz wrote:

Hi Ivan,

There's already an indexOf(int, int) that allows a user to explicitly search for a char (or character).

Sure. What I meant was not to introduce a special function for searching a single character, but to restrict the general case loop to search for a substring of at least 2 characters. Having this condition hold, we can enter the loop only when two starting characters match, and this can save us a few nanoseconds in many cases.

Hotspot seems to have some intrinsification of String.indexOf, which confuses me.

Does it mean that that the java implementation of indexOf is never compiled? When hotspot replaces the jdk implementation with its own one? Is it ever worth to try to optimize the java implementation?

Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C.

The hotspot's intrinsic is already optimized for searching substrings of different length.

Sincerely yours, Ivan

On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov < ivan.gerasimov@oracle.com> wrote:

...
Hi everyone!

On 04.04.2014 21:13, Martin Buchholz wrote:

Summary:

Many people (myself included) have looked at this problem. It's

unlikely

...
...
that String.indexOf will change. It's hard to beat the naive implementation in the typical case.

But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2.

Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/

I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable.

Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 * 573.879* 9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062* 6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 * 152.850 * 1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178* 9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603* 1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177* 1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 * 387.338* 10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669* 24.343 nsec/op

Benchmarks ending with A use the current indexOf implementation, with B use the modified version. These numbers show from 1.4% up to 28% performance increase.

The full listing is below.

Suggestions about what else to test are welcome!

Sincerely yours, Ivan

---------------------

/** * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. */ package org.benches;

import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench {

// ||| final static char[] source1 =

"abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray();

...
final static char[] source2 =

"acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray();

...
final static char[] source3 =

"tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray();

...
final static char[] target1 = "abd".toCharArray();

final static char[] source4 =

"ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray();

...
final static char[] source5 =

"hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray();

...
final static char[] target2 = "ab".toCharArray();

final static char[] source6 =

"lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray();

...
final static char[] target3 = "a".toCharArray();

final static char[] source7 =

"lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray();

...
final static char[] target4 = "abcde".toCharArray();

final static char[] source8 =

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray();

...
final static char[] target5 = "aaaab".toCharArray();

@GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchm

Martin Buchholz

7:31 p.m.

Evidence that hotspot tries to intrinsify String.indexOf: do_intrinsic(_indexOf, java_lang_String, indexOf_name, string_int_signature, F_R) \ do_name( indexOf_name, "indexOf") \ So work would have to be done at the hotspot intrinsics level (not easy!) Also, the problem is that Boyer-Moore is a fundamental improvement in string searching, but its overhead is high enough that it's unlikely to help with typical input strings found in the Real World. I think we would want to split into two implementations and only do Boyer-Moore if it looks profitable. Similarly for j.u.r.Pattern's regex compiler. I still think it's sufficiently difficult that effort is best applied elsewhere. On Mon, Jan 5, 2015 at 10:59 AM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...

Hi,

This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK? Also, is there a test set which we could consider conclusive enough to actually change the implementation based on results from that? (For example if I create an implementation that beats the naive algorithm in those testcases, then we could consider it faster in average case)

Thanks!

Zoltan

On Fri, Apr 18, 2014 at 9:43 AM, Martin Buchholz <martinrb@google.com> wrote:

...
I remain skeptical that modifying the implementation of static package private String.indexOf is the right approach.

If we can produce high-performance implementations of these, perhaps making them public in Arrays.java is the right way.

1766 if (targetCount == 1) {1767 return (i <= max) ? i - sourceOffset : -1;

If you're going to special case targetCount == 1, you shouldn't have a test for it in the main loop, since you slow down the general case. Instead, you can sequester the special cases like this:

if (targetCount <= 1) { if (targetCount == 0) ... else ... }

// now assume targetCount >= 2

On Fri, Apr 18, 2014 at 1:32 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com>wrote:

...
On 16.04.2014 2:53, Martin Buchholz wrote:

Hi Ivan,

There's already an indexOf(int, int) that allows a user to explicitly search for a char (or character).

Sure. What I meant was not to introduce a special function for searching a single character, but to restrict the general case loop to search for a substring of at least 2 characters. Having this condition hold, we can enter the loop only when two starting characters match, and this can save us a few nanoseconds in many cases.

Hotspot seems to have some intrinsification of String.indexOf, which confuses me.

Does it mean that that the java implementation of indexOf is never compiled? When hotspot replaces the jdk implementation with its own one? Is it ever worth to try to optimize the java implementation?

Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C.

The hotspot's intrinsic is already optimized for searching substrings

of

...
different length.

Sincerely yours, Ivan

On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov < ivan.gerasimov@oracle.com> wrote:

...
Hi everyone!

On 04.04.2014 21:13, Martin Buchholz wrote:

Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case.

But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2.

Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/

I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable.

Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 * 573.879* 9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062* 6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 * 152.850 * 1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178* 9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603* 1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177* 1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 * 387.338* 10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669* 24.343 nsec/op

...
...
Benchmarks ending with A use the current indexOf implementation, with B use the modified version. These numbers show from 1.4% up to 28% performance increase.

The full listing is below.

Suggestions about what else to test are welcome!

Sincerely yours, Ivan

---------------------

/** * Copyright (c) 2014, Oracle and/or its affiliates. All rights

reserved.

...
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. */ package org.benches;

import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench {

// ||| final static char[] source1 =

"abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray();

...
final static char[] source2 =

"acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray();

...
final static char[] source3 =

"tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray();

...
final static char[] target1 = "abd".toCharArray();

final static char[] source4 =

"ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray();

...
final static char[] source5 =

"hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray();

...
final static char[] target2 = "ab".toCharArray();

final static char[] source6 =

"lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray();

...
final static char[] target3 = "a".toCharArray();

final static char[] source7 =

"lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray();

...
final static char[] target4 = "abcde".toCharArray();

final static char[] source8 =

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray();

...
final static char[] target5 = "aaaab".toCharArray();

@GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchm

Jonathan Yu

10:46 p.m.

Hi, Sharing in case you haven't seen it... Aleksey Shipilëv has a talk about String optimizations, which discusses these questions and specifically the Boyer-Moore algorithm. http://shipilev.net/talks/joker-Oct2014-string-catechism.pdf Page 85 talks about the intrinsification of indexOf and compares performance of the builtin vs intrinsified versions by using: -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=::_indexOf There might also be a video recording somewhere. Jonathan On Mon, Jan 5, 2015 at 2:31 PM, Martin Buchholz <martinrb@google.com> wrote:

...

Evidence that hotspot tries to intrinsify String.indexOf:

do_intrinsic(_indexOf, java_lang_String, indexOf_name, string_int_signature, F_R) \ do_name( indexOf_name, "indexOf") \

So work would have to be done at the hotspot intrinsics level (not easy!)

Also, the problem is that Boyer-Moore is a fundamental improvement in string searching, but its overhead is high enough that it's unlikely to help with typical input strings found in the Real World. I think we would want to split into two implementations and only do Boyer-Moore if it looks profitable. Similarly for j.u.r.Pattern's regex compiler.

I still think it's sufficiently difficult that effort is best applied elsewhere.

On Mon, Jan 5, 2015 at 10:59 AM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK? Also, is there a test set which we could consider conclusive enough to actually change the implementation based on results from that? (For example if I create an implementation that beats the naive algorithm in those testcases, then we could consider it faster in average case)

Thanks!

Zoltan

On Fri, Apr 18, 2014 at 9:43 AM, Martin Buchholz <martinrb@google.com> wrote:

...
I remain skeptical that modifying the implementation of static package private String.indexOf is the right approach.

If we can produce high-performance implementations of these, perhaps making them public in Arrays.java is the right way.

1766 if (targetCount == 1) {1767 return (i <= max) ? i - sourceOffset : -1;

If you're going to special case targetCount == 1, you shouldn't have a test for it in the main loop, since you slow down the general case. Instead, you can sequester the special cases like this:

if (targetCount <= 1) { if (targetCount == 0) ... else ... }

// now assume targetCount >= 2

On Fri, Apr 18, 2014 at 1:32 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com>wrote:

...
On 16.04.2014 2:53, Martin Buchholz wrote:

Hi Ivan,

There's already an indexOf(int, int) that allows a user to explicitly search for a char (or character).

Sure. What I meant was not to introduce a special function for searching a single character, but to restrict the general case loop to search for

a

...
substring of at least 2 characters. Having this condition hold, we can enter the loop only when two starting characters match, and this can save us a few nanoseconds in many cases.

Hotspot seems to have some intrinsification of String.indexOf, which confuses me.

Does it mean that that the java implementation of indexOf is never compiled? When hotspot replaces the jdk implementation with its own one? Is it ever worth to try to optimize the java implementation?

Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C.

The hotspot's intrinsic is already optimized for searching substrings of different length.

Sincerely yours, Ivan

On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov < ivan.gerasimov@oracle.com> wrote:

...
Hi everyone!

On 04.04.2014 21:13, Martin Buchholz wrote:

Summary:

Many people (myself included) have looked at this problem. It's unlikely that String.indexOf will change. It's hard to beat the naive implementation in the typical case.

But we can try to speed up this naive implementation a little bit.

We can separate the special case: When the substring's length == 1. This special case can be done fast, and in the general case we can now assume substring's length is at least 2.

Here's the webrev with the implementation of this idea: http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/

I've done some benchmarking with JMH and found no performance degradation on my machine. Of course, more testcases should be created and they should be tried on different machines to be treated as reliable.

Benchmark Mode Thr Cnt Sec Mean Mean error Units o.b.IndexOfBench.benchIndexOf_1_A avgt 1 20 5 704.739 9.104 nsec/op o.b.IndexOfBench.benchIndexOf_1_B avgt 1 20 5 * 573.879* 9.820 nsec/op o.b.IndexOfBench.benchIndexOf_2_A avgt 1 20 5 668.455 9.882 nsec/op o.b.IndexOfBench.benchIndexOf_2_B avgt 1 20 5 *476.062* 6.063 nsec/op o.b.IndexOfBench.benchIndexOf_3_A avgt 1 20 5 155.227 1.796 nsec/op o.b.IndexOfBench.benchIndexOf_3_B avgt 1 20 5 * 152.850 * 1.512 nsec/op o.b.IndexOfBench.benchIndexOf_4_A avgt 1 20 5 656.183 5.904 nsec/op o.b.IndexOfBench.benchIndexOf_4_B avgt 1 20 5 *515.178* 9.107 nsec/op o.b.IndexOfBench.benchIndexOf_5_A avgt 1 20 5 140.609 7.305 nsec/op o.b.IndexOfBench.benchIndexOf_5_B avgt 1 20 5 *129.603* 1.654 nsec/op o.b.IndexOfBench.benchIndexOf_6_A avgt 1 20 5 127.713 1.497 nsec/op o.b.IndexOfBench.benchIndexOf_6_B avgt 1 20 5 *122.177* 1.186 nsec/op o.b.IndexOfBench.benchIndexOf_7_A avgt 1 20 5 430.148 4.875 nsec/op o.b.IndexOfBench.benchIndexOf_7_B avgt 1 20 5 * 387.338* 10.904 nsec/op o.b.IndexOfBench.benchIndexOf_8_A avgt 1 20 5 2064.563 28.885 nsec/op o.b.IndexOfBench.benchIndexOf_8_B avgt 1 20 5 *1858.669* 24.343 nsec/op

...
...
Benchmarks ending with A use the current indexOf implementation,

with B

...
use the modified version. These numbers show from 1.4% up to 28% performance increase.

The full listing is below.

Suggestions about what else to test are welcome!

Sincerely yours, Ivan

---------------------

/** * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. */ package org.benches;

import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.logic.BlackHole; import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class IndexOfBench {

// ||| final static char[] source1 =

"abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray();

...
...
...
final static char[] source2 =

"acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray();

...
...
...
final static char[] source3 =

"tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray();

...
...
...
final static char[] target1 = "abd".toCharArray();

final static char[] source4 =

"ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray();

...
...
...
final static char[] source5 =

"hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray();

...
...
...
final static char[] target2 = "ab".toCharArray();

final static char[] source6 =

"lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray();

...
...
...
final static char[] target3 = "a".toCharArray();

final static char[] source7 =

"lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray();

...
...
...
final static char[] target4 = "abcde".toCharArray();

final static char[] source8 =

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray();

...
...
...
final static char[] target5 = "aaaab".toCharArray();

@GenerateMicroBenchmark public void benchIndexOf_1_A(BlackHole bh) { bh.consume(indexOfA(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_1_B(BlackHole bh) { bh.consume(indexOfB(source1, 0, source1.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_A(BlackHole bh) { bh.consume(indexOfA(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchmark public void benchIndexOf_2_B(BlackHole bh) { bh.consume(indexOfB(source2, 0, source2.length, target1, 0, target1.length, 0)); }

@GenerateMicroBenchm

Andrew Haley

6 Jan 6 Jan

9:28 a.m.

Hi, On 05/01/15 18:59, Zoltan Sziladi wrote:

...

This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...

Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files. Andrew.

Zoltan Sziladi

8 Jan 8 Jan

8:34 p.m.

Hi, Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java. Regards, Zoltan On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...

Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

...
This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Vitaly Davidovich

8:42 p.m.

The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method). People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it. On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...

Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...
Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

...
This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
...
Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Zoltan Sziladi

11:05 p.m.

Thanks for the info. So that basically means we have 2 implementations of indexOf currently, one is in HotSpot, the other is in the JDK itself, which rarely gets executed. Aside from this later fact, isn't it still worth improving the JDK implementation if it is possible? I know that the intrinsified method gets executed most of the time, but still, if we can improve the JDK implementation also, why not? I don't know much about other JVMs but maybe a few don't intrinsify it? Is there any existing test suite which is considered conclusive enough that if an implementation beats the naive algorithm in those testcases then it could be considered as a replacement in the JDK? Regards, Zoltan On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...

The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method).

People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it.

On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...
Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

...
This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
...
Can someone please explain this? If we somehow found an algorithm that beat the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Vitaly Davidovich

11:32 p.m.

I'm fairly certain that nobody would object if you send a patch with reproducible benchmark code demonstrating the gain. I think this mostly comes down to cost/benefit of doing this work, and the benefit is marginalized out of the gate by virtue of the intrinsic. You'd probably also have to demonstrate no loss in performance for the common case of a short string. I don't have an answer to your test suite question, unfortunately. Moreover, any such benchmarks would probably be measuring peak performance, which means the JIT intrinsic code. Sent from my phone On Jan 8, 2015 6:05 PM, "Zoltan Sziladi" <kissziszi@gmail.com> wrote:

...

Thanks for the info. So that basically means we have 2 implementations of indexOf currently, one is in HotSpot, the other is in the JDK itself, which rarely gets executed. Aside from this later fact, isn't it still worth improving the JDK implementation if it is possible? I know that the intrinsified method gets executed most of the time, but still, if we can improve the JDK implementation also, why not? I don't know much about other JVMs but maybe a few don't intrinsify it?

Is there any existing test suite which is considered conclusive enough that if an implementation beats the naive algorithm in those testcases then it could be considered as a replacement in the JDK?

Regards, Zoltan

On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...
The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method).

People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it.

On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...
Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

...
This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
...
Can someone please explain this? If we somehow found an algorithm

that

...
beat

...
the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Martin Buchholz

12 Jan 12 Jan

8:11 p.m.

If there's a clean improvement in the java code, it's worth putting in. You can try benchmarking with -Xint. Are we talking about this method? static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { It does look like we can tighten the code up a little... On Thu, Jan 8, 2015 at 3:05 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...

Thanks for the info. So that basically means we have 2 implementations of indexOf currently, one is in HotSpot, the other is in the JDK itself, which rarely gets executed. Aside from this later fact, isn't it still worth improving the JDK implementation if it is possible? I know that the intrinsified method gets executed most of the time, but still, if we can improve the JDK implementation also, why not? I don't know much about other JVMs but maybe a few don't intrinsify it?

Is there any existing test suite which is considered conclusive enough that if an implementation beats the naive algorithm in those testcases then it could be considered as a replacement in the JDK?

Regards, Zoltan

On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...
The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method).

People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it.

On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...
Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

...
This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
...
...
Can someone please explain this? If we somehow found an algorithm

that

...
beat

...
the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Zoltan Sziladi

19 Jan 19 Jan

7:13 p.m.

Martin, yes, we're talking about that method. Other than tightening that code some... if we find some algorithm that outperforms the naive implementation under certain conditions (let's say when the searched pattern is longer than 10 characters), would it be worth including it as a special case? (For example naive would run normally but if pattern is longer than 10 characters then the other algorithm would run). Or do you think that would just make the code too complicated without enough benefits? Regards, Zoltan On Mon, Jan 12, 2015 at 12:11 PM, Martin Buchholz <martinrb@google.com> wrote:

...

If there's a clean improvement in the java code, it's worth putting in. You can try benchmarking with -Xint. Are we talking about this method?

static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) {

It does look like we can tighten the code up a little...

On Thu, Jan 8, 2015 at 3:05 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Thanks for the info. So that basically means we have 2 implementations of indexOf currently, one is in HotSpot, the other is in the JDK itself, which rarely gets executed. Aside from this later fact, isn't it still worth improving the JDK implementation if it is possible? I know that the intrinsified method gets executed most of the time, but still, if we can improve the JDK implementation also, why not? I don't know much about other JVMs but maybe a few don't intrinsify it?

Is there any existing test suite which is considered conclusive enough that if an implementation beats the naive algorithm in those testcases then it could be considered as a replacement in the JDK?

Regards, Zoltan

On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...
The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method).

People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it.

On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...
Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

...
This discussion was a long time ago, I was just reading through it to check again what was the last state of the discussion about the String.indexOf. There is one part which I still do not understand, hopefully someone could shed some light on it. A few emails ago Martin mentioned

"Hotspot seems to have some intrinsification of String.indexOf, which confuses me. Hotspot seems the right place to provide more optimizations for this, since there has been a fair amount of work creating high-performance low-level implementations of this idea in C."

Then Ivan asked what that actually meant, whether hotspot actually replaced the jdk implementation with a low level optimized C implementation, but I never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
...
...
Can someone please explain this? If we somehow found an algorithm

that

...
beat

...
the naive implementation in the average case, would it be possible to just implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Martin Buchholz

24 Jan 24 Jan

9:13 p.m.

I took another look. It seems to me that the java code implementing String.indexOf still assumes the old String layout and so is full of dead code, which should be excised just to be clean. (I continue to think the change to String layout came too late in the life of Java and is incomplete in ways such as this one). Is there an easy way to disable a hotspot intrinsic for testing? Here's a start at such a change: diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java @@ -1319,7 +1319,7 @@ * or {@code -1} if there is no such occurrence. */ public int indexOf(String str, int fromIndex) { - return String.indexOf(value, 0, count, str, fromIndex); + return String.indexOf(value, str, fromIndex); } /** diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -1703,8 +1703,7 @@ * or {@code -1} if there is no such occurrence. */ public int indexOf(String str, int fromIndex) { - return indexOf(value, 0, value.length, - str.value, 0, str.value.length, fromIndex); + return indexOf(value, str.value, fromIndex); } /** @@ -1713,16 +1712,11 @@ * is the string being searched for. * * @param source the characters being searched. - * @param sourceOffset offset of the source string. - * @param sourceCount count of the source string. * @param target the characters being searched for. * @param fromIndex the index to begin searching from. */ - static int indexOf(char[] source, int sourceOffset, int sourceCount, - String target, int fromIndex) { - return indexOf(source, sourceOffset, sourceCount, - target.value, 0, target.value.length, - fromIndex); + static int indexOf(char[] source, String target, int fromIndex) { + return indexOf(source, target.value, fromIndex); } /** @@ -1731,47 +1725,38 @@ * is the string being searched for. * * @param source the characters being searched. - * @param sourceOffset offset of the source string. - * @param sourceCount count of the source string. * @param target the characters being searched for. - * @param targetOffset offset of the target string. - * @param targetCount count of the target string. * @param fromIndex the index to begin searching from. */ - static int indexOf(char[] source, int sourceOffset, int sourceCount, - char[] target, int targetOffset, int targetCount, - int fromIndex) { - if (fromIndex >= sourceCount) { - return (targetCount == 0 ? sourceCount : -1); + static int indexOf(char[] source, char[] target, int fromIndex) { + final int slen = source.length; + final int tlen = target.length; + if (fromIndex >= slen) { + return (tlen == 0 ? slen : -1); } if (fromIndex < 0) { fromIndex = 0; } - if (targetCount == 0) { + if (tlen == 0) { return fromIndex; } - char first = target[targetOffset]; - int max = sourceOffset + (sourceCount - targetCount); + char first = target[0]; - for (int i = sourceOffset + fromIndex; i <= max; i++) { + outer: for (int i = fromIndex, max = slen - tlen;; i++) { /* Look for first character. */ - if (source[i] != first) { - while (++i <= max && source[i] != first); + inner: for (;; i++) { + if (i >= max) + break outer; + if (source[i] == first) + break inner; } - /* Found first character, now look at the rest of v2 */ - if (i <= max) { - int j = i + 1; - int end = j + targetCount - 1; - for (int k = targetOffset + 1; j < end && source[j] - == target[k]; j++, k++); - - if (j == end) { - /* Found whole string. */ - return i - sourceOffset; - } + for (int k = 1, j = i + 1, end = i + tlen; j < end; j++, k++) { + if (source[j] != target[k]) + continue outer; } + return i; } return -1; } On Mon, Jan 19, 2015 at 11:13 AM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...

Martin, yes, we're talking about that method.

Other than tightening that code some... if we find some algorithm that outperforms the naive implementation under certain conditions (let's say when the searched pattern is longer than 10 characters), would it be worth including it as a special case? (For example naive would run normally but if pattern is longer than 10 characters then the other algorithm would run). Or do you think that would just make the code too complicated without enough benefits?

Regards, Zoltan

On Mon, Jan 12, 2015 at 12:11 PM, Martin Buchholz <martinrb@google.com> wrote:

...
If there's a clean improvement in the java code, it's worth putting in. You can try benchmarking with -Xint. Are we talking about this method?

static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) {

It does look like we can tighten the code up a little...

On Thu, Jan 8, 2015 at 3:05 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Thanks for the info. So that basically means we have 2 implementations of indexOf currently, one is in HotSpot, the other is in the JDK itself, which rarely gets executed. Aside from this later fact, isn't it still worth improving the JDK implementation if it is possible? I know that the intrinsified method gets executed most of the time, but still, if we can improve the JDK implementation also, why not? I don't know much about other JVMs but maybe a few don't intrinsify it?

Is there any existing test suite which is considered conclusive enough that if an implementation beats the naive algorithm in those testcases then it could be considered as a replacement in the JDK?

Regards, Zoltan

On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...
The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method).

People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it.

On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

...
Hi,

On 05/01/15 18:59, Zoltan Sziladi wrote:

> This discussion was a long time ago, I was just reading through it to check > again what was the last state of the discussion about the String.indexOf. > There is one part which I still do not understand, hopefully someone could > shed some light on it. A few emails ago Martin mentioned > > "Hotspot seems to have some intrinsification of String.indexOf, which > confuses me. > Hotspot seems the right place to provide more optimizations for this, since > there has been a fair amount of work creating high-performance low-level > implementations of this idea in C." > > Then Ivan asked what that actually meant, whether hotspot actually replaced > the jdk implementation with a low level optimized C implementation, but I > never saw an answer to that.

You can have a look at an implementation of MacroAssembler::string_indexof in

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
...
> Can someone please explain this? If we somehow found an algorithm

that

...
beat > the naive implementation in the average case, would it be possible to just > implement it in the JDK?

No, because HotSpot would ignore it. Any speed improvements have to be done in the architecture-dependent files.

Andrew.

Vitaly Davidovich

9:19 p.m.

Try -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic:_indexOf sent from my phone On Jan 24, 2015 4:13 PM, "Martin Buchholz" <martinrb@google.com> wrote:

...

I took another look. It seems to me that the java code implementing String.indexOf still assumes the old String layout and so is full of dead code, which should be excised just to be clean. (I continue to think the change to String layout came too late in the life of Java and is incomplete in ways such as this one). Is there an easy way to disable a hotspot intrinsic for testing?

Here's a start at such a change:

diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java @@ -1319,7 +1319,7 @@ * or {@code -1} if there is no such occurrence. */ public int indexOf(String str, int fromIndex) { - return String.indexOf(value, 0, count, str, fromIndex); + return String.indexOf(value, str, fromIndex); }

/** diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -1703,8 +1703,7 @@ * or {@code -1} if there is no such occurrence. */ public int indexOf(String str, int fromIndex) { - return indexOf(value, 0, value.length, - str.value, 0, str.value.length, fromIndex); + return indexOf(value, str.value, fromIndex); }

/** @@ -1713,16 +1712,11 @@ * is the string being searched for. * * @param source the characters being searched. - * @param sourceOffset offset of the source string. - * @param sourceCount count of the source string. * @param target the characters being searched for. * @param fromIndex the index to begin searching from. */ - static int indexOf(char[] source, int sourceOffset, int sourceCount, - String target, int fromIndex) { - return indexOf(source, sourceOffset, sourceCount, - target.value, 0, target.value.length, - fromIndex); + static int indexOf(char[] source, String target, int fromIndex) { + return indexOf(source, target.value, fromIndex); }

/** @@ -1731,47 +1725,38 @@ * is the string being searched for. * * @param source the characters being searched. - * @param sourceOffset offset of the source string. - * @param sourceCount count of the source string. * @param target the characters being searched for. - * @param targetOffset offset of the target string. - * @param targetCount count of the target string. * @param fromIndex the index to begin searching from. */ - static int indexOf(char[] source, int sourceOffset, int sourceCount, - char[] target, int targetOffset, int targetCount, - int fromIndex) { - if (fromIndex >= sourceCount) { - return (targetCount == 0 ? sourceCount : -1); + static int indexOf(char[] source, char[] target, int fromIndex) { + final int slen = source.length; + final int tlen = target.length; + if (fromIndex >= slen) { + return (tlen == 0 ? slen : -1); } if (fromIndex < 0) { fromIndex = 0; } - if (targetCount == 0) { + if (tlen == 0) { return fromIndex; }

- char first = target[targetOffset]; - int max = sourceOffset + (sourceCount - targetCount); + char first = target[0];

- for (int i = sourceOffset + fromIndex; i <= max; i++) { + outer: for (int i = fromIndex, max = slen - tlen;; i++) { /* Look for first character. */ - if (source[i] != first) { - while (++i <= max && source[i] != first); + inner: for (;; i++) { + if (i >= max) + break outer; + if (source[i] == first) + break inner; }

- /* Found first character, now look at the rest of v2 */ - if (i <= max) { - int j = i + 1; - int end = j + targetCount - 1; - for (int k = targetOffset + 1; j < end && source[j] - == target[k]; j++, k++); - - if (j == end) { - /* Found whole string. */ - return i - sourceOffset; - } + for (int k = 1, j = i + 1, end = i + tlen; j < end; j++, k++) { + if (source[j] != target[k]) + continue outer; } + return i; } return -1; }

On Mon, Jan 19, 2015 at 11:13 AM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Martin, yes, we're talking about that method.

Other than tightening that code some... if we find some algorithm that outperforms the naive implementation under certain conditions (let's say when the searched pattern is longer than 10 characters), would it be worth including it as a special case? (For example naive would run normally but if pattern is longer than 10 characters then the other algorithm would run). Or do you think that would just make the code too complicated without enough benefits?

Regards, Zoltan

On Mon, Jan 12, 2015 at 12:11 PM, Martin Buchholz <martinrb@google.com> wrote:

...
If there's a clean improvement in the java code, it's worth putting in. You can try benchmarking with -Xint. Are we talking about this method?

static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) {

It does look like we can tighten the code up a little...

On Thu, Jan 8, 2015 at 3:05 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Thanks for the info. So that basically means we have 2 implementations of indexOf currently, one is in HotSpot, the other is in the JDK itself, which rarely gets executed. Aside from this later fact, isn't it still worth improving the JDK implementation if it is possible? I know that the intrinsified method gets executed most of the time, but still, if we can improve the JDK implementation also, why not? I don't know much about other JVMs but maybe a few don't intrinsify it?

Is there any existing test suite which is considered conclusive enough that if an implementation beats the naive algorithm in those testcases then it could be considered as a replacement in the JDK?

Regards, Zoltan

On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...
The java impl you saw would be called by (a) interpreter, (b) if you explicitly disable intrinsification of this function, or (c) some other JVM that doesn't intrinsify this method (or any method).

People don't usually disable intrinsics; if they do, it's because they hit some JIT bug and may disable it.

On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...
Hi,

Thanks everyone for all the info. So, just to go step by step in understanding this. Andrew said HotSpot would ignore my implementation. So why is there an implementation of indexOf at all in the JDK, if that's not the code that's executed? Is it just a default fallback? When is the indexOf function not intrinsified? When do people usually disable intrinsification? Sorry if these are newbie questions, I'm new to this part of Java.

Regards, Zoltan

On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph@redhat.com> wrote:

> Hi, > > On 05/01/15 18:59, Zoltan Sziladi wrote: > > > This discussion was a long time ago, I was just reading through it to > check > > again what was the last state of the discussion about the String.indexOf. > > There is one part which I still do not understand, hopefully someone > could > > shed some light on it. A few emails ago Martin mentioned > > > > "Hotspot seems to have some intrinsification of String.indexOf, which > > confuses me. > > Hotspot seems the right place to provide more optimizations for this, > since > > there has been a fair amount of work creating high-performance low-level > > implementations of this idea in C." > > > > Then Ivan asked what that actually meant, whether hotspot actually > replaced > > the jdk implementation with a low level optimized C implementation, but I > > never saw an answer to that. > > You can have a look at an implementation of MacroAssembler::string_indexof > in > > >

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/v...

...
> > > Can someone please explain this? If we somehow found an algorithm that > beat > > the naive implementation in the average case, would it be possible to > just > > implement it in the JDK? > > No, because HotSpot would ignore it. Any speed improvements have to be > done in the architecture-dependent files. > > Andrew. > >

Martin Buchholz

9:29 p.m.

On Sat, Jan 24, 2015 at 1:19 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...

Try -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic:_indexOf

Thank you very much! Hard to find because -XX:+PrintFlagsFinal is insufficient - also needs -XX:+UnlockDiagnosticVMOptions

Zoltan Sziladi

27 Jan 27 Jan

6:50 a.m.

Hi Martin, Nice catches on the cleanup! By the way, can you tell me why you used named loops in your code? Isn't it considered bad practice as it is almost like a goto statement? Couldn't we refactor it in a way that we do not use named loops? Also, if there is a for loop that has no start statement and no condition (like this: for (;; i++) ), then is a for loop a good choice for that code? I'm just wondering, maybe there are some points of view that I'm not considering... Regards, Zoltan On Sat, Jan 24, 2015 at 1:29 PM, Martin Buchholz <martinrb@google.com> wrote:

...

On Sat, Jan 24, 2015 at 1:19 PM, Vitaly Davidovich <vitalyd@gmail.com> wrote:

...
Try -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic:_indexOf

Thank you very much! Hard to find because -XX:+PrintFlagsFinal is insufficient - also needs -XX:+UnlockDiagnosticVMOptions

Martin Buchholz

24 Mar 24 Mar

12:27 a.m.

On Mon, Jan 26, 2015 at 10:50 PM, Zoltan Sziladi <kissziszi@gmail.com> wrote:

...

By the way, can you tell me why you used named loops in your code? Isn't it considered bad practice as it is almost like a goto statement? Couldn't we refactor it in a way that we do not use named loops?

Here in core-libs-land, we like to do goto-oriented programming!

4002

Age (days ago)

4356

Last active (days ago)

List overview

Download

23 comments

8 participants

participants (8)

Alan Bateman
Andrew Haley
Ivan Gerasimov
Jonathan Yu
Martin Buchholz
Remi Forax
Vitaly Davidovich
Zoltan Sziladi

String.indexOf optimization

tags

participants (8)