String.indexOf optimization

Jonathan Yu jawnsy at cpan.org
Mon Jan 5 22:46:24 UTC 2015


Hi,

Sharing in case you haven't seen it... Aleksey Shipilëv has a talk about
String optimizations, which discusses these questions and specifically the
Boyer-Moore algorithm.

http://shipilev.net/talks/joker-Oct2014-string-catechism.pdf

Page 85 talks about the intrinsification of indexOf and compares
performance of the builtin vs intrinsified versions by
using: -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=::_indexOf

There might also be a video recording somewhere.

Jonathan

On Mon, Jan 5, 2015 at 2:31 PM, Martin Buchholz <martinrb at google.com> wrote:

> Evidence that hotspot tries to intrinsify String.indexOf:
>
>                     do_intrinsic(_indexOf,
>  java_lang_String,       indexOf_name, string_int_signature,
>  F_R)   \
>    do_name(     indexOf_name,                                    "indexOf")
>                                             \
>
> So work would have to be done at the hotspot intrinsics level (not easy!)
>
> Also, the problem is that Boyer-Moore is a fundamental improvement in
> string searching, but its overhead is high enough that it's unlikely to
> help with typical input strings found in the Real World.  I think we would
> want to split into two implementations and only do Boyer-Moore if it looks
> profitable.  Similarly for j.u.r.Pattern's regex compiler.
>
> I still think it's sufficiently difficult that effort is best applied
> elsewhere.
>
>
> On Mon, Jan 5, 2015 at 10:59 AM, Zoltan Sziladi <kissziszi at gmail.com>
> wrote:
>
> > Hi,
> >
> > This discussion was a long time ago, I was just reading through it to
> > check again what was the last state of the discussion about the
> > String.indexOf.
> > There is one part which I still do not understand, hopefully someone
> could
> > shed some light on it. A few emails ago Martin mentioned
> >
> > "Hotspot seems to have some intrinsification of String.indexOf, which
> > confuses me.
> > Hotspot seems the right place to provide more optimizations for this,
> > since there has been a fair amount of work creating high-performance
> > low-level implementations of this idea in C."
> >
> > Then Ivan asked what that actually meant, whether hotspot actually
> > replaced the jdk implementation with a low level optimized C
> > implementation, but I never saw an answer to that.
> >
> > Can someone please explain this? If we somehow found an algorithm that
> > beat the naive implementation in the average case, would it be possible
> to
> > just implement it in the JDK? Also, is there a test set which we could
> > consider conclusive enough to actually change the implementation based on
> > results from that? (For example if I create an implementation that beats
> > the naive algorithm in those testcases, then we could consider it faster
> in
> > average case)
> >
> > Thanks!
> >
> > Zoltan
> >
> > On Fri, Apr 18, 2014 at 9:43 AM, Martin Buchholz <martinrb at google.com>
> > wrote:
> >
> >> I remain skeptical that modifying the implementation of static package
> >> private String.indexOf is the right approach.
> >>
> >> If we can produce high-performance implementations of these, perhaps
> >> making
> >> them public in Arrays.java is the right way.
> >>
> >> 1766             if (targetCount == 1) {1767                 return (i
> >> <= max) ? i - sourceOffset : -1;
> >>
> >>
> >> If you're going to special case targetCount == 1, you shouldn't have a
> >> test
> >> for it in the main loop, since you slow down the general case.  Instead,
> >> you can sequester the special cases like this:
> >>
> >> if (targetCount <= 1) {
> >>   if (targetCount == 0) ...
> >>   else ...
> >> }
> >>
> >> // now assume targetCount >= 2
> >>
> >>
> >> On Fri, Apr 18, 2014 at 1:32 AM, Ivan Gerasimov
> >> <ivan.gerasimov at oracle.com>wrote:
> >>
> >> >
> >> > On 16.04.2014 2:53, Martin Buchholz wrote:
> >> >
> >> > Hi Ivan,
> >> >
> >> >  There's already an indexOf(int, int) that allows a user to explicitly
> >> > search for a char (or character).
> >> >
> >> >   Sure.
> >> > What I meant was not to introduce a special function for searching a
> >> > single character, but to restrict the general case loop to search for
> a
> >> > substring of at least 2 characters.
> >> > Having this condition hold, we can enter the loop only when two
> starting
> >> > characters match, and this can save us a few nanoseconds in many
> cases.
> >> >
> >> >
> >> >  Hotspot seems to have some intrinsification of String.indexOf, which
> >> > confuses me.
> >> >
> >> >
> >> > Does it mean that that the java implementation of indexOf is never
> >> > compiled?
> >> > When hotspot replaces the jdk implementation with its own one?
> >> > Is it ever worth to try to optimize the java implementation?
> >> >
> >> >
> >> >  Hotspot seems the right place to provide more optimizations for this,
> >> > since there has been a fair amount of work creating high-performance
> >> > low-level implementations of this idea in C.
> >> >
> >> >  The hotspot's intrinsic is already optimized for searching substrings
> >> of
> >> > different length.
> >> >
> >> > Sincerely yours,
> >> > Ivan
> >> >
> >> >
> >> >
> >> > On Tue, Apr 15, 2014 at 10:13 AM, Ivan Gerasimov <
> >> > ivan.gerasimov at oracle.com> wrote:
> >> >
> >> >>  Hi everyone!
> >> >>
> >> >>
> >> >> On 04.04.2014 21:13, Martin Buchholz wrote:
> >> >>
> >> >> Summary:
> >> >>
> >> >> Many people (myself included) have looked at this problem.  It's
> >> unlikely
> >> >> that String.indexOf will change.  It's hard to beat the naive
> >> >> implementation in the typical case.
> >> >>
> >> >>  But we can try to speed up this naive implementation a little bit.
> >> >>
> >> >> We can separate the special case: When the substring's length == 1.
> >> >> This special case can be done fast, and in the general case we can
> now
> >> >> assume substring's length is at least 2.
> >> >>
> >> >> Here's the webrev with the implementation of this idea:
> >> >> http://cr.openjdk.java.net/~igerasim/indexof/0/webrev/
> >> >>
> >> >> I've done some benchmarking with JMH and found no performance
> >> degradation
> >> >> on my machine.
> >> >> Of course, more testcases should be created and they should be tried
> on
> >> >> different machines to be treated as reliable.
> >> >>
> >> >> Benchmark                             Mode Thr    Cnt  Sec
>  Mean
> >> >> Mean error    Units
> >> >>  o.b.IndexOfBench.benchIndexOf_1_A     avgt   1     20    5
> >> >> 704.739        9.104  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_1_B     avgt   1     20    5     *
> >> >> 573.879*        9.820  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_2_A     avgt   1     20    5
> >> >> 668.455        9.882  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_2_B     avgt   1     20    5
> >> >> *476.062*        6.063  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_3_A     avgt   1     20    5
> >> >> 155.227        1.796  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_3_B     avgt   1     20    5      *
> >> >> 152.850 *       1.512  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_4_A     avgt   1     20    5
> >> >> 656.183        5.904  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_4_B     avgt   1     20    5
> >> >> *515.178*        9.107  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_5_A     avgt   1     20    5
> >> >> 140.609        7.305  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_5_B     avgt   1     20    5
> >> >> *129.603*        1.654  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_6_A     avgt   1     20    5
> >> >> 127.713        1.497  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_6_B     avgt   1     20    5
> >> >> *122.177*        1.186  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_7_A     avgt   1     20    5
> >> >> 430.148        4.875  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_7_B     avgt   1     20    5      *
> >> >> 387.338*       10.904  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_8_A     avgt   1     20    5
> >> >> 2064.563       28.885  nsec/op
> >> >>  o.b.IndexOfBench.benchIndexOf_8_B     avgt   1     20    5
> >> >> *1858.669*       24.343  nsec/op
> >>
> >> >>
> >> >> Benchmarks ending with A use the current indexOf implementation,
> with B
> >> >> use the modified version.
> >> >> These numbers show from 1.4% up to 28% performance increase.
> >> >>
> >> >> The full listing is below.
> >> >>
> >> >> Suggestions about what else to test are welcome!
> >> >>
> >> >> Sincerely yours,
> >> >> Ivan
> >> >>
> >> >> ---------------------
> >> >>
> >> >> /**
> >> >>  * Copyright (c) 2014, Oracle and/or its affiliates. All rights
> >> reserved.
> >> >>  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> >> >>  *
> >> >>  * This code is free software; you can redistribute it and/or modify
> it
> >> >>  * under the terms of the GNU General Public License version 2 only,
> as
> >> >>  * published by the Free Software Foundation.  Oracle designates this
> >> >>  * particular file as subject to the "Classpath" exception as
> provided
> >> >>  * by Oracle in the LICENSE file that accompanied this code.
> >> >>  *
> >> >>  * This code is distributed in the hope that it will be useful, but
> >> >> WITHOUT
> >> >>  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> >> or
> >> >>  * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> >> License
> >> >>  * version 2 for more details (a copy is included in the LICENSE file
> >> that
> >> >>  * accompanied this code).
> >> >>  *
> >> >>  * You should have received a copy of the GNU General Public License
> >> >> version
> >> >>  * 2 along with this work; if not, write to the Free Software
> >> Foundation,
> >> >>  * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> >> >>  *
> >> >>  * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
> 94065
> >> USA
> >> >>  * or visit www.oracle.com if you need additional information or
> have
> >> any
> >> >>  * questions.
> >> >>  */
> >> >> package org.benches;
> >> >>
> >> >> import org.openjdk.jmh.annotations.*;
> >> >> import org.openjdk.jmh.logic.BlackHole;
> >> >> import java.util.concurrent.TimeUnit;
> >> >>
> >> >> @BenchmarkMode(Mode.AverageTime)
> >> >> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> >> >> @State(Scope.Benchmark)
> >> >> public class IndexOfBench {
> >> >>
> >> >>
> >> >> //
> >> >> |||
> >> >>     final static char[] source1 =
> >> >>
> >>
> "abababcabcacabcabcabcabcaccbcabcacbcabcacbcbcabcbcbacbcabcbabacbcbacbcabcabcabcabcabcabcabcacbacbacbacabcabcacbacbcabcbcbcaabdacbacabcabacbacabca".toCharArray();
> >> >>     final static char[] source2 =
> >> >>
> >>
> "acfacafacfacfacfacafcafcacadcacdacaccacacdacacfcafcafcfacdacadcadcadcdacfacfacdacadcacdcfacfacdacdacdcfacdacdacdacgshgshasdabdahghjgwdshacavcavsc".toCharArray();
> >> >>     final static char[] source3 =
> >> >>
> >>
> "tyrtytfytytuytfytuytggfghgdytyftytfdytdshfgjhdfytsfuythgsfhgjhfghtuysdfthgfsdhgystfjhgsfguysthgfgjhgdfjhgsjdghfythgsdfjhggfabduikjhfjhkjhfkjhgkjh".toCharArray();
> >> >>     final static char[] target1 = "abd".toCharArray();
> >> >>
> >> >>     final static char[] source4 =
> >> >>
> >>
> "ahhhahahahahhahahahahhahahahhhahahahahahahahahahahhahahahahahahahahallallalalalalalkakakakakakakakakkakmamamamabammamamamamamaakaklalalaoalalalao".toCharArray();
> >> >>     final static char[] source5 =
> >> >>
> >>
> "hgjkhjhjkdghkjhdfkjhgkjhdkjdhgkjdfhgkjdhfgkjdfhgkjhdfkjghkdjghkdjfhgkjhkdjhgkjdfhjkghkdjfhgkjdfhgkjdfhgkjdfhkgabhfkjghdkfjhgkjdfhgkjdfhgkjdfhgkhh".toCharArray();
> >> >>     final static char[] target2 = "ab".toCharArray();
> >> >>
> >> >>     final static char[] source6 =
> >> >>
> >>
> "lskgjsklfjgskldfjgklsfjdlgkjsdflkgjskldfgjsdklfgjsl;kdfgjskldfjglksdfjglksfjglksdfjgklsfdjgslkdfjglksjdflkgsjfalksjdflkfgjsdklfjglskdfjglksdfjghh".toCharArray();
> >> >>     final static char[] target3 = "a".toCharArray();
> >> >>
> >> >>     final static char[] source7 =
> >> >>
> >>
> "lskgajabfagskldfjgklsabclgkjsdflkabsabcdgjsdklfabclbkdfgjskabfjglksdfjabcdfjglabcfjgklsfabgslkdfjglksjdabcdsjfabcdedflabcjsdklfjglskdfabcksdfjghh".toCharArray();
> >> >>     final static char[] target4 = "abcde".toCharArray();
> >> >>
> >> >>     final static char[] source8 =
> >> >>
> >>
> "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".toCharArray();
> >> >>     final static char[] target5 = "aaaab".toCharArray();
> >> >>
> >> >>     @GenerateMicroBenchmark
> >> >>     public void benchIndexOf_1_A(BlackHole bh) {
> >> >>         bh.consume(indexOfA(source1, 0, source1.length, target1, 0,
> >> >> target1.length, 0));
> >> >>     }
> >> >>
> >> >>     @GenerateMicroBenchmark
> >> >>     public void benchIndexOf_1_B(BlackHole bh) {
> >> >>         bh.consume(indexOfB(source1, 0, source1.length, target1, 0,
> >> >> target1.length, 0));
> >> >>     }
> >> >>
> >> >>     @GenerateMicroBenchmark
> >> >>     public void benchIndexOf_2_A(BlackHole bh) {
> >> >>         bh.consume(indexOfA(source2, 0, source2.length, target1, 0,
> >> >> target1.length, 0));
> >> >>     }
> >> >>
> >> >>     @GenerateMicroBenchmark
> >> >>     public void benchIndexOf_2_B(BlackHole bh) {
> >> >>         bh.consume(indexOfB(source2, 0, source2.length, target1, 0,
> >> >> target1.length, 0));
> >> >>     }
> >> >>
> >> >>     @GenerateMicroBenchm
> >> >>
> >> >
> >> >
> >>
> >
> >
>



More information about the core-libs-dev mailing list