String.indexOf optimization
Vitaly Davidovich
vitalyd at gmail.com
Sat Jan 24 21:19:08 UTC 2015
Try -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic:_indexOf
sent from my phone
On Jan 24, 2015 4:13 PM, "Martin Buchholz" <martinrb at google.com> wrote:
> I took another look. It seems to me that the java code implementing
> String.indexOf still assumes the old String layout and so is full of dead
> code, which should be excised just to be clean. (I continue to think the
> change to String layout came too late in the life of Java and is incomplete
> in ways such as this one). Is there an easy way to disable a hotspot
> intrinsic for testing?
>
> Here's a start at such a change:
>
> diff --git
> a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> @@ -1319,7 +1319,7 @@
> * or {@code -1} if there is no such occurrence.
> */
> public int indexOf(String str, int fromIndex) {
> - return String.indexOf(value, 0, count, str, fromIndex);
> + return String.indexOf(value, str, fromIndex);
> }
>
> /**
> diff --git a/src/java.base/share/classes/java/lang/String.java
> b/src/java.base/share/classes/java/lang/String.java
> --- a/src/java.base/share/classes/java/lang/String.java
> +++ b/src/java.base/share/classes/java/lang/String.java
> @@ -1703,8 +1703,7 @@
> * or {@code -1} if there is no such occurrence.
> */
> public int indexOf(String str, int fromIndex) {
> - return indexOf(value, 0, value.length,
> - str.value, 0, str.value.length, fromIndex);
> + return indexOf(value, str.value, fromIndex);
> }
>
> /**
> @@ -1713,16 +1712,11 @@
> * is the string being searched for.
> *
> * @param source the characters being searched.
> - * @param sourceOffset offset of the source string.
> - * @param sourceCount count of the source string.
> * @param target the characters being searched for.
> * @param fromIndex the index to begin searching from.
> */
> - static int indexOf(char[] source, int sourceOffset, int sourceCount,
> - String target, int fromIndex) {
> - return indexOf(source, sourceOffset, sourceCount,
> - target.value, 0, target.value.length,
> - fromIndex);
> + static int indexOf(char[] source, String target, int fromIndex) {
> + return indexOf(source, target.value, fromIndex);
> }
>
> /**
> @@ -1731,47 +1725,38 @@
> * is the string being searched for.
> *
> * @param source the characters being searched.
> - * @param sourceOffset offset of the source string.
> - * @param sourceCount count of the source string.
> * @param target the characters being searched for.
> - * @param targetOffset offset of the target string.
> - * @param targetCount count of the target string.
> * @param fromIndex the index to begin searching from.
> */
> - static int indexOf(char[] source, int sourceOffset, int sourceCount,
> - char[] target, int targetOffset, int targetCount,
> - int fromIndex) {
> - if (fromIndex >= sourceCount) {
> - return (targetCount == 0 ? sourceCount : -1);
> + static int indexOf(char[] source, char[] target, int fromIndex) {
> + final int slen = source.length;
> + final int tlen = target.length;
> + if (fromIndex >= slen) {
> + return (tlen == 0 ? slen : -1);
> }
> if (fromIndex < 0) {
> fromIndex = 0;
> }
> - if (targetCount == 0) {
> + if (tlen == 0) {
> return fromIndex;
> }
>
> - char first = target[targetOffset];
> - int max = sourceOffset + (sourceCount - targetCount);
> + char first = target[0];
>
> - for (int i = sourceOffset + fromIndex; i <= max; i++) {
> + outer: for (int i = fromIndex, max = slen - tlen;; i++) {
> /* Look for first character. */
> - if (source[i] != first) {
> - while (++i <= max && source[i] != first);
> + inner: for (;; i++) {
> + if (i >= max)
> + break outer;
> + if (source[i] == first)
> + break inner;
> }
>
> - /* Found first character, now look at the rest of v2 */
> - if (i <= max) {
> - int j = i + 1;
> - int end = j + targetCount - 1;
> - for (int k = targetOffset + 1; j < end && source[j]
> - == target[k]; j++, k++);
> -
> - if (j == end) {
> - /* Found whole string. */
> - return i - sourceOffset;
> - }
> + for (int k = 1, j = i + 1, end = i + tlen; j < end; j++, k++)
> {
> + if (source[j] != target[k])
> + continue outer;
> }
> + return i;
> }
> return -1;
> }
>
>
> On Mon, Jan 19, 2015 at 11:13 AM, Zoltan Sziladi <kissziszi at gmail.com>
> wrote:
>
>> Martin, yes, we're talking about that method.
>>
>> Other than tightening that code some... if we find some algorithm that
>> outperforms the naive implementation under certain conditions (let's say
>> when the searched pattern is longer than 10 characters), would it be worth
>> including it as a special case? (For example naive would run normally but
>> if pattern is longer than 10 characters then the other algorithm would
>> run). Or do you think that would just make the code too complicated without
>> enough benefits?
>>
>> Regards,
>> Zoltan
>>
>> On Mon, Jan 12, 2015 at 12:11 PM, Martin Buchholz <martinrb at google.com>
>> wrote:
>>
>>> If there's a clean improvement in the java code, it's worth putting in.
>>> You can try benchmarking with -Xint.
>>> Are we talking about this method?
>>>
>>> static int indexOf(char[] source, int sourceOffset, int sourceCount,
>>> char[] target, int targetOffset, int targetCount,
>>> int fromIndex) {
>>>
>>> It does look like we can tighten the code up a little...
>>>
>>>
>>> On Thu, Jan 8, 2015 at 3:05 PM, Zoltan Sziladi <kissziszi at gmail.com>
>>> wrote:
>>>
>>>> Thanks for the info.
>>>> So that basically means we have 2 implementations of indexOf currently,
>>>> one
>>>> is in HotSpot, the other is in the JDK itself, which rarely gets
>>>> executed.
>>>> Aside from this later fact, isn't it still worth improving the JDK
>>>> implementation if it is possible? I know that the intrinsified method
>>>> gets
>>>> executed most of the time, but still, if we can improve the JDK
>>>> implementation also, why not? I don't know much about other JVMs but
>>>> maybe
>>>> a few don't intrinsify it?
>>>>
>>>> Is there any existing test suite which is considered conclusive enough
>>>> that
>>>> if an implementation beats the naive algorithm in those testcases then
>>>> it
>>>> could be considered as a replacement in the JDK?
>>>>
>>>> Regards,
>>>> Zoltan
>>>>
>>>>
>>>> On Thu, Jan 8, 2015 at 12:42 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>>> wrote:
>>>>
>>>> > The java impl you saw would be called by (a) interpreter, (b) if you
>>>> > explicitly disable intrinsification of this function, or (c) some
>>>> other JVM
>>>> > that doesn't intrinsify this method (or any method).
>>>> >
>>>> > People don't usually disable intrinsics; if they do, it's because
>>>> they hit
>>>> > some JIT bug and may disable it.
>>>> >
>>>> > On Thu, Jan 8, 2015 at 3:34 PM, Zoltan Sziladi <kissziszi at gmail.com>
>>>> > wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> Thanks everyone for all the info.
>>>> >> So, just to go step by step in understanding this.
>>>> >> Andrew said HotSpot would ignore my implementation. So why is there
>>>> an
>>>> >> implementation of indexOf at all in the JDK, if that's not the code
>>>> that's
>>>> >> executed? Is it just a default fallback? When is the indexOf
>>>> function not
>>>> >> intrinsified? When do people usually disable intrinsification?
>>>> >> Sorry if these are newbie questions, I'm new to this part of Java.
>>>> >>
>>>> >> Regards,
>>>> >> Zoltan
>>>> >>
>>>> >> On Tue, Jan 6, 2015 at 1:28 AM, Andrew Haley <aph at redhat.com> wrote:
>>>> >>
>>>> >> > Hi,
>>>> >> >
>>>> >> > On 05/01/15 18:59, Zoltan Sziladi wrote:
>>>> >> >
>>>> >> > > This discussion was a long time ago, I was just reading through
>>>> it to
>>>> >> > check
>>>> >> > > again what was the last state of the discussion about the
>>>> >> String.indexOf.
>>>> >> > > There is one part which I still do not understand, hopefully
>>>> someone
>>>> >> > could
>>>> >> > > shed some light on it. A few emails ago Martin mentioned
>>>> >> > >
>>>> >> > > "Hotspot seems to have some intrinsification of String.indexOf,
>>>> which
>>>> >> > > confuses me.
>>>> >> > > Hotspot seems the right place to provide more optimizations for
>>>> this,
>>>> >> > since
>>>> >> > > there has been a fair amount of work creating high-performance
>>>> >> low-level
>>>> >> > > implementations of this idea in C."
>>>> >> > >
>>>> >> > > Then Ivan asked what that actually meant, whether hotspot
>>>> actually
>>>> >> > replaced
>>>> >> > > the jdk implementation with a low level optimized C
>>>> implementation,
>>>> >> but I
>>>> >> > > never saw an answer to that.
>>>> >> >
>>>> >> > You can have a look at an implementation of
>>>> >> MacroAssembler::string_indexof
>>>> >> > in
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >>
>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b6b89b8f8531/src/cpu/x86/vm/macroAssembler_x86.cpp
>>>> >> >
>>>> >> > > Can someone please explain this? If we somehow found an
>>>> algorithm that
>>>> >> > beat
>>>> >> > > the naive implementation in the average case, would it be
>>>> possible to
>>>> >> > just
>>>> >> > > implement it in the JDK?
>>>> >> >
>>>> >> > No, because HotSpot would ignore it. Any speed improvements have
>>>> to be
>>>> >> > done in the architecture-dependent files.
>>>> >> >
>>>> >> > Andrew.
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>
More information about the core-libs-dev
mailing list