RFR: 8189103 - AARCH64: optimize String indexOf intrinsic

Tue Apr 24 18:22:17 UTC 2018

Hi,

I measured modified benchmark version(LL case only, because we're 
talking about latin1-related encoding)

I reduced execution matrix to save time by removing large string cases, 
because Boyer-Moore-Horspool is used for strings < 256 only

http://cr.openjdk.java.net/~dpochepk/8189103/str_indexof_modified_result.xls

Summary: results seems good too


Thanks,

Dmitrij


On 24.04.2018 20:14, Andrew Haley wrote:
> On 04/20/2018 11:30 AM, Dmitrij Pochepko wrote:
>> On 19.04.2018 20:08, Andrew Haley wrote:
>>> On 04/18/2018 05:09 PM, Dmitrij Pochepko wrote:
>>>> 2) large update of Boyer Moore algorithm implementation by increasing
>>>> search table size from 128 to 256 to remove few branches for Latin1
>>>> encoding cases and also improve performance for cases when Latin1
>>>> symbols with values >128 are met. This patch also significantly improves
>>>> search of Latin1 string inside UTF string by upgrading algorithm
>>>> logic(the idea is to skip <pattern length> symbols in case pure UTF
>>>> symbol is met in source string).
>>> I don't see a change to the comment which describes the algorithm.
>>>
>>> Does that comment need to be changed?
>>>
>> Do you mean C pseudo-code which describes algorithm in general? Well,
>> algorithm main idea wasn't changed, so, I left pseudo-code untouched to
>> keep it more readable(my patch adds few "if" operators depending on
>> LL/UU/UL cases).
> It does a bunch of other things differently.
>
> Pseudocode must correspond to the actual code, otherwise it is
> misleading, and it costs the maintenance programmer because they waste
> time having to figure out where the code is different from the
> pseudocode.
>
>> I can update this comment to reflect exact changes it if you think
>> it'll make things clearer.
> Thanks.  Your version is closer to the "classic" Boyer-Moore-Horspool
> I've seen.
>
> The really important differences are
>
>   //       /* Preprocessing */
>   //       for (i = 0; i < ASIZE; ++i)
> -//          bc[i] = 0;
> +//          bc[i] = m;
>
> and
>
> -//          if (c < ASIZE) bc[c] = i;
> +//          // c < 256 for Latin1 string, so, no need for branch
> +//          #ifdef PATTERN_STRING_IS_LATIN1
> +//          bc[c] = m - i;
> +//          #else
> +//          if (c < ASIZE) bc[c] = m - i;
> +//          #endif
>
> and
>
> +//          #ifdef PATTERN_IS_LATIN1_AND_SOURCE_IS_UTF
> +//          // UL case: need if (c<ASIZE) check. Skip <pattern length> if not.
>   //          if (c < ASIZE)
> -//            j = j - bc[y[j+m-1]] + m;
> +//            j += bc[y[j+m-1]];
>   //          else
> -//            j += 1; // Advance by 1 only if char >= ASIZE
> +//            j += m
> +//          #endif
>
> I'll grant you that the effect is the same, but without this
> explanation it's very hard to understand what is going on.
>
> I modified your benchmark to compare ASCII-only strings.  This patch
> shows what I did.
>
> Please show another spreadsheet run of tests so that we can see if
> there are ASCII-only regressions.  Thank you
>
>
> --- ./src/main/java/com/bellsw/simple/IndexOfBench.java	2018-04-17 21:13:37.000000000 +0100
> +++ ./src/main/java/com/bellsw/simple/IndexOfBench2.java	2018-04-24 18:11:02.886810345 +0100
> @@ -21,10 +21,12 @@
>   @OutputTimeUnit(TimeUnit.NANOSECONDS)
>   @Fork(3)
>   @State(Scope.Thread)
> -public class IndexOfBench {
> +public class IndexOfBench2 {
>       String PATTERN_L;
>       String PATTERN_U;
> -    static final int ALL = 1000000; // more than L1
> +
> +    @Param({"1000000"})
> +    int ALL;
>
>       @Param({"8", "16", "256", "10000"})
>       int sourceSize;
> @@ -45,7 +47,7 @@
>           char[] patternL = new char[patternSize];
>           char[] patternU = new char[patternSize];
>           for (int i = 0; i < patternSize; i++) {
> -          char c = (char)(255 - i % 255);
> +            char c = (char)(127 - i % 127);
>             patternL[i] = c;
>             // UTF pattern is a mix of UTF/Latin1 characters(even non
>             // latin-alphabet-based text has spaces, commas, points e.t.c.)
> @@ -61,7 +63,7 @@
>           char[] sourceL = new char[sourceSize - patternSize];
>           char[] sourceU = new char[sourceSize - patternSize];
>           for (int i = 0; i < sourceSize - patternSize; i++) {
> -            char c = (char)(2 + i % 254);
> +            char c = (char)(2 + i % 126);
>               sourceL[i] = c;
>               // UTF string is a mix of UTF/Latin1 characters
>               sourceU[i] =  ((i & 1) == 0) ? (char)(0xCAFE + (i % 255)) : c;
>
>