[aarch64-port-dev ] RFR: Add support for String.indexOf

Tue Aug 19 12:29:28 UTC 2014

On 08/19/2014 11:31 AM, Edward Nevill wrote:
> On Tue, 2014-08-19 at 10:18 +0100, Andrew Haley wrote:
>> Hi,
>>
>> On 18/08/14 17:27, Edward Nevill wrote:
>>>
>>> The following patch adds support for the String.indexOf intrinsic.
>>>
>>> It uses a combination of 2 algorithms, a simplified Boyer Moore, and
>>> a straightforward linear scan.
>>
>> In what way is this algorithm simplified from standard Boyer-Moore?
>> How can we have confidence it is correct?
> 
> OK. I based the algorithm on the following
> 
> http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
> 
> This describes an algorithm with 2 shift rules. The 'Bad Character'
> rule and the 'Good Suffix' rule.
> 
> These rules are essentially heuristics for how far we can shift the
> pattern along the search string.
> 
> I have implemented it using the 'Bad Character' rule only because as
> the entry says "The Good Suffix rule is markedly more complex in
> both concept and implementation than the Bad Character rule.'

OK.  This deserves a comment.  And a pointer to Wikipedia.

>> I'm a bit concerned that we always inline this code.  AFAICS if
>> icnt1 == -1 we might as well branch to a stub.
> 
> It is 600 bytes which is on the limit of what is sensible. On a code
> heap size of 128M I think this is OK. It isn't inlined that often,
> but when it is we would like it to be as fast as possible.

Right, but if this routine is called twice in the same method it'll
read exactly the same routine into the icache twice.  And that's
slowing you down as well as wasting memory.

>>> +      stp(zr, zr, pre(sp, -128));
>>> +      stp(zr, zr, Address(sp, 1*16));
>>> +      stp(zr, zr, Address(sp, 2*16));
>>> +      stp(zr, zr, Address(sp, 3*16));
>>> +      stp(zr, zr, Address(sp, 4*16));
>>> +      stp(zr, zr, Address(sp, 5*16));
>>> +      stp(zr, zr, Address(sp, 6*16));
>>> +      stp(zr, zr, Address(sp, 7*16));
>>
>> Why is this not a loop?
> 
> Mistaken sense of efficiency.
> 
> Something like this better?
> 
>   sub sp, sp, #128
>   mov tmp1, #128
> loop
>   subs tmp1, tmp1, #16
>   stp zr, zr, [sp, tmp]
>   bne loop
> 
> or this?
> 
>   mov tmp1, #8
> loop
>   subs tmp1, tmp1, #1
>   stp zr, zr, [sp, #-16]!
>   bne loop
> 
> which is one instruction shorter but I am concerned about the
> efficiency of repeated stores with pre-decrement writeback to the
> base register. I think this could take 1 additional cycle per store
> on some implementations.

No, that's not what I meant, sorry.  Why not

  for (int i = 1; i < 8; i++)
      stp(zr, zr, Address(sp, i*16));

?

It's just a stylistic thing.

Andrew.