[aarch64-port-dev ] RFR: Add support for String.indexOf

Tue Aug 19 10:31:11 UTC 2014

On Tue, 2014-08-19 at 10:18 +0100, Andrew Haley wrote:
> Hi,
> 
> On 18/08/14 17:27, Edward Nevill wrote:
> > 
> > The following patch adds support for the String.indexOf intrinsic.
> > 
> > It uses a combination of 2 algorithms, a simplified Boyer Moore, and
> > a straightforward linear scan.
> 
> In what way is this algorithm simplified from standard Boyer-Moore?
> How can we have confidence it is correct?

OK. I based the algorithm on the following

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

This describes an algorithm with 2 shift rules. The 'Bad Character' rule and the 'Good Suffix' rule.

These rules are essentially heuristics for how far we can shift the pattern along the search string.

I have implemented it using the 'Bad Character' rule only because as the entry says "The Good Suffix rule is markedly more complex in both concept and implementation than the Bad Character rule.'

> 
> I'm a bit concerned that we always inline this code.  AFAICS if
> icnt1 == -1 we might as well branch to a stub.

It is 600 bytes which is on the limit of what is sensible. On a code heap size of 128M I think this is OK. It isn't inlined that often, but when it is we would like it to be as fast as possible.

> > +  if (icnt1 == -1) {
> > +    cmp(cnt1, 256);             // Use Linear Scan if cnt1 < 8 || cnt1 >= 256
> > +    ccmp(cnt1, 8, 0b0000, CC);  // Can't handle skip >= 256 because we use
> 
> Can this CC be LO, please.  This is really important because the ARM
> carry flag is inverted compared with e.g. Intel.

Yes. For those of us brought up in the 6502 world there is only one way the carry flag should ever be.

There are other instances of CC and CS throughout. I will change those as well.

> 
> > +    br(CC, LS);                 // a byte array.
> > +    cmp(cnt1, cnt2, LSR, 2);    // Source must be 4 * pattern for BM
> > +    br(CS, LS);
> 
> No, you may not give a label the same name as one of the condition
> codes.  This is an unexploded bomb.

What could possible go wrong? Especially when I change the CS to HS, then we have br(HS, LS) for max confusion.

I'll change it.

> > +      stp(zr, zr, pre(sp, -128));
> > +      stp(zr, zr, Address(sp, 1*16));
> > +      stp(zr, zr, Address(sp, 2*16));
> > +      stp(zr, zr, Address(sp, 3*16));
> > +      stp(zr, zr, Address(sp, 4*16));
> > +      stp(zr, zr, Address(sp, 5*16));
> > +      stp(zr, zr, Address(sp, 6*16));
> > +      stp(zr, zr, Address(sp, 7*16));
> 
> Why is this not a loop?

Mistaken sense of efficiency.

Something like this better?

  sub sp, sp, #128
  mov tmp1, #128
loop
  subs tmp1, tmp1, #16
  stp zr, zr, [sp, tmp]
  bne loop

or this?

  mov tmp1, #8
loop
  subs tmp1, tmp1, #1
  stp zr, zr, [sp, #-16]!
  bne loop

which is one instruction shorter but I am concerned about the efficiency of repeated stores with pre-decrement writeback to the base register. I think this could take 1 additional cycle per store on some implementations.

Regards,
Ed.