RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals

Andrew Haley aph at redhat.com
Tue Jun 29 14:51:22 UTC 2021


I had to make some changes to the benchmark to get accurate timing, because
it is swamped by JMH overhead for very small strings.

It should be clear from my patch what I did. The most important part is
to run the test code in a loop, or you won't see small effects. We're
trying to measure something that only takes a few nanoseconds.

This is what I see, Apple M1, two equal strings:

Old:

StringEquals.equal       8  avgt    5   0.948 ± 0.001  us/op
StringEquals.equal      11  avgt    5   0.948 ± 0.004  us/op
StringEquals.equal      16  avgt    5   0.948 ± 0.001  us/op
StringEquals.equal      22  avgt    5   1.260 ± 0.002  us/op
StringEquals.equal      32  avgt    5   1.886 ± 0.001  us/op
StringEquals.equal      45  avgt    5   2.514 ± 0.001  us/op
StringEquals.equal      64  avgt    5   3.141 ± 0.003  us/op
StringEquals.equal      91  avgt    5   4.395 ± 0.002  us/op
StringEquals.equal     121  avgt    5   5.653 ± 0.014  us/op
StringEquals.equal     181  avgt    5   8.011 ± 0.010  us/op
StringEquals.equal     256  avgt    5  11.433 ± 0.014  us/op
StringEquals.equal     512  avgt    5  23.005 ± 0.124  us/op
StringEquals.equal    1024  avgt    5  49.185 ± 0.032  us/op

Your patch:

Benchmark           (size)  Mode  Cnt   Score   Error  Units
StringEquals.equal       8  avgt    5   1.574 ± 0.001  us/op
StringEquals.equal      11  avgt    5   1.734 ± 0.004  us/op
StringEquals.equal      16  avgt    5   1.888 ± 0.002  us/op
StringEquals.equal      22  avgt    5   1.892 ± 0.003  us/op
StringEquals.equal      32  avgt    5   2.517 ± 0.003  us/op
StringEquals.equal      45  avgt    5   2.988 ± 0.002  us/op
StringEquals.equal      64  avgt    5   2.517 ± 0.003  us/op
StringEquals.equal      91  avgt    5   8.659 ± 0.007  us/op
StringEquals.equal     121  avgt    5   5.649 ± 0.007  us/op
StringEquals.equal     181  avgt    5   6.050 ± 0.009  us/op
StringEquals.equal     256  avgt    5   7.088 ± 0.016  us/op
StringEquals.equal     512  avgt    5  14.163 ± 0.018  us/op
StringEquals.equal    1024  avgt    5  29.998 ± 0.052  us/op


As you can see, we're looking at regressions all the way up to size=45,
with something very odd happening at size=91. Finally the vectorized
code starts to pull ahead at size=181.

A few things:

You should never be executing the TAIL unless the string is really
short. Just do one pair of unaligned loads at the end to finish.

Please don't use aliases for rscratch1 and rscratch2. Calling them tmp1
and tmp2 doesn't help the reader.

So: please make sure the smaller strings are at least as good as
they are now. Remember strings are usually short, so we can tolerate
no regressions with the smaller sizes.

I don't think that Neon does any good here. This is what I get by rewriting
(just) the stub with scalar registers, in the attached patch:

Benchmark           (size)  Mode  Cnt   Score   Error  Units
StringEquals.equal       8  avgt    5   1.574 ± 0.004  us/op
StringEquals.equal      11  avgt    5   1.734 ± 0.003  us/op
StringEquals.equal      16  avgt    5   1.888 ± 0.002  us/op
StringEquals.equal      22  avgt    5   1.891 ± 0.003  us/op
StringEquals.equal      32  avgt    5   2.517 ± 0.001  us/op
StringEquals.equal      45  avgt    5   2.988 ± 0.002  us/op
StringEquals.equal      64  avgt    5   2.595 ± 0.004  us/op
StringEquals.equal      91  avgt    5   4.083 ± 0.006  us/op
StringEquals.equal     121  avgt    5   5.432 ± 0.006  us/op
StringEquals.equal     181  avgt    5   6.292 ± 0.009  us/op
StringEquals.equal     256  avgt    5   7.232 ± 0.008  us/op
StringEquals.equal     512  avgt    5  13.304 ± 0.012  us/op
StringEquals.equal    1024  avgt    5  25.537 ± 0.012  us/op

I use an editor with automatic indentation, as do many people, so
I inserted brackets in the right places in the assembly code.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8268229.patch
Type: text/x-patch
Size: 12464 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20210629/61ccd20c/8268229-0001.patch>


More information about the hotspot-dev mailing list