RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals
Andrew Haley
aph at redhat.com
Tue Jun 29 14:51:22 UTC 2021
I had to make some changes to the benchmark to get accurate timing, because
it is swamped by JMH overhead for very small strings.
It should be clear from my patch what I did. The most important part is
to run the test code in a loop, or you won't see small effects. We're
trying to measure something that only takes a few nanoseconds.
This is what I see, Apple M1, two equal strings:
Old:
StringEquals.equal 8 avgt 5 0.948 ± 0.001 us/op
StringEquals.equal 11 avgt 5 0.948 ± 0.004 us/op
StringEquals.equal 16 avgt 5 0.948 ± 0.001 us/op
StringEquals.equal 22 avgt 5 1.260 ± 0.002 us/op
StringEquals.equal 32 avgt 5 1.886 ± 0.001 us/op
StringEquals.equal 45 avgt 5 2.514 ± 0.001 us/op
StringEquals.equal 64 avgt 5 3.141 ± 0.003 us/op
StringEquals.equal 91 avgt 5 4.395 ± 0.002 us/op
StringEquals.equal 121 avgt 5 5.653 ± 0.014 us/op
StringEquals.equal 181 avgt 5 8.011 ± 0.010 us/op
StringEquals.equal 256 avgt 5 11.433 ± 0.014 us/op
StringEquals.equal 512 avgt 5 23.005 ± 0.124 us/op
StringEquals.equal 1024 avgt 5 49.185 ± 0.032 us/op
Your patch:
Benchmark (size) Mode Cnt Score Error Units
StringEquals.equal 8 avgt 5 1.574 ± 0.001 us/op
StringEquals.equal 11 avgt 5 1.734 ± 0.004 us/op
StringEquals.equal 16 avgt 5 1.888 ± 0.002 us/op
StringEquals.equal 22 avgt 5 1.892 ± 0.003 us/op
StringEquals.equal 32 avgt 5 2.517 ± 0.003 us/op
StringEquals.equal 45 avgt 5 2.988 ± 0.002 us/op
StringEquals.equal 64 avgt 5 2.517 ± 0.003 us/op
StringEquals.equal 91 avgt 5 8.659 ± 0.007 us/op
StringEquals.equal 121 avgt 5 5.649 ± 0.007 us/op
StringEquals.equal 181 avgt 5 6.050 ± 0.009 us/op
StringEquals.equal 256 avgt 5 7.088 ± 0.016 us/op
StringEquals.equal 512 avgt 5 14.163 ± 0.018 us/op
StringEquals.equal 1024 avgt 5 29.998 ± 0.052 us/op
As you can see, we're looking at regressions all the way up to size=45,
with something very odd happening at size=91. Finally the vectorized
code starts to pull ahead at size=181.
A few things:
You should never be executing the TAIL unless the string is really
short. Just do one pair of unaligned loads at the end to finish.
Please don't use aliases for rscratch1 and rscratch2. Calling them tmp1
and tmp2 doesn't help the reader.
So: please make sure the smaller strings are at least as good as
they are now. Remember strings are usually short, so we can tolerate
no regressions with the smaller sizes.
I don't think that Neon does any good here. This is what I get by rewriting
(just) the stub with scalar registers, in the attached patch:
Benchmark (size) Mode Cnt Score Error Units
StringEquals.equal 8 avgt 5 1.574 ± 0.004 us/op
StringEquals.equal 11 avgt 5 1.734 ± 0.003 us/op
StringEquals.equal 16 avgt 5 1.888 ± 0.002 us/op
StringEquals.equal 22 avgt 5 1.891 ± 0.003 us/op
StringEquals.equal 32 avgt 5 2.517 ± 0.001 us/op
StringEquals.equal 45 avgt 5 2.988 ± 0.002 us/op
StringEquals.equal 64 avgt 5 2.595 ± 0.004 us/op
StringEquals.equal 91 avgt 5 4.083 ± 0.006 us/op
StringEquals.equal 121 avgt 5 5.432 ± 0.006 us/op
StringEquals.equal 181 avgt 5 6.292 ± 0.009 us/op
StringEquals.equal 256 avgt 5 7.232 ± 0.008 us/op
StringEquals.equal 512 avgt 5 13.304 ± 0.012 us/op
StringEquals.equal 1024 avgt 5 25.537 ± 0.012 us/op
I use an editor with automatic indentation, as do many people, so
I inserted brackets in the right places in the assembly code.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8268229.patch
Type: text/x-patch
Size: 12464 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20210629/61ccd20c/8268229-0001.patch>
More information about the hotspot-dev
mailing list