RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3]

Wang Huang whuang at openjdk.java.net
Mon Jul 5 06:57:54 UTC 2021


On Fri, 2 Jul 2021 14:30:18 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670:
>> 
>>> 4668:       __ cbnz(rscratch1, NOT_EQUAL);
>>> 4669:       __ br(__ GE, LOOP);
>>> 4670: 
>> 
>> As I said before, we gain nothing by using Neon here.
>
> Much better:
> 
> 
> +	__ ldp(r5, r6, Address(__ post(a1, wordSize * 2)));
> +	__ ldp(rscratch1, rscratch2, Address(__ post(a2, wordSize * 2)));
> +	__ cmp(r5, rscratch1);
> +	__ ccmp(r6, rscratch2, 0, Assembler::EQ);
> +	__ br(__ NE, NOT_EQUAL);

We changed `ld1` into `ldp` and get the result as following, 

simple:
Benchmark |(size)| Mode| Cnt | Score| Error |Units
-------------------|------|-----|-----|-------|---------|-----
StringEquals.equal |45  |avgt  |5 | 6.105 | ? 0.635  |us/op
StringEquals.equal  |64  |avgt | 5  |7.226  |? 0.056  |us/op
StringEquals.equal  |91 | avgt  |5  |12.010  |? 0.375 | us/op
StringEquals.equal  |121  |avgt  |5  |14.772  |? 0.114 | us/op
StringEquals.equal  |181 | avgt  |5 | 21.468 | ? 0.676  |us/op
StringEquals.equal  |256 | avgt  |5  |28.942  |? 4.806  |us/op
StringEquals.equal | 512  |avgt | 5  |58.479  |? 5.918  |us/op
StringEquals.equal  |1024  |avgt  |5  |119.313 | ? 16.661 | us/op
 
ldp:
Benchmark |(size)| Mode| Cnt | Score| Error |Units
-------------------|------|-----|-----|-------|---------|-----
StringEquals.equal  |45  |avgt  |5  |6.449 | ? 0.202 |us/op
StringEquals.equal  |64  |avgt | 5  |7.367  |? 0.055 |us/op
StringEquals.equal  |91  |avgt  |5 | 9.984  |? 0.065 |us/op
StringEquals.equal | 121 | avgt | 5 | 12.540  |? 0.545| us/op
StringEquals.equal  |181  |avgt  |5 | 15.614  |? 0.280 |us/op
StringEquals.equal | 256  |avgt | 5  |19.346 | ? 0.243| us/op
StringEquals.equal | 512  |avgt  |5  |35.718 | ? 0.599 |us/op
StringEquals.equal  |1024  |avgt  |5  |67.846 | ? 0.439| us/op

neon:
Benchmark |(size)| Mode| Cnt | Score| Error |Units
-------------------|------|-----|-----|-------|---------|-----
StringEquals.equal  |45 | avgt | 5 | 5.883  |? 0.173 | us/op
StringEquals.equal | 64  |avgt  |5 | 6.737  |? 0.035  |us/op
StringEquals.equal | 91 | avgt  |5  |8.997  |? 0.215  |us/op
StringEquals.equal  |121 | avgt | 5 | 10.789  |? 0.386  |us/op
StringEquals.equal  |181  |avgt  |5  |14.063  |? 0.253  |us/op
StringEquals.equal  |256 | avgt  |5  |19.679 | ? 1.419  |us/op
StringEquals.equal  |512  |avgt  |5  |38.813  |? 1.378  |us/op
StringEquals.equal  |1024  |avgt  |5 | 77.769  |? 3.082 | us/op

>From the results, we can see that, 

* for small size (45~181), the performance of `ldp` version is not as good as `neon/ ld1` version
* for big size, `ldp` version is better that `neon/ld1` version
* all versions (both `ldp` and `ld1`) are better that old `simple` version .
* I agree with you `ldp` version is better than `ld1` version at **last patch** because I used 

__ ldr(v0, __ Q, Address(__ post(a1, wordSize * 2))); 
__ ldr(v1, __ Q, Address(__ post(a2, wordSize * 2)));

at last patch. However, I use 

__ ld1(v0, v1, __ T2D, Address(__ post(a1, loopThreshold)));
__ ld1(v2, v3, __ T2D, Address(__ post(a2, loopThreshold)));


in recent patch. I think this change has fixed the problem here.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4423


More information about the hotspot-dev mailing list