RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v2]

Wang Huang whuang at openjdk.java.net
Thu Jun 24 09:26:30 UTC 2021


On Thu, 17 Jun 2021 09:28:19 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> With this change the size of the`string_equals` intrinsic increases by ~60% from 120 bytes to 196 bytes and this gets expanded at every `String.equals` call site. It looks good on a micro-benchmark but I wonder if on a larger program this improvement is outweighed by the negative effects of methods taking up more space in the icache.
>> 
>> That's an excellent point. There's no need at all for the Neon part to be expanded inline: it could be a subroutine. We'd have to use fixed Neon registers at the call site.
>
>> > With this change the size of the`string_equals` intrinsic increases by ~60% from 120 bytes to 196 bytes and this gets expanded at every `String.equals` call site. It looks good on a micro-benchmark but I wonder if on a larger program this improvement is outweighed by the negative effects of methods taking up more space in the icache.
>> 
>> That's an excellent point. There's no need at all for the Neon part to be expanded inline: it could be a subroutine. We'd have to use fixed Neon registers at the call site.
> 
> Thinking some more,we could use this opportunity to move as much of the bulk comparison code as we can out of line, hopefully achieving a reduction in footprint as well as an improvement in performance.

Dear @theRealAph @dgbo @nick-arm @mdinacci,
   I have pushed my recent patch. In this commit,
   * I have tested some cases as @theRealAph suggested and found some points
      1)  we changed the diff postions in the strings and get the data if we used neon in all cases
![image](https://user-images.githubusercontent.com/73928571/123235128-2c095680-d50e-11eb-95cf-c32d2b58a634.png)
       Due to this result, if the string is small, we used old implementaion. 
     2) The result of `8:64` in this figure is something like bugs, and I fixed it by unrolling the loop
       ```c++
      bind(LOOP); {
      ldr(tmp1, Address(post(a1, wordSize)));
      ldr(tmp2, Address(post(a2, wordSize)));
      subs(cnt1, cnt1, wordSize);
      eor(tmp1, tmp1, tmp2);
      cbnz(tmp1, DONE);
      br(LT, SHORT);

      ldr(tmp1, Address(post(a1, wordSize)));
      ldr(tmp2, Address(post(a2, wordSize)));
      subs(cnt1, cnt1, wordSize);
      eor(tmp1, tmp1, tmp2);
      cbnz(tmp1, DONE);
       } br(GE, LOOP);
      ```
      3)  `UseSimpleStringEquals` is added in this patch. If the option is `true` , we used old implentation. 
 * The result of my JMH is listed here ,
 
 **Diff postion is in the LAST 2/3 of whole string**

Benchmark                      |(size) |Mode |Cnt|  Score|  Error |Units
-------------------------------|-------|-----|---|-------|--------|-----
StringEquals.equalsLenT        |     8 |avgt | 10|  7.869|± 0.063 |ns/op
StringEquals.equalsLenT        |    16 |avgt | 10|  8.651|± 0.201 |ns/op
StringEquals.equalsLenT        |    32 |avgt | 10|  9.869|± 0.049 |ns/op
StringEquals.equalsLenT        |    64 |avgt | 10| 11.379|± 0.134 |ns/op
StringEquals.equalsLenT        |   128 |avgt | 10| 17.312|± 0.274 |ns/op
StringEquals.equalsLenT_simple |     8 |avgt | 10|  7.912|± 0.439 |ns/op
StringEquals.equalsLenT_simple |    16 |avgt | 10|  8.764|± 0.061 |ns/op
StringEquals.equalsLenT_simple |    32 |avgt | 10| 30.452|± 0.065 |ns/op
StringEquals.equalsLenT_simple |    64 |avgt | 10| 14.550|± 0.199 |ns/op
StringEquals.equalsLenT_simple |   128 |avgt | 10| 20.071|± 2.465 |ns/op

 **Diff postion is in the FIRST 1/3 of whole string**

Benchmark                     | (size) |Mode |Cnt | Score|  Error |Units
------------------------------|--------|-----|----|------|--------|-----
StringEquals.equalsLenH       |      8 |avgt | 10 | 7.822|± 0.148 |ns/op
StringEquals.equalsLenH       |     16 |avgt | 10 | 7.631|± 0.179 |ns/op
StringEquals.equalsLenH       |     32 |avgt | 10 | 8.553|± 0.064 |ns/op
StringEquals.equalsLenH       |     64 |avgt | 10 |11.944|± 0.554 |ns/op
StringEquals.equalsLenH       |    128 |avgt | 10 |12.691|± 0.091 |ns/op
StringEquals.equalsLenH_simple|      8 |avgt | 10 | 7.873|± 0.141 |ns/op
StringEquals.equalsLenH_simple|     16 |avgt | 10 | 7.972|± 0.556 |ns/op
StringEquals.equalsLenH_simple|     32 |avgt | 10 | 8.383|± 0.100 |ns/op
StringEquals.equalsLenH_simple|     64 |avgt | 10 |29.364|± 0.344 |ns/op
StringEquals.equalsLenH_simple|    128 |avgt | 10 |14.748|± 0.354 |ns/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/4423


More information about the hotspot-dev mailing list