RFR: AARCH64: optimize string compare intrinsic
Paul Sandoz
paul.sandoz at oracle.com
Thu May 3 19:41:32 UTC 2018
Hi Dmitrij,
> On May 3, 2018, at 11:58 AM, Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com> wrote:
>
> Hi Paul,
>
> Actually, vectorizedMismatch has more in common with array equals, which is a more generic version of the same algorithm.
>
Since vectorizedMismatch is also used for lexicographical array comparison it still might be applicable for string comparison *if* the character encodings of the two strings are the same.
Opportunistically, my hope was that the string comparison intrinsics code could be reduced to focus on strings of different encodings, thereby potentially simplifying HotSpot code. That could apply across all platforms that support the vectorizedMismatch intrinsic.
Paul.
> Unfortunately, vectorizedMismatch intrinsic is not yet implemented for AARCH64 (we're working on it as well and will try to reuse the code assuming there is no significant performance impact).
>
> CC'ing Boris, who is working on vectorizedMismatch.
>
>
> Thanks,
>
> Dmitrij
>
>
> On 01.05.2018 01:21, Paul Sandoz wrote:
>> Hi Dmitrij,
>>
>> Here is a somewhat lateral thought, it might have some legs...
>>
>> For the case when the encoding of the compared strings are the same have you considered changing the string compare implementations to use the array mismatch functionality (see jdk.internal.util.ArraysSupport.vectorizedMismatch) and then optimize that for AARCH64, if not already done so. It may simplify things in some respects but it would also broaden the performance impact to arrays and buffers.
>>
>> Paul.
>>
>>> On Apr 28, 2018, at 11:29 AM, Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com> wrote:
>>>
>>>
>>>
>>> Hi all,
>>>
>>> please review patch for 8202326: AARCH64: optimize string compare intrinsic
>>>
>>> This patch introduces string compareTo stub, which uses large loops with prefetch instructions. Stub is called for long strings and improves String::compareTo up to 4 times on systems without hardware prefetching (ThunderX) and up to 2 times on systems with hardware prefetching (ThunderX2). Also inlined code is re-arranged with more optimal pipelining, which helps in-order systems, so small strings are also slightly improved.
>>> There are no noticeable regressions according to benchmark results.
>>>
>>> I created benchmark to measure improvement: http://cr.openjdk.java.net/~dpochepk/8202326/StringCompareBench.java
>>>
>>> Execution matrix is large and can be seen here: http://cr.openjdk.java.net/~dpochepk/8202326/str_compare.xls
>>>
>>> Raw results are *.txt files here: http://cr.openjdk.java.net/~dpochepk/8202326/
>>>
>>> webrev: http://cr.openjdk.java.net/~dpochepk/8202326/webrev.01/
>>>
>>> CR: https://bugs.openjdk.java.net/browse/JDK-8202326
>>>
>>> testing: I run jtreg hotspot tests: compiler/* gc/* runtime/* using fastdebug build and found no new failures. I also run long "bruteforce" test which checks all combinations of different character index for all strings up to size 512: http://cr.openjdk.java.net/~dpochepk/8202326/StrCmpTest.java
>>>
>>>
>>> Additional note: this patch depends on zip2 instruction encoding fix: JDK-8202395
>>>
>>> Thanks,
>>>
>>> Dmitrij
>>>
>
More information about the hotspot-compiler-dev
mailing list