[aarch64-port-dev ] [10] RFR(S): JDK-8184943: AARCH64: Intrinsify hasNegatives

Mon Aug 14 10:47:54 UTC 2017

Thanks Dmitrij,
  I'll look at what you've done and try your patch on my machines.

BR,
   Stuart

On 11 August 2017 at 18:30, Dmitrij Pochepko
<dmitrij.pochepko at bell-sw.com> wrote:
> Hi,
>
> please review a new version of this RFR [1] which is significantly
> re-worked.
>
>
> Changes compared to original posting:
>
> - 2 versions of hasNegatives intrinsic were merged, which result in good
> performance for both small and large array.
>
> - large array case and "at-the-end-of-mem-page" case were moved to stub to
> save code cache and help register allocator
>
>
> Raw performance numbers for the original hasNegativesBench.loopingFastMethod
> [2] are here[3] and accompanied by updated comparison charts for Raspberry
> Pi 3 [4] and ThunderX T88 [5]. In short, intrinsified hasNegatives is x4
> faster on T88 and x2.5 on R-Pi for 31 byte array and up to 8 times faster on
> large arrays.
>
> I've also created small and simple benchmark [6] which demonstrates
> performance difference for string constructor for strings without negative
> byte values.  Raw results [7] shows significantly increased performance on
> Thunder X T88. Results also can be seen on comparison charts [8]. Due to
> large amount of allocations and gc this benchmark is not applicable for
> R-Pi, which has 1GB system memory and sd-card as main drive.
>
>
> 
This patch should be considered as patch with 2 contributors
> (stuart.monteith at linaro.org and dmitrij.pochepko at bell-sw.com (openjdk login
> dpochepk)). Also I'd like to thank Andrew Haley for early reviews and
> consulting.
>
> No regressions were found via jtreg tests.
>
> Thanks,
>
> Dmitrij
>
>
> [1] Webrev: http://cr.openjdk.java.net/~dpochepk/8184943/webrev.02/
> [2] http://cr.openjdk.java.net/~aph/HasNegativesBench/
> [3] http://cr.openjdk.java.net/~dpochepk/8184943/perf_numbers.txt
> [4] http://cr.openjdk.java.net/~dpochepk/8184943/Cortex_A53_comparison.png
> [5] http://cr.openjdk.java.net/~dpochepk/8184943/ThunderX_comparison.png
> [6]
> http://cr.openjdk.java.net/~dpochepk/8184943/StringConstructorBench.java
> [7] http://cr.openjdk.java.net/~dpochepk/8184943/StringConstructorBench.txt
> [8]
> http://cr.openjdk.java.net/~dpochepk/8184943/ThunderX-StringConstructor.png
>
>
> On 21.07.2017 11:26, Andrew Haley wrote:
>>
>> On 20/07/17 19:27, Dmitrij Pochepko wrote:
>>>
>>> Probably best way would be to merge large data loads from my patch and
>>> Stuart's lightning-fast small arrays handling.
>>
>> Yes.
>>
>>> I'll be happy to merge these ideas in one intrinsic that works fastest
>>> on small and large arrays if Stuart does not mind. I could use some help
>>> testing the final solution on some of the HW we don't have. I don't mind
>>> if Stuart want to merge it, then we'll help him with testing on h/w he
>>> doesn't have.
>>
>> Have fun!  The performance to care about is small strings (< 31 bytes)
>> and,
>> less commonly, very long ones.  Super-fast handling of small strings is
>> very important.
>>
>