[aarch64-port-dev ] [10] RFR(S): JDK-8184943: AARCH64: Intrinsify hasNegatives
Dmitrij Pochepko
dmitrij.pochepko at bell-sw.com
Fri Aug 11 17:30:20 UTC 2017
Hi,
please review a new version of this RFR [1] which is significantly
re-worked.
Changes compared to original posting:
- 2 versions of hasNegatives intrinsic were merged, which result in good
performance for both small and large array.
- large array case and "at-the-end-of-mem-page" case were moved to stub
to save code cache and help register allocator
Raw performance numbers for the original
hasNegativesBench.loopingFastMethod [2] are here[3] and accompanied by
updated comparison charts for Raspberry Pi 3 [4] and ThunderX T88 [5].
In short, intrinsified hasNegatives is x4 faster on T88 and x2.5 on R-Pi
for 31 byte array and up to 8 times faster on large arrays.
I've also created small and simple benchmark [6] which demonstrates
performance difference for string constructor for strings without
negative byte values. Raw results [7] shows significantly increased
performance on Thunder X T88. Results also can be seen on comparison
charts [8]. Due to large amount of allocations and gc this benchmark is
not applicable for R-Pi, which has 1GB system memory and sd-card as main
drive.
This patch should be considered as patch with 2 contributors
(stuart.monteith at linaro.org and dmitrij.pochepko at bell-sw.com (openjdk
login dpochepk)).
Also I'd like to thank Andrew Haley for early
reviews and consulting.
No regressions were found via jtreg tests.
Thanks,
Dmitrij
[1] Webrev: http://cr.openjdk.java.net/~dpochepk/8184943/webrev.02/
[2] http://cr.openjdk.java.net/~aph/HasNegativesBench/
[3] http://cr.openjdk.java.net/~dpochepk/8184943/perf_numbers.txt
[4] http://cr.openjdk.java.net/~dpochepk/8184943/Cortex_A53_comparison.png
[5] http://cr.openjdk.java.net/~dpochepk/8184943/ThunderX_comparison.png
[6]
http://cr.openjdk.java.net/~dpochepk/8184943/StringConstructorBench.java
[7] http://cr.openjdk.java.net/~dpochepk/8184943/StringConstructorBench.txt
[8]
http://cr.openjdk.java.net/~dpochepk/8184943/ThunderX-StringConstructor.png
On 21.07.2017 11:26, Andrew Haley wrote:
> On 20/07/17 19:27, Dmitrij Pochepko wrote:
>> Probably best way would be to merge large data loads from my patch and
>> Stuart's lightning-fast small arrays handling.
> Yes.
>
>> I'll be happy to merge these ideas in one intrinsic that works fastest
>> on small and large arrays if Stuart does not mind. I could use some help
>> testing the final solution on some of the HW we don't have. I don't mind
>> if Stuart want to merge it, then we'll help him with testing on h/w he
>> doesn't have.
> Have fun! The performance to care about is small strings (< 31 bytes) and,
> less commonly, very long ones. Super-fast handling of small strings is
> very important.
>
More information about the aarch64-port-dev
mailing list