RFR(M): 8189112 - AARCH64: optimize StringUTF16 compress intrinsic
Dmitrij Pochepko
dmitrij.pochepko at bell-sw.com
Tue May 8 13:26:43 UTC 2018
Hi all,
please review patch for 8189112 - AARCH64: optimize StringUTF16 compress
intrinsic
This patch is based on 3 improvement ideas:
- introduction of additional large loop with prefetch instruction for
long strings
- different compression implementation, using uzp1 and uzp2 instructions
instead of uqxtn and uqxtn2, which are more expensive. It also allows to
drop direct FPSR register operations, which are very slow on some CPUs.
- slightly another codeshape, which mostly executes branches and
independent operations while loads and stores are used (helps "in-order"
CPUs)
benchmarks: I created JMH benchmark with direct call via reflection:
http://cr.openjdk.java.net/~dpochepk/8189112/StrCompressBench.java
Tested CPUs: ThunderX, ThunderX2, Cortex A73.
Performance results summary:
ThunderX: 3-5% improvement on small strings on average, x1.65 (40%) on
large strings
ThunderX2: same results on strings with length <8, up to x1.65 (40%) for
size 8..64, about x4 (80%) improvement for large strings
Cortex A73: up to 8% on small strings, up to x1.65 (40%) on large strings
Detailed results table can be found here:
http://cr.openjdk.java.net/~dpochepk/8189112/str-compress.xls
webrev: http://cr.openjdk.java.net/~dpochepk/8189112/webrev.01/
CR: https://bugs.openjdk.java.net/browse/JDK-8189112
Testing:
- hotspot jtreg tests using release build: ./compiler/*, ./gc/* and
./runtime/*
- hotspot jtreg tests using fastdebug build: ./compiler/*
No new failures found
Thanks,
Dmitrij
More information about the hotspot-compiler-dev
mailing list