RFR(M): 8189112 - AARCH64: optimize StringUTF16 compress intrinsic

Tue May 8 13:26:43 UTC 2018

Hi all,

please review patch for 8189112 - AARCH64: optimize StringUTF16 compress 
intrinsic

This patch is based on 3 improvement ideas:

- introduction of additional large loop with prefetch instruction for 
long strings
- different compression implementation, using uzp1 and uzp2 instructions 
instead of uqxtn and uqxtn2, which are more expensive. It also allows to 
drop direct FPSR register operations, which are very slow on some CPUs.
- slightly another codeshape, which mostly executes branches and 
independent operations while loads and stores are used (helps "in-order" 
CPUs)

benchmarks: I created JMH benchmark with direct call via reflection: 
http://cr.openjdk.java.net/~dpochepk/8189112/StrCompressBench.java

Tested CPUs: ThunderX, ThunderX2, Cortex A73.

Performance results summary:
ThunderX: 3-5% improvement on small strings on average, x1.65 (40%) on 
large strings
ThunderX2: same results on strings with length <8, up to x1.65 (40%) for 
size 8..64, about x4 (80%) improvement for large strings
Cortex A73: up to 8% on small strings, up to x1.65 (40%) on large strings

Detailed results table can be found here: 
http://cr.openjdk.java.net/~dpochepk/8189112/str-compress.xls

webrev: http://cr.openjdk.java.net/~dpochepk/8189112/webrev.01/

CR: https://bugs.openjdk.java.net/browse/JDK-8189112

Testing:

- hotspot jtreg tests using release build: ./compiler/*, ./gc/* and 
./runtime/*
- hotspot jtreg tests using fastdebug build: ./compiler/*

No new failures found

Thanks,

Dmitrij