<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>
</p>
<div class="moz-text-html" lang="x-unicode">
<p> </p>
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-unicode">Hi all, <br>
<br>
please review patch for JDK-8189103: AARCH64: optimize String
indexOf intrinsic <br>
<br>
indexOf instrinsic initially had 6 different algorithms(selected
depending on strings length): pattern length 1, 2, 3, 4,
<generic linear search>, Boyer Moore algorithm. <br>
<br>
This patch consists of few several improvement ideas: <br>
<br>
1) minimally updated algorithms for pattern length 1,2,3,4,
<generic linear search> by removing 1 unnecessary
dependent instruction and moving 1 common instruction into
common part for length 2,3,4, <generic linear search>. It
slightly improves performance for these cases(about 1%). <br>
<br>
2) large update of Boyer Moore algorithm implementation by
increasing search table size from 128 to 256 to remove few
branches for Latin1 encoding cases and also improve performance
for cases when Latin1 symbols with values >128 are met. This
patch also significantly improves search of Latin1 string inside
UTF string by upgrading algorithm logic(the idea is to skip
<pattern length> symbols in case pure UTF symbol is met in
source string). <br>
<br>
3) separate implementation for generic linear search is
added(moved to stub). It requires more time for initialization,
but is faster after initialization is done, which allows to
improve performance for long strings<br>
<br>
4) removed dead code in .ad file. LU case (search of UTF string
inside Latin1 string) always return -1 and handled in java
before calling this instrinsic <br>
<br>
<br>
Benchmarking and results(used Cavium ThunderX (T88), Cavium
ThunderX2 (T99) and Cortex A73): <br>
<br>
I created microbenchmark, which measures different pattern size
search in different source size string: <a
class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Edpochepk/8189103/IndexOfBench.java">http://cr.openjdk.java.net/~dpochepk/8189103/IndexOfBench.java</a>
<br>
<br>
Overall results: <br>
<br>
- about same results for minimally updated algorithm paths<br>
<br>
- large difference for Boyer Moore algorithm: up to x40
improvement for large string with variety of different
characters<br>
<br>
- significant difference for sizes affected by new linear search
in stub: up to 30% improvement <br>
<br>
<br>
Results matrix is large:
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~dpochepk/8189103/str_indexof_result.xls">http://cr.openjdk.java.net/~dpochepk/8189103/str_indexof_result.xls</a> <br>
<br>
Raw results: *.txt files in <a class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Edpochepk/8189103/">http://cr.openjdk.java.net/~dpochepk/8189103/</a><br>
<br>
<br>
webrev: <a class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Edpochepk/8189103/webrev.01/">http://cr.openjdk.java.net/%7Edpochepk/8189103/webrev.01/</a><br>
<br>
CR: <a class="moz-txt-link-freetext" href="https://bugs.openjdk.java.net/browse/JDK-8189103">https://bugs.openjdk.java.net/browse/JDK-8189103</a><br>
<br>
Testing: <br>
<br>
- run hotspot jtreg tests(compiler, runtime, gc) over patched
release build and found no new failures. <br>
<br>
- run additional "brute force" tests which checks all index
combinations for specified lengths(takes much time to run): <a
class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Edpochepk/8189103/IndexOfTest.java">http://cr.openjdk.java.net/~dpochepk/8189103/IndexOfTest.java</a>
using both release and fastdebug builds<br>
Thanks, <br>
<br>
Dmitrij <br>
<br>
</div>
</div>
</body>
</html>