RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops
Claes Redestad
redestad at openjdk.java.net
Fri Mar 4 17:18:06 UTC 2022
On Fri, 4 Mar 2022 15:54:14 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:
> Despite the hash value being cached for Strings, computing the hash still represents a significant CPU usage for applications handling lots of text.
>
> Even though it would be generally better to do it through an enhancement to the autovectorizer, the complexity of doing it by hand is trivial and the gain is sizable (2x speedup) even without the Vector API. The algorithm has been proposed by Richard Startin and Paul Sandoz [1].
>
> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`
>
>
> Benchmark (size) Mode Cnt Score Error Units
> StringHashCode.Algorithm.scalar 0 avgt 1.799 ns/op
> StringHashCode.Algorithm.scalar 1 avgt 3.233 ns/op
> StringHashCode.Algorithm.scalar 10 avgt 6.617 ns/op
> StringHashCode.Algorithm.scalar 100 avgt 61.481 ns/op
> StringHashCode.Algorithm.scalar 1000 avgt 631.690 ns/op
> StringHashCode.Algorithm.scalar 10000 avgt 6293.611 ns/op
> StringHashCode.Algorithm.scalarUnrolled8 0 avgt 1.890 ns/op
> StringHashCode.Algorithm.scalarUnrolled8 1 avgt 3.494 ns/op
> StringHashCode.Algorithm.scalarUnrolled8 10 avgt 9.050 ns/op
> StringHashCode.Algorithm.scalarUnrolled8 100 avgt 31.725 ns/op
> StringHashCode.Algorithm.scalarUnrolled8 1000 avgt 321.031 ns/op
> StringHashCode.Algorithm.scalarUnrolled8 10000 avgt 3203.838 ns/op
> StringHashCode.Algorithm.scalarUnrolled16 0 avgt 1.953 ns/op
> StringHashCode.Algorithm.scalarUnrolled16 1 avgt 3.485 ns/op
> StringHashCode.Algorithm.scalarUnrolled16 10 avgt 7.041 ns/op
> StringHashCode.Algorithm.scalarUnrolled16 100 avgt 30.975 ns/op
> StringHashCode.Algorithm.scalarUnrolled16 1000 avgt 316.616 ns/op
> StringHashCode.Algorithm.scalarUnrolled16 10000 avgt 3208.658 ns/op
>
>
> At Datadog, we handle a great amount of text (through logs management for example), and hashing String represents a large part of our CPU usage. It's very unlikely that we are the only one as String.hashCode is such a core feature of the JVM-based languages with its use in HashMap for example. Having even only a 2x speedup would allow us to save thousands of CPU cores per month and improve correspondingly the energy/carbon impact.
>
> [1] https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf
My only comment is that I'd like to see some benchmark verifying also the UTF-16 code path. It should see a similar speed-up, but adding plenty of calls might mess up compilation and inlining.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7700
More information about the core-libs-dev
mailing list