RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops
Ludovic Henry
luhenry at openjdk.java.net
Fri Mar 4 17:33:23 UTC 2022
On Fri, 4 Mar 2022 17:01:05 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>> Despite the hash value being cached for Strings, computing the hash still represents a significant CPU usage for applications handling lots of text.
>>
>> Even though it would be generally better to do it through an enhancement to the autovectorizer, the complexity of doing it by hand is trivial and the gain is sizable (2x speedup) even without the Vector API. The algorithm has been proposed by Richard Startin and Paul Sandoz [1].
>>
>> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> StringHashCode.Algorithm.scalar 0 avgt 1.799 ns/op
>> StringHashCode.Algorithm.scalar 1 avgt 3.233 ns/op
>> StringHashCode.Algorithm.scalar 10 avgt 6.617 ns/op
>> StringHashCode.Algorithm.scalar 100 avgt 61.481 ns/op
>> StringHashCode.Algorithm.scalar 1000 avgt 631.690 ns/op
>> StringHashCode.Algorithm.scalar 10000 avgt 6293.611 ns/op
>> StringHashCode.Algorithm.scalarUnrolled8 0 avgt 1.890 ns/op
>> StringHashCode.Algorithm.scalarUnrolled8 1 avgt 3.494 ns/op
>> StringHashCode.Algorithm.scalarUnrolled8 10 avgt 9.050 ns/op
>> StringHashCode.Algorithm.scalarUnrolled8 100 avgt 31.725 ns/op
>> StringHashCode.Algorithm.scalarUnrolled8 1000 avgt 321.031 ns/op
>> StringHashCode.Algorithm.scalarUnrolled8 10000 avgt 3203.838 ns/op
>> StringHashCode.Algorithm.scalarUnrolled16 0 avgt 1.953 ns/op
>> StringHashCode.Algorithm.scalarUnrolled16 1 avgt 3.485 ns/op
>> StringHashCode.Algorithm.scalarUnrolled16 10 avgt 7.041 ns/op
>> StringHashCode.Algorithm.scalarUnrolled16 100 avgt 30.975 ns/op
>> StringHashCode.Algorithm.scalarUnrolled16 1000 avgt 316.616 ns/op
>> StringHashCode.Algorithm.scalarUnrolled16 10000 avgt 3208.658 ns/op
>>
>>
>> At Datadog, we handle a great amount of text (through logs management for example), and hashing String represents a large part of our CPU usage. It's very unlikely that we are the only one as String.hashCode is such a core feature of the JVM-based languages with its use in HashMap for example. Having even only a 2x speedup would allow us to save thousands of CPU cores per month and improve correspondingly the energy/carbon impact.
>>
>> [1] https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf
>
> This straight forward enough. But I wonder if this should be a hotspot intrinsic and be able to take advantage vector machine instructions.
> I'd also expect there is an existing test that checks the value of String.hashCode.
@RogerRiggs I would be happy to do such work. As it would be a bigger change, are you suggesting that it could be done or that it should be done as an intrinsic?
@cl4es adding that right now.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7700
More information about the core-libs-dev
mailing list