RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

Tue Mar 8 16:47:02 UTC 2022

On Tue, 8 Mar 2022 15:13:46 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add UTF-16 benchmarks
>
> An awkward effect of this implementation is that it perturbs results on small Strings a bit. Adding a branch in the trivial case, but also regressing on certain lengths (try size=7). The added complexity seem to be no issue for JITs in these microbenchmarks, but I worry that the increased code size might play tricks with inlining heuristics in real applications.
> 
> After chatting a bit with @richardstartin regarding the observation that preventing a strength reduction on the constant 31 value being part of the improvement I devised an experiment which simply makes the 31 non-constant as to disable the strength reduction:
> 
>         private static int base = 31;
>         @Benchmark
>         public int scalarLatin1_NoStrengthReduction() {
>             int h = 0;
>             int i = 0, len = latin1.length;
>             for (; i < len; i++) {
>                 h = base * h + (latin1[i] & 0xff);
>             }
>             return h;
>         }
> 
> Interestingly results of that get planted in the middle of the baseline on large inputs, while avoiding most of the irregularities on small inputs compared to manually unrolled versions:
> 
> Benchmark                                                  (size)  Mode  Cnt   Score   Error  Units
> StringHashCode.Algorithm.scalarLatin1                           1  avgt    3   2.910 ? 0.068  ns/op
> StringHashCode.Algorithm.scalarLatin1                           7  avgt    3   6.530 ? 0.065  ns/op
> StringHashCode.Algorithm.scalarLatin1                           8  avgt    3   7.472 ? 0.034  ns/op
> StringHashCode.Algorithm.scalarLatin1                          15  avgt    3  12.750 ? 0.028  ns/op
> StringHashCode.Algorithm.scalarLatin1                         100  avgt    3  99.190 ? 0.618  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16                 1  avgt    3   3.119 ? 0.015  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16                 7  avgt    3  11.556 ? 4.690  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16                 8  avgt    3   7.740 ? 0.005  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16                15  avgt    3  13.030 ? 0.124  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16               100  avgt    3  46.470 ? 0.496  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8                  1  avgt    3   3.123 ? 0.057  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8                  7  avgt    3  11.380 ? 0.085  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8                  8  avgt    3   5.849 ? 0.583  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8                 15  avgt    3  12.312 ? 0.025  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8                100  avgt    3  45.751 ? 0.146  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction       1  avgt    3   3.173 ? 0.015  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction       7  avgt    3   5.229 ? 0.455  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction       8  avgt    3   5.679 ? 0.012  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction      15  avgt    3   8.731 ? 0.103  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction     100  avgt    3  70.954 ? 3.386  ns/op
> 
> I wonder if this might be a safer play while we investigate intrinsification and other possible enhancements?

@cl4es Yes, we would need to carefully measure the impact for small array sizes (similar to what we had to do when the array mismatch intrinsic was implemented and applied to array equals). My sense is to focus on the intrinsic and also look for potential opportunities like @merykitty points out, as that is where the larger impact is, although it is more work!

-------------

PR: https://git.openjdk.java.net/jdk/pull/7700