RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]
Paul Sandoz
psandoz at openjdk.java.net
Tue Mar 8 16:47:02 UTC 2022
On Tue, 8 Mar 2022 15:13:46 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Add UTF-16 benchmarks
>
> An awkward effect of this implementation is that it perturbs results on small Strings a bit. Adding a branch in the trivial case, but also regressing on certain lengths (try size=7). The added complexity seem to be no issue for JITs in these microbenchmarks, but I worry that the increased code size might play tricks with inlining heuristics in real applications.
>
> After chatting a bit with @richardstartin regarding the observation that preventing a strength reduction on the constant 31 value being part of the improvement I devised an experiment which simply makes the 31 non-constant as to disable the strength reduction:
>
> private static int base = 31;
> @Benchmark
> public int scalarLatin1_NoStrengthReduction() {
> int h = 0;
> int i = 0, len = latin1.length;
> for (; i < len; i++) {
> h = base * h + (latin1[i] & 0xff);
> }
> return h;
> }
>
> Interestingly results of that get planted in the middle of the baseline on large inputs, while avoiding most of the irregularities on small inputs compared to manually unrolled versions:
>
> Benchmark (size) Mode Cnt Score Error Units
> StringHashCode.Algorithm.scalarLatin1 1 avgt 3 2.910 ? 0.068 ns/op
> StringHashCode.Algorithm.scalarLatin1 7 avgt 3 6.530 ? 0.065 ns/op
> StringHashCode.Algorithm.scalarLatin1 8 avgt 3 7.472 ? 0.034 ns/op
> StringHashCode.Algorithm.scalarLatin1 15 avgt 3 12.750 ? 0.028 ns/op
> StringHashCode.Algorithm.scalarLatin1 100 avgt 3 99.190 ? 0.618 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 3 3.119 ? 0.015 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 7 avgt 3 11.556 ? 4.690 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 8 avgt 3 7.740 ? 0.005 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 15 avgt 3 13.030 ? 0.124 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 3 46.470 ? 0.496 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8 1 avgt 3 3.123 ? 0.057 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8 7 avgt 3 11.380 ? 0.085 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8 8 avgt 3 5.849 ? 0.583 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8 15 avgt 3 12.312 ? 0.025 ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 3 45.751 ? 0.146 ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 1 avgt 3 3.173 ? 0.015 ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 7 avgt 3 5.229 ? 0.455 ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 8 avgt 3 5.679 ? 0.012 ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 15 avgt 3 8.731 ? 0.103 ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 100 avgt 3 70.954 ? 3.386 ns/op
>
> I wonder if this might be a safer play while we investigate intrinsification and other possible enhancements?
@cl4es Yes, we would need to carefully measure the impact for small array sizes (similar to what we had to do when the array mismatch intrinsic was implemented and applied to array equals). My sense is to focus on the intrinsic and also look for potential opportunities like @merykitty points out, as that is where the larger impact is, although it is more work!
-------------
PR: https://git.openjdk.java.net/jdk/pull/7700
More information about the core-libs-dev
mailing list