RFR: 8321283: Reuse StringLatin1::equals in regionMatches

Francesco Nigro duke at openjdk.org
Wed Nov 26 20:14:08 UTC 2025


On Sat, 2 Dec 2023 16:56:22 GMT, Francesco Nigro <duke at openjdk.org> wrote:

> This improvement has been found on https://github.com/vert-x3/vertx-web/pull/2526.
> 
> It can potentially affect the existing ArraysSupport.mismatch caller code-path performance ie requires investigation.

@schlosna 

Running `TEST="micro:java.lang.StringComparisons.regionMatches"` on AMD 7950x at 4.5 GHz with tuned network-latency profile on and turbo-boost disabled using `numactl --localalloc -N 0` to avoid weird NUMA-like effects on heap objects.

baseline at 25f9af99be1c906fc85b8192df8fa50cced3474f:

Benchmark                             (size)  (utf16)  Mode  Cnt    Score   Error  Units
StringComparisons.regionMatches            6     true  avgt    5    4.380 ? 0.030  ns/op
StringComparisons.regionMatches            6    false  avgt    5    5.772 ? 0.056  ns/op
StringComparisons.regionMatches           15     true  avgt    5    4.005 ? 0.104  ns/op
StringComparisons.regionMatches           15    false  avgt    5    4.030 ? 0.055  ns/op
StringComparisons.regionMatches         1024     true  avgt    5   30.037 ? 0.089  ns/op
StringComparisons.regionMatches         1024    false  avgt    5   17.734 ? 0.092  ns/op
StringComparisons.regionMatchesRange       6     true  avgt    5    4.825 ? 0.067  ns/op
StringComparisons.regionMatchesRange       6    false  avgt    5    5.878 ? 0.056  ns/op
StringComparisons.regionMatchesRange      15     true  avgt    5    5.736 ? 0.069  ns/op
StringComparisons.regionMatchesRange      15    false  avgt    5    5.447 ? 0.028  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt    5   31.169 ? 0.009  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt    5   16.614 ? 0.168  ns/op

With 1bd619a5bd2faa8057cb85105b2c9b4997fbf2ac :

Benchmark                             (size)  (utf16)  Mode  Cnt    Score   Error  Units
StringComparisons.regionMatches            6     true  avgt    5    3.535 ? 0.022  ns/op
StringComparisons.regionMatches            6    false  avgt    5    3.134 ? 0.022  ns/op
StringComparisons.regionMatches           15     true  avgt    5    2.568 ? 0.022  ns/op
StringComparisons.regionMatches           15    false  avgt    5    3.415 ? 0.017  ns/op
StringComparisons.regionMatches         1024     true  avgt    5   30.052 ? 0.070  ns/op
StringComparisons.regionMatches         1024    false  avgt    5   17.024 ? 0.111  ns/op
StringComparisons.regionMatchesRange       6     true  avgt    5    4.819 ? 0.010  ns/op
StringComparisons.regionMatchesRange       6    false  avgt    5    5.888 ? 0.083  ns/op
StringComparisons.regionMatchesRange      15     true  avgt    5    5.849 ? 0.106  ns/op
StringComparisons.regionMatchesRange      15    false  avgt    5    5.466 ? 0.069  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt    5   31.177 ? 0.015  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt    5   16.872 ? 0.387  ns/op

Which translate in a ~1.8 speedup for small sized ones (which is still a fairly common use case), while bigger ones seems unchanged. 
I'm adding some better benchmark to show the positive test case improvement as well.

The new commit, introducing the full positive use case (maybe relevant for the case with few characters) adds
an additional comparison vs the `String::equals` case (which will likely perform a bare minimum amount of checks, if compared to region matches).

baseline at 25f9af99be1c906fc85b8192df8fa50cced3474f:

Benchmark                            (size)  (utf16)  Mode  Cnt   Score   Error  Units
StringComparisons.same                    6     true  avgt    5   2.402 ? 0.028  ns/op
StringComparisons.same                    6    false  avgt    5   2.056 ? 0.056  ns/op
StringComparisons.same                   15     true  avgt    5   3.733 ? 0.161  ns/op
StringComparisons.same                   15    false  avgt    5   2.807 ? 0.214  ns/op
StringComparisons.same                 1024     true  avgt    5  23.485 ? 0.150  ns/op
StringComparisons.same                 1024    false  avgt    5  15.302 ? 0.232  ns/op

StringComparisons.sameRegionMatches       6     true  avgt    5   4.410 ? 0.078  ns/op
StringComparisons.sameRegionMatches       6    false  avgt    5   5.414 ? 0.028  ns/op
StringComparisons.sameRegionMatches      15     true  avgt    5   5.770 ? 0.021  ns/op
StringComparisons.sameRegionMatches      15    false  avgt    5   5.771 ? 0.035  ns/op
StringComparisons.sameRegionMatches    1024     true  avgt    5  30.964 ? 0.023  ns/op
StringComparisons.sameRegionMatches    1024    false  avgt    5  16.807 ? 0.181  ns/op

with 1bd619a5bd2faa8057cb85105b2c9b4997fbf2ac:

Benchmark                            (size)  (utf16)  Mode  Cnt   Score   Error  Units
StringComparisons.sameRegionMatches       6     true  avgt    5   3.442 ? 0.016  ns/op
StringComparisons.sameRegionMatches       6    false  avgt    5   3.117 ? 0.002  ns/op
StringComparisons.sameRegionMatches      15     true  avgt    5   4.759 ? 0.075  ns/op
StringComparisons.sameRegionMatches      15    false  avgt    5   3.813 ? 0.026  ns/op
StringComparisons.sameRegionMatches    1024     true  avgt    5  28.308 ? 0.058  ns/op
StringComparisons.sameRegionMatches    1024    false  avgt    5  16.774 ? 0.220  ns/op

As confirmed by the previous results, the better value of this PR is with small-sized strings, but it is yet to be verified if:
- due to a better tail-handing of the `equals` intrinsics
- larger strings are limited by the amount of cache activity

Running `perfnorm` against 25f9af99be1c906fc85b8192df8fa50cced3474f with `same` vs `sameRegionMatches` reveal that 
both have high IPC with an high number of branchs and L1 cache-loads (with nearly no misses), which means that both dominate the cost of computation despite the number of instructions on `sameRegionMatches` is higher.
In short, with bigger sized-strings, the difference between the 2 intrinsic just fade away, making who perform less instructions, to perform better.

@cl4es I can undraft this, but I have no powers to create a JDK issue myself :"/

Keep it alive!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838571952
PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838744137
PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838950133
PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-2048905394


More information about the core-libs-dev mailing list