<i18n dev> RFR: 8302871: Speed up StringLatin1.regionMatchesCI

Eirik Bjorsnos duke at openjdk.org
Mon Feb 20 13:19:17 UTC 2023


On Sat, 18 Feb 2023 09:21:25 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:

> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying 'the oldest ASCII trick in the book'.
> 
> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is updated to use `equalsIgnoreCase`
> 
> To verify the correctness of `equalsIgnoreCase`, a new test is added  to `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 code point pairs have an `equalsIgnoreCase` consistent with Character.toUpperCase, Character.toLowerCase.
> 
> Performance is tested for matching and mismatching cases of code point pairs picked from the ASCII letter, ASCII number and latin1 letter ranges. Results in the first comment below.

Benchmark results:

Baseline:


Benchmark                                  (codePoints)  (size)  Mode  Cnt     Score    Error  Units
RegionMatchesIC.Latin1.regionMatchesIC      ascii-match    1024  avgt   15  2216.525 ± 79.626  ns/op
RegionMatchesIC.Latin1.regionMatchesIC   ascii-mismatch    1024  avgt   15     5.049 ±  0.044  ns/op
RegionMatchesIC.Latin1.regionMatchesIC     number-match    1024  avgt   15   708.977 ± 19.381  ns/op
RegionMatchesIC.Latin1.regionMatchesIC  number-mismatch    1024  avgt   15     3.726 ±  0.036  ns/op
RegionMatchesIC.Latin1.regionMatchesIC       lat1-match    1024  avgt   15  2134.499 ± 23.064  ns/op
RegionMatchesIC.Latin1.regionMatchesIC    lat1-mismatch    1024  avgt   15     4.227 ±  0.070  ns/op


Patch:


Benchmark                                  (codePoints)  (size)  Mode  Cnt     Score    Error  Units
RegionMatchesIC.Latin1.regionMatchesIC      ascii-match    1024  avgt   15   809.729 ± 40.257  ns/op
RegionMatchesIC.Latin1.regionMatchesIC   ascii-mismatch    1024  avgt   15     4.334 ±  0.031  ns/op
RegionMatchesIC.Latin1.regionMatchesIC     number-match    1024  avgt   15   370.814 ± 39.790  ns/op
RegionMatchesIC.Latin1.regionMatchesIC  number-mismatch    1024  avgt   15     3.766 ±  0.072  ns/op
RegionMatchesIC.Latin1.regionMatchesIC       lat1-match    1024  avgt   15  1247.979 ±  7.826  ns/op
RegionMatchesIC.Latin1.regionMatchesIC    lat1-mismatch    1024  avgt   15     4.819 ±  0.026  ns/op

-------------

PR: https://git.openjdk.org/jdk/pull/12632


More information about the i18n-dev mailing list