<i18n dev> RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]

Martin Buchholz martin at openjdk.org
Wed Feb 22 16:33:39 UTC 2023


On Wed, 22 Feb 2023 07:11:16 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:

>> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying 'the oldest ASCII trick in the book'.
>> 
>> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is updated to use `equalsIgnoreCase`
>> 
>> To verify the correctness of `equalsIgnoreCase`, a new test is added  to `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 code point pairs have an `equalsIgnoreCase` consistent with Character.toUpperCase, Character.toLowerCase.
>> 
>> Performance is tested for matching and mismatching cases of code point pairs picked from the ASCII letter, ASCII number and latin1 letter ranges. Results in the first comment below.
>
> Eirik Bjorsnos has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase with "by removing (setting) a single bit"
>  - Align local variable naming in toLowerCase, toUpperCase with equalsIgnoreCase by using 'lower' and 'upper'

Marked as reviewed by martin (Reviewer).

test/jdk/java/lang/String/CompactString/EqualsIgnoreCase.java line 89:

> 87:         for (int ab = 0; ab < 256; ab++) {
> 88:             for (int bb = 0; bb < 256; bb++) {
> 89:                 char a = (char) ab, b = (char) bb;

char is an unsigned numeric type, so cleaner is

for (char a = 0; a < 256; a++) 
for (char b = 0; b < 256; b++)

-------------

PR: https://git.openjdk.org/jdk/pull/12632


More information about the i18n-dev mailing list