Withdrawn: 8302872: Speed up StringLatin1.regionMatchesCI_UTF16
duke
duke at openjdk.org
Tue May 9 23:22:28 UTC 2023
On Sat, 18 Feb 2023 18:22:49 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
> This PR continues the efforts from #12632 to speed up case-insensitive string matching.
>
> We now tackle case-insensitive comparison of mixed-coder strings, implemented in `StringLatin1.regionMatchesCI_UTF16`
>
> Key insights:
>
> - If the UTF16 code point is also in latin1 range, we can leverage improvements from 12632 directly by calling `CharacterDataLatin1.equalsIgnoreCase`
> - There are exactly 7 non-latin1 Unicode code points which case fold into the latin1 range. We can special-case our comparison of these code points by adding the method `CharacterDataLatin1.latin1CaseFold`.
> - To avoid checking of `a == b` twice, this check is lifted out of `CharacterDataLatin1.equalsIgnoreCase` and the two callers are updated to check that `a != b` before calling the method.
>
> For completeness, the RegionMatches test is updated to also compare Turkic dotted/dotless 'i's against the uppercase ASCII 'I', not just the lowercase one. Not stricktly related to the purpose of this PR, but it did help catch a regression introduced in an earlier iteration of the PR.
>
> To guard against regressions caused by future changes to the set of Unicode code points folding into latin1, a new test is added to `EqualsIgnoreCase` which identifies all such code points and verifies they are compared correcty.
>
> Performance is tested for matching and mismatching cases of selected code point pairs picked from the ASCII letter, ASCII number, latin1 letter and non-latin Unicode letter ranges. Results in the first comment below.
This pull request has been closed without being integrated.
-------------
PR: https://git.openjdk.org/jdk/pull/12637
More information about the core-libs-dev
mailing list