<i18n dev> RFR: 8302871: Speed up StringLatin1.regionMatchesCI
David Schlosnagle
duke at openjdk.org
Mon Feb 20 13:19:21 UTC 2023
On Sat, 18 Feb 2023 19:45:34 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 181:
>>
>>> 179: return ( U <= 'Z' // In range A-Z
>>> 180: || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or A-grave-Thorn, excl. multiplication
>>> 181: && U == (b2 & 0xDF); // b2 has same uppercase
>>
>> I'm curious if the order of comparisons could alter performance to a small degree. For example, it might be interesting to compare various permutations like below to short circuit reject unequal uppercased b2
>>
>> Suggestion:
>>
>> // uppercase b1 using 'the oldest ASCII trick in the book'
>> int U = b1 & 0xDF;
>> return (U == (b2 & 0xDF))
>> && ((U >= 'A' && U <= 'Z') // In range A-Z
>> || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or A-grave-Thorn, excl. multiplication
>
> Yeah, as you noticed this code is tricky and sensitive to the order of operations. I did some quite extensive exploration before ending on the current structure. This particular one seems to improve rejection somewhat at the cost of matches.
>
> Since rejection is relatively speaking already very fast, I think we should favour fast matching here.
>
> Results:
>
>
> enchmark (codePoints) (size) Mode Cnt Score Error Units
> RegionMatchesIC.Latin1.regionMatchesIC ascii-match 1024 avgt 15 917.796 ± 20.285 ns/op
> RegionMatchesIC.Latin1.regionMatchesIC ascii-mismatch 1024 avgt 15 4.367 ± 0.348 ns/op
> RegionMatchesIC.Latin1.regionMatchesIC number-match 1024 avgt 15 399.656 ± 10.703 ns/op
> RegionMatchesIC.Latin1.regionMatchesIC number-mismatch 1024 avgt 15 4.361 ± 0.664 ns/op
> RegionMatchesIC.Latin1.regionMatchesIC lat1-match 1024 avgt 15 1384.443 ± 22.199 ns/op
> RegionMatchesIC.Latin1.regionMatchesIC lat1-mismatch 1024 avgt 15 4.119 ± 0.451 ns/op
Thanks for confirming
-------------
PR: https://git.openjdk.org/jdk/pull/12632
More information about the i18n-dev
mailing list