<i18n dev> RFR: 8302877: Speed up latin1 case conversions [v2]
Eirik Bjorsnos
duke at openjdk.org
Tue Feb 21 09:37:28 UTC 2023
On Tue, 21 Feb 2023 06:59:47 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
>> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase for latin1 code points by applying the 'oldest ASCII trick in the book'.
>>
>> This takes advantage of the fact that latin1 uppercase code points are always 0x20 lower than their lowercase (with the exception of two code points which uppercase out of latin1).
>>
>> To verify the correctness of the new implementation, the test `Latin1CaseConversion` is added with an exhaustive verification of toUpperCase/toLowerCase for all latin1 code points.
>>
>> The implementation needs to balance the performance of the various ranges in latin1. An effort has been made to favour operations on ASCII code points, without causing excessive regression for higher code points.
>>
>> Performance is benchmarked for 7 chosen sample code points, each representing a range or a special-case. Results in the first comment.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
>
> Spell fix for 'exhaustive' in comments in sun/text/resources
A site note: Early and crude experiements using the Vector API indicate that the 'oldest ASCII trick in the book' vectorizes pretty well.
Here's a benchmark comparing the strings "helloworld" and "HelloWorld" repeated 1024 times, followed by either 'A' or 'B' (to force a an expensive mismatch):
Benchmark (size) Mode Cnt Score Error Units
EqualsIgnoreCase.scalar 1024 avgt 15 6225.624 ± 89.182 ns/op
EqualsIgnoreCase.vectorized 1024 avgt 15 1246.110 ± 14.767 ns/op
I have the feeling that most case-insensitive comparisons are pretty short, so not sure how useful this is IRL.
-------------
PR: https://git.openjdk.org/jdk/pull/12623
More information about the i18n-dev
mailing list