<i18n dev> RFR: 8302877: Speed up latin1 case conversions

Eirik Bjorsnos duke at openjdk.org
Mon Feb 20 14:47:33 UTC 2023


This PR suggests we speed up Character.toUpperCase and Character.toLowerCase for latin1 code points by applying the 'oldest ASCII trick in the book'.

This takes advantage of the fact that latin1 uppercase code points are always 0x20 lower than their lowercase (with the exception of two code points which uppercase out of latin1).

To verify the correctness of the new implementation, the test `Latin1CaseConversion` is added with an exhaustive verification of toUpperCase/toLowerCase for all latin1 code points.

The implementation needs to balance the performance of the various ranges in latin1. An effort has been made to favour operations on ASCII code points, without causing excessive regression for higher code points.

Performance is benchmarked for 7 chosen sample code points, each representing a range or a special-case.  Results in the first comment.

-------------

Commit messages:
 - Add @bug tag to test
 - Improved whitespace alignment for Param and switch values
 - Correct spelling for "exhaustive"
 - Prefer the term "case conversion" over ""case folding". Refer to 0xB5 as 'Micro Sign' ("Mu" is the Unicode code point it uppercases to)
 - Improve comments for the two special-cased uppercase code points 'Micro Sign' and 'y with Diaeresis'
 - Adjust whitespace
 - Speed up Character.toUpperCase and Character.toLowerCase by applying the 'oldest ASCII trick in the book'

Changes: https://git.openjdk.org/jdk/pull/12623/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12623&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302877
  Stats: 164 lines in 3 files changed: 143 ins; 0 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/12623.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12623/head:pull/12623

PR: https://git.openjdk.org/jdk/pull/12623


More information about the i18n-dev mailing list