<i18n dev> RFR: 8365675: Add String Unicode Case-Folding Support [v2]

Xueming Shen sherman at openjdk.org
Wed Oct 8 00:33:20 UTC 2025


> ### Summary
> 
> Case folding is a key operation for case-insensitive matching (e.g., string equality, regex matching), where the goal is to eliminate case distinctions without applying locale or language specific conversions.
> 
> Currently, the JDK does not expose a direct API for Unicode-compliant case folding. Developers now rely on methods such as:
> 
> **String.equalsIgnoreCase(String)**
> 
> - Unicode-aware, locale-independent.
> - Implementation uses Character.toLowerCase(Character.toUpperCase(int)) per code point.
> - Limited: does not support 1:M mapping defined in Unicode case folding.
> 
> **Character.toLowerCase(int) / Character.toUpperCase(int)**
> 
> - Locale-independent, single code point only.
> - No support for 1:M mappings.
> 
> **String.toLowerCase(Locale.ROOT) / String.toUpperCase(Locale.ROOT)**
> 
> - Based on Unicode SpecialCasing.txt, supports 1:M mappings.
> - Intended primarily for presentation/display, not structural case-insensitive matching.
> - Requires full string conversion before comparison, which is less efficient and not intended for structural matching.
> 
> **1:M mapping example, U+00DF (ß)**
> 
> - String.toUpperCase(Locale.ROOT, "ß") → "SS"
> - Case folding produces "ss", matching Unicode caseless comparison rules.
> 
> 
> jshell> "\u00df".equalsIgnoreCase("ss")
> $22 ==> false
> 
> jshell> "\u00df".toUpperCase(Locale.ROOT).toLowerCase(Locale.ROOT).equals("ss")
> $24 ==> true
> 
> 
> ### Motivation & Direction
> 
> Add Unicode standard-compliant case-less comparison methods to the String class, enabling & improving reliable and efficient Unicode-aware/compliant case-insensitive matching.
> 
> - Unicode-compliant **full** case folding.
> - Simpler, stable and more efficient case-less matching without workarounds.
> - Brings Java's string comparison handling in line with other programming languages/libraries.
> 
> This PR proposes to introduce the following comparison methods in `String` class
> 
> - boolean equalsFoldCase(String anotherString)
> - int compareToFoldCase(String anotherString)
> - Comparator<String> UNICODE_CASEFOLD_ORDER
> 
> These methods are intended to be the preferred choice when Unicode-compliant case-less matching is required.
> 
> *Note: An early draft also proposed a String.toCaseFold() method returning a new case-folded string.
> However, during review this was considered error-prone, as the resulting string could easily be mistaken for a general transformation like toLowerCase() and then passed into APIs where case-folding semantics are not appropriate.
> 
> ### The New API
> 
> See CSR https://bugs.openjd...

Xueming Shen has updated the pull request incrementally with one additional commit since the last revision:

  minor api doc updates

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27628/files
  - new: https://git.openjdk.org/jdk/pull/27628/files/1abb0228..9d9997dc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27628&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27628&range=00-01

  Stats: 18 lines in 1 file changed: 5 ins; 2 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/27628.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27628/head:pull/27628

PR: https://git.openjdk.org/jdk/pull/27628


More information about the i18n-dev mailing list