<i18n dev> Case insensitive regexes and collators

Naoto Sato naoto.sato at oracle.com
Fri Sep 11 16:33:57 UTC 2020


Hi David,

Glad to hear that you are delighted with the recent fix (JDK-8248655). 
The scope of the fix is limited to the String class, so it may or may 
not affect the said RegEx and/or Collator case insensitive operations. I 
created the following two issues to track your observations:

https://bugs.openjdk.java.net/browse/JDK-8253058
https://bugs.openjdk.java.net/browse/JDK-8253059

And happy to take a look at them.

PS. "jdk-dev" is for the technical discussion related to the "JDK 
Project", so I'd recommend choosing either core-libs and/or i18n-dev 
mailing lists for the further discussion.

Naoto

On 9/10/20 3:52 PM, Dai Conrad wrote:
> I was delighted to hear the longstanding problem with
> case-insensitive comparisons of strings with astral
> characters (ones outside the basic multilingual plane)
> was fixed in JDK 16 build 8. Methods equalsIgnoreCase,
> regionMatches, and compareToIgnoreCase all work
> correctly now.
> 
> I had assumed this would also fix case-insensitive regular
> expressions and java.text.Collator, since I guessed they
> boiled down to a call to regionMatches somewhere under
> the covers. But this appears not to be the... case.
> 
> For scripts Deseret, Osage, Old Hungarian, Warang Citi,
> Medefaidrin, and Adlam, for strings with upper- and
> lowercase variants of the same letter, the following
> code fails:
> 
> Pattern pattern = Pattern.compile(lower, Pattern.CASE_INSENSITIVE);
> Matcher matcher = pattern.matcher(upper);
> assertThat(matcher.matches()).isTrue();
> 
> Collator collator = Collator.getInstance();
> collator.setStrength(Collator.PRIMARY);
> assertThat(collator.compare(lower, upper)).isEqualTo(0);
> 
> I'm not sure why the fix didn't fix these, but it would be
> a shame to overlook them while fixing it in other places.
> 
> David
> 


More information about the i18n-dev mailing list