Case insensitive regexes and collators

Dai Conrad dai.conrad at gmail.com
Thu Sep 10 22:52:29 UTC 2020


I was delighted to hear the longstanding problem with
case-insensitive comparisons of strings with astral
characters (ones outside the basic multilingual plane)
was fixed in JDK 16 build 8. Methods equalsIgnoreCase,
regionMatches, and compareToIgnoreCase all work
correctly now.

I had assumed this would also fix case-insensitive regular
expressions and java.text.Collator, since I guessed they
boiled down to a call to regionMatches somewhere under
the covers. But this appears not to be the... case.

For scripts Deseret, Osage, Old Hungarian, Warang Citi,
Medefaidrin, and Adlam, for strings with upper- and
lowercase variants of the same letter, the following
code fails:

Pattern pattern = Pattern.compile(lower, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(upper);
assertThat(matcher.matches()).isTrue();

Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);
assertThat(collator.compare(lower, upper)).isEqualTo(0);

I'm not sure why the fix didn't fix these, but it would be
a shame to overlook them while fixing it in other places.

David


More information about the jdk-dev mailing list