<i18n dev> RFR: 8248655: Support supplementary characters in String case insensitive operations

Joe Wang huizhe.wang at oracle.com
Wed Jul 15 17:57:17 UTC 2020


Hi Naoto,

In StringUTF16.java, if one is isHighSurrogate and the other not, you 
may quickly return without going through the rest of the process, 
probably not significant as cp1 and cp2 and/or u1 and u2 won't be equal 
anyways. But it could skip a couple of 
toCodePoint/toUpperCase/toLowerCase calls.

-Joe

On 7/15/20 9:00 AM, naoto.sato at oracle.com wrote:
> Hello,
>
> Please review the fix to the following issues:
>
> https://bugs.openjdk.java.net/browse/JDK-8248655
> https://bugs.openjdk.java.net/browse/JDK-8248434
>
> The proposed changeset and its CSR are located at:
>
> https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8248664
>
> A bug was filed against SimpleDateFormat (8248434) where 
> case-insensitive date format/parse failed in some of the new locales 
> in JDK15. The root cause was that case-insensitive 
> String.regionMatches() method did not work with supplementary 
> characters. The problem is that the method's spec does not expect case 
> mappings of supplementary characters, possibly because it was 
> overlooked in the first place, JSR 204 - "Unicode Supplementary 
> Character support". Similar behavior is observed in other two 
> case-insensitive methods, i.e., compareToIgnoreCase() and 
> equalsIgnoreCase().
>
> The fix is straightforward to compare strings by code point basis, 
> instead of code unit (16bit "char") basis. Technically this change 
> will introduce a backward incompatibility, but I believe it is an 
> incompatibility to wrong behavior, not true to the meaning of those 
> methods' expectations.
>
> Naoto



More information about the i18n-dev mailing list