<i18n dev> RFR: 8248655: Support supplementary characters in String case insensitive operations

naoto.sato at oracle.com naoto.sato at oracle.com
Wed Jul 15 16:39:26 UTC 2020


Thank you, Jim, for the quick review!

On 7/15/20 9:26 AM, Jim Laskey wrote:
> I think I'm good with this. +1
> 
> Asides:
> 
>   325             int cp1 = (int)getChar(value, k1);
>   326             int cp2 = (int)getChar(other, k2);
> 
> I would be tempted to short cut by exiting when not equal, but I think we agreed we need to allow for upper/lowers on different planes.
>   
> In the UTF-16 code I was trying to think of how your could exhaust the first string and not the second, and still have their lengths the same. I think I have convinced myself that it's not possible as long as surrogates always map upper/lowers to surrogates (two chars each.)

Right. All code points as of JDK15/6 is in the same plane, thus the 
lengths won't change. I was trying to create a test case for that 
hypothetical situation, but gave up because each character case map is 
embedded in Unicode Character Database, which cannot be modified.

Naoto

> 
> Cheers,
> 
> -- Jim
> 
> 
> 
> 
> 
>> On Jul 15, 2020, at 1:00 PM, naoto.sato at oracle.com wrote:
>>
>> Hello,
>>
>> Please review the fix to the following issues:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8248655
>> https://bugs.openjdk.java.net/browse/JDK-8248434
>>
>> The proposed changeset and its CSR are located at:
>>
>> https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8248664
>>
>> A bug was filed against SimpleDateFormat (8248434) where case-insensitive date format/parse failed in some of the new locales in JDK15. The root cause was that case-insensitive String.regionMatches() method did not work with supplementary characters. The problem is that the method's spec does not expect case mappings of supplementary characters, possibly because it was overlooked in the first place, JSR 204 - "Unicode Supplementary Character support". Similar behavior is observed in other two case-insensitive methods, i.e., compareToIgnoreCase() and equalsIgnoreCase().
>>
>> The fix is straightforward to compare strings by code point basis, instead of code unit (16bit "char") basis. Technically this change will introduce a backward incompatibility, but I believe it is an incompatibility to wrong behavior, not true to the meaning of those methods' expectations.
>>
>> Naoto
> 


More information about the i18n-dev mailing list