Possible optimization in StringLatin1.regionMatchesCI

Claes Redestad claes.redestad at oracle.com
Tue May 26 22:03:10 UTC 2020


So to try and clarify:

if (Character.toLowerCase(u1) == Character.toLowerCase(u2))

... can never happen today in the context of the StringLatin1 version
of regionMatchesCI (I did a quick check), and a test that exhaustively
tests this property holds should ensure any future unicode updates
doesn't trip us (unlikely -- but not theoretically impossible).

I think we can go ahead with this.

/Claes

On 2020-05-26 18:27, Martin Buchholz wrote:
> On Tue, May 26, 2020 at 4:07 AM Christoph Dreis
> <christoph.dreis at freenet.de> wrote:
>>
>> Hi Martin,
>>
>> > Not a review, but:
>>> Compare with the variant of this code in StringUTF16.
>>> StringLatin1 only ever needs to support the first 256 chars in Unicode
>>
>> Does it really? That makes me wonder even more about the additional lowercase check.
>>
>>> which can never change, unlike StringUTF16,
>>
>> What do you mean by "can never change"?
> 
> When we discover sentient life on Titan, their script needs to get
> added to Unicode.  But the first 256 chars are already fully
> allocated; the Titans will be given empty space elsewhere.  Hopefully
> Unicode won't be clogged by a million emojis at that point.
> 
> There's a real fear of eszett capitalization changing. After centuries
> of debate the German Sprachbund will finally decide to (wisely!)
> abolish eszett, but Liechtenstein will be the only holdout insisting
> that eszett be capitalized to
> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E
> 
> Fortunately the code we are reviewing here is Locale-independent, and
> so is hopefully immune to the future politics of Liechtenstein.
> 


More information about the core-libs-dev mailing list