Possible optimization in StringLatin1.regionMatchesCI
Claes Redestad
claes.redestad at oracle.com
Tue May 26 22:03:10 UTC 2020
So to try and clarify:
if (Character.toLowerCase(u1) == Character.toLowerCase(u2))
... can never happen today in the context of the StringLatin1 version
of regionMatchesCI (I did a quick check), and a test that exhaustively
tests this property holds should ensure any future unicode updates
doesn't trip us (unlikely -- but not theoretically impossible).
I think we can go ahead with this.
/Claes
On 2020-05-26 18:27, Martin Buchholz wrote:
> On Tue, May 26, 2020 at 4:07 AM Christoph Dreis
> <christoph.dreis at freenet.de> wrote:
>>
>> Hi Martin,
>>
>> > Not a review, but:
>>> Compare with the variant of this code in StringUTF16.
>>> StringLatin1 only ever needs to support the first 256 chars in Unicode
>>
>> Does it really? That makes me wonder even more about the additional lowercase check.
>>
>>> which can never change, unlike StringUTF16,
>>
>> What do you mean by "can never change"?
>
> When we discover sentient life on Titan, their script needs to get
> added to Unicode. But the first 256 chars are already fully
> allocated; the Titans will be given empty space elsewhere. Hopefully
> Unicode won't be clogged by a million emojis at that point.
>
> There's a real fear of eszett capitalization changing. After centuries
> of debate the German Sprachbund will finally decide to (wisely!)
> abolish eszett, but Liechtenstein will be the only holdout insisting
> that eszett be capitalized to
> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E
>
> Fortunately the code we are reviewing here is Locale-independent, and
> so is hopefully immune to the future politics of Liechtenstein.
>
More information about the core-libs-dev
mailing list