Possible optimization in StringLatin1.regionMatchesCI

Martin Buchholz martinrb at google.com
Tue May 26 16:27:55 UTC 2020


On Tue, May 26, 2020 at 4:07 AM Christoph Dreis
<christoph.dreis at freenet.de> wrote:
>
> Hi Martin,
>
> > Not a review, but:
> > Compare with the variant of this code in StringUTF16.
> > StringLatin1 only ever needs to support the first 256 chars in Unicode
>
> Does it really? That makes me wonder even more about the additional lowercase check.
>
> > which can never change, unlike StringUTF16,
>
> What do you mean by "can never change"?

When we discover sentient life on Titan, their script needs to get
added to Unicode.  But the first 256 chars are already fully
allocated; the Titans will be given empty space elsewhere.  Hopefully
Unicode won't be clogged by a million emojis at that point.

There's a real fear of eszett capitalization changing. After centuries
of debate the German Sprachbund will finally decide to (wisely!)
abolish eszett, but Liechtenstein will be the only holdout insisting
that eszett be capitalized to
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

Fortunately the code we are reviewing here is Locale-independent, and
so is hopefully immune to the future politics of Liechtenstein.


More information about the core-libs-dev mailing list