RFR: 8311220: Optimization for StringLatin UpperLower [v4]

Sun Sep 3 17:37:40 UTC 2023

On Sun, 3 Sep 2023 12:33:18 GMT, Claes Redestad <redestad at openjdk.org> wrote:

> The two odd codepoints I was curious about are `0xaa` and `0xba`, both of which are lower-case according to `Character.isLowerCase(..)` but does not actually have an uppercase. The Unicode data categorize these two as `Lo`, Letter, other, so I'm a little confused how they got tagged as lowercase.
> 
> `Character.toUpperCaseEx` is specified as adhering to the definition of the unicode data (unlike some legacy java character definition that might differ subtly) so perhaps it's reasonable to specify this newly invented `isLowerCaseEx` as strictly adhering to the unicode data in which case I think `0xaa` and `0xbb` should not be considered lower case. I am not a domain expert and would like someone like @naotoj to weigh in here. But either way we should think about how to specify this kind of method to keep it precise. Even if it's only internal code..
> 
> I suggested `hasUpperCase` (or maybe `hasUpperCaseEx`) as a way out of this particular conundrum, since it makes perfect sense to define a method named like that to be equivalent to `return cp != CharacterDataLatin1.instance.toUpperCaseEx(cp);`

i have renamed isLowerCaseEx to hasNotUpperCaseEx, is this ok?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14751#issuecomment-1704360024