RFR: 8365675: Add String Unicode Case-Folding Support [v11]
Naoto Sato
naoto at openjdk.org
Tue Dec 2 18:49:44 UTC 2025
On Tue, 2 Dec 2025 18:35:40 GMT, Xueming Shen <sherman at openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/lang/CaseFolding.java.template line 69:
>>
>>> 67: * | 1:2 mapping | 0002 | 0000 | xxxx | xxxx | FB02 => 0066 006C
>>> 68: * +---+---------+--------+---------+--------+--------+
>>> 69: * | 1:3 mapping | 0003 | xxxx | xxxx | xxxx | FB03 => 0066 0066 0069
>>
>> What if 1:2/3 mappings included non-BMP case folded forms?
>
> 1:2 should be fine, we still have enough bits available. 1:3 will be more challenging, but in theory 21-bit x 3 = 63. we still have the msb to indicate it's 3 non-bmp. That said, I assume we might simply fallback to the char/int[] mode when the 'flag' byte indicates 0004 for 1:2 non-bmp or 0006 for 1:3 non-bmp, for example. I don't think we need to worry much about the performance for those 'special' cases, if the standard does add such mappings.
Yeah, it is non-existent as of now, so the performance would not be an issue even if those cases were introduced in the future.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2582428584
More information about the core-libs-dev
mailing list