RFR: 8365675: Add String Unicode Case-Folding Support [v11]

Naoto Sato naoto at openjdk.org
Tue Dec 2 18:49:44 UTC 2025


On Tue, 2 Dec 2025 18:35:40 GMT, Xueming Shen <sherman at openjdk.org> wrote:

>> src/java.base/share/classes/jdk/internal/lang/CaseFolding.java.template line 69:
>> 
>>> 67:     *  | 1:2 mapping |  0002  |   0000  |  xxxx  |  xxxx  |  FB02 => 0066 006C
>>> 68:     *  +---+---------+--------+---------+--------+--------+
>>> 69:     *  | 1:3 mapping |  0003  |   xxxx  |  xxxx  |  xxxx  |  FB03 => 0066 0066 0069
>> 
>> What if 1:2/3 mappings included non-BMP case folded forms?
>
> 1:2 should be fine, we still have enough bits available. 1:3 will be more challenging,  but in theory 21-bit x 3 = 63. we still have the msb to indicate it's 3 non-bmp.  That said, I assume we might simply fallback to the char/int[] mode when the 'flag' byte indicates 0004 for 1:2 non-bmp or 0006 for 1:3 non-bmp, for example.  I don't think we need to worry much about the performance for those 'special' cases, if the standard does add such mappings.

Yeah, it is non-existent as of now, so the performance would not be an issue even if those cases were introduced in the future.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2582428584


More information about the core-libs-dev mailing list