<i18n dev> RFR: 8303018: Unicode Emoji Properties [v2]

Roger Riggs rriggs at openjdk.org
Tue Mar 14 21:46:46 UTC 2023


On Tue, 14 Mar 2023 15:49:56 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> Proposing accessor methods to Emoji properties defined in [Unicode Technical Standard #51](https://unicode.org/reports/tr51/) in `java.lang.Character` class. This is per a request from the client group, as well as refining the currently existing ad-hoc emoji implementation in regex. A CSR has also been drafted, and I would appreciate reviews for it too.
>
> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fixed method descriptions

Looks good.  
There are opportunities to modernize the code, but maybe separately.

make/jdk/src/classes/build/tools/generatecharacter/EmojiData.java line 99:

> 97:             case "Emoji_Component" -> EMOJI_COMPONENT;
> 98:             case "Extended_Pictographic" -> EXTENDED_PICTOGRAPHIC;
> 99:             default -> throw new InternalError();

It would be useful to include the "type" as the exception argument. It give some idea as to the corruption or missing case.

make/jdk/src/classes/build/tools/generatecharacter/GenerateCharacter.java line 214:

> 212:         maskEmojiModifierBase = 0x020000000000L,
> 213:         maskEmojiComponent  = 0x040000000000L,
> 214:         maskExtendedPictographic = 0x080000000000L;

It would be good to leverage a common definition (perhaps a bit number) here and in EmojiData.java
and build the constants with <<< shifts.

make/jdk/src/classes/build/tools/generatecharacter/GenerateCharacter.java line 810:

> 808:         if (x.equals("maskEmojiModifierBase")) return "0x" + hex4(maskEmojiModifierBase >> 32);
> 809:         if (x.equals("maskEmojiComponent")) return "0x" + hex4(maskEmojiComponent >> 32);
> 810:         if (x.equals("maskExtendedPictographic")) return "0x" + hex4(maskExtendedPictographic >> 32);

An upgrade would be to modify hex4(), hexNN() to use `HexFormat.of().toUpperCase().toHexDigits((short)xxx)`
The HexFormat is reusable and would avoid creating extra strings.
Perhaps also create a method that combines the repetitive shift and prefixing.

This if...then... sequence could be an expression switch (x) {...}.

-------------

Marked as reviewed by rriggs (Reviewer).

PR: https://git.openjdk.org/jdk/pull/13006


More information about the i18n-dev mailing list