RFR: 8296246: Update Unicode Data Files to Version 15.1.0 [v3]

Joe Wang joehw at openjdk.org
Tue Sep 19 16:35:54 UTC 2023


On Mon, 18 Sep 2023 18:33:20 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> This PR is to incorporate the latest Unicode 15.1, which was released yesterday. Besides the usual character data update, an upgraded implementation of RegEx which reflects the Indic Conjunct Break specified in the latest [Unicode Annex #29 ("Unicode Text Segmentation")](https://unicode.org/reports/tr29/) is included. A corresponding CSR has also been drafted.
>
> Naoto Sato has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
> 
>  - Fix GensrcRegex.gmk
>  - Merge branch 'master' into JDK-8296246-Unicode15.1
>  - Update make/jdk/src/classes/build/tools/generateextraproperties/GenerateExtraProperties.java
>    
>    Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
>  - Update make/jdk/src/classes/build/tools/generateextraproperties/GenerateExtraProperties.java
>    
>    Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
>  - TR29 final version
>  - .md file update
>  - Final 8/28
>  - Draft 8/11
>  - GenerateExtraProperties tool
>  - initial commit

LGTM. A couple of minor suggestions.

make/jdk/src/classes/build/tools/generateextraproperties/GenerateExtraProperties.java line 41:

> 39: /**
> 40:  * Parses extra properties files of UCD, and replaces the placeholders in
> 41:  * the given template source file with the generated conditions, then emit

s/emit/emits?

make/jdk/src/classes/build/tools/generateextraproperties/GenerateExtraProperties.java line 51:

> 49:  * <blockquote>
> 50:  *     %%%Type(=Value)%%%
> 51:  * </blockquote>

Nice example to explain how it works! 
I wonder if it might be helpful to clarify that the () indicate optional. It could be slightly confused with being a part of the template.

make/modules/java.base/gensrc/GensrcRegex.gmk line 35:

> 33: INDICCONJUNCTBREAKTEMP := $(MODULE_SRC)/share/classes/jdk/internal/util/regex/IndicConjunctBreak.java.template
> 34: INDICCONJUNCTBREAKPROPS := $(MODULE_SRC)/share/data/unicodedata/DerivedCoreProperties.txt
> 35: INDICCONJUNCTBREAKPARAMS := InCB=Linker InCB=Extend InCB=Consonant

DerivedCoreProperties.txt is very large. Would it be helpful to reference the section, that is "Derived Property: Indic_Conjunct_Break", in this file or the template?

-------------

Marked as reviewed by joehw (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15728#pullrequestreview-1633706039
PR Review Comment: https://git.openjdk.org/jdk/pull/15728#discussion_r1330390702
PR Review Comment: https://git.openjdk.org/jdk/pull/15728#discussion_r1330408135
PR Review Comment: https://git.openjdk.org/jdk/pull/15728#discussion_r1330410613


More information about the core-libs-dev mailing list