<i18n dev> RFR: 8291660: Grapheme support in BreakIterator

Naoto Sato naoto at openjdk.org
Thu Aug 25 20:40:02 UTC 2022


On Thu, 25 Aug 2022 03:52:48 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> This is to enhance the character break analysis in `java.text.BreakIterator` to conform to the extended grapheme cluster boundaries defined in https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries. A corresponding CSR has also been drafted, as there will be behavioral changes with this modification.
>
> src/java.base/share/classes/sun/util/locale/provider/BreakIteratorProviderImpl.java line 258:
> 
>> 256:                     .filter(i -> boundaries.get(i) > offset)
>> 257:                     .findFirst()
>> 258:                     .orElse(boundaries.size() - 1);
> 
> Is it worth trying to use Collections.binarySearch() here? I think the boundaries list is in ascending sorted order, so you might be able to drop in a binarySearch() call directly. (Need to be a bit careful with negative return values.)

Will replace this with `Collections.binarySearch()`.

> test/jdk/java/util/regex/whitebox/GraphemeTest.java line 30:
> 
>> 28:  * @library /lib/testlibrary/java/lang
>> 29:  * @compile --add-exports java.base/jdk.internal.util.regex=ALL-UNNAMED GraphemeTest.java
>> 30:  * @run testng/othervm --add-exports java.base/jdk.internal.util.regex=ALL-UNNAMED --add-opens java.base/jdk.internal.util.regex=ALL-UNNAMED GraphemeTest
> 
> Can you use the `@modules` directive to export+open the internal module to the test?

Good point. Will use `@modules` tag.

-------------

PR: https://git.openjdk.org/jdk/pull/9991


More information about the i18n-dev mailing list