<i18n dev> RFR: 8041488: Locale-Dependent List Patterns [v8]
Roger Riggs
rriggs at openjdk.org
Fri Aug 18 20:06:38 UTC 2023
On Thu, 10 Aug 2023 17:31:28 GMT, Naoto Sato <naoto at openjdk.org> wrote:
>> Introducing a new formatting class for locale-dependent list patterns. The class is to provide the functionality from the Unicode Consortium's LDML specification for [list patterns](https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns). For example, given a list of String as "Monday", "Wednesday", "Friday", its `format` method would produce "Monday, Wednesday, and Friday" in US English. A CSR has also been drafted, and its draft javadoc can be viewed here: https://cr.openjdk.org/~naoto/JDK-8041488-ListPatterns-PR/api.00/java.base/java/text/ListFormat.html
>
> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
>
> Reflecting review comments
src/java.base/share/classes/java/text/ListFormat.java line 96:
> 94: * round-trip with the corresponding formatting. For example, a String list
> 95: * "a, b,", "c" will be formatted as "a, b, and c", but may be parsed as
> 96: * "a", "b", "c".
Suggestion:
* round-trip with the corresponding formatting. For example, a two element String list
* "a, b,", "c" will be formatted as "a, b, and c", but may be parsed as three elements
* "a", "b", "c".
src/java.base/share/classes/java/text/ListFormat.java line 240:
> 238: * is thrown.
> 239: * <p>
> 240: * Each pattern string is first parsed as follows. Patterns in parentheses are optional:
The characters in parentheses are arbitrary and literal strings; not really "patterns" in the regex sense.
src/java.base/share/classes/java/text/ListFormat.java line 246:
> 244: * end := {0}end_between{1}(end_after)
> 245: * two := (two_before){0}two_between{1}(two_after)
> 246: * three := (three_before){0}three_between{1}three_between{2}(three_after)
The use of "three_between" in 2 positions may mislead the developer to think they need to be the same string.
Perhaps "three_between1" and "three_between2"? or something to differentiate them.
src/java.base/share/classes/java/text/ListFormat.java line 250:
> 248: * If parsing of the pattern string for start/middle/end fails, it throws an
> 249: * {@code IllegalArgumentException}. If two/three pattern string is empty, or
> 250: * fails on parsing, it falls back to
The IllegalArgumentException is thrown for a parse failure on any of the start, middle, end, two, or three patterns.
The fallback is only if they are null or empty.
A more readable form of "start/middle/end" would be "start, middle, or end". Here and elsewhere in the javadoc.
src/java.base/share/classes/java/text/ListFormat.java line 262:
> 260: * n > 3: (start_before){0}start_between{1}middle_between{2} ... middle_between{m}end_between{n}(end_after)
> 261: * </pre></blockquote>
> 262: * As an example, the following table shows patterns array which is equivalent to
Suggestion:
* As an example, the following table shows a pattern array which is equivalent to
src/java.base/share/classes/java/text/ListFormat.java line 279:
> 277: * <tr><th scope="row" style="text-align:left">2</th>
> 278: * <td>"{0} and {1}"</td>
> 279: * <tr><th scope="row" style="text-align:left">3</th>
The use of different terms to identify pattern kinds of 2/two and 3/three may be confusing.
Unless it is referring to LDML, use "two" and "three" be used consistently.
src/java.base/share/classes/java/text/ListFormat.java line 304:
> 302: * @param patterns array of patterns, not null
> 303: * @throws IllegalArgumentException if the length {@code patterns} array is not 5, or
> 304: * any of {@code start}, {@code middle}, {@code end} patterns cannot be parsed.
Also throws if two and three patterns can not be parsed.
src/java.base/share/classes/java/text/ListFormat.java line 353:
> 351: return generateMessageFormat(a).format(a, toAppendTo, DontCareFieldPosition.INSTANCE);
> 352: } else {
> 353: throw new IllegalArgumentException("The object to format should be an Object list");
Suggestion:
throw new IllegalArgumentException("The object to format should be a List<Object> or List[] array");
src/java.base/share/classes/java/text/ListFormat.java line 364:
> 362: * delimiters. For example, a String list {@code "a, b,", "c"} will be
> 363: * formatted as {@code "a, b, and c"}, but may be parsed as
> 364: * {@code "a", "b", "c"}.
Suggestion:
* delimiters. For example, a two element String list {@code "a, b,", "c"} will be
* formatted as {@code "a, b, and c"}, but may be parsed as three elements
* {@code "a", "b", "c"}.
src/java.base/share/classes/java/text/ListFormat.java line 389:
> 387: * use all characters up to the end of the string), and the parsed
> 388: * object is returned. The updated {@code parsePos} can be used to
> 389: * indicate the starting point for the next call to this method.
The "this method" seems too specific to ListFormat.
Suggestion:
* indicate the starting point for the next call to parse additional text.
src/java.base/share/classes/java/text/ListFormat.java line 429:
> 427: // now try exact number patterns
> 428: parsed = new MessageFormat(patterns[TWO], locale).parseObject(source, parsePos);
> 429: if (parsed == null && !patterns[THREE].isEmpty()) {
patterns[THREE] should never be null/empty.
src/java.base/share/classes/java/text/ListFormat.java line 451:
> 449: @Override
> 450: public AttributedCharacterIterator formatToCharacterIterator(Object arguments) {
> 451: if (arguments instanceof List<?> objs) {
Should this method also support Object[].
And update the exception message below.
src/java.base/share/classes/java/text/ListFormat.java line 510:
> 508: } else {
> 509: yield new MessageFormat(createMessageFormatString(len), locale);
> 510: }
TWO and THREE patterns are always generated, so this if/else switch should not be needed.
src/java.base/share/classes/java/text/ListFormat.java line 591:
> 589: * Suitable for elements, such as, "M", "T", "W", etc.
> 590: */
> 591: NARROW
The examples here are misleading (without explaining the source of the values).
The ListFormat does NOT change the strings supplied, it can only affect the text and punctuation literals that appear before, between, or after the elements of the list.
Suggestion:
* The {@code FULL} list format style. This is the default style, which typically is the
* full description of the text and punctuation that appear between the list elements.
* Suitable for elements, such as, "Monday", "Tuesday", "Wednesday", etc.
*/
FULL,
/**
* The {@code SHORT} list format style. This style is typically an abbreviation
* of the text and punctuation that appear between the list elements.
* Suitable for elements, such as, "Mon", "Tue", "Wed", etc.
*/
SHORT,
/**
* The {@code NARROW} list format style. This style is typically the shortest description
* of the text and punctuation that appear between the list elements.
* Suitable for elements, such as, "M", "T", "W", etc.
*/
NARROW
test/jdk/java/text/Format/ListFormat/TestListFormat.java line 261:
> 259: assertEquals(input, f.parse(result));
> 260: }
> 261: }
There should be a test of `formatToCharacterIterator` method.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298790942
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298783724
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298781795
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298628165
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298784506
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298787272
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298791762
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298793675
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298794252
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298673908
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298798096
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298813568
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298642614
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298821495
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1298824618
More information about the i18n-dev
mailing list