<i18n dev> RFR: JDK-8325898: ChoiceFormat returns erroneous result when formatting bad pattern

Thu Feb 22 17:41:53 UTC 2024

On Tue, 20 Feb 2024 23:14:26 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> Please review this PR which handles an edge case pattern bug with ChoiceFormat.
>> 
>> 
>> var d = new ChoiceFormat("0#foo|1#bar|baz|") // creates cFmt equivalent to "0.0#foo|1.0#bar|1.0#"
>> d.format(1) // unexpectedly returns ""
>> 
>> 
>> Not only does this lead to faulty formatting results, but breaks the ChoiceFormat class invariant of duplicate limits.
>> 
>> It would have been preferable if either an exception was thrown when the ChoiceFormat was initially created, or the ChoiceFormat formatting 1 returned a value of "bar".
>> 
>> For comparison,
>> 
>> var d = new ChoiceFormat("0#foo|1#bar|baz") // creates cFmt equivalent to "0.0#foo|1.0#bar"
>> d.format(1) // returns "bar"
>> 
>> 
>> After this change, the first code snippet now returns "bar" when formatting 1 and discards the "baz" as the second code snippet does.
>> 
>> 
>> This PR includes a lot of cleanup to the applyPattern implementation, the specific fix for this bug is addressed with the following, on line 305,
>> 
>> if (seg != Segment.FORMAT) {
>>     // Discard incorrect portion and finish building cFmt
>>     break;
>> }
>> 
>> This prevents a limit/format entry from being built when the current parsing mode has not yet processed the limit and relation. The previous implementation would build an additional limit/format of 1 with an empty string. The `break` allows for the discarding of the incorrect portion and continuing. Alternatively an exception could have been thrown. For consistency with the second code snippet, the former was picked.
>> 
>> https://github.com/openjdk/jdk/pull/17856 may be of interest, which is a semi-related specification fix.
>
> src/java.base/share/classes/java/text/ChoiceFormat.java line 332:
> 
>> 330: 
>> 331:     // Used to explicitly define the segment mode while applying a pattern
>> 332:     private enum Segment {
> 
> Do we need a new enum? Would introducing a new enum solve something that cannot be done with the existing implementation?

No, we don't _need_ a new enum, although I prefer it for readability.

For example, I think the following is more clear:
- `Segment.LIMIT.bldr.toString()`over `segments[1].toString();`.
- `seg = Segment.LIMIT;` over `part = 0;`

Can also no longer do something such as `part = 2;` even if unlikely.

Although I can see the argument of not wanting an enum as `part` is either only `0` or `1`. Either is OK with me.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17883#discussion_r1499626656