<i18n dev> RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v7]
Justin Lu
jlu at openjdk.org
Thu Feb 1 23:30:04 UTC 2024
On Tue, 30 Jan 2024 18:28:32 GMT, Archie Cobbs <acobbs at openjdk.org> wrote:
>> Please consider this fix to ensure that going from `MessageFormat` to pattern string via `toPattern()` and then back via `new MessageFormat()` results in a format that is equivalent to the original.
>>
>> The quoting and escaping rules for `MessageFormat` pattern strings are really tricky. I admit not completely understanding them. At a high level, they work like this: The normal way one would "nest" strings containing special characters is with straightforward recursive escaping like with the `bash` command line. For example, if you want to echo `a "quoted string" example` then you enter `echo "a "quoted string" example"`. With this scheme it's always the "outer" layer's job to (un)escape special characters as needed. That is, the echo command never sees the backslash characters.
>>
>> In contrast, with `MessageFormat` and friends, nested subformat pattern strings are always provided "pre-escaped". So to build an "outer" string (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or less just concatenated, and then only the `ChoiceFormat` option separator characters (e.g., `<`, `#`, `|`, etc.) are escaped.
>>
>> The "pre-escape" escaping algorithm escapes `{` characters, because `{` indicates the beginning of a format argument. However, it doesn't escape `}` characters. This is OK because the format string parser treats any "extra" closing braces (where "extra" means not matching an opening brace) as plain characters.
>>
>> So far, so good... at least, until a format string containing an extra closing brace is nested inside a larger format string, where the extra closing brace, which was previously "extra", can now suddenly match an opening brace in the outer pattern containing it, thus truncating it by "stealing" the match from some subsequent closing brace.
>>
>> An example is the `MessageFormat` string `"{0,choice,0.0#option A: {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a trailing closing brace in plain text. If you create a `MessageFormat` with this string, you see a trailing `}` only with the second option.
>>
>> However, if you then invoke `toPattern()`, the result is `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the "extra" closing brace is no longer quoted, it matches the opening brace at the beginning of the string, and the following closing brace, which was the previous match, is now just plain text in the outer `MessageFormat` string.
>>
>> As a result, invoking `f.format(new ...
>
> Archie Cobbs has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix misspelling in comment.
Left some minor comments, but LGTM, I also ran this change against internal tests and all is good there. Others may have suggestions, in any case, you will still need a review from someone with 'reviewer' status.
test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 26:
> 24: /*
> 25: * @test
> 26: * @summary Verify MessageFormat.toPattern() is equivalent to original pattern
Could make the summary a little more specific for future readers
test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 70:
> 68: @BeforeAll
> 69: public static void setup() {
> 70: savedLocale = Locale.getDefault();
I'm not sure we need to save the default locale and restore it, unless I'm missing something.
test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 104:
> 102: Arguments.of("{0,choice,0.0#option A: {0}|1.0#option B: {0}'}'}", "option B: 1.23}"),
> 103: Arguments.of("{0,choice,0.0#option A: {0}|2.0#option B: {0}'}'}", "option A: 1.23"),
> 104:
Suggestion:
// Absurd double quote examples
Arguments.of("Foo '}''''''''}' {0,number,bar'}' '}' } baz ", "Foo }''''} bar} } 1 baz "),
Arguments.of("'''}''{'''}''''}'"), "'}'{'}''}"),
test/jdk/java/text/Format/MessageFormat/MessageFormatsByArgumentIndex.java line 35:
> 33: import java.text.NumberFormat;
> 34:
> 35: public class MessageFormatsByArgumentIndex {
Can you bump the latter copyright year for this file
test/jdk/java/text/Format/MessageFormat/MessageRegression.java line 114:
> 112: * MessageFormat.toPattern has weird rounding behavior.
> 113: */
> 114: @Test
This file as well
-------------
Marked as reviewed by jlu (Committer).
PR Review: https://git.openjdk.org/jdk/pull/17416#pullrequestreview-1857862268
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475280475
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475279603
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475294193
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475277825
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475278147
More information about the i18n-dev
mailing list