RFR: JDK-8263261 Extend String::translateEscapes to support unicode escapes [v12]
Roger Riggs
rriggs at openjdk.org
Fri Jan 26 17:04:38 UTC 2024
On Fri, 26 Jan 2024 15:06:50 GMT, Jim Laskey <jlaskey at openjdk.org> wrote:
>> Currently String::translateEscapes does not support unicode escapes, reported as a IllegalArgumentException("Invalid escape sequence: ..."). String::translateEscapes should translate unicode escape sequences to provide full coverage,
>
> Jim Laskey has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
>
> - Merge remote-tracking branch 'upstream/master' into 8263261
> - Update unicode to Unicode
> - Requested changes
> - Update String.java
> - Requested changes
> - Update Copyright
> - Update copyright year of test
> - Add JLS Unicode Escapes reference
> - Update comment
> - Update copyright year
> - ... and 2 more: https://git.openjdk.org/jdk/compare/b94b04ff...040bda82
src/java.base/share/classes/java/lang/String.java line 4229:
> 4227: * <th scope="row">{@code \u005Cu...uXXXX}</th>
> 4228: * <td>Unicode escape</td>
> 4229: * <td>single UTF-16 code unit equivalent</td>
The `...` makes it less clear what is being shown. It might be clearer to include the XXXX in the resulting value and drop the multiple `u` case.
src/java.base/share/classes/java/lang/String.java line 4245:
> 4243: * escape sequences and Unicode escapes are translated as encountered in one pass and
> 4244: * <strong>not</strong> done as an Unicode escapes pass followed by an escape sequences
> 4245: * pass.
I would move the description of the compiler behavior to the end and remove "also". For example,
Suggestion:
* @implNote As a convenience for use with constructed
* strings, this method translates Unicode escapes. For example, this
* method could be used when ASCII encoded text files need to maintain Unicode
* content. The translation is done in a single pass and is non-recursive. That is,
* escape sequences and Unicode escapes are translated as encountered in one pass and
* <strong>not</strong> done as an Unicode escapes pass followed by an escape sequences
* pass. By comparison, the compiler translates all Unicode escapes before string
* literals are translated.
test/jdk/java/lang/String/TranslateEscapes.java line 97:
> 95: verifyUnicodeEscape("\\u2022", "\u2022");
> 96: verifyUnicodeEscape("\\ud83c\\udf09", "\ud83c\udf09");
> 97: verifyUnicodeEscape("\\uuuuu2022", "\uuuuu2022");
Include the code from the example as a test case too.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467892757
PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467895901
PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467900516
More information about the core-libs-dev
mailing list