RFR: JDK-8263261 Extend String::translateEscapes to support unicode escapes [v12]

Jim Laskey jlaskey at openjdk.org
Fri Jan 26 17:36:56 UTC 2024


On Fri, 26 Jan 2024 16:54:14 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> Jim Laskey has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'upstream/master' into 8263261
>>  - Update unicode to Unicode
>>  - Requested changes
>>  - Update String.java
>>  - Requested changes
>>  - Update Copyright
>>  - Update copyright year of test
>>  - Add JLS Unicode Escapes reference
>>  - Update comment
>>  - Update copyright year
>>  - ... and 2 more: https://git.openjdk.org/jdk/compare/af9bfd62...040bda82
>
> src/java.base/share/classes/java/lang/String.java line 4229:
> 
>> 4227:      *     <th scope="row">{@code \u005Cu...uXXXX}</th>
>> 4228:      *     <td>Unicode escape</td>
>> 4229:      *     <td>single UTF-16 code unit equivalent</td>
> 
> The `...` makes it less clear what is being shown.  It might be clearer to include the XXXX in the resulting value and drop the multiple `u` case.

Changed

> src/java.base/share/classes/java/lang/String.java line 4245:
> 
>> 4243:      * escape sequences and Unicode escapes are translated as encountered in one pass and
>> 4244:      * <strong>not</strong> done as an Unicode escapes pass followed by an escape sequences
>> 4245:      * pass.
> 
> I would move the description of the compiler behavior to the end and remove "also". For example, 
> Suggestion:
> 
>      * @implNote As a convenience for use with constructed
>      * strings, this method translates Unicode escapes. For example, this
>      * method could be used when ASCII encoded text files need to maintain Unicode
>      * content. The translation is done in a single pass and is non-recursive. That is,
>      * escape sequences and Unicode escapes are translated as encountered in one pass and
>      * <strong>not</strong> done as an Unicode escapes pass followed by an escape sequences
>      * pass. By comparison, the compiler translates all Unicode escapes before string
>      * literals are translated.

Changed

> test/jdk/java/lang/String/TranslateEscapes.java line 97:
> 
>> 95:         verifyUnicodeEscape("\\u2022", "\u2022");
>> 96:         verifyUnicodeEscape("\\ud83c\\udf09", "\ud83c\udf09");
>> 97:         verifyUnicodeEscape("\\uuuuu2022", "\uuuuu2022");
> 
> Include the code from the example as a test case too.

None present. Was a mis-paste.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467926349
PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467926483
PR Review Comment: https://git.openjdk.org/jdk/pull/17491#discussion_r1467929023


More information about the core-libs-dev mailing list