<i18n dev> RFR: 8356978: Convert unicode sequences in Java source code to UTF-8 [v3]

Mon Jun 9 14:25:55 UTC 2025

On Mon, 9 Jun 2025 13:41:10 GMT, Magnus Ihse Bursie <ihse at openjdk.org> wrote:

>> After we converted the source base to be fully UTF-8, we do not need to use unicode sequences (like \u0123) in string literals. Sometimes, that might still make sense, as for control characters, non-breaking space, etc. But for strings that is supposed to be a coherent text in a language that needs non-ASCII parts of Unicode, this is not so. Instead, having the sequences makes the text just harder to read and edit. We have already removed several such sequences before, but some remains.
>
> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Reverting fixes in java.xml and jdk.jdi

src/demo/share/jfc/Stylepad/HelloWorld.java line 196:

> 194:             + "ασπαζον"
> 195:             + "ται υμα"
> 196:             + "ς!")

Maybe we can merge the string now:
Suggestion:

            new Run("none", "Αθηναι ασπαζονται υμας!") // Greek

At least two words seem to be split between the wrapped lines.

src/demo/share/jfc/Stylepad/HelloWorld.java line 203:

> 201:         new Paragraph("title", new Run[] {
> 202:             new Run("none", "שלום מירו"
> 203:             + "שלים")

Suggestion:

            new Run("none", "שלום מירושלים")

src/jdk.localedata/share/classes/sun/text/resources/ext/FormatData_ja.java line 90:

> 88:             { "japanese.FirstYear",
> 89:                 new String[] {  // first year name
> 90:                     "元",   // "Gan"-nen

Suggestion:

                    "元",       // "Gan"-nen

Preserve comment alignment?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25229#discussion_r2135817539
PR Review Comment: https://git.openjdk.org/jdk/pull/25229#discussion_r2135819248
PR Review Comment: https://git.openjdk.org/jdk/pull/25229#discussion_r2135821435