<i18n dev> RFR: 8356980: Better handling of non-breaking space

Justin Lu jlu at openjdk.org
Wed May 14 17:21:54 UTC 2025


On Wed, 14 May 2025 16:59:23 GMT, Phil Race <prr at openjdk.org> wrote:

>> Non-breaking space characters are problematic. They look identical to the normal space character, but is not. For that reason, it should never be typed as an UTF-8 literal, but only by using unicode sequences.
>> 
>> I have checked:
>> * U+00A0 NO-BREAK SPACE (NBSP)
>> * U+202F NARROW NO-BREAK SPACE (NNBSP)
>> * U+2007 FIGURE SPACE
>> * U+2060 WORD JOINER
>> 
>> In some places, these character were used when an ordinary space should have been used. I replaced those with normal space. In other places, they were correct, but as literals instead of unicode sequences. I replaced those instances with sequences.
>
> src/java.desktop/share/classes/com/sun/java/swing/plaf/gtk/resources/gtk_fr.properties line 39:
> 
>> 37: GTKColorChooserPanel.hue.textAndMnemonic=&Teinte :
>> 38: 
>> 39: GTKColorChooserPanel.red.textAndMnemonic=Roug&e\u00a0:
> 
> So, this exactly reverses what was done in the fix for https://bugs.openjdk.org/browse/JDK-8301991
> But I think you know that ..  since you commented on the PR
> 
> The fix was done by @justin-curtis-lu and reviewed by
> @naotoj so I think I'd like to get their opinion on this

For the l10n files, they are synced by the translation team and we don't edit them. IMO, I think it's fine leaving those ones as is. Especially because language rules can cause different spacing and punctuation characters, so generally we don't ensure translations are equivalent to the original file's value in that regard. (So viewing them as a Unicode escape sequence vs UTF-8 literal may not bring much benefit.)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25234#discussion_r2089399946


More information about the i18n-dev mailing list