Allowing apps to force sun.jnu.encoding = "UTF-8" on Windows

Naoto Sato naoto.sato at oracle.com
Tue Nov 5 17:48:54 UTC 2024


Hi Fabian,

On 11/5/24 12:52 AM, Fabian Meumertzheim wrote:
> On Mon, Nov 4, 2024 at 8:46 PM Naoto Sato <naoto.sato at oracle.com> wrote:
>> I am afraid that the risk that would be involved in configuring
>> sun.jnu.encoding exceeds the benefit it would bring, as the encoding is
>> so baked in the basis of the Windows Java runtime. Since Microsoft
>> itself now recommends users choose UTF-8 as the ANSI code page (over
>> changing apps to use -W APIs)[1], I think we would want to wait for that
>> glorious day.
>>
>> Naoto
>>
>> [1]
>> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page*-a-vs--w-apis__;Iw!!ACWV5N9M2RV99hQ!JcTCVhRAZCQZaQWCt8WQJ8oN31jpETS4danV6j-3PXtKlK9ffLVuPY0G-XEooSus0sCFYoCNx-dJNyyNSmdRzg$
> 
> My understanding of that page is that Microsoft recommends
> *application developers* to choose UTF-8 as the code page for their
> apps by adding a directive to their app manifest. While this works
> well for native applications, it doesn't directly apply to Java
> applications as the manifest is that of the java.exe launcher binary,
> which is necessarily static (and currently doesn't set the
> `activeCodePage` directive).

Yes, the article is for app developers, but my intention quoting that 
specific paragraph (-A vs -W) was to point out Microsoft's directional 
change:

```
Until recently, Windows has emphasized "Unicode" -W variants over -A 
APIs. However, recent releases have used the ANSI code page and -A APIs 
as a means to introduce UTF-8 support to apps. If the ANSI code page is 
configured for UTF-8, then -A APIs typically operate in UTF-8. This 
model has the benefit of supporting existing code built with -A APIs 
without any code changes.
```

This was a 180 degree direction change, which lets ANSI based apps 
(including Java launcher) work without any changes in apps side.

> 
> We could choose to rely on users switching to the UTF-8 codepage
> system-wide. This is possible as of the 1809 build of Windows 10, but
> is not the default, still marked as Beta in the latest version,
> requires admin privileges to enable, and can break other applications,
> even of other users. This may become the default some day, but it's
> unclear whether this will happen in the foreseeable future, especially
> since there is a backwards compatible alternative for native
> applications.

I cannot speak for MS, but I read the article as the day will still 
come, when UTF-8 becomes the default on Windows.

> 
> I understand that incrementally refactoring the Windows Java runtime
> until its encoding becomes configurable is too risky. Taking that into
> account, what do you think of offering an additional entrypoint for
> the Java launcher on Windows, say java-utf8.exe, that is identical to
> java.exe except that it specifies
> `<activeCodePage>UTF-8</activeCodePage>` in its app manifest? This
> would give users the desired opt-in behavior with no changes to the
> actual implementation of the Java runtime. (In fact, in my concrete
> use case, we are relying on this as a workaround by patching the
> manifest in java.exe with a tool [1].)

Yes, it would be possible if two launchers were provided. However, 
please note that it would also require the maintenance cost doubled. 
Some JDK distributors may be interested, but I am not sure it would be 
implemented in the OpenJDK Windows reference implementation.

Naoto


More information about the core-libs-dev mailing list