Allowing apps to force sun.jnu.encoding = "UTF-8" on Windows
Naoto Sato
naoto.sato at oracle.com
Tue Nov 5 17:48:54 UTC 2024
Hi Fabian,
On 11/5/24 12:52 AM, Fabian Meumertzheim wrote:
> On Mon, Nov 4, 2024 at 8:46 PM Naoto Sato <naoto.sato at oracle.com> wrote:
>> I am afraid that the risk that would be involved in configuring
>> sun.jnu.encoding exceeds the benefit it would bring, as the encoding is
>> so baked in the basis of the Windows Java runtime. Since Microsoft
>> itself now recommends users choose UTF-8 as the ANSI code page (over
>> changing apps to use -W APIs)[1], I think we would want to wait for that
>> glorious day.
>>
>> Naoto
>>
>> [1]
>> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page*-a-vs--w-apis__;Iw!!ACWV5N9M2RV99hQ!JcTCVhRAZCQZaQWCt8WQJ8oN31jpETS4danV6j-3PXtKlK9ffLVuPY0G-XEooSus0sCFYoCNx-dJNyyNSmdRzg$
>
> My understanding of that page is that Microsoft recommends
> *application developers* to choose UTF-8 as the code page for their
> apps by adding a directive to their app manifest. While this works
> well for native applications, it doesn't directly apply to Java
> applications as the manifest is that of the java.exe launcher binary,
> which is necessarily static (and currently doesn't set the
> `activeCodePage` directive).
Yes, the article is for app developers, but my intention quoting that
specific paragraph (-A vs -W) was to point out Microsoft's directional
change:
```
Until recently, Windows has emphasized "Unicode" -W variants over -A
APIs. However, recent releases have used the ANSI code page and -A APIs
as a means to introduce UTF-8 support to apps. If the ANSI code page is
configured for UTF-8, then -A APIs typically operate in UTF-8. This
model has the benefit of supporting existing code built with -A APIs
without any code changes.
```
This was a 180 degree direction change, which lets ANSI based apps
(including Java launcher) work without any changes in apps side.
>
> We could choose to rely on users switching to the UTF-8 codepage
> system-wide. This is possible as of the 1809 build of Windows 10, but
> is not the default, still marked as Beta in the latest version,
> requires admin privileges to enable, and can break other applications,
> even of other users. This may become the default some day, but it's
> unclear whether this will happen in the foreseeable future, especially
> since there is a backwards compatible alternative for native
> applications.
I cannot speak for MS, but I read the article as the day will still
come, when UTF-8 becomes the default on Windows.
>
> I understand that incrementally refactoring the Windows Java runtime
> until its encoding becomes configurable is too risky. Taking that into
> account, what do you think of offering an additional entrypoint for
> the Java launcher on Windows, say java-utf8.exe, that is identical to
> java.exe except that it specifies
> `<activeCodePage>UTF-8</activeCodePage>` in its app manifest? This
> would give users the desired opt-in behavior with no changes to the
> actual implementation of the Java runtime. (In fact, in my concrete
> use case, we are relying on this as a workaround by patching the
> manifest in java.exe with a tool [1].)
Yes, it would be possible if two launchers were provided. However,
please note that it would also require the maintenance cost doubled.
Some JDK distributors may be interested, but I am not sure it would be
implemented in the OpenJDK Windows reference implementation.
Naoto
More information about the core-libs-dev
mailing list