core-libs-dev Digest, Vol 210, Issue 718

Fabian Meumertzheim fabian at buildbuddy.io
Thu Oct 31 16:12:45 UTC 2024


> This has been discussed when we did JEP 400: UTF-8 by Default and
> decided not to do it, mainly because it affects filename/path encoding.
> Changing `sun.jnu.encoding` apart from Windows system encoding will make
> apps not being able to access those files/directories (e.g. home
> directory) if the path/name contains characters with different encodings.

Based on grepping the source, it looks like the JDK (almost?)
exclusively uses the -W Windows APIs to interface with the file
system, with the active code page only being relevant for the internal
conversion between Java strings and platform UTF-16 strings through
`MultiByteToWideChar` and `WideCharToMultiByte` (via `CP_ACP`).

If my understanding is correct, wouldn't it be an option to
conditionally replace all usages of `CP_ACP` with `CP_UTF8` while
simultaneously setting `sun.jnu.encoding` to UTF-8? I envision this as
being equivalent to adding `<activeCodePage>UTF-8</activeCodePage>` to
the Java launchers app manifest, but in a way that's toggleable and
thus doesn't break applications that don't handle charsets other than
the codepage they have been running with so far.

I'm still trying to wrap my head around all the implications this may
have, so apologies if I'm missing any. I mostly want to make sure that
I understand what exactly could go wrong if anyone were to try this.
The upside is definitely there.


More information about the core-libs-dev mailing list