<i18n dev> RFR: 8337077: Java uses wrong Charset in System.out when running on MINGW
Naoto Sato
naoto.sato at oracle.com
Tue Aug 20 16:34:04 UTC 2024
Hello,
As I commented in the bug report, I closed the issue as "not an issue",
as LC_* environment values on Windows has never been supported (or even
considered) as a means to set locale/encoding as the way POSIX does. It
would create some inconsistent state between Windows' locale settings
which may cause some unexpected behavior in applications.
> The "Use Unicode UTF-8 for worldwide language support" is in beta state
> for a very long time and for several major versions of Windows and
> Microsoft doesn't seem to have any plan to make it production ready and
> enabled by default.
I cannot speak for Microsoft, but they seem to have switched the
direction to apart from -W api to -A api with UTF-8 code page, and
reccommend setting the codepage to UTF-8 for Unix like applications:
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis
---
Use UTF-8 character encoding for optimal compatibility between web apps
and other *nix-based platforms (Unix, Linux, and variants), minimize
localization bugs, and reduce testing overhead.
---
For these two reasons, I am not sure it's worth enhancing the charset to
support LC_* environment values on Windows.
Naoto
On 8/20/24 5:59 AM, Rostislav Krasny wrote:
> Hello,
>
> I'm the original author of the JDK-8337077 bug report. I reported in
> through your web site and have no account to comment it in the https://
> bugs.openjdk.org/browse/JDK-8337077 <https://bugs.openjdk.org/browse/
> JDK-8337077>
>
> This bug report was closed by Naoto Sato as "Not an Issue" about a month
> ago without any discussion. I disagree with the closing reasons Naoto
> has written in his comment in that bug report.
>
> The "Use Unicode UTF-8 for worldwide language support" is in beta state
> for a very long time and for several major versions of Windows and
> Microsoft doesn't seem to have any plan to make it production ready and
> enabled by default. Also this Windows capability has nothing in common
> with my bug report and could be used as a workaround only.
>
> When you enable that beta UTF-8 support you enable it in the windows
> console and not in the MINGW console. The MINGW console supports UTF-8
> by default regardless of that option.
>
> The right solution/fix should be as following:
>
> 1. JRE should check the OSTYPE environment variable to identify that it
> is running inside an MSYS2 console/environment.
> 2. In case the OSTYPE equals "msys" JVM is running under MSYS2 and the
> right encoding of the current console should be retrieved from the LC_*
> environment variables, for example from LC_CTYPE. In my case
> LC_CTYPE="en_GB.UTF-8" meaning the right encoding is UTF-8.
> 3. That retrieved encoding should be used during initialization of both
> System.out and System.err instead of the usually different encoding that
> is reported by Windows directly.
>
> Currently JVM uses a not relevant (in case of MINGW console) method of
> console encoding identification. In most cases it brings wrong encoding
> from a not related Windows configuration.
>
> Please reopen the JDK-8337077 bug report and make a real fix, i.e. add
> support for MSYS2/MINGW consoles.
>
> I'm almost sure there is the same issue when Cygwin is used. In the case
> of Cygwin you should check that OSTYPE equals "cygwin" and the rest is
> the same.
>
> By default Windows has no OSTYPE environment variable defined.
>
> MSYS2 also defines an MSYSTEM environment variable that identifies a
> sud-type of MSYS2 (MINGW32, MINGW64, UCRT64, MSYS2, etc.) but the
> console is the same and configured similarly in all sub-types of MSYS2.
> Windows itself also doesn't have the MSYSTEM environment variable
> defined by default and MSYS2 always defines both OSTYPE and MSYSTEM by
> itself.
More information about the i18n-dev
mailing list