<i18n dev> RFR: 8337077: Java uses wrong Charset in System.out when running on MINGW

Tue Aug 20 16:34:04 UTC 2024

Hello,

As I commented in the bug report, I closed the issue as "not an issue", 
as LC_* environment values on Windows has never been supported (or even 
considered) as a means to set locale/encoding as the way POSIX does. It 
would create some inconsistent state between Windows' locale settings 
which may cause some unexpected behavior in applications.

 > The "Use Unicode UTF-8 for worldwide language support" is in beta state
 > for a very long time and for several major versions of Windows and
 > Microsoft doesn't seem to have any plan to make it production ready and
 > enabled by default.

I cannot speak for Microsoft, but they seem to have switched the 
direction to apart from -W api to -A api with UTF-8 code page, and 
reccommend setting the codepage to UTF-8 for Unix like applications:
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis

---
Use UTF-8 character encoding for optimal compatibility between web apps 
and other *nix-based platforms (Unix, Linux, and variants), minimize 
localization bugs, and reduce testing overhead.
---

For these two reasons, I am not sure it's worth enhancing the charset to 
support LC_* environment values on Windows.

Naoto

On 8/20/24 5:59 AM, Rostislav Krasny wrote:
> Hello,
> 
> I'm the original author of the JDK-8337077 bug report. I reported in 
> through your web site and have no account to comment it in the https:// 
> bugs.openjdk.org/browse/JDK-8337077 <https://bugs.openjdk.org/browse/ 
> JDK-8337077>
> 
> This bug report was closed by Naoto Sato as "Not an Issue" about a month 
> ago without any discussion. I disagree with the closing reasons Naoto 
> has written in his comment in that bug report.
> 
> The "Use Unicode UTF-8 for worldwide language support" is in beta state 
> for a very long time and for several major versions of Windows and 
> Microsoft doesn't seem to have any plan to make it production ready and 
> enabled by default. Also this Windows capability has nothing in common 
> with my bug report and could be used as a workaround only.
> 
> When you enable that beta UTF-8 support you enable it in the windows 
> console and not in the MINGW console. The MINGW console supports UTF-8 
> by default regardless of that option.
> 
> The right solution/fix should be as following:
> 
> 1. JRE should check the OSTYPE environment variable to identify that it 
> is running inside an MSYS2 console/environment.
> 2. In case the OSTYPE equals "msys" JVM is running under MSYS2  and the 
> right encoding of the current console should be retrieved from the LC_* 
> environment variables, for example from LC_CTYPE. In my case 
> LC_CTYPE="en_GB.UTF-8" meaning the right encoding is UTF-8.
> 3. That retrieved encoding should be used during initialization of both 
> System.out and System.err instead of the usually different encoding that 
> is reported by Windows directly.
> 
> Currently JVM uses a not relevant (in case of MINGW console) method of 
> console encoding identification. In most cases it brings wrong encoding 
> from a not related Windows configuration.
> 
> Please reopen the JDK-8337077 bug report and make a real fix, i.e. add 
> support for MSYS2/MINGW consoles.
> 
> I'm almost sure there is the same issue when Cygwin is used. In the case 
> of Cygwin you should check that OSTYPE equals "cygwin" and the rest is 
> the same.
> 
> By default Windows has no OSTYPE environment variable defined.
> 
> MSYS2 also defines an MSYSTEM environment variable that identifies a 
> sud-type of MSYS2 (MINGW32, MINGW64, UCRT64, MSYS2, etc.) but the 
> console is the same and configured similarly in all sub-types of MSYS2. 
> Windows itself also doesn't have the MSYSTEM environment variable 
> defined by default and MSYS2 always defines both OSTYPE and MSYSTEM by 
> itself.