[EXTERNAL] Re: Error if argument passed to java.exe contains non-ASCII character

Stephanie Crater scrater at microsoft.com
Wed Jun 22 18:32:31 UTC 2022


Thanks for pointing me to that recent fix! I’ve confirmed it solves the issue. As the problem is still present on jdk17 and jdk11, I’ll seek a backport to solve the issue there as well.

Thanks again,
Stephanie

From: Naoto Sato <naoto.sato at oracle.com>
Date: Friday, June 17, 2022 at 8:52 AM
To: Alan Bateman <Alan.Bateman at oracle.com>, Stephanie Crater <scrater at microsoft.com>, jdk-dev at openjdk.org <jdk-dev at openjdk.org>
Subject: [EXTERNAL] Re: Error if argument passed to java.exe contains non-ASCII character
[You don't often get email from naoto.sato at oracle.com. Learn why this is important at https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!ACWV5N9M2RV99hQ!OAcZxSsN57unJxy_yyH8PtHnFL2b7iYCNWiRw2t9JvULGbTCTDQRh_dM2eLLIZL1AGG8E80Hmz_0_78Z6m9t$  ]

IIRC, the issue relates to the JVM invocation interface, which takes CLI
options as `char *` in platform's encoding. So even if the launcher is
UNICODE enabled, it ends up passing the arguments in platform
characters, i.e., back to `???`s to the created JVM.

Having said that, recently Microsoft has chnaged its direction on
supporting Unicode apps:

https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fdocs.microsoft.com*2Fen-us*2Fwindows*2Fapps*2Fdesign*2Fglobalizing*2Fuse-utf8-code-page&data=05*7C01*7Cscrater*40microsoft.com*7C8b1f421416cc4eb6adf508da50795cfa*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637910779447463057*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*7C*7C*7C&sdata=M8qfsixgxLMcX51GgG5mpU7iGKz6Stdi1snup6vxg68*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!OAcZxSsN57unJxy_yyH8PtHnFL2b7iYCNWiRw2t9JvULGbTCTDQRh_dM2eLLIZL1AGG8E80Hmz_0_0PHY0PR$ 

where it reads:

```
Until recently, Windows has emphasized "Unicode" -W variants over -A
APIs. However, recent releases have used the ANSI code page and -A APIs
as a means to introduce UTF-8 support to apps. If the ANSI code page is
configured for UTF-8, -A APIs typically operate in UTF-8. This model has
the benefit of supporting existing code built with -A APIs without any
code changes.
```

So they are now recommending using ANSI interface, with UTF-8 as the
system default encoding. This perfectly fits our situation. With the fix
recently made (https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.org*2Fbrowse*2FJDK-8272352&data=05*7C01*7Cscrater*40microsoft.com*7C8b1f421416cc4eb6adf508da50795cfa*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637910779447463057*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*7C*7C*7C&sdata=1yMQpreCl*2FtZEojoFOBrpnotCQH7aBk2s08*2BfjdsNJ4*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!OAcZxSsN57unJxy_yyH8PtHnFL2b7iYCNWiRw2t9JvULGbTCTDQRh_dM2eLLIZL1AGG8E80Hmz_0__WntQ41$ ), JDK now is
capable of dealing with non-ASCII paths on Windows with UTF-8 system
encoding.

HTH,
Naoto


On 6/17/22 7:55 AM, Alan Bateman wrote:
> On 17/06/2022 00:53, Stephanie Crater wrote:
>>
>> Hi,
>>
>> I've been investigating an error on Windows in which compilation fails
>> when the SDK file path includes a Chinese character (and more broadly,
>> if any string argument passed to java.exe contains a non-ASCII
>> character). This happens because command line arguments are read using
>> GetCommandLine() [1]. In the Windows file processenv.h, this resolves
>> to GetCommandLineW [2] if UNICODE is defined and GetCommandLineA [3]
>> otherwise. As UNICODE is not defined, GetCommandLineA is used and
>> Chinese characters on the command line are converted to "?", causing
>> the following:
>>
>> Compilation failed with an internal error.
>>
>> Exception in thread "main" java.nio.file.InvalidPathException: Illegal
>> char <?> at index 34: C:\Program Files
>> (x86)\Android\SDK????\platforms\android-31\android.jar
>>
>> This error has been reported before, including JDK-8124977 [4]
>> (describes command line encoding challenges on Windows, created in
>> 2015 and still unresolved)
>>
>
> Someone in Microsoft did propose a patch in 2015 on this. It lead to an
> 8 month discussion on the issues/implications (the core-libs-dev archive
> from 2015 and 2016). Several things have changed since then, including
> moving to UTF-8 by default and defining system properties for the native
> and console encoding. I don't disagree that it may be time to look at
> this again. The core-libs-dev mailing list is the right place rather
> than jdk-dev.
>
> -Alan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20220622/8f8b0709/attachment.htm>


More information about the jdk-dev mailing list