JDK-8226810: An other case and a small change suggestion

naoto.sato at oracle.com naoto.sato at oracle.com
Fri May 8 21:19:13 UTC 2020


Hi Johannes,

On 5/8/20 1:37 PM, Johannes Kuhn wrote:
> Thanks.
> 
> I think strcpy(ret+2, "1252") vs. strcpy(ret, "Cp1252")  is a just 
> matter of style. I would prefer the later, as it makes the intent clear.
> But not my call.

I thought the former was clearer, as at that point, Cp/MS/GB part is not 
initialized in normal cases. It follows that pattern.

> 
> Do you have any info how I can change the detected codepage there? I 
> wrote a small C program that basically just does that part and printf it.
> In my limited tests (windows likes to require a restart after each 
> configuration change) I did not find a way to influence that.

Have you tried changing the Windows System Locale to Japanese? I am 
pretty sure the code will return MS932.

> 
> An other thing to consider is if Cp65001 should be treated as UTF-8 in 
> that function?
> (As said before, locale is not my expertise. Can that function with that 
> LCSID even return 65001?)
> I can see how things go wrong if it returns 65001 as locale, so... could 
> be a safe change? (I'm sure that things break if that function returns 
> 65001.)

Yes it should return UTF-8, which is not implemented. If the code page 
is 65001, then the following switch should put UTF-8 as the default charset.

> 
> Then there is the other part:
> The mismatch between the comment in jni_util.c/newSizedStringJava and 
> the implementation on the Java side.
> There is no fallback to iso-8859-1. If new String(byte[]) is called 
> before the system properties are installed, then this will lead to a 
> NullPointerException.
> And there is a code path that leads to exactly that - newPlatformString 
> is called from the initialization of the properties. [1]
> And if the encoding returned by the windows function is not supported, 
> then it will call new String(byte[]) - during system property 
> initialization.

I would expect there shouldn't be a mismatch, i.e., all the default 
system locale in Windows should return *known* default charset. 
Returning UTF-8 in java_props_md.c should resolve this.

Naoto

> 
> - Johannes
> 
> [1]: 
> https://hg.openjdk.java.net/jdk/jdk/file/d40d865753fb/src/java.base/share/native/libjava/System.c#l207
> 
> On 08-May-20 18:27, naoto.sato at oracle.com wrote:
>> Ditto. Good catch!
>>
>> I am not sure the fix would address the issue in 8226810 (cannot 
>> confirm it either, as my Windows box is at my office where I cannot 
>> enter at the moment :-), but this definitely looks like a bug. I would 
>> change the additional line to "strcpy(ret+2, "1252");" as Cp is copied 
>> in the following switch.
>>
>> Naoto
>>
>>
>>
>> On 5/7/20 5:50 AM, Alan Bateman wrote:
>>> On 07/05/2020 12:37, Johannes Kuhn wrote:
>>>> :
>>>>
>>>> In the end, I don't know what causes the bug, or how I can replicate 
>>>> it.
>>>> I think I did find a likely suspect.
>>> Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
>>> fail but it does look like that would return possibly non-terminated 
>>> garbage starting with "CP" so we should at least fix that.
>>>
>>> The issue in JDK-8226810 might be something else. One of the 
>>> submitters to that issue did engage and provided enough information 
>>> to learn that the locale is zh_CN and also reported that it was 
>>> failing for GB18030. GB18030 is not in java.base so that at least 
>>> explained that report.
>>>
>>> -Alan
> 
> 


More information about the core-libs-dev mailing list