JDK-8226810: An other case and a small change suggestion
Johannes Kuhn
info at j-kuhn.de
Mon May 11 11:10:55 UTC 2020
On 11-May-20 10:59, Alan Bateman wrote:
>
>
> On 10/05/2020 15:47, Johannes Kuhn wrote:
>> :
>>
>> After the discussion with Naoto, I would like to change the one line
>> to strcpy(ret + 2, "1252")
>>
>> diff --git a/src/java.base/windows/native/libjava/java_props_md.c
>> b/src/java.base/windows/native/libjava/java_props_md.c
>> --- a/src/java.base/windows/native/libjava/java_props_md.c
>> +++ b/src/java.base/windows/native/libjava/java_props_md.c
>> @@ -73,6 +73,7 @@
>> LOCALE_IDEFAULTANSICODEPAGE,
>> ret+2, 14) == 0) {
>> codepage = 1252;
>> + strcpy(ret + 2, "1252");
>> } else {
>> codepage = atoi(ret+2);
>> }
>>
>> I have already signed the OCA.
>> This would be my first patch. Is there anything else you need?
>>
>> I will take a further look into GetLocaleInfo, and try to find a
>> model that works for me (currently thinking about blackbox, can
>> return anything from [1]),
>> and then discussing how to handle all those cases.
> It's not clear how conditions are necessary for GetLocaleInfo to fail
> but good to fix this code path. We should probably create a new issue
> in JIRA for this as it's not clear that it is the same issue as
> JDK-8226810. We've been chasing JDK-8226810 for some time but have not
> been able to find a system where it duplicates. The reports so far
> suggests is Windows Chinese but most of the people that report the
> issue to not respond to questions so we can't be sure. One of the
> reports (see my comment in the bug from June 2019) suggests the code
> page is 54936, useful information as GB18030 is not in java.base. Once
> your fix is in then I think we should try to improve the exception
> message rather than NPE. There may be some of these cases where it
> should default to UTF-8.
>
> -Alan.
>
I agree.
Looking through the JDK code, the IMHO best way to create a hook for
this is Charset.defaultCharset().
Instead of using GetPropertyAction, it could retrieve
System.getProperties and do a null check there. This has probably the
least impact on performance as this code path is only executed once.
Indeed, it could (theoretically) emit a warning/debug message there with
debug information and return sun.nio.cs.UTF_8.INSTANCE as fallback
without setting defaultCharset.
But that is a hard problem. I don't really like adding a warning to java
startup, even if it would not work before just to get debug information.
On the other hand, just continuing as if nothing did happen could hide
further bugs. (Also, how would I get the debug info from the Java side,
as properties are not yet initialized.)
I wrote a small C program which just outputs the result of
GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...).
I was not able to influence that result - on my machine it's always 1252.
My current understanding of codepages is basically: They are like encodings.
They come in two variants (?): ANSI and non-ANSI. The ansi codepages use
ANSI for bytes between 0 and 127.
Each system, application or thread (wtf? see CP_THREAD_ACP) can have a
different current codepage.
Some windows API functions (the *A variants) use the current codepage
(as returned by GetACP) to translate the string from/to unicode.
GetACP returns the codepage identifier for the operating system. (No
mention of thread in the doc for that, so why is there even CP_THREAD_ACP?)
It's a rabbit hole.
If someone knows about a way to influence the result of
GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...),
then at least reproducing the error would become possible.
- Johannes
More information about the core-libs-dev
mailing list