Is setting -Dsun.jnu.encoding from the command line supposed to work?
Xueming Shen
xueming.shen at oracle.com
Fri Dec 11 18:14:54 UTC 2015
On 12/11/2015 09:53 AM, Xueming Shen wrote:
> Don't do it, that's all I would suggest :-) same as "file.encoding", they are
> both supposed to be informative read-only system property.
>
> Here is the history of sun.jnu.encoding
> http://ccc.us.oracle.com/4958170
>
My apology, forgot the "ccc" is still an internal site. Here is the copy/paste of
the history. It's mainly to solve the Windows' user/system locale setting issue.
It is supposed to be an informative system property to help the jvm to
communicate with the underlying os with the appropriate encoding. Its value
should be absolutely obtained from the system setting not set from the command
line. While it might be desired in some use scenario to have the file.encoding set
to a different value, I'm not convinced that the sun.jnu.encoding need to be
customized
-sherman
--------------------------------------------------------------------------------------------------------------
Problem
This request addresses the following 3 bugs.
4958170: javaw does not retrieve the user locale
4891531: javaw and java get different default locale from OS
4989534: Regional / Language setting improperly detected with a multi-user PC
Windows has 2 locale settings called User Locale and System Locale.
According to Microsoft's documentation System Locale is a legacy
compatibility mode rather than a true locale and User Locale is what
Windows really use for formatting dates, times, currentcy and large
numbers. JDK's documentation has expressly stated that these
two must be the same to be supported, but users do not always follow
this and JDK is internally inconsistent if these settings are
different depending on how java is launched.
Also an incorrect change to javaw in 1.4.2 to attempt to fix one
observed inconsistent case of different System Locale and User
Locale setting created yet another internal inconsistency.
These JDK bugs and inconsistencies need to be resolved.
Background:
(1)From the very beginning we have 2 groups of end user on W2k/XP who
have different(opposite) opinions on how the java default locale
should be set, based on their different use scenarios. One group
believes the default java locale should be the same as what Windows
System Locale is and other prefers that it should be the same as
Windows User Locale.
Until recently we have been purposely avoiding to face this issue
directly (because the reality that java runtime is NOT a real win32
Unicode app) by insisting the SUPPORTED use scenario is to set both
UserLocale and SystemLocale to the same locale/language on W2K/XP.
The related doc is at
http://java.sun.com/j2se/1.4.2/docs/guide/intl/locale.doc.html#jfc
However we've started to see more and more bugs get filed to complain
this restriction (guess this is mostly because more and more apps/
users have been migrating to W2K/XP, which is a real multilingual
environment compared to previous OSs) and we believe we need to make
it clear in Tiger that the java default locale should be set solely
based on Windows User Locale setting in ControlPanel's RegionSetting,
from the perspective of what User Locale is defined on Windows and
what java default Locale is defined on Java platform.
You can find the official MS definition of System Locale, User Locale
and Thread Locale at
http://www.microsoft.com/globaldev/reference/localetable.mspx
The highlights of some concepts we care are:
(a)Windows Locale
A locale is either a language or a language in combination with
a country that a user wants to use for formatting dates, times,
currency, and large numbers.
(b)Windows User Locale
The user locale determines, which default settings a user wants
to use for formatting dates, times, currency, and large numbers.
(c)Windows System Locale
The system locale is not really a locale. It determines which
codepages (ANSI, DOS, and Macintosh) are used on the system by
default.
...
If it weren't be so long the system locale should be called
legacy applications compatibility setting, because that is
really what it is.
(d)Windows Thread Locale
The thread locale defaults to the user locale and determines the
formatting dates, times, currency, and large numbers for the
thread. It can be changed programmatically using the API
SetThreadLocale. In most cases the thread locale should not be
overwritten
And "java locale" is defined as
A Locale object represents a specific geographical, political, or
cultural region. An operation that requires a Locale to perform its
task is called locale-sensitive and uses the Locale to tailor
information for the user. For example, displaying a number is a
locale-sensitive operation--the number should be formatted according
to the customs/conventions of the user's native country, region, or
culture.
So it's clear that Windows System Locale actually has nothing to do
with locale and java locale at all, java locale definitely should
not derived from Windows System Locale setting.
Having said that the fact java runtime is a "Ansi" non-unicode app
forces java runtime to still have heavy dependency on Windows
System Locale setting. More specifically the def of system property
"file.encoidng" and some of its usage in j2se's implementation
(jni_util.c), currently this system property is derived from the
same setting of what the default java locale comes from. It's not
a problem when System Locale and User locale are the same, but when
these 2 setting are incompatible, some usages of "file.encoding"
will be problematic (Considering when you system locale is Japanese
and user locale is English and you, as a Ansi app, try to speak
with the System by using English).
The list of some of these user locale/System Locale setting related
bugs is at
http://javaweb.sfbay/~sherman/Win32_locale_setting_bugs.html
(2)The "Inconsistency" of Windows API doc and Windows implementation
of GetThreadLocale() when launching app from Dos/Command Prompt as
a console app (in java's case, launch java from "java")
Based on Windows document, GetThreadLocale() should return the
Locale of this thread and it should be the "User Locale", it's
true for most of the cases. But in scenario like
-System Locale is set to a DBCS locale (like Japanese or Chinese)
-User Locale is set to a SBCS locale (like English)
-Start the app from Dos Prompt (as a concole app)
The GetThreadLocale() will not return the value set in User Locale
(English) but what set in System Locale (Japanese), my guess is
that this is because the Dos Prompt/Command Prompt itself is a
Ansi/non-Uniocde app and it is doing something special when the
system locale is a DBCS locale.
(3)Starting from JDK1.4.2 the default locale of java runtime will be
set to different values when launched from "javaw" or "java" on
W2K and XP, if the User Locale and System Locale settings of the
underlying Windows system are different. More specifically, if java
runtime is launched from "javaw", the default locale will be set
based on Windows System Locale setting and if launched from "java"
the java default locale will be set based on Windows User Locale
setting. This causes big confusion for developers and end users who
work on multilingual W2K/XP environment. The direct trigger of
#4891531 is the fix for bug#4629351 which in launcher's win32
java_md.c we added one line code in WinMain
SetThreadLocale(GetSystemDefaultLCID());
to force the java runtime's thread locale to be Windows System
Locale, the result of this change is that the java default locale
will be set to whatever set in Windows System Locale. But this line
of code will never be executed if we launch java runtime through
"java" command, this is where the inconsistency comes from.
Solution
To the fullest extent possible, make the User Locale the default
locale used by the JRE. The basic fix for this is to call the correct
Windows API and use that as the basis for setting the Java locale and
system property file.encoding. This will make the windows behavior
consistent with the experience on Solaris and Linux.
However since some parts of the JRE are still a "windows legacy
application" that cannot always use the user locale (ie not all
of the JRE is a unicode application in the windows meaning of the
term) then some additional measures need to be, so the proposed
fixes are
(1)Removed the inappropriate fix of #4629351 (from java_md.c). The
java default locale (language, country) and the file.encoding
system property will be solely based on what the User Locale is
(use GetUserDefaultLCID() in java_props.md).
(2)We will not change our public position in Tiger to claim official
support for runtime environment with different (incompatibal)
System Locale and User Locale setting on XP/W2K because we are
still running java runtime as a non-Unicode app and the sequence
that there are still issues left in some places that can not be
solved before we migrate vm and launcher code to a pure
unicode-based implementaiton
(3)Introduce in a new "internal use only" system property
sun.jnu.encoding
to be used in jni_util.c to replace the file.encoding in method
"initializeEncoding(JNIEnv *env)", this change will affect 2
jni_util.c methods on Windows
NewStringPlatform
GetPlatformStringChar
Compared to file.encoding, the sun.jnu.encoding property will be
set based on what Windows System Locale is on Windows platforms.
On Solaris/Linux this priority will remain the same as
file.encoding.
The reasons why we need to introduce this new property and use it
in jni_util.c are
(1)NewStringPlatform and GetPlatformStringChar are 2 methods used
internally by j2se impl to do the text encoding conversion when
need to talk with underlying platform (again, this is because our
runtime is not a win32 Unicode app). This "platform encoding" must
be the encoding that matches what the System Locale setting is. It
does not make any sense (and it does not work either) to use the
encoding derived from the User Locale setting, the Windows system
will simply not understand it if System Locale and User Locale
setting are not compatible.
(2)We want to have file.encoding derived from what the Windows User
Locale setting is (when have different System Locale and User
Locale settings), file.encoding which will mostly be used to set
the default encoding for "contents" of input/output stream should
always match what the default java locale is (which is from User
Locale setting now).
You can find all usages of NewStringPlatform and GetPlatformString
inside j2se at
http://javaweb.sfbay/~sherman/NewStringPlatform_GetStringPlatformChar
and the usage of "file.encoding" at
http://javaweb.sfbay/~sherman/file_encoding
The webrev for proposed fix for your reference is at
http://javaweb.sfbay/~sherman/Webrevs/webrev_4891531_4958170
(regression test cases will be added)
Interface summary
exported private property file.encoding will be solely derived from User Locale setting on Windows
internal property sun.jnu.encoding will be solely derivied from System Locale setting on Windows
imported external other GetUserDefaultLCID, GetSystemDefaultLCID, GetThreadLocale
exported external method java.util.Locale.getDefault() and java.nio.charset.Charset.defaultCharset()
Specification
(1)sun.jnu.encoding:
An internal use only system property that derived from what Windows
System Locale setting is on Windows platform. It has the same value
as file.encoding on Solaris and Linux platform.
(2)java.util.Locale.getDefault()
java.nio.charset.Charset.defaultCharset()
We never publicly documented how the values returned by these 2
methods are derived from the host env setting and still are not
going to document it. However, the return values of these 2 methods
are now consistently derived on Windows from the user locale
as reported by GetUserDefaultLCID.
(3)file.encoding
We never publicly documented how this property value is derived from the
host environment settings and still are not going to document it. However,
this value is now consistently derived on Windows from the user locale as
reported by GetUserDefaultLCID.
Compatibility risk: low
First of all the proposed solution should NOT have any compatibility
impact to our "official supported" use scenario in which the Windows
System locale and User locale are set to the same language. What we
are trying to do is to make the java runtime behave more consistent
in scenarios that we dont officially support but more and more end
users are running into. In these scenarios our launcher "java" and
"javaw" have severe inconsistent behavior which I think we have to
pay a price to fix. Followings are the impact that we know we will
have with the proposed solution.
(1)file.encoding will always be derived from User Locale setting.
Compared to current impl, file.encoding in proposed change will
always be derived from user locale setting, this will cause a
incompatible behavior to some apps (if they depend on the
assumption that file.encoding will always be derived from System
Locale setting)when the System Locale and User Locale settings
are not "compatible" (even this is not an official supported
scenario on Java/JDK i18n document mentioned before) But given
the fact that our current impl already breaks this assumption in
some env scenarios and have severe inconsistency here and there,
I believe this is a decision/choice we must to make and a risk we
have to face with, if we want to address the inconsistency
(2)concern of adding a sun.jnu.encoding and using it instead of
file.encoding in jni_util.c
The win32 FileSystem and awt has been migrated to MSLU, so there
is no compatibility issue in this 2 big chunks. Other than that
j2se currently does not have heavily dependency on these 2
methods, I've scanned all places (see url above) where these 2
jni_util methods are being used, my conclusion is that it makes
more sense to use the encoding from "System Locale" setting
instead of the "User Locale" setting in all of the places. The
only places that I would have little concern of compatibility are
(1)in System.c when converting those system properties such as
usr.name, usr.dir from platform encoding to uncode encoding and
(2)in launcher code when parsing the command line args to
unicode encoded args for java main class, in scenario like
-System Locale is set to English
-User Locale is set to Japanese
-Someone tries to pass in a command line option/properties/flag
in Japanese like -Dxxx=yyyy (which yyyy is in Japanese)
Since the "sun.jnu.encoding" now is "English"/cp1252, this Japanese
option/flag/properties will not be converted correctly into java
runtime.
But given the fact the Windows System Locale now is "English
/cp1252", even you are able to input the Japanese from command
line (in this case the "dos prompt app" is in English/cp1252 mode,
it does not work well with Japanese, though you can use
copy/paste...), this Japanese text has been collapsed already even
before java gets a hand on it, means the text in args from C main
and from getenv is collapsed already. So this should not be an
issue. It does not work anyway, even in current impl.
3)What we have in 1.4.1, 1.4.2 and in proposed 1.5 when the Windows
settings are different.
a)UserLocale=Japanese/SystemLocale=English
java Java file.encoding sun.jnu.encoding
launcher Locale property property
1.4.1/java Ja MS932
1.4.1/javaw ja MS932
1.4.2/java ja MS932
1.4.2/javaw en Cp1252
1.5.0/java ja MS932 Cp1252
1.5.0/javaw ja MS932 Cp1252
b)UserLocale=English/SystemLocale=Japanese
java Java file.encoding sun.jnu.encoding
launcher Locale property property
1.4.1/java ja MS932
1.4.1/javaw en Cp1252
1.4.2/java ja MS932
1.4.2/javaw ja MS932
1.5.0/java en Cp1252 MS932
1.5.0/javaw en Cp1252 MS932
More information about the core-libs-dev
mailing list