Is setting -Dsun.jnu.encoding from the command line supposed to work?

Xueming Shen xueming.shen at oracle.com
Fri Dec 11 18:14:54 UTC 2015


On 12/11/2015 09:53 AM, Xueming Shen wrote:
> Don't do it, that's all I would suggest :-)  same as "file.encoding", they are
> both supposed to be informative read-only system property.
>
> Here is the history of sun.jnu.encoding
> http://ccc.us.oracle.com/4958170
>

My apology, forgot the "ccc" is still an internal site. Here is the copy/paste of
the history. It's mainly to solve the Windows' user/system locale setting issue.
It is supposed to be an informative system property to help the jvm to
communicate with the underlying os with the appropriate encoding. Its value
should be absolutely obtained from the system setting not set from the command
line. While it might be desired in some use scenario to have the file.encoding set
to a different value, I'm not convinced that the sun.jnu.encoding need to be
customized

-sherman

--------------------------------------------------------------------------------------------------------------
Problem
This request addresses the following 3 bugs.

4958170: javaw does not retrieve the user locale
4891531: javaw and java get different default locale from OS
4989534: Regional / Language setting improperly detected with a multi-user PC

Windows has 2 locale settings called User Locale and System Locale.
According to Microsoft's documentation System Locale is a legacy
compatibility mode rather than a true locale and User Locale is what
Windows really use for formatting dates, times, currentcy and large
numbers. JDK's documentation has expressly stated that these
two must be the same to be supported, but users do not always follow
this and JDK is internally inconsistent if these settings are
different depending on how java is launched.

Also an incorrect change to javaw in 1.4.2 to attempt to fix one
observed inconsistent case of different System Locale and User
Locale setting created yet another internal inconsistency.

These JDK bugs and inconsistencies need to be resolved.

Background:

(1)From the very beginning we have 2 groups of end user on W2k/XP who
have different(opposite) opinions on how the java default locale
should be set, based on their different use scenarios. One group
believes the default java locale should be the same as what Windows
System Locale is and other prefers that it should be the same as
Windows User Locale.
Until recently we have been purposely avoiding to face this issue
directly (because the reality that java runtime is NOT a real win32
Unicode app) by insisting the SUPPORTED use scenario is to set both
UserLocale and SystemLocale to the same locale/language on W2K/XP.

The related doc is at
http://java.sun.com/j2se/1.4.2/docs/guide/intl/locale.doc.html#jfc

However we've started to see more and more bugs get filed to complain
this restriction (guess this is mostly because more and more apps/
users have been migrating to W2K/XP, which is a real multilingual
environment compared to previous OSs) and we believe we need to make
it clear in Tiger that the java default locale should be set solely
based on Windows User Locale setting in ControlPanel's RegionSetting,
from the perspective of what User Locale is defined on Windows and
what java default Locale is defined on Java platform.

You can find the official MS definition of System Locale, User Locale
and Thread Locale at
http://www.microsoft.com/globaldev/reference/localetable.mspx

The highlights of some concepts we care are:
   (a)Windows Locale
      A locale is either a language or a language in combination with
      a country that a user wants to use for formatting dates, times,
      currency, and large numbers.

   (b)Windows User Locale
      The user locale determines, which default settings a user wants
      to use for formatting dates, times, currency, and large numbers.

   (c)Windows System Locale
      The system locale is not really a locale. It determines which
      codepages (ANSI, DOS, and Macintosh) are used on the system by
      default.
      ...
      If it weren't be so long the system locale should be called
      legacy applications compatibility setting, because that is
      really what it is.

   (d)Windows Thread Locale
      The thread locale defaults to the user locale and determines the
      formatting dates, times, currency, and large numbers for the
      thread. It can be changed programmatically using the API
      SetThreadLocale. In most cases the thread locale should not be
      overwritten

And "java locale" is defined as

A Locale object represents a specific geographical, political, or
cultural region. An operation that requires a Locale to perform its
task is called locale-sensitive and uses the Locale to tailor
information for the user. For example, displaying a number is a
locale-sensitive operation--the number should be formatted according
to the customs/conventions of the user's native country, region, or
culture.

So it's clear that Windows System Locale actually has nothing to do
with locale and java locale at all, java locale definitely should
not derived from Windows System Locale setting.

Having said that the fact java runtime is a "Ansi" non-unicode app
forces java runtime to still have heavy dependency on Windows
System Locale setting. More specifically the def of system property
"file.encoidng" and some of its usage in j2se's implementation
(jni_util.c), currently this system property is derived from the
same setting of what the default java locale comes from. It's not
a problem when System Locale and User locale are the same, but when
these 2 setting are incompatible, some usages of "file.encoding"
will be problematic (Considering when you system locale is Japanese
and user locale is English and you, as a Ansi app, try to speak
with the System by using English).

The list of some of these user locale/System Locale setting related
bugs is at
http://javaweb.sfbay/~sherman/Win32_locale_setting_bugs.html

(2)The "Inconsistency" of Windows API doc and Windows implementation
of GetThreadLocale() when launching app from Dos/Command Prompt as
a console app (in java's case, launch java from "java")

Based on Windows document, GetThreadLocale() should return the
Locale of this thread and it should be the "User Locale", it's
true for most of the cases. But in scenario like

  -System Locale is set to a DBCS locale (like Japanese or Chinese)
  -User Locale is set to a SBCS locale (like English)
  -Start the app from Dos Prompt (as a concole app)

The GetThreadLocale() will not return the value set in User Locale
(English) but what set in System Locale (Japanese), my guess is
that this is because the Dos Prompt/Command Prompt itself is a
Ansi/non-Uniocde app and it is doing something special when the
system locale is a DBCS locale.

(3)Starting from JDK1.4.2 the default locale of java runtime will be
set to different values when launched from "javaw" or "java" on
W2K and XP, if the User Locale and System Locale settings of the
underlying Windows system are different. More specifically, if java
runtime is launched from "javaw", the default locale will be set
based on Windows System Locale setting and if launched from "java"
the java default locale will be set based on Windows User Locale
setting. This causes big confusion for developers and end users who
work on multilingual W2K/XP environment. The direct trigger of
#4891531 is the fix for bug#4629351 which in launcher's win32
java_md.c we added one line code in WinMain

      SetThreadLocale(GetSystemDefaultLCID());

to force the java runtime's thread locale to be Windows System
Locale, the result of this change is that the java default locale
will be set to whatever set in Windows System Locale. But this line
of code will never be executed if we launch java runtime through
"java" command, this is where the inconsistency comes from.


Solution
To the fullest extent possible, make the User Locale the default
locale used by the JRE. The basic fix for this is to call the correct
Windows API and use that as the basis for setting the Java locale and
system property file.encoding. This will make the windows behavior
consistent with the experience on Solaris and Linux.

However since some parts of the JRE are still a "windows legacy
application" that cannot always use the user locale (ie not all
of the JRE is a unicode application in the windows meaning of the
term) then some additional measures need to be, so the proposed
fixes are

(1)Removed the inappropriate fix of #4629351 (from java_md.c). The
    java default locale (language, country) and the file.encoding
    system property will be solely based on what the User Locale is
    (use GetUserDefaultLCID() in java_props.md).

(2)We will not change our public position in Tiger to claim official
    support for runtime environment with different (incompatibal)
    System Locale and User Locale setting on XP/W2K because we are
    still running java runtime as a non-Unicode app and the sequence
    that there are still issues left in some places that can not be
    solved before we migrate vm and launcher code to a pure
    unicode-based implementaiton

(3)Introduce in a new "internal use only" system property

    sun.jnu.encoding

    to be used in jni_util.c to replace the file.encoding in method
    "initializeEncoding(JNIEnv *env)", this change will affect 2
    jni_util.c methods on Windows

    NewStringPlatform
    GetPlatformStringChar

    Compared to file.encoding, the sun.jnu.encoding property will be
    set based on what Windows System Locale is on Windows platforms.
    On Solaris/Linux this priority will remain the same as
    file.encoding.

    The reasons why we need to introduce this new property and use it
    in jni_util.c are

    (1)NewStringPlatform and GetPlatformStringChar are 2 methods used
    internally by j2se impl to do the text encoding conversion when
    need to talk with underlying platform (again, this is because our
    runtime is not a win32 Unicode app). This "platform encoding" must
    be the encoding that matches what the System Locale setting is. It
    does not make any sense (and it does not work either) to use the
    encoding derived from the User Locale setting, the Windows system
    will simply not understand it if System Locale and User Locale
    setting are not compatible.

    (2)We want to have file.encoding derived from what the Windows User
    Locale setting is (when have different System Locale and User
    Locale settings), file.encoding which will mostly be used to set
    the default encoding for "contents" of input/output stream should
    always match what the default java locale is (which is from User
    Locale setting now).

    You can find all usages of NewStringPlatform and GetPlatformString
    inside j2se at
    http://javaweb.sfbay/~sherman/NewStringPlatform_GetStringPlatformChar
    and the usage of "file.encoding" at
    http://javaweb.sfbay/~sherman/file_encoding

The webrev for proposed fix for your reference is at
http://javaweb.sfbay/~sherman/Webrevs/webrev_4891531_4958170
(regression test cases will be added)

Interface summary
exported        private        property        file.encoding will be solely derived from User Locale setting on Windows
     internal        property        sun.jnu.encoding will be solely derivied from System Locale setting on Windows
imported        external        other        GetUserDefaultLCID, GetSystemDefaultLCID, GetThreadLocale
exported        external        method        java.util.Locale.getDefault() and java.nio.charset.Charset.defaultCharset()

Specification
(1)sun.jnu.encoding:
An internal use only system property that derived from what Windows
System Locale setting is on Windows platform. It has the same value
as file.encoding on Solaris and Linux platform.

(2)java.util.Locale.getDefault()
    java.nio.charset.Charset.defaultCharset()

We never publicly documented how the values returned by these 2
methods are derived from the host env setting and still are not
going to document it. However, the return values of these 2 methods
are now consistently derived on Windows from the user locale
as reported by GetUserDefaultLCID.

(3)file.encoding

We never publicly documented how this property value is derived from the
host environment settings and still are not going to document it. However,
this value is now consistently derived on Windows from the user locale as
reported by GetUserDefaultLCID.

Compatibility risk:     low
First of all the proposed solution should NOT have any compatibility
impact to our "official supported" use scenario in which the Windows
System locale and User locale are set to the same language. What we
are trying to do is to make the java runtime behave more consistent
in scenarios that we dont officially support but more and more end
users are running into. In these scenarios our launcher "java" and
"javaw" have severe inconsistent behavior which I think we have to
pay a price to fix. Followings are the impact that we know we will
have with the proposed solution.


(1)file.encoding will always be derived from User Locale setting.

    Compared to current impl, file.encoding in proposed change will
    always be derived from user locale setting, this will cause a
    incompatible behavior to some apps (if they depend on the
    assumption that file.encoding will always be derived from System
    Locale setting)when the System Locale and User Locale settings
    are not "compatible" (even this is not an official supported
    scenario on Java/JDK i18n document mentioned before) But given
    the fact that our current impl already breaks this assumption in
    some env scenarios and have severe inconsistency here and there,
    I believe this is a decision/choice we must to make and a risk we
    have to face with, if we want to address the inconsistency

(2)concern of adding a sun.jnu.encoding and using it instead of
file.encoding in jni_util.c

    The win32 FileSystem and awt has been migrated to MSLU, so there
    is no compatibility issue in this 2 big chunks. Other than that
    j2se currently does not have heavily dependency on these 2
    methods, I've scanned all places (see url above) where these 2
    jni_util methods are being used, my conclusion is that it makes
    more sense to use the encoding from "System Locale" setting
    instead of the "User Locale" setting in all of the places. The
    only places that I would have little concern of compatibility are
    (1)in System.c when converting those system properties such as
    usr.name, usr.dir from platform encoding to uncode encoding and
    (2)in launcher code when parsing the command line args to
    unicode encoded args for java main class, in scenario like

    -System Locale is set to English
    -User Locale is set to Japanese
    -Someone tries to pass in a command line option/properties/flag
     in Japanese like -Dxxx=yyyy  (which yyyy is in Japanese)

    Since the "sun.jnu.encoding" now is "English"/cp1252, this Japanese
    option/flag/properties will not be converted correctly into java
    runtime.

    But given the fact the Windows System Locale now is "English
    /cp1252", even you are able to input the Japanese from command
    line (in this case the "dos prompt app" is in English/cp1252 mode,
    it does not work well with Japanese, though you can use
    copy/paste...), this Japanese text has been collapsed already even
    before java gets a hand on it, means the text in args from C main
    and from getenv is collapsed already. So this should not be an
    issue. It does not work anyway, even in current impl.


3)What we have in 1.4.1, 1.4.2 and in proposed 1.5 when the Windows
settings are different.


  a)UserLocale=Japanese/SystemLocale=English

java               Java     file.encoding    sun.jnu.encoding
launcher           Locale   property         property
1.4.1/java         Ja       MS932
1.4.1/javaw        ja       MS932
1.4.2/java         ja       MS932
1.4.2/javaw        en       Cp1252
1.5.0/java         ja       MS932            Cp1252
1.5.0/javaw        ja       MS932            Cp1252


  b)UserLocale=English/SystemLocale=Japanese

java               Java     file.encoding    sun.jnu.encoding
launcher           Locale   property         property
1.4.1/java         ja       MS932
1.4.1/javaw        en       Cp1252
1.4.2/java         ja       MS932
1.4.2/javaw        ja       MS932
1.5.0/java         en       Cp1252           MS932
1.5.0/javaw        en       Cp1252           MS932



More information about the core-libs-dev mailing list