Is setting -Dsun.jnu.encoding from the command line supposed to work?
Volker Simonis
volker.simonis at gmail.com
Mon Dec 14 10:23:40 UTC 2015
Hi Sherman,
thanks for providing the detailed history to this issue.
At least for testing purpose it could be definitely nice to set
sun.jnu.encoding. It may be also useful on platforms where setlocale()
returns bogus values. But I won't insist in changing this until we
don't have a real problem with it.
Regards,
Volker
On Fri, Dec 11, 2015 at 7:14 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
> On 12/11/2015 09:53 AM, Xueming Shen wrote:
>>
>> Don't do it, that's all I would suggest :-) same as "file.encoding", they
>> are
>> both supposed to be informative read-only system property.
>>
>> Here is the history of sun.jnu.encoding
>> http://ccc.us.oracle.com/4958170
>>
>
> My apology, forgot the "ccc" is still an internal site. Here is the
> copy/paste of
> the history. It's mainly to solve the Windows' user/system locale setting
> issue.
> It is supposed to be an informative system property to help the jvm to
> communicate with the underlying os with the appropriate encoding. Its value
> should be absolutely obtained from the system setting not set from the
> command
> line. While it might be desired in some use scenario to have the
> file.encoding set
> to a different value, I'm not convinced that the sun.jnu.encoding need to be
> customized
>
> -sherman
>
> --------------------------------------------------------------------------------------------------------------
> Problem
> This request addresses the following 3 bugs.
>
> 4958170: javaw does not retrieve the user locale
> 4891531: javaw and java get different default locale from OS
> 4989534: Regional / Language setting improperly detected with a multi-user
> PC
>
> Windows has 2 locale settings called User Locale and System Locale.
> According to Microsoft's documentation System Locale is a legacy
> compatibility mode rather than a true locale and User Locale is what
> Windows really use for formatting dates, times, currentcy and large
> numbers. JDK's documentation has expressly stated that these
> two must be the same to be supported, but users do not always follow
> this and JDK is internally inconsistent if these settings are
> different depending on how java is launched.
>
> Also an incorrect change to javaw in 1.4.2 to attempt to fix one
> observed inconsistent case of different System Locale and User
> Locale setting created yet another internal inconsistency.
>
> These JDK bugs and inconsistencies need to be resolved.
>
> Background:
>
> (1)From the very beginning we have 2 groups of end user on W2k/XP who
> have different(opposite) opinions on how the java default locale
> should be set, based on their different use scenarios. One group
> believes the default java locale should be the same as what Windows
> System Locale is and other prefers that it should be the same as
> Windows User Locale.
> Until recently we have been purposely avoiding to face this issue
> directly (because the reality that java runtime is NOT a real win32
> Unicode app) by insisting the SUPPORTED use scenario is to set both
> UserLocale and SystemLocale to the same locale/language on W2K/XP.
>
> The related doc is at
> http://java.sun.com/j2se/1.4.2/docs/guide/intl/locale.doc.html#jfc
>
> However we've started to see more and more bugs get filed to complain
> this restriction (guess this is mostly because more and more apps/
> users have been migrating to W2K/XP, which is a real multilingual
> environment compared to previous OSs) and we believe we need to make
> it clear in Tiger that the java default locale should be set solely
> based on Windows User Locale setting in ControlPanel's RegionSetting,
> from the perspective of what User Locale is defined on Windows and
> what java default Locale is defined on Java platform.
>
> You can find the official MS definition of System Locale, User Locale
> and Thread Locale at
> http://www.microsoft.com/globaldev/reference/localetable.mspx
>
> The highlights of some concepts we care are:
> (a)Windows Locale
> A locale is either a language or a language in combination with
> a country that a user wants to use for formatting dates, times,
> currency, and large numbers.
>
> (b)Windows User Locale
> The user locale determines, which default settings a user wants
> to use for formatting dates, times, currency, and large numbers.
>
> (c)Windows System Locale
> The system locale is not really a locale. It determines which
> codepages (ANSI, DOS, and Macintosh) are used on the system by
> default.
> ...
> If it weren't be so long the system locale should be called
> legacy applications compatibility setting, because that is
> really what it is.
>
> (d)Windows Thread Locale
> The thread locale defaults to the user locale and determines the
> formatting dates, times, currency, and large numbers for the
> thread. It can be changed programmatically using the API
> SetThreadLocale. In most cases the thread locale should not be
> overwritten
>
> And "java locale" is defined as
>
> A Locale object represents a specific geographical, political, or
> cultural region. An operation that requires a Locale to perform its
> task is called locale-sensitive and uses the Locale to tailor
> information for the user. For example, displaying a number is a
> locale-sensitive operation--the number should be formatted according
> to the customs/conventions of the user's native country, region, or
> culture.
>
> So it's clear that Windows System Locale actually has nothing to do
> with locale and java locale at all, java locale definitely should
> not derived from Windows System Locale setting.
>
> Having said that the fact java runtime is a "Ansi" non-unicode app
> forces java runtime to still have heavy dependency on Windows
> System Locale setting. More specifically the def of system property
> "file.encoidng" and some of its usage in j2se's implementation
> (jni_util.c), currently this system property is derived from the
> same setting of what the default java locale comes from. It's not
> a problem when System Locale and User locale are the same, but when
> these 2 setting are incompatible, some usages of "file.encoding"
> will be problematic (Considering when you system locale is Japanese
> and user locale is English and you, as a Ansi app, try to speak
> with the System by using English).
>
> The list of some of these user locale/System Locale setting related
> bugs is at
> http://javaweb.sfbay/~sherman/Win32_locale_setting_bugs.html
>
> (2)The "Inconsistency" of Windows API doc and Windows implementation
> of GetThreadLocale() when launching app from Dos/Command Prompt as
> a console app (in java's case, launch java from "java")
>
> Based on Windows document, GetThreadLocale() should return the
> Locale of this thread and it should be the "User Locale", it's
> true for most of the cases. But in scenario like
>
> -System Locale is set to a DBCS locale (like Japanese or Chinese)
> -User Locale is set to a SBCS locale (like English)
> -Start the app from Dos Prompt (as a concole app)
>
> The GetThreadLocale() will not return the value set in User Locale
> (English) but what set in System Locale (Japanese), my guess is
> that this is because the Dos Prompt/Command Prompt itself is a
> Ansi/non-Uniocde app and it is doing something special when the
> system locale is a DBCS locale.
>
> (3)Starting from JDK1.4.2 the default locale of java runtime will be
> set to different values when launched from "javaw" or "java" on
> W2K and XP, if the User Locale and System Locale settings of the
> underlying Windows system are different. More specifically, if java
> runtime is launched from "javaw", the default locale will be set
> based on Windows System Locale setting and if launched from "java"
> the java default locale will be set based on Windows User Locale
> setting. This causes big confusion for developers and end users who
> work on multilingual W2K/XP environment. The direct trigger of
> #4891531 is the fix for bug#4629351 which in launcher's win32
> java_md.c we added one line code in WinMain
>
> SetThreadLocale(GetSystemDefaultLCID());
>
> to force the java runtime's thread locale to be Windows System
> Locale, the result of this change is that the java default locale
> will be set to whatever set in Windows System Locale. But this line
> of code will never be executed if we launch java runtime through
> "java" command, this is where the inconsistency comes from.
>
>
> Solution
> To the fullest extent possible, make the User Locale the default
> locale used by the JRE. The basic fix for this is to call the correct
> Windows API and use that as the basis for setting the Java locale and
> system property file.encoding. This will make the windows behavior
> consistent with the experience on Solaris and Linux.
>
> However since some parts of the JRE are still a "windows legacy
> application" that cannot always use the user locale (ie not all
> of the JRE is a unicode application in the windows meaning of the
> term) then some additional measures need to be, so the proposed
> fixes are
>
> (1)Removed the inappropriate fix of #4629351 (from java_md.c). The
> java default locale (language, country) and the file.encoding
> system property will be solely based on what the User Locale is
> (use GetUserDefaultLCID() in java_props.md).
>
> (2)We will not change our public position in Tiger to claim official
> support for runtime environment with different (incompatibal)
> System Locale and User Locale setting on XP/W2K because we are
> still running java runtime as a non-Unicode app and the sequence
> that there are still issues left in some places that can not be
> solved before we migrate vm and launcher code to a pure
> unicode-based implementaiton
>
> (3)Introduce in a new "internal use only" system property
>
> sun.jnu.encoding
>
> to be used in jni_util.c to replace the file.encoding in method
> "initializeEncoding(JNIEnv *env)", this change will affect 2
> jni_util.c methods on Windows
>
> NewStringPlatform
> GetPlatformStringChar
>
> Compared to file.encoding, the sun.jnu.encoding property will be
> set based on what Windows System Locale is on Windows platforms.
> On Solaris/Linux this priority will remain the same as
> file.encoding.
>
> The reasons why we need to introduce this new property and use it
> in jni_util.c are
>
> (1)NewStringPlatform and GetPlatformStringChar are 2 methods used
> internally by j2se impl to do the text encoding conversion when
> need to talk with underlying platform (again, this is because our
> runtime is not a win32 Unicode app). This "platform encoding" must
> be the encoding that matches what the System Locale setting is. It
> does not make any sense (and it does not work either) to use the
> encoding derived from the User Locale setting, the Windows system
> will simply not understand it if System Locale and User Locale
> setting are not compatible.
>
> (2)We want to have file.encoding derived from what the Windows User
> Locale setting is (when have different System Locale and User
> Locale settings), file.encoding which will mostly be used to set
> the default encoding for "contents" of input/output stream should
> always match what the default java locale is (which is from User
> Locale setting now).
>
> You can find all usages of NewStringPlatform and GetPlatformString
> inside j2se at
> http://javaweb.sfbay/~sherman/NewStringPlatform_GetStringPlatformChar
> and the usage of "file.encoding" at
> http://javaweb.sfbay/~sherman/file_encoding
>
> The webrev for proposed fix for your reference is at
> http://javaweb.sfbay/~sherman/Webrevs/webrev_4891531_4958170
> (regression test cases will be added)
>
> Interface summary
> exported private property file.encoding will be solely
> derived from User Locale setting on Windows
> internal property sun.jnu.encoding will be solely derivied
> from System Locale setting on Windows
> imported external other GetUserDefaultLCID,
> GetSystemDefaultLCID, GetThreadLocale
> exported external method java.util.Locale.getDefault()
> and java.nio.charset.Charset.defaultCharset()
>
> Specification
> (1)sun.jnu.encoding:
> An internal use only system property that derived from what Windows
> System Locale setting is on Windows platform. It has the same value
> as file.encoding on Solaris and Linux platform.
>
> (2)java.util.Locale.getDefault()
> java.nio.charset.Charset.defaultCharset()
>
> We never publicly documented how the values returned by these 2
> methods are derived from the host env setting and still are not
> going to document it. However, the return values of these 2 methods
> are now consistently derived on Windows from the user locale
> as reported by GetUserDefaultLCID.
>
> (3)file.encoding
>
> We never publicly documented how this property value is derived from the
> host environment settings and still are not going to document it. However,
> this value is now consistently derived on Windows from the user locale as
> reported by GetUserDefaultLCID.
>
> Compatibility risk: low
> First of all the proposed solution should NOT have any compatibility
> impact to our "official supported" use scenario in which the Windows
> System locale and User locale are set to the same language. What we
> are trying to do is to make the java runtime behave more consistent
> in scenarios that we dont officially support but more and more end
> users are running into. In these scenarios our launcher "java" and
> "javaw" have severe inconsistent behavior which I think we have to
> pay a price to fix. Followings are the impact that we know we will
> have with the proposed solution.
>
>
> (1)file.encoding will always be derived from User Locale setting.
>
> Compared to current impl, file.encoding in proposed change will
> always be derived from user locale setting, this will cause a
> incompatible behavior to some apps (if they depend on the
> assumption that file.encoding will always be derived from System
> Locale setting)when the System Locale and User Locale settings
> are not "compatible" (even this is not an official supported
> scenario on Java/JDK i18n document mentioned before) But given
> the fact that our current impl already breaks this assumption in
> some env scenarios and have severe inconsistency here and there,
> I believe this is a decision/choice we must to make and a risk we
> have to face with, if we want to address the inconsistency
>
> (2)concern of adding a sun.jnu.encoding and using it instead of
> file.encoding in jni_util.c
>
> The win32 FileSystem and awt has been migrated to MSLU, so there
> is no compatibility issue in this 2 big chunks. Other than that
> j2se currently does not have heavily dependency on these 2
> methods, I've scanned all places (see url above) where these 2
> jni_util methods are being used, my conclusion is that it makes
> more sense to use the encoding from "System Locale" setting
> instead of the "User Locale" setting in all of the places. The
> only places that I would have little concern of compatibility are
> (1)in System.c when converting those system properties such as
> usr.name, usr.dir from platform encoding to uncode encoding and
> (2)in launcher code when parsing the command line args to
> unicode encoded args for java main class, in scenario like
>
> -System Locale is set to English
> -User Locale is set to Japanese
> -Someone tries to pass in a command line option/properties/flag
> in Japanese like -Dxxx=yyyy (which yyyy is in Japanese)
>
> Since the "sun.jnu.encoding" now is "English"/cp1252, this Japanese
> option/flag/properties will not be converted correctly into java
> runtime.
>
> But given the fact the Windows System Locale now is "English
> /cp1252", even you are able to input the Japanese from command
> line (in this case the "dos prompt app" is in English/cp1252 mode,
> it does not work well with Japanese, though you can use
> copy/paste...), this Japanese text has been collapsed already even
> before java gets a hand on it, means the text in args from C main
> and from getenv is collapsed already. So this should not be an
> issue. It does not work anyway, even in current impl.
>
>
> 3)What we have in 1.4.1, 1.4.2 and in proposed 1.5 when the Windows
> settings are different.
>
>
> a)UserLocale=Japanese/SystemLocale=English
>
> java Java file.encoding sun.jnu.encoding
> launcher Locale property property
> 1.4.1/java Ja MS932
> 1.4.1/javaw ja MS932
> 1.4.2/java ja MS932
> 1.4.2/javaw en Cp1252
> 1.5.0/java ja MS932 Cp1252
> 1.5.0/javaw ja MS932 Cp1252
>
>
> b)UserLocale=English/SystemLocale=Japanese
>
> java Java file.encoding sun.jnu.encoding
> launcher Locale property property
> 1.4.1/java ja MS932
> 1.4.1/javaw en Cp1252
> 1.4.2/java ja MS932
> 1.4.2/javaw ja MS932
> 1.5.0/java en Cp1252 MS932
> 1.5.0/javaw en Cp1252 MS932
More information about the core-libs-dev
mailing list