[loc-en-dev] Comments on the locale enhancement proposal

Yoshito Umaoka y.umaoka at gmail.com
Mon Feb 2 19:33:53 PST 2009


Masayoshi Okutsu wrote:

>> Let's assume an instance of Locale is created from language tag 
>> "zh-Hans-CN".  The proposal suggest Locale#toString() to return 
>> "zh_Hans_CN".  Do you think this behavior is problematic?  Are you 
>> suggesting to add a new method, for exmaple, Locale#getID() to return 
>> "zh_Hans_CN", but not to put the script "Hans" and extra separator "_" 
>> in the result of #toString()?
> 
> I think returning "zh_Hans_CN" may cause a problem. Let's think about 
> the following scenario.
> 
> (1) Application A and B communicate through RMI (i.e., serialization).
> (2) A is script-aware, while B may be or may not.
> (3) B uses 3rd party class library L which isn't script-aware.
> 
> Suppose both A and B are running in JDK 7, and that A sends a Locale 
> from "zh-Hans-CN" to B. B passes the given Locale to L. In this case, L 
> might be confused with "zh_Hans_CN" from toString().
> 
> We could say, "Don't do that." But if someone complains it's an 
> incompatible change in JDK 7, we will need to give up the new behavior 
> of toString(). If the complaint comes after the JDK 7 release, it will 
> be a tragedy...

I do not understand what you wrote above.  Locale has 3 member fields - 
language, country and variant.  When an instance of Locale is being 
serialized, these fields are preserved in the serialized form.  Even we 
internally add extra fields or change the internal representation of 
these fields, we have to write out these 3 separated fields for 
supporting serialization compatibility.  In the scenario above, I would 
expect Locale("zh", "CN") at the other end (pre-JDK7).  Of course, it 
loses the script information, which is not ideal, but at least the 
problem which you mentioned above should not happen.

It is true that there might be an existing application depending on its 
String representation and making an assumption - A locale string consist 
from up to 3 fields delimitted by "_" - 1st one is language, 2nd one is 
country and the rest is variant.  If we need to avoid this - we could -

1. toString by the Java convension, we still want to write out entire 
fields information, including script, extensions...  If we append these 
information to the end of variant, I would expect the impact is minimum.

2. With the change above, we want another method to return formal 
"programmatic name".  Probably we need to add getID() to do so.  If we 
decided to go this way, we should update the document to encourage 
people to use getID() instead of toString() to get a canonical string 
representation of a Locale.

Although we could do such things for supporting full backward 
compatibility, I prefer not to do so.

Am I missing anything?

Thanks,
Yoshito






More information about the locale-enhancement-dev mailing list