[loc-en-dev] Comments on the locale enhancement proposal

Naoto Sato Naoto.Sato at Sun.COM
Tue Feb 3 11:53:34 PST 2009


Yoshito Umaoka wrote:
> Masayoshi Okutsu wrote:
> 
>>> Let's assume an instance of Locale is created from language tag 
>>> "zh-Hans-CN".  The proposal suggest Locale#toString() to return 
>>> "zh_Hans_CN".  Do you think this behavior is problematic?  Are you 
>>> suggesting to add a new method, for exmaple, Locale#getID() to return 
>>> "zh_Hans_CN", but not to put the script "Hans" and extra separator 
>>> "_" in the result of #toString()?
>>
>> I think returning "zh_Hans_CN" may cause a problem. Let's think about 
>> the following scenario.
>>
>> (1) Application A and B communicate through RMI (i.e., serialization).
>> (2) A is script-aware, while B may be or may not.
>> (3) B uses 3rd party class library L which isn't script-aware.
>>
>> Suppose both A and B are running in JDK 7, and that A sends a Locale 
>> from "zh-Hans-CN" to B. B passes the given Locale to L. In this case, 
>> L might be confused with "zh_Hans_CN" from toString().
>>
>> We could say, "Don't do that." But if someone complains it's an 
>> incompatible change in JDK 7, we will need to give up the new behavior 
>> of toString(). If the complaint comes after the JDK 7 release, it will 
>> be a tragedy...
> 
> I do not understand what you wrote above.  Locale has 3 member fields - 
> language, country and variant.  When an instance of Locale is being 
> serialized, these fields are preserved in the serialized form.  Even we 
> internally add extra fields or change the internal representation of 
> these fields, we have to write out these 3 separated fields for 
> supporting serialization compatibility.  In the scenario above, I would 
> expect Locale("zh", "CN") at the other end (pre-JDK7).  Of course, it 
> loses the script information, which is not ideal, but at least the 
> problem which you mentioned above should not happen.
> 
> It is true that there might be an existing application depending on its 
> String representation and making an assumption - A locale string consist 
> from up to 3 fields delimitted by "_" - 1st one is language, 2nd one is 
> country and the rest is variant.  If we need to avoid this - we could -
> 
> 1. toString by the Java convension, we still want to write out entire 
> fields information, including script, extensions...  If we append these 
> information to the end of variant, I would expect the impact is minimum.
> 
> 2. With the change above, we want another method to return formal 
> "programmatic name".  Probably we need to add getID() to do so.  If we 
> decided to go this way, we should update the document to encourage 
> people to use getID() instead of toString() to get a canonical string 
> representation of a Locale.

I was thinking that toString(IDType) in the draft spec was supposed to 
do this, wasn't it?

Probably we should add more descriptive name to this method like 
toCanonicalName() (I removed "IDType" argument as we may end up 
supporting BCP47 only).

Naoto

> 
> Although we could do such things for supporting full backward 
> compatibility, I prefer not to do so.
> 
> Am I missing anything?
> 
> Thanks,
> Yoshito
> 
> 
> 


-- 
Naoto Sato



More information about the locale-enhancement-dev mailing list