[loc-en-dev] Comments on the locale enhancement proposal
Yoshito Umaoka
y.umaoka at gmail.com
Mon Feb 2 19:33:53 PST 2009
Masayoshi Okutsu wrote:
>> Let's assume an instance of Locale is created from language tag
>> "zh-Hans-CN". The proposal suggest Locale#toString() to return
>> "zh_Hans_CN". Do you think this behavior is problematic? Are you
>> suggesting to add a new method, for exmaple, Locale#getID() to return
>> "zh_Hans_CN", but not to put the script "Hans" and extra separator "_"
>> in the result of #toString()?
>
> I think returning "zh_Hans_CN" may cause a problem. Let's think about
> the following scenario.
>
> (1) Application A and B communicate through RMI (i.e., serialization).
> (2) A is script-aware, while B may be or may not.
> (3) B uses 3rd party class library L which isn't script-aware.
>
> Suppose both A and B are running in JDK 7, and that A sends a Locale
> from "zh-Hans-CN" to B. B passes the given Locale to L. In this case, L
> might be confused with "zh_Hans_CN" from toString().
>
> We could say, "Don't do that." But if someone complains it's an
> incompatible change in JDK 7, we will need to give up the new behavior
> of toString(). If the complaint comes after the JDK 7 release, it will
> be a tragedy...
I do not understand what you wrote above. Locale has 3 member fields -
language, country and variant. When an instance of Locale is being
serialized, these fields are preserved in the serialized form. Even we
internally add extra fields or change the internal representation of
these fields, we have to write out these 3 separated fields for
supporting serialization compatibility. In the scenario above, I would
expect Locale("zh", "CN") at the other end (pre-JDK7). Of course, it
loses the script information, which is not ideal, but at least the
problem which you mentioned above should not happen.
It is true that there might be an existing application depending on its
String representation and making an assumption - A locale string consist
from up to 3 fields delimitted by "_" - 1st one is language, 2nd one is
country and the rest is variant. If we need to avoid this - we could -
1. toString by the Java convension, we still want to write out entire
fields information, including script, extensions... If we append these
information to the end of variant, I would expect the impact is minimum.
2. With the change above, we want another method to return formal
"programmatic name". Probably we need to add getID() to do so. If we
decided to go this way, we should update the document to encourage
people to use getID() instead of toString() to get a canonical string
representation of a Locale.
Although we could do such things for supporting full backward
compatibility, I prefer not to do so.
Am I missing anything?
Thanks,
Yoshito
More information about the locale-enhancement-dev
mailing list