[loc-en-dev] Comments on the locale enhancement proposal
Masayoshi Okutsu
Masayoshi.Okutsu at Sun.COM
Tue Feb 3 01:01:48 PST 2009
I believe neither is perfect. My point is that the system should provide
simple mechanisms. If it's too inconvenient to put everything in
sr_Latn_RS, we could treat sr_Latn as an exception, like sr_Latn_RS ->
sr_Latn -> (root). JDK already has some exceptions, like
zh_HK->zh_TW->zh, anyway.
Thanks,
Masayoshi
On 2/3/2009 5:02 AM, Yoshito Umaoka wrote:
> Masayoshi Okutsu wrote:
> > On 1/21/2009 9:13 AM, Doug Felt wrote:
> >>
> >>
> >> On Tue, Jan 20, 2009 at 4:04 PM, Masayoshi Okutsu
> <Masayoshi.Okutsu at sun.com <mailto:Masayoshi.Okutsu at sun.com>> wrote:
> >>
> >> I think it's obvious that we can't support old data with new
> >> identifiers perfectly, like zh_Hans_CN and zh_Hant_CN. When we
> >> can't support both, I prefer to define a simple algorithm to
> >> produce a look-up sequences with minimum exceptions. [...]
> >>
> >>
> >> Can define one so we can understand what cases you intend to handle
> and how?
> >
> > My preference is:
> >
> > (1) Treat language+script as a writingsystem which produces sequence
> language_script -> language.
> >
>
> If we forget about legacy RB organization, it makes sense. However..
> see my comments for the next item.
>
> > (2) Apply the traditional sequence production rule to
> writingsystem_country_variant
> >
> > writingsystem_country_variant
> > writingsystem_country
> > writingsystem
> >
> > each of which produces language_script -> language. Therefore, the
> entire sequence is:
> >
> > language_script_country_variant
> > language_country_variant
> > language_script_country
> > language_country
> > language_script
> > language
> >
> > For example, the sequence for zh_Hans_CN is:
> >
> > zh_Hans_CN
> > zh_CN
> > zh_Hans
> > zh
> >
> > while the proposed one is:
> >
> > zh_Hans_CN
> > zh_Hans
> > zh_CN
> > zh
> >
>
> zh_Hans_CN -> zh_CN -> zh_Hans -> zh may work OK for this specific
> case. However, when a country has two commonly used script, this
> order may not work as we expect. For example, let's see sr_Latn_RS.
> With you suggestion, the order of look up will be -
>
> sr_Latn_RS
> sr_RS
> sr_Latn
> sr
>
> In general, writing system is more important than country variant.
> For Seribian used in Serbia, Cyrillic script is likely used as a
> default script. Therefore, existing resource sr_RS likely has
> Cyrillic contents. Some may want to add Latn variant along with sr_RS
> and tag it sr_Latn_RS and add its parent sr_Latn, sr_Latin may be
> hidden by sr_RS by this lookup order.
>
> I think we're talking about which one matches better for sr_Latn_RS -
> sr_RS or sr_Latin. And, in this case, I think sr_Latin is the answer.
>
>
> > (3) If no script is given, the sequence is the same as the
> traditional one.
> >
> > language_country_variant
> > language_country
> > language
> >
>
> This does not work well unless we supply a default script for
> languages which has two or more script variants. For a request -
> zh_HK, this suggestion produces following candidates -
>
> zh_HK
> zh
>
> But, for people who want to distinguish scripts with the new framework
> may have resources zh_Hant_HK, but not zh_HK. When a language has
> commonly used multiple variants and one of them is dominant in a
> country, the expanson - wrinting system (without script) -> writing
> system with script is desired.
>
>
> > (4) Exceptions are Norwegian and Hebrew.
> >
> > no_NO -> nb_NO -> no -> nb
> > no_NO_NY -> nn_NO -> no_NO -> nn -> no
> > nn_NO -> no_NO_NY -> nn -> no
> > nb_NO -> no_NO -> nb -> no
> >
> > he_IL -> iw_IL -> he -> iw
> > iw_IL -> he_IL -> iw -> he
>
> I think we should distinguish Norwegian case from Hebrew case. For
> Hebrea, he is exactly equal to iw. For Norwegian, strictly speaking,
> no could be nb or nn. I'm fine with the order of Hebrew above. But I
> think Norwegian case should be handled differently.
>
> -Yoshito
More information about the locale-enhancement-dev
mailing list