[loc-en-dev] Comments on the locale enhancement proposal

Masayoshi Okutsu Masayoshi.Okutsu at Sun.COM
Tue Feb 3 01:01:48 PST 2009


I believe neither is perfect. My point is that the system should provide 
simple mechanisms. If it's too inconvenient to put everything in 
sr_Latn_RS, we could treat sr_Latn as an exception, like sr_Latn_RS -> 
sr_Latn -> (root). JDK already has some exceptions, like 
zh_HK->zh_TW->zh, anyway.

Thanks,
Masayoshi

On 2/3/2009 5:02 AM, Yoshito Umaoka wrote:
> Masayoshi Okutsu wrote:
> > On 1/21/2009 9:13 AM, Doug Felt wrote:
> >>
> >>
> >> On Tue, Jan 20, 2009 at 4:04 PM, Masayoshi Okutsu 
> <Masayoshi.Okutsu at sun.com <mailto:Masayoshi.Okutsu at sun.com>> wrote:
> >>
> >>     I think it's obvious that we can't support old data with new
> >>     identifiers perfectly, like zh_Hans_CN and zh_Hant_CN. When we
> >>     can't support both, I prefer to define a simple algorithm to
> >>     produce a look-up sequences with minimum exceptions. [...]
> >>
> >>
> >> Can define one so we can understand what cases you intend to handle 
> and how?
> >
> > My preference is:
> >
> > (1) Treat language+script as a writingsystem which produces sequence 
> language_script -> language.
> >
>
> If we forget about legacy RB organization, it makes sense.  However.. 
> see my comments for the next item.
>
> > (2) Apply the traditional sequence production rule to 
> writingsystem_country_variant
> >
> > writingsystem_country_variant
> > writingsystem_country
> > writingsystem
> >
> > each of which produces language_script -> language. Therefore, the 
> entire sequence is:
> >
> > language_script_country_variant
> > language_country_variant
> > language_script_country
> > language_country
> > language_script
> > language
> >
> > For example, the sequence for zh_Hans_CN is:
> >
> > zh_Hans_CN
> > zh_CN
> > zh_Hans
> > zh
> >
> > while the proposed one is:
> >
> > zh_Hans_CN
> > zh_Hans
> > zh_CN
> > zh
> >
>
> zh_Hans_CN -> zh_CN -> zh_Hans -> zh may work OK for this specific 
> case.  However, when a country has two commonly used script, this 
> order may not work as we expect.  For example, let's see sr_Latn_RS.  
> With you suggestion, the order of look up will be -
>
> sr_Latn_RS
> sr_RS
> sr_Latn
> sr
>
> In general, writing system is more important than country variant.  
> For Seribian used in Serbia, Cyrillic script is likely used as a 
> default script.  Therefore, existing resource sr_RS likely has 
> Cyrillic contents.  Some may want to add Latn variant along with sr_RS 
> and tag it sr_Latn_RS and add its parent sr_Latn, sr_Latin may be 
> hidden by sr_RS by this lookup order.
>
> I think we're talking about which one matches better for sr_Latn_RS - 
> sr_RS or sr_Latin.  And, in this case, I think sr_Latin is the answer.
>
>
> > (3) If no script is given, the sequence is the same as the 
> traditional one.
> >
> > language_country_variant
> > language_country
> > language
> >
>
> This does not work well unless we supply a default script for 
> languages which has two or more script variants.  For a request - 
> zh_HK, this suggestion produces following candidates -
>
> zh_HK
> zh
>
> But, for people who want to distinguish scripts with the new framework 
> may have resources zh_Hant_HK, but not zh_HK.  When a language has 
> commonly used multiple variants and one of them is dominant in a 
> country, the expanson - wrinting system (without script) -> writing 
> system with script is desired.
>
>
> > (4) Exceptions are Norwegian and Hebrew.
> >
> > no_NO -> nb_NO -> no -> nb
> > no_NO_NY -> nn_NO -> no_NO -> nn -> no
> > nn_NO -> no_NO_NY -> nn -> no
> > nb_NO -> no_NO -> nb -> no
> >
> > he_IL -> iw_IL -> he -> iw
> > iw_IL -> he_IL -> iw -> he
>
> I think we should distinguish Norwegian case from Hebrew case.  For 
> Hebrea, he is exactly equal to iw.  For Norwegian, strictly speaking, 
> no could be nb or nn.  I'm fine with the order of Hebrew above.  But I 
> think Norwegian case should be handled differently.
>
> -Yoshito



More information about the locale-enhancement-dev mailing list