[loc-en-dev] Comments on the locale enhancement proposal
Yoshito Umaoka
y.umaoka at gmail.com
Mon Feb 2 12:02:07 PST 2009
Masayoshi Okutsu wrote:
> On 1/21/2009 9:13 AM, Doug Felt wrote:
>>
>>
>> On Tue, Jan 20, 2009 at 4:04 PM, Masayoshi Okutsu
<Masayoshi.Okutsu at sun.com <mailto:Masayoshi.Okutsu at sun.com>> wrote:
>>
>> I think it's obvious that we can't support old data with new
>> identifiers perfectly, like zh_Hans_CN and zh_Hant_CN. When we
>> can't support both, I prefer to define a simple algorithm to
>> produce a look-up sequences with minimum exceptions. [...]
>>
>>
>> Can define one so we can understand what cases you intend to handle
and how?
>
> My preference is:
>
> (1) Treat language+script as a writingsystem which produces sequence
language_script -> language.
>
If we forget about legacy RB organization, it makes sense. However..
see my comments for the next item.
> (2) Apply the traditional sequence production rule to
writingsystem_country_variant
>
> writingsystem_country_variant
> writingsystem_country
> writingsystem
>
> each of which produces language_script -> language. Therefore, the
entire sequence is:
>
> language_script_country_variant
> language_country_variant
> language_script_country
> language_country
> language_script
> language
>
> For example, the sequence for zh_Hans_CN is:
>
> zh_Hans_CN
> zh_CN
> zh_Hans
> zh
>
> while the proposed one is:
>
> zh_Hans_CN
> zh_Hans
> zh_CN
> zh
>
zh_Hans_CN -> zh_CN -> zh_Hans -> zh may work OK for this specific case.
However, when a country has two commonly used script, this order may
not work as we expect. For example, let's see sr_Latn_RS. With you
suggestion, the order of look up will be -
sr_Latn_RS
sr_RS
sr_Latn
sr
In general, writing system is more important than country variant. For
Seribian used in Serbia, Cyrillic script is likely used as a default
script. Therefore, existing resource sr_RS likely has Cyrillic
contents. Some may want to add Latn variant along with sr_RS and tag it
sr_Latn_RS and add its parent sr_Latn, sr_Latin may be hidden by sr_RS
by this lookup order.
I think we're talking about which one matches better for sr_Latn_RS -
sr_RS or sr_Latin. And, in this case, I think sr_Latin is the answer.
> (3) If no script is given, the sequence is the same as the
traditional one.
>
> language_country_variant
> language_country
> language
>
This does not work well unless we supply a default script for languages
which has two or more script variants. For a request - zh_HK, this
suggestion produces following candidates -
zh_HK
zh
But, for people who want to distinguish scripts with the new framework
may have resources zh_Hant_HK, but not zh_HK. When a language has
commonly used multiple variants and one of them is dominant in a
country, the expanson - wrinting system (without script) -> writing
system with script is desired.
> (4) Exceptions are Norwegian and Hebrew.
>
> no_NO -> nb_NO -> no -> nb
> no_NO_NY -> nn_NO -> no_NO -> nn -> no
> nn_NO -> no_NO_NY -> nn -> no
> nb_NO -> no_NO -> nb -> no
>
> he_IL -> iw_IL -> he -> iw
> iw_IL -> he_IL -> iw -> he
I think we should distinguish Norwegian case from Hebrew case. For
Hebrea, he is exactly equal to iw. For Norwegian, strictly speaking, no
could be nb or nn. I'm fine with the order of Hebrew above. But I
think Norwegian case should be handled differently.
-Yoshito
More information about the locale-enhancement-dev
mailing list