[loc-en-dev] Comments on the locale enhancement proposal

Tue Jan 20 13:46:46 PST 2009

I added my comments

On Tue, Jan 20, 2009 at 4:26 PM, Doug Felt <dougfelt at google.com> wrote:

> Comments inline.
>
> On Tue, Jan 20, 2009 at 12:25 AM, Masayoshi Okutsu <
> Masayoshi.Okutsu at sun.com> wrote:
>
>>  Folks,
>>
>> Finally I've spent some time to review the locale enhancement proposal.
>> Here are my comments on the proposal.
>>
>> - Compatibility Strategy
>>
>> I think we should have a cleaner and simpler strategy to support the
>> enhancements while keeping compatibility so that it's easier for non-i18n
>> people how to use new staff for the new features.
>>
>> My proposal is:
>>
>>    - the existing interfaces should be kept fully compatible in both
>>    binaries and source code.
>>
>> Can you define more precisely what you mean?  Do you mean no API additions
> to the Locale class?
>

I have no doubt about keeping the existing interfaces fully compatible.  And
it should be binary compatible (I meant, the existing code does not require
to recompile, that is, exact same API signature for the existing APIs).  But
I could not understand "source code".  Can you explain?  I guess internal
implementation changes should not affect to the old Java consumers.

>
>
>>    -
>>    - the enhancements should be available only through new interfaces
>>    given by the builder pattern design which is flexible and extensible.
>>    - the factory method for creating a Locale should be eliminated
>>    because having both factory methods and a builder is redundant.
>>
>> Depends on the common use case.
>
>>
>>    -
>>    - enum IDType should be eliminate and add methods with ID type names
>>    (e.g., toBCP47String() instead of toString(IDType.BCP47))
>>
>> I lean this way too.
>

I have no problems with this.

>
>
>>    -
>>    - support a method which performs conversions between old and new
>>    Locales. (best effort basis) e.g., zh_TW (old) -> zh_Hant_TW (new), zh_Hant
>>    (new) -> zh_TW (old)
>>
>> - Development
>>
>> We should define a minimal set of changes that should go to JDK 7. I'm
>> considering support of ISO 639 (new 2-letter and part 2-3) and script as the
>> minimal set. We will be able to add more if the schedule allows.
>>
>
> I think that's always been the assumption.
>

I'm not sure whether we can narrow the scope without introducing
inconsistency (or future incompatible changes).  I think we need further
discussion for defining the scope of JDK 7.

>
>
>>
>> - 6.1 Script
>>
>> The 4-arg Locale constructor examples in the table should be removed.
>>
>
> Yes.
>

I assume this means that we will encourage Java user to move to
factory/builder.

>
>
>
>>
>>
>> new Locale("", "", "US").toString() should return "" for compatibility,
>> assuming that new Locale("", "", US") is a typo of new Locale("", "", "US").
>>
>> - 7. Locale Resource/Service Lookup
>>
>
> I think of there being two situations we might want to handle
> automatically:
>
> 1) old data, new identifiers
> 2) new data, old identifiers
>
> The proposal tries to deal with both, by defining what we think the old
> data and old identifiers mean in the new system.  So for example we assume
> that old data tagged as 'zh_TW' is in fact in traditional Han and should be
> presented as 'zh_Hant_TW' data; similarly that a request for 'zh_TW' data is
> implicitly a request for traditional Han (as well as Taiwan) data since that
> is what (we assume) it returns for existing appliations/jvms, and so should
> be handled as a request for 'zh_Hant_TW' data.
>
> We don't necessarily have to handle both situations (or either).
>
> If we don't redefine old data as being locatable under the new identifiers,
> this would be inconvenient for people shipping existing applications that
> users might run on JVMs handling new locale identifiers-- they'd have to
> tell such users to ensure, for example, that their system locale was set to
> 'zh_TW' and not 'zh_Hant_TW', otherwise an unmodified fallback system would
> miss the old 'zh_TW' data.
>
> If we don't allow old identifiers to locate new data, we'll have to tell
> people to make their data available under both the new and old identifiers.
> It's likely many clients will be dealing with old identifiers for some time
> (e.g. handling an HTTP request that specifies a 'zh_TW' locale).  We could
> tell them to copy their data, use an aliasing mechanism under the covers (as
> ICU does), or convert the identifiers on the boundaries themselves
> (converting 'zh_TW' to 'zh_Hant_TW' at the point they receive the HTTP
> request), but they might not be happy with this.
>
> I think the approach here (Yoshito, tell me if I'm wrong) is to specify
> explicitly the remappings necessary to support legacy data.
>

Correct.  We'd like to give Java users options - keep the existing
resources/codes unchanged, while adopting new (we assume - better) locales
without any problems.

>
>
>
>> Should the script part be treated as additional information to the
>> language to disambiguate the writing system? So the zh_Hans_CN look-up
>> sequence should be zh_Hans_CN -> zh_CN -> zh_Hans -> zh? (zh_CN and zh
>> shouldn't exist to avoid conflicts with zh_Hant_CN and other zh_* sequences,
>> though.)
>>
>
> The zh_CN data need not exist if there is zh_Hans_CN data, but it might
> exist as legacy, so it needs to be part of the search.
>
>
>> Should country really be added to the lang_script? (e.g., zh_Hant ->
>> zh_Hant_TW)
>>
>
> This does seem odd.  I guess the issue addressed in section 7.3 is that
> some clients might have followed the suggestion to make data available under
> 'zh_Hant_TW', but did not provide any under 'zh_Hant', even though 'zh_Hant'
> was the intended semantics of the data.  I think it might be better to not
> support this, unless there are clients who are in this situation and cannot
> adapt.
>
>

Obviously, many of existing Java users tag "zh_TW" for Traditional Chinese
language contents, no matter it is actually for TW.  At the same time, some
others uses "zh_TW" specific for TW.  When these two use cases are mixed
with new locale ID with script, I think zh_Hant -> zh_Hant_TW would be
safer.  But, I agree that we should review whether we really need this or
not.

>
>> Should script be added to language? (e.g., pa -> pa_Guru)
>>
>
> I wonder about this too.  Perhaps Yoshito can describe a case in which this
> behavior is needed.  I didn't notice a motivating case.
>
>

When you do not want to fallback one script to another, this makes sense.
For example, some users may want to supply resources - pa_Guru and pa_Arab,
but no pa.  Anyway, I think Mark has some thoughts on this.

>
>
>> The *_NO look-up sequences should be consistent with the no and nb ones.
>> So the sequences should be:
>>
>> no_NO -> nb_NO -> no -> nb
>> no_NO_NY -> nn_NO -> no_NO -> nn -> no
>> nn_NO -> no_NO_NY -> nn -> no
>> nb_NO -> no_NO -> nb -> no
>>
>
> I don't know enough about this.
>

Existing Java documentation says no_NO is Bokmal (Norway) / no_NO_NY is
Nynorsk (Norway).  This design starts with mapping to the most preferrable
form - no_NO -> nb_NO / no_NO_NY -> nn_NO, then consuming the fallback chain
from there.  Then, visit the chain from no_*.  The philosophy behind this is
- no is not exactly equivalent nb / nn.

1. no_NO = nb_NO (by the existing Java's definition)
2. no_NO_NY = nn_NO (by the existing Java's definition)

I assume these are "aliases" in the Java world (but not true in BCP47 /
Unicode CLDR).

Thus, I prefer

no_NO (alias of nb_NO) -> nb_NO -> nb (-> (macro language mapping) no_NO) ->
no
no_NO_NY (alias of nn_NO) -> nn_NO -> nn -> (macro language mapping) no_NO
-> no

Anyway, we should discuss about this too.

>
>
>
>>
>> - 5.3. Locale Builder
>> - 8. Summary Proposed API Changes
>> - LocaleBuilder API
>>
>> The builder class should be a nested class of Locale. (LocaleBuilder
>> should be Locale.Builder.)
>>
>
> I agree.
>

I agree

>
>
>> The builder methods should return the builder object so that it can
>> support a fluent interface.
>>
>
> I agree, this is typical for builders.
>

I agree.

>
>
>> I prefer build() instead of getLocale().
>>
>
> Me too.
>

No objection.  I do not care about the method name much.

>
>
>>
>>
>> That's all for today.
>>
>> Thanks,
>> Masayoshi
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20090120/4e5eeab1/attachment.html