[loc-en-dev] Comments on the locale enhancement proposal

Tue Jan 20 12:26:57 PST 2009

Comments inline.

On Tue, Jan 20, 2009 at 12:25 AM, Masayoshi Okutsu <Masayoshi.Okutsu at sun.com
> wrote:

>  Folks,
>
> Finally I've spent some time to review the locale enhancement proposal.
> Here are my comments on the proposal.
>
> - Compatibility Strategy
>
> I think we should have a cleaner and simpler strategy to support the
> enhancements while keeping compatibility so that it's easier for non-i18n
> people how to use new staff for the new features.
>
> My proposal is:
>
>    - the existing interfaces should be kept fully compatible in both
>    binaries and source code.
>
> Can you define more precisely what you mean?  Do you mean no API additions
to the Locale class?

>
>    -
>    - the enhancements should be available only through new interfaces
>    given by the builder pattern design which is flexible and extensible.
>    - the factory method for creating a Locale should be eliminated because
>    having both factory methods and a builder is redundant.
>
> Depends on the common use case.

>
>    -
>    - enum IDType should be eliminate and add methods with ID type names
>    (e.g., toBCP47String() instead of toString(IDType.BCP47))
>
> I lean this way too.

>
>    -
>    - support a method which performs conversions between old and new
>    Locales. (best effort basis) e.g., zh_TW (old) -> zh_Hant_TW (new), zh_Hant
>    (new) -> zh_TW (old)
>
> - Development
>
> We should define a minimal set of changes that should go to JDK 7. I'm
> considering support of ISO 639 (new 2-letter and part 2-3) and script as the
> minimal set. We will be able to add more if the schedule allows.
>

I think that's always been the assumption.

>
>
> - 6.1 Script
>
> The 4-arg Locale constructor examples in the table should be removed.
>

Yes.

>
>
> new Locale("", "", "US").toString() should return "" for compatibility,
> assuming that new Locale("", "", US") is a typo of new Locale("", "", "US").
>
> - 7. Locale Resource/Service Lookup
>

I think of there being two situations we might want to handle automatically:

1) old data, new identifiers
2) new data, old identifiers

The proposal tries to deal with both, by defining what we think the old data
and old identifiers mean in the new system.  So for example we assume that
old data tagged as 'zh_TW' is in fact in traditional Han and should be
presented as 'zh_Hant_TW' data; similarly that a request for 'zh_TW' data is
implicitly a request for traditional Han (as well as Taiwan) data since that
is what (we assume) it returns for existing appliations/jvms, and so should
be handled as a request for 'zh_Hant_TW' data.

We don't necessarily have to handle both situations (or either).

If we don't redefine old data as being locatable under the new identifiers,
this would be inconvenient for people shipping existing applications that
users might run on JVMs handling new locale identifiers-- they'd have to
tell such users to ensure, for example, that their system locale was set to
'zh_TW' and not 'zh_Hant_TW', otherwise an unmodified fallback system would
miss the old 'zh_TW' data.

If we don't allow old identifiers to locate new data, we'll have to tell
people to make their data available under both the new and old identifiers.
It's likely many clients will be dealing with old identifiers for some time
(e.g. handling an HTTP request that specifies a 'zh_TW' locale).  We could
tell them to copy their data, use an aliasing mechanism under the covers (as
ICU does), or convert the identifiers on the boundaries themselves
(converting 'zh_TW' to 'zh_Hant_TW' at the point they receive the HTTP
request), but they might not be happy with this.

I think the approach here (Yoshito, tell me if I'm wrong) is to specify
explicitly the remappings necessary to support legacy data.

> Should the script part be treated as additional information to the language
> to disambiguate the writing system? So the zh_Hans_CN look-up sequence
> should be zh_Hans_CN -> zh_CN -> zh_Hans -> zh? (zh_CN and zh shouldn't
> exist to avoid conflicts with zh_Hant_CN and other zh_* sequences, though.)
>

The zh_CN data need not exist if there is zh_Hans_CN data, but it might
exist as legacy, so it needs to be part of the search.

> Should country really be added to the lang_script? (e.g., zh_Hant ->
> zh_Hant_TW)
>

This does seem odd.  I guess the issue addressed in section 7.3 is that some
clients might have followed the suggestion to make data available under
'zh_Hant_TW', but did not provide any under 'zh_Hant', even though 'zh_Hant'
was the intended semantics of the data.  I think it might be better to not
support this, unless there are clients who are in this situation and cannot
adapt.

>
> Should script be added to language? (e.g., pa -> pa_Guru)
>

I wonder about this too.  Perhaps Yoshito can describe a case in which this
behavior is needed.  I didn't notice a motivating case.

>
> The *_NO look-up sequences should be consistent with the no and nb ones. So
> the sequences should be:
>
> no_NO -> nb_NO -> no -> nb
> no_NO_NY -> nn_NO -> no_NO -> nn -> no
> nn_NO -> no_NO_NY -> nn -> no
> nb_NO -> no_NO -> nb -> no
>

I don't know enough about this.

>
> - 5.3. Locale Builder
> - 8. Summary Proposed API Changes
> - LocaleBuilder API
>
> The builder class should be a nested class of Locale. (LocaleBuilder should
> be Locale.Builder.)
>

I agree.

>
> The builder methods should return the builder object so that it can support
> a fluent interface.
>

I agree, this is typical for builders.

>
> I prefer build() instead of getLocale().
>

Me too.

>
>
>
> That's all for today.
>
> Thanks,
> Masayoshi
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20090120/8b4b401c/attachment.html