[loc-en-dev] Builder / Locale API detailed review
Yoshito Umaoka
y.umaoka at gmail.com
Tue Jul 6 14:55:24 PDT 2010
Hello,
In the project conference call on last Friday, we reviewed Builder /
Locale API behavior. I updated the two document to reflect our conclusions:
http://sites.google.com/site/openjdklocale/design-notes/builder
http://sites.google.com/site/openjdklocale/design-notes/locale
Sections in orange background are either not covered in the call or
things we did not conclude.
I think there are some remaining design questions
1) new Locale("no", "NO", "NY").toLanguageTag()
There are two possible options
a. "nn-NO"
b. "no-NO-x-jvariant-NY"
Semantically, "nn-NO" is preferred for other applications. However, it
does not round trip well (or should it?)
"no-NO-x-jvariant-NY" preserves entire fields as is, but it depends
private use to indicate it's Nynorsk.
I prefer "nn-NO". Although forLanguageTag("nn-NO") will return a Locale
nn_NO, we also propose both nn_NO and no_NO_NY are included in
ResourceBundle / LocaleServiceProvider look up candidate list, so nn_NO
matches resource no_NO_NY or vise versa.
2) Locale.forLanguageTag("en-x-jvariant-WIN") vs. new
Builder().setLanguage("en").setExtension("x-jvariant-WIN").build()
In the last call, we thought private use "jvariant-*" should be treated
as Java variant. I think above two should create the same Locale -
en__WIN. Are there any objections with this behavior?
If we decide to parse private use "jvariant-*" into variant, we need to
resolve one more issue.
new
Builder().setLanguage("en").setVariant("Windows").setExtension("x-jvariant-XP").build()
We have two choices. setExtension above to set "XP" to variant, or
append "XP" to the variant. So the result Locale would be either:
a) en__XP
b) en__Windows_XP
I prefer option b) that means, "append". This behavior is somewhat
consistent with toLanguageTag() - new Locale("en", "",
"Windows_XP").toLanguageTag() returns "en-Windows-x-jvariant-XP".
3) Locale.forLanguageTag("en-a-abc-a-def-x-123") and new
Builder().setLanguageTag("en-a-abc-a-def-x-123")
When two extensions with a same key are in a language tag, how should we
interpret this? The same question also applies to duplicated Unicode
attributes / multiple Unicode locale keyword with a same key.
We should decide if we simply treat this as an error (forLanguage tag
truncate anything second "a" / setLanguageTag to throw an exception and
set error index at the second "a"). Alternatively, we just ignore the
second one ("a-def") and continue parsing.
We'll make the final decision for them in the next call (next Monday).
I'll write design note explaining detailed behavior for the rest of area
by the next call and we'll go through the document.
Thanks,
Yoshito
More information about the locale-enhancement-dev
mailing list