[loc-en-dev] -u- extension API - necessary updates?

Doug Felt dougfelt at google.com
Wed Jun 30 15:19:16 PDT 2010


Comments inline

On Wed, Jun 30, 2010 at 1:30 PM, Yoshito Umaoka <y.umaoka at gmail.com> wrote:

> In the Locale Enhancement repository, we have following proposed APIs
> supporting -u- extension:
>
> In java.util.Locale
>
> public Set<String> getUnicodeLocaleKeys()
> public String getUnicodeLocaleType(String key)
>
> In java.util.Locale.Builder
>
> public Builder setUnicodeLocaleKeyword(String key, String type)
>
> Following Unicode locale extension are not in our scope last year.
>
> 1. type represented by multiple subtags
> 2. key without type
> 3. attribute
>
> For supporting 1, it looks we do not need any changes in the proposal.  A
> Unicode locale extension keyword may have type represented by multiple
> subtags. For example, "en-u-vt-0061-0065" is a valid example defined by the
> current LDML specification (See
> http://www.unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data).
>
> However, this does not mean that a keyword may have multiple types. In this
> example, 0061 and 0065 are not two different types - instead "0061-0065" is
> a type. Thus, getUnicodeLocaleType("vt") can simply return "0061-0065".  To
> set the type using Builder, setUnicodeLocaleKeyword("vt", "0061-0065") is
> sufficient.
>
> Agree.


> For supporting 2, there is a minor conflict with the current proposal.
> Assume we have a Locale represented by pseudo language tag "en-u-aa-bb-ccc".
> getUnicodeLocaleKeys() will return a set containing "aa" and "bb".
> getUnicodeLocaleType(String key) currently returns null when the input key
> is not available, and it returns non-empty type string when the key is
> available. We could use empty string "" to represent typeless keyword - that
> is, getUnicodeLocaleType("aa") to return "" in this example.
>
> Agree.


> The remaining question is the Builder API - setUnicodeLocaleKeyword(String
> key, String type). For now, empty string type indicate that the keyword
> itself is removed from the current state and null type throws NPE. We could
> change the API to use null for deletion instead of empty string. For
> example, if an Builder internally represents "en-u-aa-bb-ccc",
> setUnicodeLocaleKeyword("aa", null) will remove the typeless keyword "aa" -
> and internal representation will be changed to "en-u-bb-ccc" after the call.
> Also, setUnicodeLocaleKeyword("dd", "") will append a typeless keyword "dd"
> to the internal state (that is, "en-u-aa-bb-ccc-dd").
>
> Agree.


> Note that setXXX with empty string is removing a field from Builder by the
> current design. If we really want to change the semantics of empty string
> and null in  the API setUnicodeLocaleKeyword, the consistent policy should
> be applied to others (for example, setLanguage(null) to remove language
> field, instead of setLanguage("")).
>
> Disagree.  language/script/country/variant are always present, in the sense
that Locale.getL/S/C/V() never returns null, though they're not present in
the sense that there are always three separator characters.  I don't think
(any more) that we need to switch to using null in the setters for these.


> For supporting 3, we could treat an attribute as keyless keyword. But it
> makes getUnicodeLocaleKeys()/getUnicodeLocaleType(String key) a little bit
> awkward. Technically, we can still design them like that way
> (getUnicodeLocaleKeys() to include an empty string in the return set /
> getUnicodeLocaleType("") to return attribute subtags). I think adding extra
> API dedicated for attribute is cleaner.
>
> Agree, though I could be persuaded to use "" as the key and a single list
of attributes connected with hyphen as the 'type' for this key.


> public Set<String> getUnicodeLocaleAttributes()
>
> The same idea is applicable to Builder. The API dedicated for
> adding/removing Unicode locale attribute like below may be added:
>
> public Builder addUnicodeLocaleAttribute(String attribute)
> public Builder removeUnicodeLocaleAttribute(String attribute)
>
> Another possibility is to multiple attributes as a whole.
>
> public Builder setUnicodeLocaleAttribute(String attributes)
>
> For example, setting attribute "abc" and "def",
> setUnicodeLocaleAttributes("abc-def"). If we go for this approach, we do not
> need "remove" method. A tricky part is that the order of attributes does not
> matter. So, semantically, "abc-def" and "def-abc" are same. We do not want
> to introduce unnecessary variations, we should clearly state that the order
> of attributes are not preserved.
>

The order should be canonicalized if they're truly separate.  In this case
I'd go with add/remove.

The 'multiple types per key' are treated as single types with hyphen between
segments in this API, and they should be treated that way without
modification of the order of the components of the type.  If they are truly
multiple independent types for the same key, then we would need different
API for that too.

>
>
> Another question related to this - Set<String> vs. List<String>. Currently,
> getUnicodeLocaleKeys() returns Set<String> (actually, unmodifiable set).
> Semantically, the order of keywords does not matter. "u-ca-japanese-cu-jpy"
> is equivalent to "u-cu-jpy-ca-japanese". But we do use canonical order
> (alphabetical order of keys) when a Locale is converted to a language tag.
> From this point of view, List<String> might be more appropriate. This also
> applies to attributes. If we agree to support Unicode locale attributes with
> dedicated APIs like above, we should decide if the collection of attributes
> should be represented by Set or List.
>
>  I think if the keys are unique, we should use a Set, not a List.  If we
really want to also specify the order then we can define these to return a
SortedSet.


> Overall, supporting full specification of Unicode locale extension looks
> not too bad. Some may argue why we add APIs dedicated for things which are
> not yet used. We could defer adding "attribute" APIs - and attribute can be
> only set via Builder.setExtension('u', "...."). But necessary API addition
> is pretty minimal and with these APIs, the design look more complete.
> Therefore, if we are going to include any 'u' extension specific APIs, I
> want to do it completely including attribute support.
>
> -Yoshito
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100630/46799439/attachment.html 


More information about the locale-enhancement-dev mailing list