[loc-en-dev] -u- extension vs. other extensions
Yoshito Umaoka
y.umaoka at gmail.com
Wed Jun 30 10:51:28 PDT 2010
Hi all,
We agreed that we validate syntax of subtags, but do not validate code
itself in Java. In other words, proposed implementation won't invalidate
language subtag "xx" although the use of such code is not valid for BCP
47 language tag.
In BCP47, extension is defined as:
extension = singleton 1*("-" (2*8alphanum)
When the previous proposal was written last year, the Unicode locale
extension ('u' extension) only allows key/type subtag pairs. In BNF,
unicode_locale_extensions = sep "u" 1*(sep keyword)
keyword = key sep type
key = 2alphanum
type = 3*8alphanum
This require special syntax validation for 'u' extension. For example,
1, extension "a-abc-de" is syntactically valid
2. extension "u-abc-de" is syntactically invalid, because it does not
satisfy the requirement for 'u' extension (key(2alphanum) must be
followed right after singleton, key must have its type(3*8alphanum).
'u' extension was updated in the final spec as below:
unicode_locale_extensions = sep "u" (
1*(sep keyword)
/ 1*(sep attribute) *(sep
keyword)
)
keyword = key [sep type]
key = 2alphanum
type = 3*8alphanum * (sep 3*8alphanum)
attribute = 3*8alphanum
This change - 1. subtags in the form of 3*8alpha before the first
occurrence of key (2*alphanum) is interpreted as attributes, 2. key
subtag might not be followed by type, 3. type might be represented by
multiple subtags in the form of 3*8alphanum - actually eliminates the
special syntax requirements for 'u' extension. With the updated
specification, extension subtags satisfying the BCP47 extension syntax
are also satisfying the 'u' extension. For example, "u-abc-de" is
interpreted as attribute "abc" and typeless key "de". (Note that this
specific tag is illegal because "abc" is not a registered attribute and
"de" is not a known key value)
With this change, we do not need any special coding for handling 'u'
extension in the API - Builder#setExtension. This also means that we do
not need to add special implementation dedicated for 'u' extension even
we do not add the Unicode locale extension APIs (such as
Builder#setUnicodeLocaleKeyword).
-Yoshito
More information about the locale-enhancement-dev
mailing list