[loc-en-dev] -u- extension vs. other extensions

Yoshito Umaoka y.umaoka at gmail.com
Wed Jun 30 10:51:28 PDT 2010


Hi all,

We agreed that we validate syntax of subtags, but do not validate code 
itself in Java. In other words, proposed implementation won't invalidate 
language subtag "xx" although the use of such code is not valid for BCP 
47 language tag.

In BCP47, extension is defined as:

extension = singleton 1*("-" (2*8alphanum)

When the previous proposal was written last year, the Unicode locale 
extension ('u' extension) only allows key/type subtag pairs.  In BNF,

unicode_locale_extensions = sep "u" 1*(sep keyword)
keyword = key sep type
key = 2alphanum
type = 3*8alphanum

This require special syntax validation for 'u' extension.  For example,

1, extension "a-abc-de" is syntactically valid
2. extension "u-abc-de" is syntactically invalid, because it does not 
satisfy the requirement for 'u' extension (key(2alphanum) must be 
followed right after singleton, key must have its type(3*8alphanum).


'u' extension was updated in the final spec as below:

unicode_locale_extensions = sep "u" (
                                            1*(sep keyword)
                                            / 1*(sep attribute) *(sep 
keyword)
                                          )
keyword = key [sep type]
key = 2alphanum
type = 3*8alphanum * (sep 3*8alphanum)
attribute = 3*8alphanum


This change - 1. subtags in the form of 3*8alpha before the first 
occurrence of key (2*alphanum) is interpreted as attributes, 2. key 
subtag might not be followed by type, 3. type might be represented by 
multiple subtags in the form of 3*8alphanum - actually eliminates the 
special syntax requirements for 'u' extension.  With the updated 
specification, extension subtags satisfying the BCP47 extension syntax 
are also satisfying the 'u' extension.  For example, "u-abc-de" is 
interpreted as attribute "abc" and typeless key "de". (Note that this 
specific tag is illegal because "abc" is not a registered attribute and 
"de" is not a known key value)

With this change, we do not need any special coding for handling 'u' 
extension in the API - Builder#setExtension.  This also means that we do 
not need to add special implementation dedicated for 'u' extension even 
we do not add the Unicode locale extension APIs (such as 
Builder#setUnicodeLocaleKeyword).

-Yoshito


More information about the locale-enhancement-dev mailing list