[loc-en-dev] locale extensions

Yoshito Umaoka y.umaoka at gmail.com
Mon Feb 2 12:00:11 PST 2009


In the Locale Enhancement proposal, we're proposing several new locale 
elements - script and keywords.  In BCP47 language tag specification, 
available elements are - language, script, country, variant and 
extensions.  In the LDML specification, keywords are mapped to BCP47 
extensions.  For example -

Unicode Locale Identifier: es_ES at collation=traditional

is mapped to BCP47 language tag

es-ES-k-collatio-traditio

(Note: each segment in BCP47 language tag must be up to 8*ALPHA, so 
collation is truncated to "collatio", traditional is truncated to 
"traditio")

This assumes "k" is reserved for LDML keywords.  In future, other 
application may register a letter for their use.

One of the goals for this project is to support compatibility with 
BCP47.  In BCP47, the keyword is one of extension type.  From the design 
point of view, I propose to store all of BCP47 extensions in a single 
object - LocaleExtensions.  LocaleExtensions may contain keywords as 
well as other extensions.  And LocaleExtensions is a field in Locale class.

An instance of LocaleExtensions is created only via LocaleBuilder.  Here 
is my assumption about extensions -

For keywords

- Multiple keywords are allowed
- No two keywords have the same name 
(calendar=islamic;calendar=gregorian is invalid)
- toString converts keywords to @name1=value1;name2=value2, for example, 
@calendar=buddhist;@number=thai.

For other extensions

- Multiple extensions are allowed
- No two extensions have the exact same values
- toString converts these extensions to @letter=value.  For example, 
BCP47 en-US-a-yoshito will be converted to en_US at a=yoshito


The major difference between keywords and other extensions is -

- A keyword in BCP47 is always represented by -k-<name>-<value>.  In 
Locale#toString(), it is converted to <name>=<value>
- Other extensions in represented by 
-<letter>-<segment1>[-<segment2>...].  In Locale#toString(), it is 
converted to <letter>=<segment1>[-<segment2>...].

Do you see any problems with above assumptions/behavior?

-Yoshito



More information about the locale-enhancement-dev mailing list