[loc-en-dev] Equality of base locale and LocaleServiceProvider implementation

Doug Felt dougfelt at google.com
Fri Mar 13 16:32:12 PDT 2009


The way we've been thinking of it is that the extensions are an adjunct to
the fields of the locale.  Their order isn't important, so we canonicalize
their order (the same is true for ldml keywords within the ldml extension).

The model I have is as follows:

The user passes a Locale to a service, which (usually) looks up a bundle,
using the base fields language, script, region, and variant (including
subfields of variant).  Once a matching bundle is found, it's returned to
the service.  The appropriate extensions (in our particular case, the ldml
extension) are then interpreted by the service when accessing the bundle--
which extensions are used depend on the service.  There's no real
hierarchical ordering to the ldml extensions, they're just different
customizations that apply to whatever services care about them.  This is
different from language/script/region/variant where there is (generally) a
useful hierarchical order.

Part of the idea here is that by handling extensions separately, the service
provider doesn't have to list them-- and if there's no canonical order, it
needs to list all permutations.  For example, you might have keywords for
collation, calendar, and number that apply to several locales.  If there
were no canonical order you'd have to support all 16 permutations (zero,
one, two, or three keywords, in any order) for each such locale.  This is
rather a lot to list in getAvailableLocales.  And this doesn't even involve
the values of the keywords.

Even if there is a canonical order, then simple fallback doesn't work for
all services.  Say NumberFormat is passed a locale with the extension
"th-th-u-ca-foobar-nu-thai" (calendar = foobar, numbers = thai).  There's no
locale matching that  so this falls back to "th-th-u-ca-foobar", tossing the
numbers extension on the floor.  The problem appears when there is a bundle
"th-th-u-nu-thai", the (irrelevant, from NumberFormat's point of view)
request for the foobar calendar preempted the (relevant) request for thai
numbers, since it was canonalized to a position earlier in the language
tag.  The exact opposite could happen for DateFormat with calendar =
japanese and animal = foobar (as a hypothetical example).

This leads to each service having to manipulate the extensions before
looking up the bundle, to keep irrelevant extensions out of the way.
This means each service is potentially seeing an entirely different bundle
for the 'same' locale.  If data used by both services is different between
the two bundles, this might show up as an unwanted and unexpected side
effect.

Considerations like these led us to want to perform lookup using only the
base locale, and let the services make use of the extension data as they saw
fit based on that same bundle.

As for BCP47, it does have descriptions of how one might match against a
preferred language list, and also says that particular implementations can
perform lookup ignoring extensions.  But this language from the spec is
generally in the context of matching a preferred language list with
wildcards, and so it's not clear how or if this applies to examining a
partially ordered collection of locale resources.   I tend to think it does
not directly apply, and that the way we propose handling lookup is
conformant.  Mark of course may have a different opinion.

Doug

On Fri, Mar 13, 2009 at 1:11 PM, Naoto Sato <Naoto.Sato at sun.com> wrote:

> So are you specifically talking about LDML extensions?  In BCP 47, it's one
> of the subtags and the BCP does not give any special semantics to it
> (because it does not know for what it would be used).  So I thought the
> fallback would be:
>
> xx-yy-zz-ext
> xx-yy-zz
> xx-yy
> xx
>
> Thanks,
> Naoto
>
> Yoshito Umaoka wrote:
>
>> Naoto Sato wrote:
>>
>>> Umaoka-san,
>>>
>>> I don't think this is a compatibility issue, because the existing SPI
>>> implementations should still work compatible with the locales without
>>> extensions.  Possible issue would only arise with the new locales.
>>>
>>> BTW, current SPI implementation invocation already involves fallback
>>> itself. i.e., say the request locale is xx_YY_foo_bar, and one SPI provider
>>> implements xx_YY, then that provider's service is used.  So adding the
>>> extension fallback is not that ugly to me.
>>>
>> Yes, I know the current fallback strategy.
>> LDML extensions are designed for specifying optional behavior for a
>> locale.  Therefore, as we described in the very first proposal, extensions
>> are carried in each level.  More specifically, if a locale
>> xx-yy-zzzz-u-cu-usd is requested, below is the candidate list.
>>
>> xx-yy-zzzz-u-usd
>> xx-yy-u-usd
>> xx-u-usd
>>
>> If we need "extensionless" version inserted, it becomes
>>
>> xx-yy-zzzz-u-usd
>> xx-yy-zzzz
>> xx-yy-u-usd
>> xx-yy
>> xx-u-usd
>> xx
>>
>> Don't you think it's somewhat ugly?
>>
>> -Yoshito
>>
>
>
> --
> Naoto Sato
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20090313/3859dec3/attachment.html 


More information about the locale-enhancement-dev mailing list