[loc-en-dev] Equality of base locale and LocaleServiceProvider implementation

Doug Felt dougfelt at google.com
Mon Mar 16 12:43:13 PDT 2009


Well, perhaps Mark can clarify this passage for us  Mark?

As you cite:

"However, an implementation MAY remove these [extensions and unrecognized
private-use subtags] from ranges prior to performing the lookup, provided
the implementation also removes them from the tags being compared"

This seem to me to allow us to compare only the initial fields when doing
lookup, while still using the full locale after lookup has been completed.

Naoto, how would you propose dealing with the problems I cited?

Doug

On Mon, Mar 16, 2009 at 11:28 AM, Naoto Sato <Naoto.Sato at sun.com> wrote:

> Well, I understand the rationale for the LDML keywords.  However on the
> other hand, BCP 47 specifies the look up fallback as I described before
> (RFC4647, "3.4. Lookup").  Regarding the extensions, it reads:
>
> ---
>
>  Extensions and unrecognized private-use subtags might be unrelated to
>  a particular application of lookup.  Since these subtags come at the
>  end of the subtag sequence, they are removed first during the
>  fallback process and usually pose no barrier to interoperability.
>  However, an implementation MAY remove these from ranges prior to
>  performing the lookup (provided the implementation also removes them
>  from the tags being compared).  Such modification is internal to the
>  implementation and applications, protocols, or specifications SHOULD
>  NOT remove or modify subtags in content that they return or forward,
>  because this removes information that can be used elsewhere.
>
> ---
>
> So this expects that the extensions should first be removed when fallback
> happens.  I am not sure whether removing the left subtags (variant, region,
> etc.) and appending the extension would conform to this specification.  At
> least I think that the current proposed fallback could confuse some of the
> developers.
>
> Naoto
>
> Doug Felt wrote:
>
>> The way we've been thinking of it is that the extensions are an adjunct to
>> the fields of the locale.  Their order isn't important, so we canonicalize
>> their order (the same is true for ldml keywords within the ldml extension).
>>
>> The model I have is as follows:
>>
>> The user passes a Locale to a service, which (usually) looks up a bundle,
>> using the base fields language, script, region, and variant (including
>> subfields of variant).  Once a matching bundle is found, it's returned to
>> the service.  The appropriate extensions (in our particular case, the ldml
>> extension) are then interpreted by the service when accessing the bundle--
>> which extensions are used depend on the service.  There's no real
>> hierarchical ordering to the ldml extensions, they're just different
>> customizations that apply to whatever services care about them.  This is
>> different from language/script/region/variant where there is (generally) a
>> useful hierarchical order.
>>
>> Part of the idea here is that by handling extensions separately, the
>> service provider doesn't have to list them-- and if there's no canonical
>> order, it needs to list all permutations.  For example, you might have
>> keywords for collation, calendar, and number that apply to several locales.
>>  If there were no canonical order you'd have to support all 16 permutations
>> (zero, one, two, or three keywords, in any order) for each such locale.
>>  This is rather a lot to list in getAvailableLocales.  And this doesn't even
>> involve the values of the keywords.
>>
>> Even if there is a canonical order, then simple fallback doesn't work for
>> all services.  Say NumberFormat is passed a locale with the extension
>> "th-th-u-ca-foobar-nu-thai" (calendar = foobar, numbers = thai).  There's no
>> locale matching that  so this falls back to "th-th-u-ca-foobar", tossing the
>> numbers extension on the floor.  The problem appears when there is a bundle
>> "th-th-u-nu-thai", the (irrelevant, from NumberFormat's point of view)
>> request for the foobar calendar preempted the (relevant) request for thai
>> numbers, since it was canonalized to a position earlier in the language tag.
>>  The exact opposite could happen for DateFormat with calendar = japanese and
>> animal = foobar (as a hypothetical example).
>>
>> This leads to each service having to manipulate the extensions before
>> looking up the bundle, to keep irrelevant extensions out of the way.
>> This means each service is potentially seeing an entirely different bundle
>> for the 'same' locale.  If data used by both services is different between
>> the two bundles, this might show up as an unwanted and unexpected side
>> effect.
>>
>> Considerations like these led us to want to perform lookup using only the
>> base locale, and let the services make use of the extension data as they saw
>> fit based on that same bundle.
>>
>> As for BCP47, it does have descriptions of how one might match against a
>> preferred language list, and also says that particular implementations can
>> perform lookup ignoring extensions.  But this language from the spec is
>> generally in the context of matching a preferred language list with
>> wildcards, and so it's not clear how or if this applies to examining a
>> partially ordered collection of locale resources.   I tend to think it does
>> not directly apply, and that the way we propose handling lookup is
>> conformant.  Mark of course may have a different opinion.
>>
>> Doug
>>
>> On Fri, Mar 13, 2009 at 1:11 PM, Naoto Sato <Naoto.Sato at sun.com <mailto:
>> Naoto.Sato at sun.com>> wrote:
>>
>>    So are you specifically talking about LDML extensions?  In BCP 47,
>>    it's one of the subtags and the BCP does not give any special
>>    semantics to it (because it does not know for what it would be
>>    used).  So I thought the fallback would be:
>>
>>    xx-yy-zz-ext
>>    xx-yy-zz
>>    xx-yy
>>    xx
>>
>>
>>    Thanks,
>>    Naoto
>>
>>    Yoshito Umaoka wrote:
>>
>>        Naoto Sato wrote:
>>
>>            Umaoka-san,
>>
>>            I don't think this is a compatibility issue, because the
>>            existing SPI implementations should still work compatible
>>            with the locales without extensions.  Possible issue would
>>            only arise with the new locales.
>>
>>            BTW, current SPI implementation invocation already
>>            involves fallback itself. i.e., say the request locale is
>>            xx_YY_foo_bar, and one SPI provider implements xx_YY, then
>>            that provider's service is used.  So adding the extension
>>            fallback is not that ugly to me.
>>
>>        Yes, I know the current fallback strategy.
>>        LDML extensions are designed for specifying optional behavior
>>        for a locale.  Therefore, as we described in the very first
>>        proposal, extensions are carried in each level.  More
>>        specifically, if a locale xx-yy-zzzz-u-cu-usd is requested,
>>        below is the candidate list.
>>
>>        xx-yy-zzzz-u-usd
>>        xx-yy-u-usd
>>        xx-u-usd
>>
>>        If we need "extensionless" version inserted, it becomes
>>
>>        xx-yy-zzzz-u-usd
>>        xx-yy-zzzz
>>        xx-yy-u-usd
>>        xx-yy
>>        xx-u-usd
>>        xx
>>
>>        Don't you think it's somewhat ugly?
>>
>>        -Yoshito
>>
>>
>>
>>    --    Naoto Sato
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20090316/89314675/attachment.html 


More information about the locale-enhancement-dev mailing list