[loc-en-dev] Equality of base locale and LocaleServiceProvider implementation

Doug Felt dougfelt at google.com
Tue Mar 17 11:14:02 PDT 2009


I don't understand this.

If providers only advertise their base locales, why is the extension
involved in lookup at all?

Doug

On Tue, Mar 17, 2009 at 11:07 AM, Naoto Sato <Naoto.Sato at sun.com> wrote:

> If handling the extension is special and it's our discretion how to deal
> with it, then I would think the following fallback is the most compatible
> (for the reason Yoshito mentioned) and still meets the requirement from -u
> extension for LDML keywords, assuming that the providers only advertise
> their base locales.
>
> xx-yy-zzzz-ext
> xx-yy-zzzz
> xx-yy-ext
> xx-yy
> xx-ext
> xx
>
> Thanks,
> Naoto
>
> Doug Felt wrote:
>
>> Well, perhaps Mark can clarify this passage for us  Mark?
>>
>> As you cite:
>>
>> "However, an implementation MAY remove these [extensions and unrecognized
>> private-use subtags] from ranges prior to performing the lookup, provided
>> the implementation also removes them from the tags being compared"
>>
>> This seem to me to allow us to compare only the initial fields when doing
>> lookup, while still using the full locale after lookup has been completed.
>>
>> Naoto, how would you propose dealing with the problems I cited?
>>
>> Doug
>>
>> On Mon, Mar 16, 2009 at 11:28 AM, Naoto Sato <Naoto.Sato at sun.com <mailto:
>> Naoto.Sato at sun.com>> wrote:
>>
>>    Well, I understand the rationale for the LDML keywords.  However
>>    on the other hand, BCP 47 specifies the look up fallback as I
>>    described before (RFC4647, "3.4. Lookup").  Regarding the
>>    extensions, it reads:
>>
>>    ---
>>
>>     Extensions and unrecognized private-use subtags might be unrelated to
>>     a particular application of lookup.  Since these subtags come at the
>>     end of the subtag sequence, they are removed first during the
>>     fallback process and usually pose no barrier to interoperability.
>>     However, an implementation MAY remove these from ranges prior to
>>     performing the lookup (provided the implementation also removes them
>>     from the tags being compared).  Such modification is internal to the
>>     implementation and applications, protocols, or specifications SHOULD
>>     NOT remove or modify subtags in content that they return or forward,
>>     because this removes information that can be used elsewhere.
>>
>>    ---
>>
>>    So this expects that the extensions should first be removed when
>>    fallback happens.  I am not sure whether removing the left subtags
>>    (variant, region, etc.) and appending the extension would conform
>>    to this specification.  At least I think that the current proposed
>>    fallback could confuse some of the developers.
>>
>>    Naoto
>>
>>    Doug Felt wrote:
>>
>>        The way we've been thinking of it is that the extensions are
>>        an adjunct to the fields of the locale.  Their order isn't
>>        important, so we canonicalize their order (the same is true
>>        for ldml keywords within the ldml extension).
>>
>>        The model I have is as follows:
>>
>>        The user passes a Locale to a service, which (usually) looks
>>        up a bundle, using the base fields language, script, region,
>>        and variant (including subfields of variant).  Once a matching
>>        bundle is found, it's returned to the service.  The
>>        appropriate extensions (in our particular case, the ldml
>>        extension) are then interpreted by the service when accessing
>>        the bundle-- which extensions are used depend on the service.
>>         There's no real hierarchical ordering to the ldml extensions,
>>        they're just different customizations that apply to whatever
>>        services care about them.  This is different from
>>        language/script/region/variant where there is (generally) a
>>        useful hierarchical order.
>>
>>        Part of the idea here is that by handling extensions
>>        separately, the service provider doesn't have to list them--
>>        and if there's no canonical order, it needs to list all
>>        permutations.  For example, you might have keywords for
>>        collation, calendar, and number that apply to several locales.
>>         If there were no canonical order you'd have to support all 16
>>        permutations (zero, one, two, or three keywords, in any order)
>>        for each such locale.  This is rather a lot to list in
>>        getAvailableLocales.  And this doesn't even involve the values
>>        of the keywords.
>>
>>        Even if there is a canonical order, then simple fallback
>>        doesn't work for all services.  Say NumberFormat is passed a
>>        locale with the extension "th-th-u-ca-foobar-nu-thai"
>>        (calendar = foobar, numbers = thai).  There's no locale
>>        matching that  so this falls back to "th-th-u-ca-foobar",
>>        tossing the numbers extension on the floor.  The problem
>>        appears when there is a bundle "th-th-u-nu-thai", the
>>        (irrelevant, from NumberFormat's point of view) request for
>>        the foobar calendar preempted the (relevant) request for thai
>>        numbers, since it was canonalized to a position earlier in the
>>        language tag.  The exact opposite could happen for DateFormat
>>        with calendar = japanese and animal = foobar (as a
>>        hypothetical example).
>>
>>        This leads to each service having to manipulate the extensions
>>        before looking up the bundle, to keep irrelevant extensions
>>        out of the way.
>>        This means each service is potentially seeing an entirely
>>        different bundle for the 'same' locale.  If data used by both
>>        services is different between the two bundles, this might show
>>        up as an unwanted and unexpected side effect.
>>
>>        Considerations like these led us to want to perform lookup
>>        using only the base locale, and let the services make use of
>>        the extension data as they saw fit based on that same bundle.
>>
>>        As for BCP47, it does have descriptions of how one might match
>>        against a preferred language list, and also says that
>>        particular implementations can perform lookup ignoring
>>        extensions.  But this language from the spec is generally in
>>        the context of matching a preferred language list with
>>        wildcards, and so it's not clear how or if this applies to
>>        examining a partially ordered collection of locale resources.
>>          I tend to think it does not directly apply, and that the way
>>        we propose handling lookup is conformant.  Mark of course may
>>        have a different opinion.
>>
>>        Doug
>>
>>        On Fri, Mar 13, 2009 at 1:11 PM, Naoto Sato
>>        <Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>
>>        <mailto:Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>>> wrote:
>>
>>           So are you specifically talking about LDML extensions?  In
>>        BCP 47,
>>           it's one of the subtags and the BCP does not give any special
>>           semantics to it (because it does not know for what it would be
>>           used).  So I thought the fallback would be:
>>
>>           xx-yy-zz-ext
>>           xx-yy-zz
>>           xx-yy
>>           xx
>>
>>
>>           Thanks,
>>           Naoto
>>
>>           Yoshito Umaoka wrote:
>>
>>               Naoto Sato wrote:
>>
>>                   Umaoka-san,
>>
>>                   I don't think this is a compatibility issue,
>>        because the
>>                   existing SPI implementations should still work
>>        compatible
>>                   with the locales without extensions.  Possible
>>        issue would
>>                   only arise with the new locales.
>>
>>                   BTW, current SPI implementation invocation already
>>                   involves fallback itself. i.e., say the request
>>        locale is
>>                   xx_YY_foo_bar, and one SPI provider implements
>>        xx_YY, then
>>                   that provider's service is used.  So adding the
>>        extension
>>                   fallback is not that ugly to me.
>>
>>               Yes, I know the current fallback strategy.
>>               LDML extensions are designed for specifying optional
>>        behavior
>>               for a locale.  Therefore, as we described in the very first
>>               proposal, extensions are carried in each level.  More
>>               specifically, if a locale xx-yy-zzzz-u-cu-usd is requested,
>>               below is the candidate list.
>>
>>               xx-yy-zzzz-u-usd
>>               xx-yy-u-usd
>>               xx-u-usd
>>
>>               If we need "extensionless" version inserted, it becomes
>>
>>               xx-yy-zzzz-u-usd
>>               xx-yy-zzzz
>>               xx-yy-u-usd
>>               xx-yy
>>               xx-u-usd
>>               xx
>>
>>               Don't you think it's somewhat ugly?
>>
>>               -Yoshito
>>
>>
>>
>>           --    Naoto Sato
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20090317/10b024c3/attachment.html 


More information about the locale-enhancement-dev mailing list