[loc-en-dev] Equality of base locale and LocaleServiceProvider implementation

Naoto Sato Naoto.Sato at Sun.COM
Tue Mar 17 11:07:11 PDT 2009


If handling the extension is special and it's our discretion how to deal 
with it, then I would think the following fallback is the most 
compatible (for the reason Yoshito mentioned) and still meets the 
requirement from -u extension for LDML keywords, assuming that the 
providers only advertise their base locales.

xx-yy-zzzz-ext
xx-yy-zzzz
xx-yy-ext
xx-yy
xx-ext
xx

Thanks,
Naoto

Doug Felt wrote:
> Well, perhaps Mark can clarify this passage for us  Mark?
>
> As you cite:
>
> "However, an implementation MAY remove these [extensions and 
> unrecognized private-use subtags] from ranges prior to performing the 
> lookup, provided the implementation also removes them from the tags 
> being compared"
>
> This seem to me to allow us to compare only the initial fields when 
> doing lookup, while still using the full locale after lookup has been 
> completed.
>
> Naoto, how would you propose dealing with the problems I cited?
>
> Doug
>
> On Mon, Mar 16, 2009 at 11:28 AM, Naoto Sato <Naoto.Sato at sun.com 
> <mailto:Naoto.Sato at sun.com>> wrote:
>
>     Well, I understand the rationale for the LDML keywords.  However
>     on the other hand, BCP 47 specifies the look up fallback as I
>     described before (RFC4647, "3.4. Lookup").  Regarding the
>     extensions, it reads:
>
>     ---
>
>      Extensions and unrecognized private-use subtags might be unrelated to
>      a particular application of lookup.  Since these subtags come at the
>      end of the subtag sequence, they are removed first during the
>      fallback process and usually pose no barrier to interoperability.
>      However, an implementation MAY remove these from ranges prior to
>      performing the lookup (provided the implementation also removes them
>      from the tags being compared).  Such modification is internal to the
>      implementation and applications, protocols, or specifications SHOULD
>      NOT remove or modify subtags in content that they return or forward,
>      because this removes information that can be used elsewhere.
>
>     ---
>
>     So this expects that the extensions should first be removed when
>     fallback happens.  I am not sure whether removing the left subtags
>     (variant, region, etc.) and appending the extension would conform
>     to this specification.  At least I think that the current proposed
>     fallback could confuse some of the developers.
>
>     Naoto
>
>     Doug Felt wrote:
>
>         The way we've been thinking of it is that the extensions are
>         an adjunct to the fields of the locale.  Their order isn't
>         important, so we canonicalize their order (the same is true
>         for ldml keywords within the ldml extension).
>
>         The model I have is as follows:
>
>         The user passes a Locale to a service, which (usually) looks
>         up a bundle, using the base fields language, script, region,
>         and variant (including subfields of variant).  Once a matching
>         bundle is found, it's returned to the service.  The
>         appropriate extensions (in our particular case, the ldml
>         extension) are then interpreted by the service when accessing
>         the bundle-- which extensions are used depend on the service.
>          There's no real hierarchical ordering to the ldml extensions,
>         they're just different customizations that apply to whatever
>         services care about them.  This is different from
>         language/script/region/variant where there is (generally) a
>         useful hierarchical order.
>
>         Part of the idea here is that by handling extensions
>         separately, the service provider doesn't have to list them--
>         and if there's no canonical order, it needs to list all
>         permutations.  For example, you might have keywords for
>         collation, calendar, and number that apply to several locales.
>          If there were no canonical order you'd have to support all 16
>         permutations (zero, one, two, or three keywords, in any order)
>         for each such locale.  This is rather a lot to list in
>         getAvailableLocales.  And this doesn't even involve the values
>         of the keywords.
>
>         Even if there is a canonical order, then simple fallback
>         doesn't work for all services.  Say NumberFormat is passed a
>         locale with the extension "th-th-u-ca-foobar-nu-thai"
>         (calendar = foobar, numbers = thai).  There's no locale
>         matching that  so this falls back to "th-th-u-ca-foobar",
>         tossing the numbers extension on the floor.  The problem
>         appears when there is a bundle "th-th-u-nu-thai", the
>         (irrelevant, from NumberFormat's point of view) request for
>         the foobar calendar preempted the (relevant) request for thai
>         numbers, since it was canonalized to a position earlier in the
>         language tag.  The exact opposite could happen for DateFormat
>         with calendar = japanese and animal = foobar (as a
>         hypothetical example).
>
>         This leads to each service having to manipulate the extensions
>         before looking up the bundle, to keep irrelevant extensions
>         out of the way.
>         This means each service is potentially seeing an entirely
>         different bundle for the 'same' locale.  If data used by both
>         services is different between the two bundles, this might show
>         up as an unwanted and unexpected side effect.
>
>         Considerations like these led us to want to perform lookup
>         using only the base locale, and let the services make use of
>         the extension data as they saw fit based on that same bundle.
>
>         As for BCP47, it does have descriptions of how one might match
>         against a preferred language list, and also says that
>         particular implementations can perform lookup ignoring
>         extensions.  But this language from the spec is generally in
>         the context of matching a preferred language list with
>         wildcards, and so it's not clear how or if this applies to
>         examining a partially ordered collection of locale resources.
>           I tend to think it does not directly apply, and that the way
>         we propose handling lookup is conformant.  Mark of course may
>         have a different opinion.
>
>         Doug
>
>         On Fri, Mar 13, 2009 at 1:11 PM, Naoto Sato
>         <Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>
>         <mailto:Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>>> wrote:
>
>            So are you specifically talking about LDML extensions?  In
>         BCP 47,
>            it's one of the subtags and the BCP does not give any special
>            semantics to it (because it does not know for what it would be
>            used).  So I thought the fallback would be:
>
>            xx-yy-zz-ext
>            xx-yy-zz
>            xx-yy
>            xx
>
>
>            Thanks,
>            Naoto
>
>            Yoshito Umaoka wrote:
>
>                Naoto Sato wrote:
>
>                    Umaoka-san,
>
>                    I don't think this is a compatibility issue,
>         because the
>                    existing SPI implementations should still work
>         compatible
>                    with the locales without extensions.  Possible
>         issue would
>                    only arise with the new locales.
>
>                    BTW, current SPI implementation invocation already
>                    involves fallback itself. i.e., say the request
>         locale is
>                    xx_YY_foo_bar, and one SPI provider implements
>         xx_YY, then
>                    that provider's service is used.  So adding the
>         extension
>                    fallback is not that ugly to me.
>
>                Yes, I know the current fallback strategy.
>                LDML extensions are designed for specifying optional
>         behavior
>                for a locale.  Therefore, as we described in the very first
>                proposal, extensions are carried in each level.  More
>                specifically, if a locale xx-yy-zzzz-u-cu-usd is requested,
>                below is the candidate list.
>
>                xx-yy-zzzz-u-usd
>                xx-yy-u-usd
>                xx-u-usd
>
>                If we need "extensionless" version inserted, it becomes
>
>                xx-yy-zzzz-u-usd
>                xx-yy-zzzz
>                xx-yy-u-usd
>                xx-yy
>                xx-u-usd
>                xx
>
>                Don't you think it's somewhat ugly?
>
>                -Yoshito
>
>
>
>            --    Naoto Sato
>
>
>
>




More information about the locale-enhancement-dev mailing list