[loc-en-dev] Equality of base locale and LocaleServiceProvider implementation
Naoto Sato
Naoto.Sato at Sun.COM
Mon Mar 16 11:28:14 PDT 2009
Well, I understand the rationale for the LDML keywords. However on the
other hand, BCP 47 specifies the look up fallback as I described before
(RFC4647, "3.4. Lookup"). Regarding the extensions, it reads:
---
Extensions and unrecognized private-use subtags might be unrelated to
a particular application of lookup. Since these subtags come at the
end of the subtag sequence, they are removed first during the
fallback process and usually pose no barrier to interoperability.
However, an implementation MAY remove these from ranges prior to
performing the lookup (provided the implementation also removes them
from the tags being compared). Such modification is internal to the
implementation and applications, protocols, or specifications SHOULD
NOT remove or modify subtags in content that they return or forward,
because this removes information that can be used elsewhere.
---
So this expects that the extensions should first be removed when
fallback happens. I am not sure whether removing the left subtags
(variant, region, etc.) and appending the extension would conform to
this specification. At least I think that the current proposed fallback
could confuse some of the developers.
Naoto
Doug Felt wrote:
> The way we've been thinking of it is that the extensions are an
> adjunct to the fields of the locale. Their order isn't important, so
> we canonicalize their order (the same is true for ldml keywords within
> the ldml extension).
>
> The model I have is as follows:
>
> The user passes a Locale to a service, which (usually) looks up a
> bundle, using the base fields language, script, region, and variant
> (including subfields of variant). Once a matching bundle is found,
> it's returned to the service. The appropriate extensions (in our
> particular case, the ldml extension) are then interpreted by the
> service when accessing the bundle-- which extensions are used depend
> on the service. There's no real hierarchical ordering to the ldml
> extensions, they're just different customizations that apply to
> whatever services care about them. This is different from
> language/script/region/variant where there is (generally) a useful
> hierarchical order.
>
> Part of the idea here is that by handling extensions separately, the
> service provider doesn't have to list them-- and if there's no
> canonical order, it needs to list all permutations. For example, you
> might have keywords for collation, calendar, and number that apply to
> several locales. If there were no canonical order you'd have to
> support all 16 permutations (zero, one, two, or three keywords, in any
> order) for each such locale. This is rather a lot to list in
> getAvailableLocales. And this doesn't even involve the values of the
> keywords.
>
> Even if there is a canonical order, then simple fallback doesn't work
> for all services. Say NumberFormat is passed a locale with the
> extension "th-th-u-ca-foobar-nu-thai" (calendar = foobar, numbers =
> thai). There's no locale matching that so this falls back to
> "th-th-u-ca-foobar", tossing the numbers extension on the floor. The
> problem appears when there is a bundle "th-th-u-nu-thai", the
> (irrelevant, from NumberFormat's point of view) request for the foobar
> calendar preempted the (relevant) request for thai numbers, since it
> was canonalized to a position earlier in the language tag. The exact
> opposite could happen for DateFormat with calendar = japanese and
> animal = foobar (as a hypothetical example).
>
> This leads to each service having to manipulate the extensions before
> looking up the bundle, to keep irrelevant extensions out of the way.
> This means each service is potentially seeing an entirely different
> bundle for the 'same' locale. If data used by both services is
> different between the two bundles, this might show up as an unwanted
> and unexpected side effect.
>
> Considerations like these led us to want to perform lookup using only
> the base locale, and let the services make use of the extension data
> as they saw fit based on that same bundle.
>
> As for BCP47, it does have descriptions of how one might match against
> a preferred language list, and also says that particular
> implementations can perform lookup ignoring extensions. But this
> language from the spec is generally in the context of matching a
> preferred language list with wildcards, and so it's not clear how or
> if this applies to examining a partially ordered collection of locale
> resources. I tend to think it does not directly apply, and that the
> way we propose handling lookup is conformant. Mark of course may have
> a different opinion.
>
> Doug
>
> On Fri, Mar 13, 2009 at 1:11 PM, Naoto Sato <Naoto.Sato at sun.com
> <mailto:Naoto.Sato at sun.com>> wrote:
>
> So are you specifically talking about LDML extensions? In BCP 47,
> it's one of the subtags and the BCP does not give any special
> semantics to it (because it does not know for what it would be
> used). So I thought the fallback would be:
>
> xx-yy-zz-ext
> xx-yy-zz
> xx-yy
> xx
>
>
> Thanks,
> Naoto
>
> Yoshito Umaoka wrote:
>
> Naoto Sato wrote:
>
> Umaoka-san,
>
> I don't think this is a compatibility issue, because the
> existing SPI implementations should still work compatible
> with the locales without extensions. Possible issue would
> only arise with the new locales.
>
> BTW, current SPI implementation invocation already
> involves fallback itself. i.e., say the request locale is
> xx_YY_foo_bar, and one SPI provider implements xx_YY, then
> that provider's service is used. So adding the extension
> fallback is not that ugly to me.
>
> Yes, I know the current fallback strategy.
> LDML extensions are designed for specifying optional behavior
> for a locale. Therefore, as we described in the very first
> proposal, extensions are carried in each level. More
> specifically, if a locale xx-yy-zzzz-u-cu-usd is requested,
> below is the candidate list.
>
> xx-yy-zzzz-u-usd
> xx-yy-u-usd
> xx-u-usd
>
> If we need "extensionless" version inserted, it becomes
>
> xx-yy-zzzz-u-usd
> xx-yy-zzzz
> xx-yy-u-usd
> xx-yy
> xx-u-usd
> xx
>
> Don't you think it's somewhat ugly?
>
> -Yoshito
>
>
>
> --
> Naoto Sato
>
>
More information about the locale-enhancement-dev
mailing list