[loc-en-dev] Equality of base locale and LocaleServiceProvider implementation

Naoto Sato Naoto.Sato at Sun.COM
Tue Mar 17 23:24:18 PDT 2009


The rationale I involved the step 4 is for the existing providers which 
don't know extensions.  In this enhanced spec, they are regarded to 
advertise, say, xx-yy-zzz-* that should match whatever the extension 
is.  However, as Yoshito mentioned, if the provider implementation uses 
Locale.equals() to compare the requested locale with its advertised 
locales, it won't return true because of the extension.  To avoid this, 
we need to call the provider with 'xx-yy-zzzz' request locale.

Thanks,
Naoto

Yoshito Umaoka wrote:
> I think Naoto mentioned the interaction between Java and a LSP.  The 
> lookup is done by base locale which is under Java's control.  Once 
> Java found a LSP which support the base locale, then call a service 
> method with extensions first.  If null is returned, then call the same 
> provider with the base locale only.  So conceptually, look up and 
> service invocation are not a single fallback chain.
>
> More specifically,
>
> 1. Requested locale is xx-yy-zzzz-ext
> 2. Java try to locate a service claiming to support xx-yy-zzzz
> 3. Java invoke the service method with xx-yy-zzzz-ext
> 4. If 3 is failed (null is returned), then call the same service 
> method with xx-yy-zzzz
> 5. If 4 is failed (still returning null), then Java to try xx-yy and 
> start over the steps 2 to 4, then xx.
>
> One thing I'm not sure (and it was my original question) is if we 
> really need step 4 above.
>
> -Yoshito
>
> Doug Felt wrote:
>> I don't understand this.
>>
>> If providers only advertise their base locales, why is the extension 
>> involved in lookup at all?
>>
>> Doug
>>
>> On Tue, Mar 17, 2009 at 11:07 AM, Naoto Sato <Naoto.Sato at sun.com 
>> <mailto:Naoto.Sato at sun.com>> wrote:
>>
>>     If handling the extension is special and it's our discretion how
>>     to deal with it, then I would think the following fallback is the
>>     most compatible (for the reason Yoshito mentioned) and still meets
>>     the requirement from -u extension for LDML keywords, assuming that
>>     the providers only advertise their base locales.
>>
>>     xx-yy-zzzz-ext
>>     xx-yy-zzzz
>>     xx-yy-ext
>>     xx-yy
>>     xx-ext
>>     xx
>>
>>     Thanks,
>>     Naoto
>>
>>     Doug Felt wrote:
>>
>>         Well, perhaps Mark can clarify this passage for us  Mark?
>>
>>         As you cite:
>>
>>         "However, an implementation MAY remove these [extensions and
>>         unrecognized private-use subtags] from ranges prior to
>>         performing the lookup, provided the implementation also
>>         removes them from the tags being compared"
>>
>>         This seem to me to allow us to compare only the initial fields
>>         when doing lookup, while still using the full locale after
>>         lookup has been completed.
>>
>>         Naoto, how would you propose dealing with the problems I cited?
>>
>>         Doug
>>
>>         On Mon, Mar 16, 2009 at 11:28 AM, Naoto Sato
>>         <Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>
>>         <mailto:Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>>> wrote:
>>
>>            Well, I understand the rationale for the LDML keywords.
>>          However
>>            on the other hand, BCP 47 specifies the look up fallback as I
>>            described before (RFC4647, "3.4. Lookup").  Regarding the
>>            extensions, it reads:
>>
>>            ---
>>
>>             Extensions and unrecognized private-use subtags might be
>>         unrelated to
>>             a particular application of lookup.  Since these subtags
>>         come at the
>>             end of the subtag sequence, they are removed first during 
>> the
>>             fallback process and usually pose no barrier to
>>         interoperability.
>>             However, an implementation MAY remove these from ranges
>>         prior to
>>             performing the lookup (provided the implementation also
>>         removes them
>>             from the tags being compared).  Such modification is
>>         internal to the
>>             implementation and applications, protocols, or
>>         specifications SHOULD
>>             NOT remove or modify subtags in content that they return
>>         or forward,
>>             because this removes information that can be used elsewhere.
>>
>>            ---
>>
>>            So this expects that the extensions should first be removed
>>         when
>>            fallback happens.  I am not sure whether removing the left
>>         subtags
>>            (variant, region, etc.) and appending the extension would
>>         conform
>>            to this specification.  At least I think that the current
>>         proposed
>>            fallback could confuse some of the developers.
>>
>>            Naoto
>>
>>            Doug Felt wrote:
>>
>>                The way we've been thinking of it is that the
>>         extensions are
>>                an adjunct to the fields of the locale.  Their order 
>> isn't
>>                important, so we canonicalize their order (the same is 
>> true
>>                for ldml keywords within the ldml extension).
>>
>>                The model I have is as follows:
>>
>>                The user passes a Locale to a service, which (usually)
>>         looks
>>                up a bundle, using the base fields language, script,
>>         region,
>>                and variant (including subfields of variant).  Once a
>>         matching
>>                bundle is found, it's returned to the service.  The
>>                appropriate extensions (in our particular case, the ldml
>>                extension) are then interpreted by the service when
>>         accessing
>>                the bundle-- which extensions are used depend on the
>>         service.
>>                 There's no real hierarchical ordering to the ldml
>>         extensions,
>>                they're just different customizations that apply to
>>         whatever
>>                services care about them.  This is different from
>>                language/script/region/variant where there is 
>> (generally) a
>>                useful hierarchical order.
>>
>>                Part of the idea here is that by handling extensions
>>                separately, the service provider doesn't have to list
>>         them--
>>                and if there's no canonical order, it needs to list all
>>                permutations.  For example, you might have keywords for
>>                collation, calendar, and number that apply to several
>>         locales.
>>                 If there were no canonical order you'd have to support
>>         all 16
>>                permutations (zero, one, two, or three keywords, in any
>>         order)
>>                for each such locale.  This is rather a lot to list in
>>                getAvailableLocales.  And this doesn't even involve the
>>         values
>>                of the keywords.
>>
>>                Even if there is a canonical order, then simple fallback
>>                doesn't work for all services.  Say NumberFormat is
>>         passed a
>>                locale with the extension "th-th-u-ca-foobar-nu-thai"
>>                (calendar = foobar, numbers = thai).  There's no locale
>>                matching that  so this falls back to "th-th-u-ca-foobar",
>>                tossing the numbers extension on the floor.  The problem
>>                appears when there is a bundle "th-th-u-nu-thai", the
>>                (irrelevant, from NumberFormat's point of view) 
>> request for
>>                the foobar calendar preempted the (relevant) request
>>         for thai
>>                numbers, since it was canonalized to a position earlier
>>         in the
>>                language tag.  The exact opposite could happen for
>>         DateFormat
>>                with calendar = japanese and animal = foobar (as a
>>                hypothetical example).
>>
>>                This leads to each service having to manipulate the
>>         extensions
>>                before looking up the bundle, to keep irrelevant 
>> extensions
>>                out of the way.
>>                This means each service is potentially seeing an entirely
>>                different bundle for the 'same' locale.  If data used
>>         by both
>>                services is different between the two bundles, this
>>         might show
>>                up as an unwanted and unexpected side effect.
>>
>>                Considerations like these led us to want to perform 
>> lookup
>>                using only the base locale, and let the services make
>>         use of
>>                the extension data as they saw fit based on that same
>>         bundle.
>>
>>                As for BCP47, it does have descriptions of how one
>>         might match
>>                against a preferred language list, and also says that
>>                particular implementations can perform lookup ignoring
>>                extensions.  But this language from the spec is
>>         generally in
>>                the context of matching a preferred language list with
>>                wildcards, and so it's not clear how or if this 
>> applies to
>>                examining a partially ordered collection of locale
>>         resources.
>>                  I tend to think it does not directly apply, and that
>>         the way
>>                we propose handling lookup is conformant.  Mark of
>>         course may
>>                have a different opinion.
>>
>>                Doug
>>
>>                On Fri, Mar 13, 2009 at 1:11 PM, Naoto Sato
>>                <Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>
>>         <mailto:Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>>
>>                <mailto:Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>
>>         <mailto:Naoto.Sato at sun.com <mailto:Naoto.Sato at sun.com>>>> wrote:
>>
>>                   So are you specifically talking about LDML
>>         extensions?  In
>>                BCP 47,
>>                   it's one of the subtags and the BCP does not give
>>         any special
>>                   semantics to it (because it does not know for what
>>         it would be
>>                   used).  So I thought the fallback would be:
>>
>>                   xx-yy-zz-ext
>>                   xx-yy-zz
>>                   xx-yy
>>                   xx
>>
>>
>>                   Thanks,
>>                   Naoto
>>
>>                   Yoshito Umaoka wrote:
>>
>>                       Naoto Sato wrote:
>>
>>                           Umaoka-san,
>>
>>                           I don't think this is a compatibility issue,
>>                because the
>>                           existing SPI implementations should still work
>>                compatible
>>                           with the locales without extensions.  Possible
>>                issue would
>>                           only arise with the new locales.
>>
>>                           BTW, current SPI implementation invocation
>>         already
>>                           involves fallback itself. i.e., say the 
>> request
>>                locale is
>>                           xx_YY_foo_bar, and one SPI provider implements
>>                xx_YY, then
>>                           that provider's service is used.  So adding 
>> the
>>                extension
>>                           fallback is not that ugly to me.
>>
>>                       Yes, I know the current fallback strategy.
>>                       LDML extensions are designed for specifying 
>> optional
>>                behavior
>>                       for a locale.  Therefore, as we described in the
>>         very first
>>                       proposal, extensions are carried in each level.
>>          More
>>                       specifically, if a locale xx-yy-zzzz-u-cu-usd is
>>         requested,
>>                       below is the candidate list.
>>
>>                       xx-yy-zzzz-u-usd
>>                       xx-yy-u-usd
>>                       xx-u-usd
>>
>>                       If we need "extensionless" version inserted, it
>>         becomes
>>
>>                       xx-yy-zzzz-u-usd
>>                       xx-yy-zzzz
>>                       xx-yy-u-usd
>>                       xx-yy
>>                       xx-u-usd
>>                       xx
>>
>>                       Don't you think it's somewhat ugly?
>>
>>                       -Yoshito
>>
>>
>>
>>                   --    Naoto Sato
>>
>>
>>
>>
>>
>>
>




More information about the locale-enhancement-dev mailing list