[loc-en-dev] [Fwd: Re: About Locale#toString()]

Yoshito Umaoka y.umaoka at gmail.com
Thu Feb 26 09:19:24 PST 2009


BCP47 allows multiple variants used at the same time and there is 
actually some valid use cases.  For example,

sl-si-1994-rozaj-solba

is a valid language tag (variant 1994/rozaj/solba are registered in IANA 
registry) and it actually make sense.

According to Mark Davis, the IANA registry data does not force the order 
of variant tags and following language tags are semantically equivalent 
with above example -

sl-si-1994-solba-rozaj
sl-si-rozaj-1994-solba
sl-si-rozaj-solba-1994
sl-si-solba-1994-rozaj
sl-si-solba-rozaj-1994

For matching, the order of variant matters.  I cannot find any 
description about canonical ordering of multiple variants in 
RFC4646bis-20.  But when I asked about this to Mark, he said that 
canonical representation should be in the natural alphabetical order, 
which is, 1994-rozaj-solba.  (Mark, is this correct?)

I think locale inheritance within variants should be compatible with the 
current JDK's behavior and it is consistent with the language tag 
matching part of BCP47.

-Yoshito


-------- Original Message --------
Subject: Re: [loc-en-dev] About Locale#toString()
Date: Wed, 18 Feb 2009 15:50:01 -0500
From: Yoshito Umaoka <y.umaoka at gmail.com>
To: Doug Felt <dougfelt at google.com>
CC: locale-enhancement-dev at openjdk.java.net
References: <499C3F36.3090906 at gmail.com> 
<146f39a80902181131u2784864cw8333331e39a95445 at mail.gmail.com>

Thank you for pointing out the problem.  I mixed up the ICU
implementation with JDK.  You're right and we do not need to support
inheritance within variant component.

Actually, this opens up another question.  BCP47 itself supports
multiple variant values by its syntax definition.  The IANA language tag
directory contains following variants -

1606nict
1694acad
1901
1959acad
1994
1996
arevela
arevmda
baku1926
biske
boont
fonipa
fonupa
kkcor
lipaw
monoton
nedis
njiva
osojs
pinyin
polyton
rozaj
scotland
scouse
solba
tarask
uccor
ucrcor
valencia
wadegile

These values are actually constrained by prefix.  For example,

%%
Type: variant
Subtag: scotland
Description: Scottish Standard English
Added: 2007-08-31
Prefix: en
%%
Type: variant
Subtag: scouse
Description: Scouse
Added: 2006-09-18
Prefix: en
Comments: English Liverpudlian dialect known as 'Scouse'


So "en-scotland" is a valid language tag, "en-scouse" is also a valid
language tag, but I'm not sure about "en-scotland-scouse" or
"en-scouse-scotlan".  Practically, such combination does not make sense.
  But I could not find any description that explains these are invalid.
  If these are valid, language range "en-scotland" could match
"en-scotland-scouse" by the RFC4647 part of BCP47.

Anyway, I cannot imagine any practical language tag which has multiple
IANA registered variants, I think it's probably OK to process variant as
a single field and not supprting inheritance within a variant.

I'll check LTRU folks if multiple variant values are currently allowed.

-Yoshito

Doug Felt wrote:
> What is the motivation for treating the variant field 
> underscore-by-underscore rather than as an entire unit?
> 
> Doug
> 
> On Wed, Feb 18, 2009 at 9:02 AM, Yoshito Umaoka <y.umaoka at gmail.com 
> <mailto:y.umaoka at gmail.com>> wrote:
> 
>     In the bi-weekly project call, we agreed not to change the behavior
>     of toString().  This implies that you won't get any new field
>     information (such as script and extensions) returned by toString().
> 
>     In the current proposed API set, we have toLanguageTag(), which
>     returns syntactically valid BCP47 language tag string.  However,
>     subtags in a BCP47 language tag is delimited by hyphen('-') instead
>     of underscore('_').  One of the goals in this project is to include
>     script field value involved in the resource bundle lookup
>     inheritance. Therefore, I would like to have a method creating a
>     locale string delimited by underscore, which can be used for
>     resource bundle suffix. (Technically, this can be achieved by
>     composing the string by appending getLanguage(), getScript()...)
>      This is a common operation and I think it is worth having such API.
> 
>     I'm considering following three APIs for the purpose.
> 
>     String toFullString()
>     Locale getBaseLocale()
>     Locale getParent()
> 
> 
>     toFullString() is a variant of toString() to generate a string
>     representation of Locale, but also include script and extensions if
>     they are available.
> 
>     getBaseLocale() returns a Locale (proposed implementation is to
>     return a singleton) without locale extensions.  Locale extensions is
>     not used for resource bundle lookup.
> 
>     getParent() returns a parent Locale (proposed implementation is to
>     return a singleton).  A parent locale represent a locale omitting
>     the most right field of its child locale.  For example, Locale("en")
>     is a parent locale of Locale("en", "US").  If a locale has a variant
>     field and the variant field contains one or more underscore
>     characters, then its parent still have variant field, but excluding
>     the substring after the last underscore.  For example, Locale("en",
>     "US", "NYC") is a parent locale of Locale("en", "US", "NYC_JFK")
> 
>     With these 3 APIs, the resource bundle is collecting key-value pairs
>     with the pseudo code below -
> 
>     Locale target; // the resolved Locale
>     Locale loc = target;
>     ResourceBundleImpl child = null;
>     while (true) {
>        ResourceBundleImpl aBundle = loadFrom(bundleBaseName + "_" +
>            loc.getBaseLocale().toFullString());
>        if (child != null) {
>            child.parent = aBundle;
>        }
>        loc = loc.getParent();
>        if (loc == null) {
>            // Locale.ROOT.getParent() returns null
>            break;
>        }
>        child = aBundle;
>     }
> 
>     Do you think we should have such APIs?  Also, if you do, do you want
>     to make them public or keep them package local/private?
> 
>     -Yoshito
> 
> 





More information about the locale-enhancement-dev mailing list