From y.umaoka at gmail.com Wed Jan 26 13:08:05 2011 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 26 Jan 2011 16:08:05 -0500 Subject: [loc-en-dev] toLanguageTag problem Message-ID: <4D408D35.9030005@gmail.com> Hi all, I found a problem in Locale#toLanguageTag(). When an instance of Locale has no language, toLanguageTag() supplies "und" as the language subtag. This is a requirement of "langtag" construction of BCP47 language tag. langtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse] However, it is not necessary if the Locale only has private use value. Because a BCP47 language tag could be private use alone. Language-Tag = langtag ; normal language tags / privateuse ; private use tag / grandfathered ; grandfathered tags For example, Locale.forLanguageTag("x-elmer").toLanguageTag() Above currently returns "und-x-elmer", but it should actually return "x-elmer". I'm going to file a bug for this issue. -Yoshito From y.umaoka at gmail.com Wed Jan 26 18:40:56 2011 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 26 Jan 2011 21:40:56 -0500 Subject: [loc-en-dev] Fwd: toLanguageTag problem Message-ID: <4D40DB38.8040109@gmail.com> Hmm.. My original message had gone somewhere. Resending. -------- Original Message -------- Subject: toLanguageTag problem Date: Wed, 26 Jan 2011 16:08:05 -0500 From: Yoshito Umaoka To: locale-enhancement-dev at openjdk.java.net Hi all, I found a problem in Locale#toLanguageTag(). When an instance of Locale has no language, toLanguageTag() supplies "und" as the language subtag. This is a requirement of "langtag" construction of BCP47 language tag. langtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse] However, it is not necessary if the Locale only has private use value. Because a BCP47 language tag could be private use alone. Language-Tag = langtag ; normal language tags / privateuse ; private use tag / grandfathered ; grandfathered tags For example, Locale.forLanguageTag("x-elmer").toLanguageTag() Above currently returns "und-x-elmer", but it should actually return "x-elmer". I'm going to file a bug for this issue. -Yoshito -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20110126/3c6d2487/attachment.html From staudacher at google.com Wed Jan 26 19:51:23 2011 From: staudacher at google.com (Andy Staudacher) Date: Wed, 26 Jan 2011 19:51:23 -0800 Subject: [loc-en-dev] Fwd: toLanguageTag problem In-Reply-To: <4D40DB38.8040109@gmail.com> References: <4D40DB38.8040109@gmail.com> Message-ID: +1 Thanks! On Wed, Jan 26, 2011 at 6:40 PM, Yoshito Umaoka wrote: > Hmm.. My original message had gone somewhere. Resending. > > > -------- Original Message -------- Subject: toLanguageTag problem Date: Wed, > 26 Jan 2011 16:08:05 -0500 From: Yoshito Umaoka To: > locale-enhancement-dev at openjdk.java.net > > Hi all, > > I found a problem in Locale#toLanguageTag(). When an instance of Locale > has no language, toLanguageTag() supplies "und" as the language subtag. > This is a requirement of "langtag" construction of BCP47 language tag. > > langtag = language > ["-" script] > ["-" region] > *("-" variant) > *("-" extension) > ["-" privateuse] > > However, it is not necessary if the Locale only has private use value. > Because a BCP47 language tag could be private use alone. > > Language-Tag = langtag ; normal language tags > / privateuse ; private use tag > / grandfathered ; grandfathered tags > > > > For example, > > Locale.forLanguageTag("x-elmer").toLanguageTag() > > Above currently returns "und-x-elmer", but it should actually return > "x-elmer". I'm going to file a bug for this issue. > > -Yoshito > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20110126/16eb55dd/attachment.html From y.umaoka at gmail.com Thu Jan 27 18:35:09 2011 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Thu, 27 Jan 2011 21:35:09 -0500 Subject: [loc-en-dev] A bug was filed - toLanguageTag problem Message-ID: <4D422B5D.3000708@gmail.com> I submitted a bug - 7015500 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7015500 It should take a while the bug is showing up in the above link. -Yoshito -------- Original Message -------- Subject: toLanguageTag problem Date: Wed, 26 Jan 2011 16:08:05 -0500 From: Yoshito Umaoka To: locale-enhancement-dev at openjdk.java.net Hi all, I found a problem in Locale#toLanguageTag(). When an instance of Locale has no language, toLanguageTag() supplies "und" as the language subtag. This is a requirement of "langtag" construction of BCP47 language tag. langtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse] However, it is not necessary if the Locale only has private use value. Because a BCP47 language tag could be private use alone. Language-Tag = langtag ; normal language tags / privateuse ; private use tag / grandfathered ; grandfathered tags For example, Locale.forLanguageTag("x-elmer").toLanguageTag() Above currently returns "und-x-elmer", but it should actually return "x-elmer". I'm going to file a bug for this issue. -Yoshito -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20110127/1ac5a8a6/attachment.html From y.umaoka at gmail.com Thu Jan 27 18:47:22 2011 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Thu, 27 Jan 2011 21:47:22 -0500 Subject: [loc-en-dev] Locale.toString() with language and script/extension without country/variant Message-ID: <4D422E3A.1000109@gmail.com> While testing private use tags, I discovered one more issue in Locale.toString(). Our design goal was to put script and extensions after variant. By doing so, they look like a part of variant from old Java releases. However, it does not work well when a Locale has language and script/extensions, but no country or variants. For example: Locale.forLanguageTag("en-Latn").toString() -> Expected: "en__#Latn" / Actual: "en_#Latn" Locale.forLanguageTag("en-x-123").toString() -> Expected: "en__#x-123" / Actual: "en_#x-123" The current behavior may confuse old Java programs assuming the second field separated by "_" is country. I think we should also fix this problem before JDK 7 final release. -Yoshito