From yoshito_umaoka at us.ibm.com Wed Jun 2 10:08:11 2010 From: yoshito_umaoka at us.ibm.com (yoshito_umaoka at us.ibm.com) Date: Wed, 2 Jun 2010 13:08:11 -0400 Subject: [loc-en-dev] Java Locale enhancement - Random notes on the webrev & design specification In-Reply-To: Message-ID: It returns "" (empty string). I understand you may want to treat standalone privateuse as a language, but I think it is somewhat inconsistent and not fit well to the design of Locale. BTW, we have a mailing list for such discussion. Please subscribe to locale-enhancement-dev at openjdk.java.net and discuss about design problems/questions in the mailing list. http://mail.openjdk.java.net/mailman/listinfo/locale-enhancement-dev -Yoshito Markus Scherer wrote on 06/02/2010 12:58:03 PM: > On Wed, Jun 2, 2010 at 9:41 AM, Mark Davis ? wrote: > As to the second, you can have > > x-markus > or > en-x-markus > or > en-Arab-US-u-co-foo-x-markus > > That is, the x-markus can be anywhere, and unlike the extensions > doesn't have to have a base language. However, it is most like the > extension fields, and for simplicity in the API, makes sense to > treat it like one. > > Hm, ok. What does getLanguage() return for "x-markus"? > > markus From staudacher at google.com Thu Jun 10 13:40:46 2010 From: staudacher at google.com (Andy Staudacher) Date: Thu, 10 Jun 2010 13:40:46 -0700 Subject: [loc-en-dev] Review of current implementation and the design spec Message-ID: I've reviewed the implementation and the design specification with two goals: a) identify what parts of the design specification document need to be updated, and b) assess the status and quality of the proposal as someone who hasn't been involved in the evolution of the proposal until now. *Verdict:* 1. *The proposal looks good*. The implementation has a few minor areas that need to be addressed (no public API changes needed), and as already known, the design specification document needs a major overhaul. 2. We already knew that the *design specification document* needs to be updated. My feedback identifies the sections that need changes, and proposes structural changes to the document for clarity and readability. 3. As for the *implementation*, the two main areas that need some work are *canonicalization and resource lookup*. E.g. the latter hasn't been implemented yet (and luckily should be straightforward to implement). 4. The feedback includes a laundry list of issues and suggestions. The *review notes* can be found at: http://docs.google.com/View?id=dg83b6q2_26cjm427hq If you can't view the document, please let me know. If you'd like to add comments to the document itself, please send me the email address for which I should add edit access. I hope the notes help in your work to update the design specification document, and please let me know where I could help on addressing any issues. Thanks, - Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100610/429e4e90/attachment.html From y.umaoka at gmail.com Tue Jun 22 08:45:02 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Tue, 22 Jun 2010 11:45:02 -0400 Subject: [loc-en-dev] Resume the project calls starting next week Message-ID: <4C20DA7E.2080908@gmail.com> Hi all, It looks the proposal got "go" sign from the project committee. We need to make the APIs finalized and the implementation ready for review. I'm planning to resume the project call starting next week - for this time, we need to wrap things up quickly - so I want to have the call weekly, at least next 3-4 weeks. Because Okutsu-san is in Japan, we probably need to schedule the call early morning or evening here. I created Doodle Poll for the meeting time: http://www.doodle.com/3mrdxxky68n3ta2c Please post your availability there by the end of Thursday 6/24 PT. I'll look everyone's availability on Friday morning and send the meeting invitation. This is the poll for the first call, but we want to use the same time slot for at least next 2 or 3 weeks. Thanks, Yoshito From y.umaoka at gmail.com Fri Jun 25 08:44:13 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Fri, 25 Jun 2010 15:44:13 +0000 Subject: [loc-en-dev] Invitation: OpenJDK Locale Enhancement Project Status Call @ Every 4 weeks from 8pm to 9pm on Monday (locale-enhancement-dev@openjdk.java.net) Message-ID: <0016e68eeae6898c2e0489dca4e2@google.com> You have been invited to the following event. Title: OpenJDK Locale Enhancement Project Status Call Call-in# informaiton Passcode#662122 US 877-421-0033 Japan 00531-11-3180 (KDD) 0066-33-801263 (Cable & Wireless) 0044-22-112668 (Softbank Telecom) 0034-800-900155 (NTT) When: Every 4 weeks from 8pm to 9pm on Monday Eastern Time Where: Teleconference Calendar: locale-enhancement-dev at openjdk.java.net Who: * Yoshito Umaoka - organizer * locale-enhancement-dev at openjdk.java.net Event details: https://www.google.com/calendar/event?action=VIEW&eid=dmJkYmY4bHFwNXA3M2drNDRtNHNzYmJlY2MgbG9jYWxlLWVuaGFuY2VtZW50LWRldkBvcGVuamRrLmphdmEubmV0&tok=MTgjeS51bWFva2FAZ21haWwuY29tZWRmYzFmMjNkYjBlYzAwYTQ3YjY4ZGNiMTZlOGEwZTYyNjM1MzNiYQ&ctz=America%2FNew_York&hl=en Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account locale-enhancement-dev at openjdk.java.net because you are an attendee of this event. To stop receiving future notifications for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100625/b61e0df4/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1763 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100625/b61e0df4/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1813 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100625/b61e0df4/attachment-0001.bin From y.umaoka at gmail.com Fri Jun 25 08:45:23 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Fri, 25 Jun 2010 15:45:23 +0000 Subject: [loc-en-dev] Updated Invitation: OpenJDK Locale Enhancement Project Status Call @ Weekly from 8pm to 9pm on Monday from Mon Jun 28 to Mon Jul 19 (locale-enhancement-dev@openjdk.java.net) Message-ID: <0050450160b1b7eaeb0489dca896@google.com> This event has been changed. Title: OpenJDK Locale Enhancement Project Status Call Call-in# informaiton Passcode#662122 US 877-421-0033 Japan 00531-11-3180 (KDD) 0066-33-801263 (Cable & Wireless) 0044-22-112668 (Softbank Telecom) 0034-800-900155 (NTT) When: Weekly from 8pm to 9pm on Monday from Mon Jun 28 to Mon Jul 19 Eastern Time (changed) Where: Teleconference Calendar: locale-enhancement-dev at openjdk.java.net Who: * y.umaoka at gmail.com - organizer * locale-enhancement-dev at openjdk.java.net Event details: https://www.google.com/calendar/event?action=VIEW&eid=dmJkYmY4bHFwNXA3M2drNDRtNHNzYmJlY2MgbG9jYWxlLWVuaGFuY2VtZW50LWRldkBvcGVuamRrLmphdmEubmV0&tok=MTgjeS51bWFva2FAZ21haWwuY29tZWRmYzFmMjNkYjBlYzAwYTQ3YjY4ZGNiMTZlOGEwZTYyNjM1MzNiYQ&ctz=America%2FNew_York&hl=en Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account locale-enhancement-dev at openjdk.java.net because you are an attendee of this event. To stop receiving future notifications for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100625/71ef76d6/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1779 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100625/71ef76d6/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1829 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100625/71ef76d6/attachment-0001.bin From staudacher at google.com Sun Jun 27 20:48:56 2010 From: staudacher at google.com (Andy Staudacher) Date: Sun, 27 Jun 2010 20:48:56 -0700 Subject: [loc-en-dev] Meeting Jan-26 - Agenda items Message-ID: I'm looking forward to this week's meeting. Here are a few agenda items that I'd like to discuss: - Status updates - Details on go-ahead from Oracle/Sun? - Status update on Resource lookup implementation? - Design spec updates summary? - Roadmap / deadlines? - Status of test suite? - Andy would like to write more tests. Are there specific plans already? - Docs / outreach: - Need to update http://sites.google.com/site/openjdklocale/ - Remove "Proposed API changes" section, add link to webrev - Need to update http://openjdk.java.net/projects/locale-enhancement/ - Get status from Doug. He's waiting for instructions from Naoto. - Update webrev - And use short-URL for webrev (version-independent) Best, - Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100627/54780f0a/attachment.html From y.umaoka at gmail.com Mon Jun 28 18:28:42 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Tue, 29 Jun 2010 01:28:42 +0000 Subject: [loc-en-dev] Updated Invitation: OpenJDK Locale Enhancement Project Status Call @ Fri Jul 2 5pm - 6:30pm (locale-enhancement-dev@openjdk.java.net) Message-ID: <0016e6d26d2a5bbc17048a21281e@google.com> This event has been changed. Title: OpenJDK Locale Enhancement Project Status Call Call-in# informaiton Passcode#662122 US 877-421-0033 Japan 00531-11-3180 (KDD) 0066-33-801263 (Cable & Wireless) 0044-22-112668 (Softbank Telecom) 0034-800-900155 (NTT) When: Fri Jul 2 5pm ? 6:30pm Eastern Time (changed) Where: Teleconference Calendar: locale-enhancement-dev at openjdk.java.net Who: * y.umaoka at gmail.com - organizer * locale-enhancement-dev at openjdk.java.net Event details: https://www.google.com/calendar/event?action=VIEW&eid=dmJkYmY4bHFwNXA3M2drNDRtNHNzYmJlY2NfMjAxMDA3MDZUMDAwMDAwWiBsb2NhbGUtZW5oYW5jZW1lbnQtZGV2QG9wZW5qZGsuamF2YS5uZXQ&tok=MTgjeS51bWFva2FAZ21haWwuY29tODEzODQ5MzcxYzRhOGI0Y2U0ZGI3M2FmNzc1YTJkYTY3ZDU4YjZjMA&ctz=America%2FNew_York&hl=en Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account locale-enhancement-dev at openjdk.java.net because you are an attendee of this event. To stop receiving future notifications for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100629/18b03726/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1796 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100629/18b03726/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1846 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100629/18b03726/attachment-0001.bin From masayoshi.okutsu at oracle.com Fri Jun 25 01:58:54 2010 From: masayoshi.okutsu at oracle.com (Masayoshi Okutsu) Date: Fri, 25 Jun 2010 17:58:54 +0900 Subject: [loc-en-dev] Resume the project calls starting next week In-Reply-To: <4C20DA7E.2080908@gmail.com> References: <4C20DA7E.2080908@gmail.com> Message-ID: <4C246FCE.2020101@oracle.com> Unfortunately next week doesn't work for me due to mandatory courses and other commitment. I've added myself to the Poll anyway. Thanks, Masayoshi On 6/23/2010 12:45 AM, Yoshito Umaoka wrote: > Hi all, > > It looks the proposal got "go" sign from the project committee. We > need to make the APIs finalized and the implementation ready for > review. I'm planning to resume the project call starting next week - > for this time, we need to wrap things up quickly - so I want to have > the call weekly, at least next 3-4 weeks. > > Because Okutsu-san is in Japan, we probably need to schedule the call > early morning or evening here. I created Doodle Poll for the meeting > time: > > http://www.doodle.com/3mrdxxky68n3ta2c > > Please post your availability there by the end of Thursday 6/24 PT. > I'll look everyone's availability on Friday morning and send the > meeting invitation. This is the poll for the first call, but we want > to use the same time slot for at least next 2 or 3 weeks. > > Thanks, > Yoshito From y.umaoka at gmail.com Wed Jun 30 10:16:40 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 30 Jun 2010 13:16:40 -0400 Subject: [loc-en-dev] June 28 Project Call Minutes Message-ID: <4C2B7BF8.1030503@gmail.com> I captured what we discussed briefly in the meeting agenda page - http://sites.google.com/site/openjdklocale/meeting-agenda We reviewed proposed Builder APIs and made some minor changes- - Accept both '_' and '-' as subtag separator in all APIs - Retract "isLenientVariant" behavior from Builder I added some comments captured during the call -> http://sites.google.com/site/openjdklocale/design-notes/builder We need to decide if we really want to add APIs for accessing Unicode Locale Extension (-u-). In that case, we probably need some changes in the proposed API for handling attributes/typeless keywords now. We'll discuss about this in this ML and make the final decision on the next meeting on July 2nd, 2010. Also, we'll go through Locale class API changes in the next call and make them finalized. -Yoshito From y.umaoka at gmail.com Wed Jun 30 10:51:28 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 30 Jun 2010 13:51:28 -0400 Subject: [loc-en-dev] -u- extension vs. other extensions Message-ID: <4C2B8420.50709@gmail.com> Hi all, We agreed that we validate syntax of subtags, but do not validate code itself in Java. In other words, proposed implementation won't invalidate language subtag "xx" although the use of such code is not valid for BCP 47 language tag. In BCP47, extension is defined as: extension = singleton 1*("-" (2*8alphanum) When the previous proposal was written last year, the Unicode locale extension ('u' extension) only allows key/type subtag pairs. In BNF, unicode_locale_extensions = sep "u" 1*(sep keyword) keyword = key sep type key = 2alphanum type = 3*8alphanum This require special syntax validation for 'u' extension. For example, 1, extension "a-abc-de" is syntactically valid 2. extension "u-abc-de" is syntactically invalid, because it does not satisfy the requirement for 'u' extension (key(2alphanum) must be followed right after singleton, key must have its type(3*8alphanum). 'u' extension was updated in the final spec as below: unicode_locale_extensions = sep "u" ( 1*(sep keyword) / 1*(sep attribute) *(sep keyword) ) keyword = key [sep type] key = 2alphanum type = 3*8alphanum * (sep 3*8alphanum) attribute = 3*8alphanum This change - 1. subtags in the form of 3*8alpha before the first occurrence of key (2*alphanum) is interpreted as attributes, 2. key subtag might not be followed by type, 3. type might be represented by multiple subtags in the form of 3*8alphanum - actually eliminates the special syntax requirements for 'u' extension. With the updated specification, extension subtags satisfying the BCP47 extension syntax are also satisfying the 'u' extension. For example, "u-abc-de" is interpreted as attribute "abc" and typeless key "de". (Note that this specific tag is illegal because "abc" is not a registered attribute and "de" is not a known key value) With this change, we do not need any special coding for handling 'u' extension in the API - Builder#setExtension. This also means that we do not need to add special implementation dedicated for 'u' extension even we do not add the Unicode locale extension APIs (such as Builder#setUnicodeLocaleKeyword). -Yoshito From y.umaoka at gmail.com Wed Jun 30 13:30:24 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 30 Jun 2010 16:30:24 -0400 Subject: [loc-en-dev] -u- extension API - necessary updates? Message-ID: <4C2BA960.3040306@gmail.com> In the Locale Enhancement repository, we have following proposed APIs supporting -u- extension: In java.util.Locale public Set getUnicodeLocaleKeys() public String getUnicodeLocaleType(String key) In java.util.Locale.Builder public Builder setUnicodeLocaleKeyword(String key, String type) Following Unicode locale extension are not in our scope last year. 1. type represented by multiple subtags 2. key without type 3. attribute For supporting 1, it looks we do not need any changes in the proposal. A Unicode locale extension keyword may have type represented by multiple subtags. For example, "en-u-vt-0061-0065" is a valid example defined by the current LDML specification (See http://www.unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data). However, this does not mean that a keyword may have multiple types. In this example, 0061 and 0065 are not two different types - instead "0061-0065" is a type. Thus, getUnicodeLocaleType("vt") can simply return "0061-0065". To set the type using Builder, setUnicodeLocaleKeyword("vt", "0061-0065") is sufficient. For supporting 2, there is a minor conflict with the current proposal. Assume we have a Locale represented by pseudo language tag "en-u-aa-bb-ccc". getUnicodeLocaleKeys() will return a set containing "aa" and "bb". getUnicodeLocaleType(String key) currently returns null when the input key is not available, and it returns non-empty type string when the key is available. We could use empty string "" to represent typeless keyword - that is, getUnicodeLocaleType("aa") to return "" in this example. The remaining question is the Builder API - setUnicodeLocaleKeyword(String key, String type). For now, empty string type indicate that the keyword itself is removed from the current state and null type throws NPE. We could change the API to use null for deletion instead of empty string. For example, if an Builder internally represents "en-u-aa-bb-ccc", setUnicodeLocaleKeyword("aa", null) will remove the typeless keyword "aa" - and internal representation will be changed to "en-u-bb-ccc" after the call. Also, setUnicodeLocaleKeyword("dd", "") will append a typeless keyword "dd" to the internal state (that is, "en-u-aa-bb-ccc-dd"). Note that setXXX with empty string is removing a field from Builder by the current design. If we really want to change the semantics of empty string and null in the API setUnicodeLocaleKeyword, the consistent policy should be applied to others (for example, setLanguage(null) to remove language field, instead of setLanguage("")). For supporting 3, we could treat an attribute as keyless keyword. But it makes getUnicodeLocaleKeys()/getUnicodeLocaleType(String key) a little bit awkward. Technically, we can still design them like that way (getUnicodeLocaleKeys() to include an empty string in the return set / getUnicodeLocaleType("") to return attribute subtags). I think adding extra API dedicated for attribute is cleaner. public Set getUnicodeLocaleAttributes() The same idea is applicable to Builder. The API dedicated for adding/removing Unicode locale attribute like below may be added: public Builder addUnicodeLocaleAttribute(String attribute) public Builder removeUnicodeLocaleAttribute(String attribute) Another possibility is to multiple attributes as a whole. public Builder setUnicodeLocaleAttribute(String attributes) For example, setting attribute "abc" and "def", setUnicodeLocaleAttributes("abc-def"). If we go for this approach, we do not need "remove" method. A tricky part is that the order of attributes does not matter. So, semantically, "abc-def" and "def-abc" are same. We do not want to introduce unnecessary variations, we should clearly state that the order of attributes are not preserved. Another question related to this - Set vs. List. Currently, getUnicodeLocaleKeys() returns Set (actually, unmodifiable set). Semantically, the order of keywords does not matter. "u-ca-japanese-cu-jpy" is equivalent to "u-cu-jpy-ca-japanese". But we do use canonical order (alphabetical order of keys) when a Locale is converted to a language tag. From this point of view, List might be more appropriate. This also applies to attributes. If we agree to support Unicode locale attributes with dedicated APIs like above, we should decide if the collection of attributes should be represented by Set or List. Overall, supporting full specification of Unicode locale extension looks not too bad. Some may argue why we add APIs dedicated for things which are not yet used. We could defer adding "attribute" APIs - and attribute can be only set via Builder.setExtension('u', "...."). But necessary API addition is pretty minimal and with these APIs, the design look more complete. Therefore, if we are going to include any 'u' extension specific APIs, I want to do it completely including attribute support. -Yoshito From y.umaoka at gmail.com Wed Jun 30 14:08:41 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 30 Jun 2010 17:08:41 -0400 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java Message-ID: <4C2BB259.9000109@gmail.com> First of all, I'm not trying to retract Unicode locale extension part of proposal proactively. But I think we need to clarify the scope of our proposal - what Unicode locale extensions meant to Java itself. We want to bring Unicode locale extension to Java world. Java used to define variant to specify specific behavior variations. This model does not fit well to BCP 47. Unicode locale extension give you formal/well-structured scheme for representing a variation of locale. Java Locale ja_JP_JP is used for a variant of locale ja_JP, just changing calendar type to be Japanese Imperial calendar. This is Java's proprietary definition. In the current proposal, ja_JP_JP is transformed to -u-ca-japanese. For me, adding unicode locale extension APIs in Java indicates a certain level of commitment for supporting Unicode locale extension in Java itself. However, we did not discuss about Java's i18n service implementation part much so far. We only care two exceptional cases - ja_JP_JP and th_TH_TH at this moment. But, if we once expose Unicode locale extension in Java, Java users may expect Currency instance created with Locale de-DE-u-cu-dem to use German Mark. Of course, we need a framework first. Actual use of Unicode locale extension in Java i18n services might be done later. If we decide to add APIs dedicated for Unicode locale extensions and defer the support in i18n services, I think we should clearly state what Unicode locale extension meant to Java i18n services - what are supported, what are not, etc. I'll put this topic in the next project meeting. -Yoshito From dougfelt at google.com Wed Jun 30 15:19:16 2010 From: dougfelt at google.com (Doug Felt) Date: Wed, 30 Jun 2010 15:19:16 -0700 Subject: [loc-en-dev] -u- extension API - necessary updates? In-Reply-To: <4C2BA960.3040306@gmail.com> References: <4C2BA960.3040306@gmail.com> Message-ID: Comments inline On Wed, Jun 30, 2010 at 1:30 PM, Yoshito Umaoka wrote: > In the Locale Enhancement repository, we have following proposed APIs > supporting -u- extension: > > In java.util.Locale > > public Set getUnicodeLocaleKeys() > public String getUnicodeLocaleType(String key) > > In java.util.Locale.Builder > > public Builder setUnicodeLocaleKeyword(String key, String type) > > Following Unicode locale extension are not in our scope last year. > > 1. type represented by multiple subtags > 2. key without type > 3. attribute > > For supporting 1, it looks we do not need any changes in the proposal. A > Unicode locale extension keyword may have type represented by multiple > subtags. For example, "en-u-vt-0061-0065" is a valid example defined by the > current LDML specification (See > http://www.unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data). > > However, this does not mean that a keyword may have multiple types. In this > example, 0061 and 0065 are not two different types - instead "0061-0065" is > a type. Thus, getUnicodeLocaleType("vt") can simply return "0061-0065". To > set the type using Builder, setUnicodeLocaleKeyword("vt", "0061-0065") is > sufficient. > > Agree. > For supporting 2, there is a minor conflict with the current proposal. > Assume we have a Locale represented by pseudo language tag "en-u-aa-bb-ccc". > getUnicodeLocaleKeys() will return a set containing "aa" and "bb". > getUnicodeLocaleType(String key) currently returns null when the input key > is not available, and it returns non-empty type string when the key is > available. We could use empty string "" to represent typeless keyword - that > is, getUnicodeLocaleType("aa") to return "" in this example. > > Agree. > The remaining question is the Builder API - setUnicodeLocaleKeyword(String > key, String type). For now, empty string type indicate that the keyword > itself is removed from the current state and null type throws NPE. We could > change the API to use null for deletion instead of empty string. For > example, if an Builder internally represents "en-u-aa-bb-ccc", > setUnicodeLocaleKeyword("aa", null) will remove the typeless keyword "aa" - > and internal representation will be changed to "en-u-bb-ccc" after the call. > Also, setUnicodeLocaleKeyword("dd", "") will append a typeless keyword "dd" > to the internal state (that is, "en-u-aa-bb-ccc-dd"). > > Agree. > Note that setXXX with empty string is removing a field from Builder by the > current design. If we really want to change the semantics of empty string > and null in the API setUnicodeLocaleKeyword, the consistent policy should > be applied to others (for example, setLanguage(null) to remove language > field, instead of setLanguage("")). > > Disagree. language/script/country/variant are always present, in the sense that Locale.getL/S/C/V() never returns null, though they're not present in the sense that there are always three separator characters. I don't think (any more) that we need to switch to using null in the setters for these. > For supporting 3, we could treat an attribute as keyless keyword. But it > makes getUnicodeLocaleKeys()/getUnicodeLocaleType(String key) a little bit > awkward. Technically, we can still design them like that way > (getUnicodeLocaleKeys() to include an empty string in the return set / > getUnicodeLocaleType("") to return attribute subtags). I think adding extra > API dedicated for attribute is cleaner. > > Agree, though I could be persuaded to use "" as the key and a single list of attributes connected with hyphen as the 'type' for this key. > public Set getUnicodeLocaleAttributes() > > The same idea is applicable to Builder. The API dedicated for > adding/removing Unicode locale attribute like below may be added: > > public Builder addUnicodeLocaleAttribute(String attribute) > public Builder removeUnicodeLocaleAttribute(String attribute) > > Another possibility is to multiple attributes as a whole. > > public Builder setUnicodeLocaleAttribute(String attributes) > > For example, setting attribute "abc" and "def", > setUnicodeLocaleAttributes("abc-def"). If we go for this approach, we do not > need "remove" method. A tricky part is that the order of attributes does not > matter. So, semantically, "abc-def" and "def-abc" are same. We do not want > to introduce unnecessary variations, we should clearly state that the order > of attributes are not preserved. > The order should be canonicalized if they're truly separate. In this case I'd go with add/remove. The 'multiple types per key' are treated as single types with hyphen between segments in this API, and they should be treated that way without modification of the order of the components of the type. If they are truly multiple independent types for the same key, then we would need different API for that too. > > > Another question related to this - Set vs. List. Currently, > getUnicodeLocaleKeys() returns Set (actually, unmodifiable set). > Semantically, the order of keywords does not matter. "u-ca-japanese-cu-jpy" > is equivalent to "u-cu-jpy-ca-japanese". But we do use canonical order > (alphabetical order of keys) when a Locale is converted to a language tag. > From this point of view, List might be more appropriate. This also > applies to attributes. If we agree to support Unicode locale attributes with > dedicated APIs like above, we should decide if the collection of attributes > should be represented by Set or List. > > I think if the keys are unique, we should use a Set, not a List. If we really want to also specify the order then we can define these to return a SortedSet. > Overall, supporting full specification of Unicode locale extension looks > not too bad. Some may argue why we add APIs dedicated for things which are > not yet used. We could defer adding "attribute" APIs - and attribute can be > only set via Builder.setExtension('u', "...."). But necessary API addition > is pretty minimal and with these APIs, the design look more complete. > Therefore, if we are going to include any 'u' extension specific APIs, I > want to do it completely including attribute support. > > -Yoshito > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100630/46799439/attachment.html From naoto.sato at oracle.com Wed Jun 30 15:51:36 2010 From: naoto.sato at oracle.com (Naoto Sato) Date: Wed, 30 Jun 2010 15:51:36 -0700 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java In-Reply-To: <4C2BB259.9000109@gmail.com> References: <4C2BB259.9000109@gmail.com> Message-ID: <4C2BCA78.1050900@oracle.com> (6/30/10 2:08 PM), Yoshito Umaoka wrote: > First of all, I'm not trying to retract Unicode locale extension part of > proposal proactively. But I think we need to clarify the scope of our > proposal - what Unicode locale extensions meant to Java itself. > > We want to bring Unicode locale extension to Java world. Java used to > define variant to specify specific behavior variations. This model does > not fit well to BCP 47. > > Unicode locale extension give you formal/well-structured scheme for > representing a variation of locale. Java Locale ja_JP_JP is used for a > variant of locale ja_JP, just changing calendar type to be Japanese > Imperial calendar. This is Java's proprietary definition. In the current > proposal, ja_JP_JP is transformed to -u-ca-japanese. What do you mean by "transformed" here? I thought that "-u-ca-japanese" is just automatically added and "JP" variant is intact. Is it not? > > For me, adding unicode locale extension APIs in Java indicates a certain > level of commitment for supporting Unicode locale extension in Java > itself. However, we did not discuss about Java's i18n service > implementation part much so far. We only care two exceptional cases - > ja_JP_JP and th_TH_TH at this moment. But, if we once expose Unicode > locale extension in Java, Java users may expect Currency instance > created with Locale de-DE-u-cu-dem to use German Mark. > > Of course, we need a framework first. Actual use of Unicode locale > extension in Java i18n services might be done later. If we decide to add > APIs dedicated for Unicode locale extensions and defer the support in > i18n services, I think we should clearly state what Unicode locale > extension meant to Java i18n services - what are supported, what are > not, etc. I'll put this topic in the next project meeting. Let's separate implementation from the spec. Although we might add this type of explanation in the "supported locales" document, that's never been part of the spec. Naoto From dougfelt at google.com Wed Jun 30 16:07:21 2010 From: dougfelt at google.com (Doug Felt) Date: Wed, 30 Jun 2010 16:07:21 -0700 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java In-Reply-To: <4C2BCA78.1050900@oracle.com> References: <4C2BB259.9000109@gmail.com> <4C2BCA78.1050900@oracle.com> Message-ID: On Wed, Jun 30, 2010 at 3:51 PM, Naoto Sato wrote: > (6/30/10 2:08 PM), Yoshito Umaoka wrote: > >> First of all, I'm not trying to retract Unicode locale extension part of >> proposal proactively. But I think we need to clarify the scope of our >> proposal - what Unicode locale extensions meant to Java itself. >> >> We want to bring Unicode locale extension to Java world. Java used to >> define variant to specify specific behavior variations. This model does >> not fit well to BCP 47. >> >> Unicode locale extension give you formal/well-structured scheme for >> representing a variation of locale. Java Locale ja_JP_JP is used for a >> variant of locale ja_JP, just changing calendar type to be Japanese >> Imperial calendar. This is Java's proprietary definition. In the current >> proposal, ja_JP_JP is transformed to -u-ca-japanese. >> > > What do you mean by "transformed" here? I thought that "-u-ca-japanese" is > just automatically added and "JP" variant is intact. Is it not? > > JP is too short to be a valid variant value in BCP47, so when converting to a BCP47 identifier it is dropped. I believe the decision is that a Java locale created from a LocaleBuilder with -u-ca-japanese will not return JP from getVariant, but Yoshito knows for sure, I expect :-) Doug > > >> For me, adding unicode locale extension APIs in Java indicates a certain >> level of commitment for supporting Unicode locale extension in Java >> itself. However, we did not discuss about Java's i18n service >> implementation part much so far. We only care two exceptional cases - >> ja_JP_JP and th_TH_TH at this moment. But, if we once expose Unicode >> locale extension in Java, Java users may expect Currency instance >> created with Locale de-DE-u-cu-dem to use German Mark. >> >> Of course, we need a framework first. Actual use of Unicode locale >> extension in Java i18n services might be done later. If we decide to add >> APIs dedicated for Unicode locale extensions and defer the support in >> i18n services, I think we should clearly state what Unicode locale >> extension meant to Java i18n services - what are supported, what are >> not, etc. I'll put this topic in the next project meeting. >> > > Let's separate implementation from the spec. Although we might add this > type of explanation in the "supported locales" document, that's never been > part of the spec. > > Naoto > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100630/51c4fb9f/attachment.html From naoto.sato at oracle.com Wed Jun 30 16:37:05 2010 From: naoto.sato at oracle.com (Naoto Sato) Date: Wed, 30 Jun 2010 16:37:05 -0700 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java In-Reply-To: References: <4C2BB259.9000109@gmail.com> <4C2BCA78.1050900@oracle.com> Message-ID: <4C2BD521.4010504@oracle.com> I wasn't sure that Yoshito's "In the current proposal" was just for the Builder. If that's the case I am fine. I want to confirm that the variant that is created by the Locale constructor is intact, otherwise it would cause a compatibility issue. The reason I brought this up was that the current API doc ("Compatibility" section in the Locale class description) reads: "When the Locale constructor is called with the arguments "ja", "JP", "JP", this extension is automatically added. " Naoto (6/30/10 4:07 PM), Doug Felt wrote: > > > On Wed, Jun 30, 2010 at 3:51 PM, Naoto Sato > wrote: > > (6/30/10 2:08 PM), Yoshito Umaoka wrote: > > First of all, I'm not trying to retract Unicode locale extension > part of > proposal proactively. But I think we need to clarify the scope > of our > proposal - what Unicode locale extensions meant to Java itself. > > We want to bring Unicode locale extension to Java world. Java > used to > define variant to specify specific behavior variations. This > model does > not fit well to BCP 47. > > Unicode locale extension give you formal/well-structured scheme for > representing a variation of locale. Java Locale ja_JP_JP is used > for a > variant of locale ja_JP, just changing calendar type to be Japanese > Imperial calendar. This is Java's proprietary definition. In the > current > proposal, ja_JP_JP is transformed to -u-ca-japanese. > > > What do you mean by "transformed" here? I thought that > "-u-ca-japanese" is just automatically added and "JP" variant is > intact. Is it not? > > JP is too short to be a valid variant value in BCP47, so when converting > to a BCP47 identifier it is dropped. I believe the decision is that a > Java locale created from a LocaleBuilder with -u-ca-japanese will not > return JP from getVariant, but Yoshito knows for sure, I expect :-) > > Doug > > > > For me, adding unicode locale extension APIs in Java indicates a > certain > level of commitment for supporting Unicode locale extension in Java > itself. However, we did not discuss about Java's i18n service > implementation part much so far. We only care two exceptional > cases - > ja_JP_JP and th_TH_TH at this moment. But, if we once expose Unicode > locale extension in Java, Java users may expect Currency instance > created with Locale de-DE-u-cu-dem to use German Mark. > > Of course, we need a framework first. Actual use of Unicode locale > extension in Java i18n services might be done later. If we > decide to add > APIs dedicated for Unicode locale extensions and defer the > support in > i18n services, I think we should clearly state what Unicode locale > extension meant to Java i18n services - what are supported, what are > not, etc. I'll put this topic in the next project meeting. > > > Let's separate implementation from the spec. Although we might add > this type of explanation in the "supported locales" document, that's > never been part of the spec. > > Naoto > > From y.umaoka at gmail.com Wed Jun 30 17:50:21 2010 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Wed, 30 Jun 2010 20:50:21 -0400 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java In-Reply-To: <4C2BD521.4010504@oracle.com> References: <4C2BB259.9000109@gmail.com> <4C2BCA78.1050900@oracle.com> <4C2BD521.4010504@oracle.com> Message-ID: <4C2BE64D.4020505@gmail.com> Sorry for the confusion. The term "transform" was really ambiguous. The current spec is not clear about this. But as you pointed out, the current API doc is accurate. Below is the currently proposed implementation - new Locale("ja", "JP", "JP") will create a Locale with language - ja country - JP variant - JP extension - u-ca-japanese That means, the extension is appended. The same behavior when a Locale created from older version of Java is deserialized on Java 7. new Locale("ja", "JP", "JP").toLanguageTag(); will return "ja-JP-u-ca-japanese-x-jvariant-JP" Locale.forLanguageTag("ja-JP-u-ca-japanese-x-jvariant-JP"); will return a Locale same with new Locale("ja", "JP", "JP"). Locale.forLanguageTag("ja-JP-u-ca-japanese-x-jvariant-jp"); will return a Locale with language - ja country - JP variant - jp extension - u-ca-japanese new Builder().setLocale(new Locale("ja", "JP", "JP")).build(); won't throw any exceptions. The Locale returned by above will be same with new Locale("ja", "JP", "JP"). This is an exceptional case (only 3 exceptions - ja_JP_JP / th_TH_TH / no_NO_NY). new Builder().setLocale(new Locale("ja", "", "JP")); will throw IllformedLocaleException. new Builder().setLangauge("ja").setRegion("JP").setVariant("JP"); will also throw IllformedLocaleException. new Builder().setLanguage("ja").setRegion("JP").setUnicodeLocaleKeyword("ca", "japanese").build(); This is a questionable case - but I think it should not set "JP" to variant. -Yoshito Naoto Sato wrote: > I wasn't sure that Yoshito's "In the current proposal" was just for > the Builder. If that's the case I am fine. I want to confirm that the > variant that is created by the Locale constructor is intact, otherwise > it would cause a compatibility issue. > > The reason I brought this up was that the current API doc > ("Compatibility" section in the Locale class description) reads: > > "When the Locale constructor is called with the arguments "ja", "JP", > "JP", this extension is automatically added. " > > Naoto > > (6/30/10 4:07 PM), Doug Felt wrote: >> >> >> On Wed, Jun 30, 2010 at 3:51 PM, Naoto Sato > > wrote: >> >> (6/30/10 2:08 PM), Yoshito Umaoka wrote: >> >> First of all, I'm not trying to retract Unicode locale extension >> part of >> proposal proactively. But I think we need to clarify the scope >> of our >> proposal - what Unicode locale extensions meant to Java itself. >> >> We want to bring Unicode locale extension to Java world. Java >> used to >> define variant to specify specific behavior variations. This >> model does >> not fit well to BCP 47. >> >> Unicode locale extension give you formal/well-structured >> scheme for >> representing a variation of locale. Java Locale ja_JP_JP is used >> for a >> variant of locale ja_JP, just changing calendar type to be >> Japanese >> Imperial calendar. This is Java's proprietary definition. In the >> current >> proposal, ja_JP_JP is transformed to -u-ca-japanese. >> >> >> What do you mean by "transformed" here? I thought that >> "-u-ca-japanese" is just automatically added and "JP" variant is >> intact. Is it not? >> >> JP is too short to be a valid variant value in BCP47, so when converting >> to a BCP47 identifier it is dropped. I believe the decision is that a >> Java locale created from a LocaleBuilder with -u-ca-japanese will not >> return JP from getVariant, but Yoshito knows for sure, I expect :-) >> >> Doug >> >> >> >> For me, adding unicode locale extension APIs in Java indicates a >> certain >> level of commitment for supporting Unicode locale extension >> in Java >> itself. However, we did not discuss about Java's i18n service >> implementation part much so far. We only care two exceptional >> cases - >> ja_JP_JP and th_TH_TH at this moment. But, if we once expose >> Unicode >> locale extension in Java, Java users may expect Currency >> instance >> created with Locale de-DE-u-cu-dem to use German Mark. >> >> Of course, we need a framework first. Actual use of Unicode >> locale >> extension in Java i18n services might be done later. If we >> decide to add >> APIs dedicated for Unicode locale extensions and defer the >> support in >> i18n services, I think we should clearly state what Unicode >> locale >> extension meant to Java i18n services - what are supported, >> what are >> not, etc. I'll put this topic in the next project meeting. >> >> >> Let's separate implementation from the spec. Although we might add >> this type of explanation in the "supported locales" document, that's >> never been part of the spec. >> >> Naoto >> >> > > From dougfelt at google.com Wed Jun 30 17:55:08 2010 From: dougfelt at google.com (Doug Felt) Date: Wed, 30 Jun 2010 17:55:08 -0700 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java In-Reply-To: <4C2BD521.4010504@oracle.com> References: <4C2BB259.9000109@gmail.com> <4C2BCA78.1050900@oracle.com> <4C2BD521.4010504@oracle.com> Message-ID: The constructors maintain the variant for compatibility; that is, a subsequent call to Locale.getVariant must return "JP". LocaleBuilder.setVariant("JP") I think will throw an exception since the value is invalid and it expects to validate the input (that's part of the value of the builder). Once constructed: Locale.forLanguageTag (I believe) will ignore the invalid value but process the rest of the tag, unlike LocaleBuilder it tries to be lenient. Locale.toLanguageTag will omit the variant, since it's syntactically invalid, and add "-u-ca-japanese" Locale.getExtension('u') will return (I believe) "ca-japanese". The serialized form of the Locale, when read by Java 6, should see the "JA" variant though not, of course, the extension. Haven't tested this, though. Doug On Wed, Jun 30, 2010 at 4:37 PM, Naoto Sato wrote: > I wasn't sure that Yoshito's "In the current proposal" was just for the > Builder. If that's the case I am fine. I want to confirm that the variant > that is created by the Locale constructor is intact, otherwise it would > cause a compatibility issue. > > The reason I brought this up was that the current API doc ("Compatibility" > section in the Locale class description) reads: > > "When the Locale constructor is called with the arguments "ja", "JP", "JP", > this extension is automatically added. " > > Naoto > > > (6/30/10 4:07 PM), Doug Felt wrote: > >> >> >> On Wed, Jun 30, 2010 at 3:51 PM, Naoto Sato > > wrote: >> >> (6/30/10 2:08 PM), Yoshito Umaoka wrote: >> >> First of all, I'm not trying to retract Unicode locale extension >> part of >> proposal proactively. But I think we need to clarify the scope >> of our >> proposal - what Unicode locale extensions meant to Java itself. >> >> We want to bring Unicode locale extension to Java world. Java >> used to >> define variant to specify specific behavior variations. This >> model does >> not fit well to BCP 47. >> >> Unicode locale extension give you formal/well-structured scheme for >> representing a variation of locale. Java Locale ja_JP_JP is used >> for a >> variant of locale ja_JP, just changing calendar type to be Japanese >> Imperial calendar. This is Java's proprietary definition. In the >> current >> proposal, ja_JP_JP is transformed to -u-ca-japanese. >> >> >> What do you mean by "transformed" here? I thought that >> "-u-ca-japanese" is just automatically added and "JP" variant is >> intact. Is it not? >> >> JP is too short to be a valid variant value in BCP47, so when converting >> to a BCP47 identifier it is dropped. I believe the decision is that a >> Java locale created from a LocaleBuilder with -u-ca-japanese will not >> return JP from getVariant, but Yoshito knows for sure, I expect :-) >> >> Doug >> >> >> >> For me, adding unicode locale extension APIs in Java indicates a >> certain >> level of commitment for supporting Unicode locale extension in Java >> itself. However, we did not discuss about Java's i18n service >> implementation part much so far. We only care two exceptional >> cases - >> ja_JP_JP and th_TH_TH at this moment. But, if we once expose >> Unicode >> locale extension in Java, Java users may expect Currency instance >> created with Locale de-DE-u-cu-dem to use German Mark. >> >> Of course, we need a framework first. Actual use of Unicode locale >> extension in Java i18n services might be done later. If we >> decide to add >> APIs dedicated for Unicode locale extensions and defer the >> support in >> i18n services, I think we should clearly state what Unicode locale >> extension meant to Java i18n services - what are supported, what >> are >> not, etc. I'll put this topic in the next project meeting. >> >> >> Let's separate implementation from the spec. Although we might add >> this type of explanation in the "supported locales" document, that's >> never been part of the spec. >> >> Naoto >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100630/7122adb0/attachment-0001.html From dougfelt at google.com Wed Jun 30 17:58:40 2010 From: dougfelt at google.com (Doug Felt) Date: Wed, 30 Jun 2010 17:58:40 -0700 Subject: [loc-en-dev] Unicode locale extension - what it meant to Java In-Reply-To: References: <4C2BB259.9000109@gmail.com> <4C2BCA78.1050900@oracle.com> <4C2BD521.4010504@oracle.com> Message-ID: Heh, Yoshito answered more thoroughly than I did and I'd forgotten about including the illegal variant as an x-jvariant value. The moral is, never trust me on answers to these questions, always wait for Yoshito to reply :-) Doug On Wed, Jun 30, 2010 at 5:55 PM, Doug Felt wrote: > The constructors maintain the variant for compatibility; that is, a > subsequent call to Locale.getVariant must return "JP". > LocaleBuilder.setVariant("JP") I think will throw an exception since the > value is invalid and it expects to validate the input (that's part of the > value of the builder). > > Once constructed: > > Locale.forLanguageTag (I believe) will ignore the invalid value but process > the rest of the tag, unlike LocaleBuilder it tries to be lenient. > > Locale.toLanguageTag will omit the variant, since it's syntactically > invalid, and add "-u-ca-japanese" > > Locale.getExtension('u') will return (I believe) "ca-japanese". > > The serialized form of the Locale, when read by Java 6, should see the "JA" > variant though not, of course, the extension. > > Haven't tested this, though. > > Doug > > > On Wed, Jun 30, 2010 at 4:37 PM, Naoto Sato wrote: > >> I wasn't sure that Yoshito's "In the current proposal" was just for the >> Builder. If that's the case I am fine. I want to confirm that the variant >> that is created by the Locale constructor is intact, otherwise it would >> cause a compatibility issue. >> >> The reason I brought this up was that the current API doc ("Compatibility" >> section in the Locale class description) reads: >> >> "When the Locale constructor is called with the arguments "ja", "JP", >> "JP", this extension is automatically added. " >> >> Naoto >> >> >> (6/30/10 4:07 PM), Doug Felt wrote: >> >>> >>> >>> On Wed, Jun 30, 2010 at 3:51 PM, Naoto Sato >> > wrote: >>> >>> (6/30/10 2:08 PM), Yoshito Umaoka wrote: >>> >>> First of all, I'm not trying to retract Unicode locale extension >>> part of >>> proposal proactively. But I think we need to clarify the scope >>> of our >>> proposal - what Unicode locale extensions meant to Java itself. >>> >>> We want to bring Unicode locale extension to Java world. Java >>> used to >>> define variant to specify specific behavior variations. This >>> model does >>> not fit well to BCP 47. >>> >>> Unicode locale extension give you formal/well-structured scheme >>> for >>> representing a variation of locale. Java Locale ja_JP_JP is used >>> for a >>> variant of locale ja_JP, just changing calendar type to be >>> Japanese >>> Imperial calendar. This is Java's proprietary definition. In the >>> current >>> proposal, ja_JP_JP is transformed to -u-ca-japanese. >>> >>> >>> What do you mean by "transformed" here? I thought that >>> "-u-ca-japanese" is just automatically added and "JP" variant is >>> intact. Is it not? >>> >>> JP is too short to be a valid variant value in BCP47, so when converting >>> to a BCP47 identifier it is dropped. I believe the decision is that a >>> Java locale created from a LocaleBuilder with -u-ca-japanese will not >>> return JP from getVariant, but Yoshito knows for sure, I expect :-) >>> >>> Doug >>> >>> >>> >>> For me, adding unicode locale extension APIs in Java indicates a >>> certain >>> level of commitment for supporting Unicode locale extension in >>> Java >>> itself. However, we did not discuss about Java's i18n service >>> implementation part much so far. We only care two exceptional >>> cases - >>> ja_JP_JP and th_TH_TH at this moment. But, if we once expose >>> Unicode >>> locale extension in Java, Java users may expect Currency instance >>> created with Locale de-DE-u-cu-dem to use German Mark. >>> >>> Of course, we need a framework first. Actual use of Unicode locale >>> extension in Java i18n services might be done later. If we >>> decide to add >>> APIs dedicated for Unicode locale extensions and defer the >>> support in >>> i18n services, I think we should clearly state what Unicode locale >>> extension meant to Java i18n services - what are supported, what >>> are >>> not, etc. I'll put this topic in the next project meeting. >>> >>> >>> Let's separate implementation from the spec. Although we might add >>> this type of explanation in the "supported locales" document, that's >>> never been part of the spec. >>> >>> Naoto >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100630/90919a11/attachment.html From staudacher at google.com Wed Jun 30 23:26:41 2010 From: staudacher at google.com (Andy Staudacher) Date: Wed, 30 Jun 2010 23:26:41 -0700 Subject: [loc-en-dev] -u- extension vs. other extensions In-Reply-To: <4C2B8420.50709@gmail.com> References: <4C2B8420.50709@gmail.com> Message-ID: On Wed, Jun 30, 2010 at 10:51 AM, Yoshito Umaoka wrote: > Hi all, > > We agreed that we validate syntax of subtags, but do not validate code > itself in Java. In other words, proposed implementation won't invalidate > language subtag "xx" although the use of such code is not valid for BCP 47 > language tag. > > In BCP47, extension is defined as: > > extension = singleton 1*("-" (2*8alphanum) > > When the previous proposal was written last year, the Unicode locale > extension ('u' extension) only allows key/type subtag pairs. In BNF, > > unicode_locale_extensions = sep "u" 1*(sep keyword) > keyword = key sep type > key = 2alphanum > type = 3*8alphanum > > This require special syntax validation for 'u' extension. For example, > > 1, extension "a-abc-de" is syntactically valid > 2. extension "u-abc-de" is syntactically invalid, because it does not > satisfy the requirement for 'u' extension (key(2alphanum) must be followed > right after singleton, key must have its type(3*8alphanum). > > > 'u' extension was updated in the final spec as below: > > unicode_locale_extensions = sep "u" ( > 1*(sep keyword) > / 1*(sep attribute) *(sep > keyword) > ) > keyword = key [sep type] > key = 2alphanum > type = 3*8alphanum * (sep 3*8alphanum) > attribute = 3*8alphanum > > > This change - 1. subtags in the form of 3*8alpha before the first > occurrence of key (2*alphanum) is interpreted as attributes, 2. key subtag > might not be followed by type, 3. type might be represented by multiple > subtags in the form of 3*8alphanum - actually eliminates the special syntax > requirements for 'u' extension. With the updated specification, extension > subtags satisfying the BCP47 extension syntax are also satisfying the 'u' > extension. For example, "u-abc-de" is interpreted as attribute "abc" and > typeless key "de". (Note that this specific tag is illegal because "abc" is > not a registered attribute and "de" is not a known key value) > > With this change, we do not need any special coding for handling 'u' > extension in the API - Builder#setExtension. This also means that we do not > need to add special implementation dedicated for 'u' extension even we do > not add the Unicode locale extension APIs (such as > Builder#setUnicodeLocaleKeyword). Indeed. Great insight! The "u" singleton can be followed by alphanum{2} or alphanum{3,8} and any alphanum{3,8} can (but doesn't have to) be followed by a alphanum{3,8} or a alphanum{2}, and vice-versa. I.e. "u" must be followed by 1*("-" (2*8alphanum)), which is the same syntax any BCP 47 extension must satisfy. . Let's document this as a series of test cases with some comments for the tests. I could take on this task if you want me to. Thanks, - Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/locale-enhancement-dev/attachments/20100630/13a02e98/attachment.html