From masayoshi.okutsu at oracle.com Wed Apr 3 05:36:12 2013 From: masayoshi.okutsu at oracle.com (Masayoshi Okutsu) Date: Wed, 03 Apr 2013 21:36:12 +0900 Subject: [8]Request for review: 7091601: Arabic Locale: can not set type of digit in application level In-Reply-To: <51560B3E.1030804@oracle.com> References: <514CBDA6.3090302@oracle.com> <51547F19.4040305@oracle.com> <51560B3E.1030804@oracle.com> Message-ID: <515C223C.5020701@oracle.com> Looks good to me. Masayoshi On 3/30/2013 6:44 AM, Naoto Sato wrote: > Revised the macosx portion of the changeset again. Reverted the code > that obtains CFLocaleRef back to using CFLocaleCopyCurrent(), > otherwise the user's cusomization would not be reflected. Here is the > revised webrev: > > http://cr.openjdk.java.net/~naoto/7091601/webrev.02/ > > Naoto > > On 3/28/13 10:34 AM, Naoto Sato wrote: >> Updated the changeset according to an internal review (using >> CFStringGetCharacterAtIndex() instead of converting entire string on >> MacOSX). Here is the revised webrev: >> >> http://cr.openjdk.java.net/~naoto/7091601/webrev.01/ >> >> Naoto >> >> On 3/22/13 1:23 PM, Naoto Sato wrote: >>> Hello, >>> >>> Please review the changes for the following bug: >>> >>> http://bugs.sun.com/view_bug.do?bug_id=7091601 >>> >>> The idea is to reflect the operating system's settings in the HOST >>> locale provider adapter. Also fixed a bug in MacOSX code conversion for >>> the zero digit. The webrev is available here: >>> >>> http://cr.openjdk.java.net/~naoto/7091601/webrev.00/ >>> >>> Naoto >> > From masayoshi.okutsu at oracle.com Wed Apr 3 09:09:45 2013 From: masayoshi.okutsu at oracle.com (Masayoshi Okutsu) Date: Thu, 04 Apr 2013 01:09:45 +0900 Subject: Review request: Splitting locale resources (FormatData) for java.time classes Message-ID: <515C5449.5010109@oracle.com> Hi, I've made changes for splitting locale resources into FormatData required by the legacy i18n classes and JavaTimeSupplementary required by the java.time classes. The reason of this split is to make locale resources maintenance easier. Changes are mostly in the locale data adapter side. Here are concrete changes in a random order: - Split FormatData files for JRE into the FormatData files and their corresponding JavaTimeSupplementary files. The supplementary data is added to its FormatData on demand at runtime. This is to avoid loading java.time specific resources in case they are not used. All JavaTimeSupplementary files were generated by a tool. But the tool isn't included in webrev. It's still a mess. - Changed prefix "cldr." to "java.time." for java.time specific resources. - "java.time.*DatePattrens" resources are now converted from legacy JRE resources rather than from CLDR. This change required the TestNonIsoFormatter.java change. - Added missing resources (due to a tool bug) to FormatData files. - No format changes to FormatData files generated from CLDR (XML). - Added ParallelListResourceBundle which supports additional contents (key-value pairs). FormatData* classes are now ParallelListResourceBundle subclasses. sun/util/resources/LocaleData takes care of adding supplementary data at runtime. - Renamed CalendarDataUtility.retrieveCldr* to .retrieveJavaTime*. - Cleaned up OpenListResourceBundle. - CLDR Converter Tool now takes "approved" and "contributed" data items because there are too many missing elements in arrays. However, FormatData and JavaTimeSupplementary files for JRE have only approved items. - Changed some copyright text and removed "DO NOT EDIT" comment lines. I don't believe those files can be re-generated using the older version of CLDR Converter Tool. There are some remaining work items. - Need more verification of the actual locale resources. - Clean up the JavaTimeSupplementary generator tool. - Add a test for ParallelListResourceBundl, which is almost ready for review, but it requires a bug ID for the @bug tag of jtreg. - Clean up test/sun/text/resources/LocaleData with the additional resources. Webrev: http://cr.openjdk.java.net/~okutsu/310/resourcesplit/webrev.00/ Thanks, Masayoshi From scolebourne at joda.org Wed Apr 3 09:28:45 2013 From: scolebourne at joda.org (Stephen Colebourne) Date: Wed, 3 Apr 2013 17:28:45 +0100 Subject: [threeten-dev] Review request: Splitting locale resources (FormatData) for java.time classes In-Reply-To: <515C5449.5010109@oracle.com> References: <515C5449.5010109@oracle.com> Message-ID: The changes sound sensible and I didn't see anything wrong in a quick look through. Stephen On 3 April 2013 17:09, Masayoshi Okutsu wrote: > Hi, > > I've made changes for splitting locale resources into FormatData required by > the legacy i18n classes and JavaTimeSupplementary required by the java.time > classes. The reason of this split is to make locale resources maintenance > easier. Changes are mostly in the locale data adapter side. > > Here are concrete changes in a random order: > > - Split FormatData files for JRE into the FormatData files and their > corresponding JavaTimeSupplementary files. The supplementary data is added > to its FormatData on demand at runtime. This is to avoid loading java.time > specific resources in case they are not used. All JavaTimeSupplementary > files were generated by a tool. But the tool isn't included in webrev. It's > still a mess. > > - Changed prefix "cldr." to "java.time." for java.time specific resources. > > - "java.time.*DatePattrens" resources are now converted from legacy JRE > resources rather than from CLDR. This change required the > TestNonIsoFormatter.java change. > > - Added missing resources (due to a tool bug) to FormatData files. > > - No format changes to FormatData files generated from CLDR (XML). > > - Added ParallelListResourceBundle which supports additional contents > (key-value pairs). FormatData* classes are now ParallelListResourceBundle > subclasses. sun/util/resources/LocaleData takes care of adding supplementary > data at runtime. > > - Renamed CalendarDataUtility.retrieveCldr* to .retrieveJavaTime*. > > - Cleaned up OpenListResourceBundle. > > - CLDR Converter Tool now takes "approved" and "contributed" data items > because there are too many missing elements in arrays. However, FormatData > and JavaTimeSupplementary files for JRE have only approved items. > > - Changed some copyright text and removed "DO NOT EDIT" comment lines. I > don't believe those files can be re-generated using the older version of > CLDR Converter Tool. > > There are some remaining work items. > > - Need more verification of the actual locale resources. > > - Clean up the JavaTimeSupplementary generator tool. > > - Add a test for ParallelListResourceBundl, which is almost ready for > review, but it requires a bug ID for the @bug tag of jtreg. > > - Clean up test/sun/text/resources/LocaleData with the additional resources. > > Webrev: > http://cr.openjdk.java.net/~okutsu/310/resourcesplit/webrev.00/ > > Thanks, > Masayoshi > > From naoto.sato at oracle.com Wed Apr 3 14:09:53 2013 From: naoto.sato at oracle.com (Naoto Sato) Date: Wed, 03 Apr 2013 14:09:53 -0700 Subject: Review request: Splitting locale resources (FormatData) for java.time classes In-Reply-To: <515C5449.5010109@oracle.com> References: <515C5449.5010109@oracle.com> Message-ID: <515C9AA1.7080608@oracle.com> Looks good overall. Some minor comments: - sun/util/locale/provider/CalendarNameProviderImpl.java, line 234: "cldr" -> "javatime" - sun/util/locale/provider/LocaleResources.java, line 370: Should this method be renamed to getJavaTime...()? - src/share/classes/sun/util/resources/LocaleData.java, line 128: Is there any chance that this assertion would fail with FALLBACK adapter? Naoto On 4/3/13 9:09 AM, Masayoshi Okutsu wrote: > Hi, > > I've made changes for splitting locale resources into FormatData > required by the legacy i18n classes and JavaTimeSupplementary required > by the java.time classes. The reason of this split is to make locale > resources maintenance easier. Changes are mostly in the locale data > adapter side. > > Here are concrete changes in a random order: > > - Split FormatData files for JRE into the FormatData files and their > corresponding JavaTimeSupplementary files. The supplementary data is > added to its FormatData on demand at runtime. This is to avoid loading > java.time specific resources in case they are not used. All > JavaTimeSupplementary files were generated by a tool. But the tool isn't > included in webrev. It's still a mess. > > - Changed prefix "cldr." to "java.time." for java.time specific resources. > > - "java.time.*DatePattrens" resources are now converted from legacy JRE > resources rather than from CLDR. This change required the > TestNonIsoFormatter.java change. > > - Added missing resources (due to a tool bug) to FormatData files. > > - No format changes to FormatData files generated from CLDR (XML). > > - Added ParallelListResourceBundle which supports additional contents > (key-value pairs). FormatData* classes are now > ParallelListResourceBundle subclasses. sun/util/resources/LocaleData > takes care of adding supplementary data at runtime. > > - Renamed CalendarDataUtility.retrieveCldr* to .retrieveJavaTime*. > > - Cleaned up OpenListResourceBundle. > > - CLDR Converter Tool now takes "approved" and "contributed" data items > because there are too many missing elements in arrays. However, > FormatData and JavaTimeSupplementary files for JRE have only approved > items. > > - Changed some copyright text and removed "DO NOT EDIT" comment lines. I > don't believe those files can be re-generated using the older version of > CLDR Converter Tool. > > There are some remaining work items. > > - Need more verification of the actual locale resources. > > - Clean up the JavaTimeSupplementary generator tool. > > - Add a test for ParallelListResourceBundl, which is almost ready for > review, but it requires a bug ID for the @bug tag of jtreg. > > - Clean up test/sun/text/resources/LocaleData with the additional > resources. > > Webrev: > http://cr.openjdk.java.net/~okutsu/310/resourcesplit/webrev.00/ > > Thanks, > Masayoshi > > From masayoshi.okutsu at oracle.com Thu Apr 4 01:13:14 2013 From: masayoshi.okutsu at oracle.com (Masayoshi Okutsu) Date: Thu, 04 Apr 2013 17:13:14 +0900 Subject: Review request: Splitting locale resources (FormatData) for java.time classes In-Reply-To: <515C9AA1.7080608@oracle.com> References: <515C5449.5010109@oracle.com> <515C9AA1.7080608@oracle.com> Message-ID: <515D361A.40301@oracle.com> Thanks for catching them. I've pushed the changes with fixes to the threeten repo. The assertion was a leftover of debugging code and removed. Masayoshi On 4/4/2013 6:09 AM, Naoto Sato wrote: > Looks good overall. Some minor comments: > > - sun/util/locale/provider/CalendarNameProviderImpl.java, line 234: > "cldr" -> "javatime" > > - sun/util/locale/provider/LocaleResources.java, line 370: Should this > method be renamed to getJavaTime...()? > > - src/share/classes/sun/util/resources/LocaleData.java, line 128: Is > there any chance that this assertion would fail with FALLBACK adapter? > > Naoto > > On 4/3/13 9:09 AM, Masayoshi Okutsu wrote: >> Hi, >> >> I've made changes for splitting locale resources into FormatData >> required by the legacy i18n classes and JavaTimeSupplementary required >> by the java.time classes. The reason of this split is to make locale >> resources maintenance easier. Changes are mostly in the locale data >> adapter side. >> >> Here are concrete changes in a random order: >> >> - Split FormatData files for JRE into the FormatData files and their >> corresponding JavaTimeSupplementary files. The supplementary data is >> added to its FormatData on demand at runtime. This is to avoid loading >> java.time specific resources in case they are not used. All >> JavaTimeSupplementary files were generated by a tool. But the tool isn't >> included in webrev. It's still a mess. >> >> - Changed prefix "cldr." to "java.time." for java.time specific >> resources. >> >> - "java.time.*DatePattrens" resources are now converted from legacy JRE >> resources rather than from CLDR. This change required the >> TestNonIsoFormatter.java change. >> >> - Added missing resources (due to a tool bug) to FormatData files. >> >> - No format changes to FormatData files generated from CLDR (XML). >> >> - Added ParallelListResourceBundle which supports additional contents >> (key-value pairs). FormatData* classes are now >> ParallelListResourceBundle subclasses. sun/util/resources/LocaleData >> takes care of adding supplementary data at runtime. >> >> - Renamed CalendarDataUtility.retrieveCldr* to .retrieveJavaTime*. >> >> - Cleaned up OpenListResourceBundle. >> >> - CLDR Converter Tool now takes "approved" and "contributed" data items >> because there are too many missing elements in arrays. However, >> FormatData and JavaTimeSupplementary files for JRE have only approved >> items. >> >> - Changed some copyright text and removed "DO NOT EDIT" comment lines. I >> don't believe those files can be re-generated using the older version of >> CLDR Converter Tool. >> >> There are some remaining work items. >> >> - Need more verification of the actual locale resources. >> >> - Clean up the JavaTimeSupplementary generator tool. >> >> - Add a test for ParallelListResourceBundl, which is almost ready for >> review, but it requires a bug ID for the @bug tag of jtreg. >> >> - Clean up test/sun/text/resources/LocaleData with the additional >> resources. >> >> Webrev: >> http://cr.openjdk.java.net/~okutsu/310/resourcesplit/webrev.00/ >> >> Thanks, >> Masayoshi >> >> > From yong.huang at oracle.com Sun Apr 14 22:46:06 2013 From: yong.huang at oracle.com (Yong Huang) Date: Mon, 15 Apr 2013 13:46:06 +0800 Subject: Review Request - 8011977: ISO 4217 Amendment Number 155 Message-ID: <516B941E.4040600@oracle.com> Hello, This is the review request for https://jbs.oracle.com/bugs/browse/JDK-8011977. Webrev: http://cr.openjdk.java.net/~yhuang/8011977/webrev.00/ thanks, Yong From naoto.sato at oracle.com Mon Apr 15 11:32:54 2013 From: naoto.sato at oracle.com (Naoto Sato) Date: Mon, 15 Apr 2013 11:32:54 -0700 Subject: Review Request - 8011977: ISO 4217 Amendment Number 155 In-Reply-To: <516B941E.4040600@oracle.com> References: <516B941E.4040600@oracle.com> Message-ID: <516C47D6.6080306@oracle.com> Looks good to me. Naoto On 4/14/13 10:46 PM, Yong Huang wrote: > Hello, > > This is the review request for > https://jbs.oracle.com/bugs/browse/JDK-8011977. > > Webrev: http://cr.openjdk.java.net/~yhuang/8011977/webrev.00/ > > thanks, > Yong From naoto.sato at oracle.com Thu Apr 18 13:36:40 2013 From: naoto.sato at oracle.com (Naoto Sato) Date: Thu, 18 Apr 2013 13:36:40 -0700 Subject: [8]Request for review - 8010666: Implement Currency/LocaleNameProvider in Windows Host LocaleProviderAdapter Message-ID: <51705958.4080001@oracle.com> Hello, Please review the changes for the following CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010666 Here is the webrev for the changes: http://cr.openjdk.java.net/~naoto/8010666/webrev.00/ Windows APIs for retrieving display names are limited to the current user's UI language, so the HOST adapter cannot provide arbitrary localized names like JDK. Also, I changed to retrieve numeric values with GetLocaleInfoEx directly with LOCALE_RETURN_NUMBER flag (previously it first retrieved via a string, then turned it into number with wtoi() function). Naoto From masayoshi.okutsu at oracle.com Mon Apr 22 07:03:12 2013 From: masayoshi.okutsu at oracle.com (Masayoshi Okutsu) Date: Mon, 22 Apr 2013 23:03:12 +0900 Subject: [8]Request for review - 8010666: Implement Currency/LocaleNameProvider in Windows Host LocaleProviderAdapter In-Reply-To: <51705958.4080001@oracle.com> References: <51705958.4080001@oracle.com> Message-ID: <51754320.1070807@oracle.com> Looks good to me. Masayoshi On 4/19/2013 5:36 AM, Naoto Sato wrote: > Hello, > > Please review the changes for the following CR: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010666 > > Here is the webrev for the changes: > > http://cr.openjdk.java.net/~naoto/8010666/webrev.00/ > > Windows APIs for retrieving display names are limited to the current > user's UI language, so the HOST adapter cannot provide arbitrary > localized names like JDK. Also, I changed to retrieve numeric values > with GetLocaleInfoEx directly with LOCALE_RETURN_NUMBER flag > (previously it first retrieved via a string, then turned it into > number with wtoi() function). > > Naoto From naoto.sato at oracle.com Thu Apr 25 14:21:56 2013 From: naoto.sato at oracle.com (Naoto Sato) Date: Thu, 25 Apr 2013 14:21:56 -0700 Subject: [8] RFR: 8013086 : NPE thrown by SimpleDateFormat with TimeZoneNameProvider supplied Message-ID: <51799E74.1020607@oracle.com> Hello, Please review the fix for the following bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8013086 The fix is to complement the missing display names with the provided ones for TimeZoneNameProvider implementations. http://cr.openjdk.java.net/~naoto/8013086/webrev.00/ Naoto From masayoshi.okutsu at oracle.com Fri Apr 26 01:46:21 2013 From: masayoshi.okutsu at oracle.com (Masayoshi Okutsu) Date: Fri, 26 Apr 2013 17:46:21 +0900 Subject: [8] RFR: 8013086 : NPE thrown by SimpleDateFormat with TimeZoneNameProvider supplied In-Reply-To: <51799E74.1020607@oracle.com> References: <51799E74.1020607@oracle.com> Message-ID: <517A3EDD.6020200@oracle.com> I'd suggest that the for loop and the following if-statement be combined and optimized. Masayoshi On 4/26/2013 6:21 AM, Naoto Sato wrote: > Hello, > > Please review the fix for the following bug: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8013086 > > The fix is to complement the missing display names with the provided > ones for TimeZoneNameProvider implementations. > > http://cr.openjdk.java.net/~naoto/8013086/webrev.00/ > > Naoto From y.umaoka at gmail.com Fri Apr 26 08:57:51 2013 From: y.umaoka at gmail.com (Yoshito Umaoka) Date: Fri, 26 Apr 2013 11:57:51 -0400 Subject: [8] RFR: 8013086 : NPE thrown by SimpleDateFormat with TimeZoneNameProvider supplied In-Reply-To: <51799E74.1020607@oracle.com> References: <51799E74.1020607@oracle.com> Message-ID: <517AA3FF.3040503@gmail.com> On 4/25/2013 5:21 PM, Naoto Sato wrote: > Hello, > > Please review the fix for the following bug: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8013086 > > The fix is to complement the missing display names with the provided > ones for TimeZoneNameProvider implementations. > > http://cr.openjdk.java.net/~naoto/8013086/webrev.00/ > > Naoto First of all, thanks for looking into this bug. I believe I filed the bug a while ago. I'm afraid that the fix above is not what I wanted. Your proposed fix - fallback to standard name when daylight name is missing, and long name when short name is missing. I think the right fix would be querying a localized name from another provide in the provider chain. I think a typical use case of custom locale provider is to replace the stock JDK locale data with user's own data. My understanding is that this was not available in Java 7, but Java 8 can be configured to use a custom provider first, then fallback to others (including the stock JDK locale provider). -Yoshito From naoto.sato at oracle.com Fri Apr 26 09:48:25 2013 From: naoto.sato at oracle.com (Naoto Sato) Date: Fri, 26 Apr 2013 09:48:25 -0700 Subject: [8] RFR: 8013086 : NPE thrown by SimpleDateFormat with TimeZoneNameProvider supplied In-Reply-To: <517AA3FF.3040503@gmail.com> References: <51799E74.1020607@oracle.com> <517AA3FF.3040503@gmail.com> Message-ID: <517AAFD9.2050703@oracle.com> On 4/26/13 8:57 AM, Yoshito Umaoka wrote: > I'm afraid that the fix above is not what I wanted. Your proposed fix - > fallback to standard name when daylight name is missing, and long name > when short name is missing. > > I think the right fix would be querying a localized name from another > provide in the provider chain. I think a typical use case of custom > locale provider is to replace the stock JDK locale data with user's own > data. My understanding is that this was not available in Java 7, but > Java 8 can be configured to use a custom provider first, then fallback > to others (including the stock JDK locale provider). I thought about it, but decided not to do that because it would introduce unwanted complexity (array item based fallback) for sort of little benefit. If the SPI implementation wants to override the JRE's display names by declaring to support that locale, it would not be too much to require the SPI implementation to return all the display names. Naoto From naoto.sato at oracle.com Fri Apr 26 11:13:48 2013 From: naoto.sato at oracle.com (Naoto Sato) Date: Fri, 26 Apr 2013 11:13:48 -0700 Subject: [8] RFR: 8013086 : NPE thrown by SimpleDateFormat with TimeZoneNameProvider supplied In-Reply-To: <517A3EDD.6020200@oracle.com> References: <51799E74.1020607@oracle.com> <517A3EDD.6020200@oracle.com> Message-ID: <517AC3DC.6010809@oracle.com> Thank you for the comment. Updated the fix (just moved the following if in front of the for-loop). http://cr.openjdk.java.net/~naoto/8013086/webrev.01/ Naoto On 4/26/13 1:46 AM, Masayoshi Okutsu wrote: > I'd suggest that the for loop and the following if-statement be combined > and optimized. > > Masayoshi > > On 4/26/2013 6:21 AM, Naoto Sato wrote: >> Hello, >> >> Please review the fix for the following bug: >> >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8013086 >> >> The fix is to complement the missing display names with the provided >> ones for TimeZoneNameProvider implementations. >> >> http://cr.openjdk.java.net/~naoto/8013086/webrev.00/ >> >> Naoto > From xueming.shen at oracle.com Tue Apr 30 10:03:10 2013 From: xueming.shen at oracle.com (Xueming Shen) Date: Tue, 30 Apr 2013 10:03:10 -0700 Subject: Fwd: RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control} Message-ID: <517FF94E.1050502@oracle.com> -------- Original Message -------- Message-ID: <517FF8F8.3080208 at oracle.com> Date: Tue, 30 Apr 2013 10:01:44 -0700 From: Xueming Shen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: core-libs-dev core-libs-dev Subject: RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control} Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, It appears we dropped the ball on u+200c and u+200d when we updated the "simple word boundaries" back to jdk7 [1]. You can find most of the related discussion here [2]. These 2 code points are listed as one of the issues we were trying to fix but obviously the final doc and implementation don't address them. Mainly because the \p{Join_Control} was not explicitly listed in TR#18 "compatibility" section back then (the earlier version) [3], though these 2 code points are explicitly mentioned at section RL1.4 Simple Word Boundaries [4]. The \p{Join_Control} (u+200c and u+200d) has been added/listed in the "compatibility" section in the latest version of TR#18 [5]. The proposed change here is to (1) add these two code points back to the collection of \w (2) list them explicitly into the \w definition as \p{Join_Control} (3) list Join_Control as one of the supported binary properties. http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html The webrev for RegExTest.java above includes the change for 8013252 which is being reviewed as well, I'm not separating them out just for convenience. The regression/unit tests may not that "direct", here is a direct version to verify the fix. Matcher wordU = Pattern.compile("\\w", Pattern.UNICODE_CHARACTER_CLASS).matcher(""); System.out.println(wordU.reset("\u200c").find()); System.out.println(wordU.reset("\u200d").find()); thanks -Sherman [1] http://ccc.us.oracle.com/7039066 [2] http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html [3] http://www.unicode.org/reports/tr18/tr18-13.html#Compatibility_Properties [4] http://www.unicode.org/reports/tr18/tr18-13.html#Simple_Word_Boundaries [5] http://www.unicode.org/reports/tr18/#Compatibility_Properties From xueming.shen at oracle.com Tue Apr 30 14:01:12 2013 From: xueming.shen at oracle.com (Xueming Shen) Date: Tue, 30 Apr 2013 14:01:12 -0700 Subject: RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control} In-Reply-To: <517FF8F8.3080208@oracle.com> References: <517FF8F8.3080208@oracle.com> Message-ID: <51803118.1020407@oracle.com> My apology, the webrev is at http://cr.openjdk.java.net/~sherman/8013254/webrev/ -Sherman On 04/30/2013 10:01 AM, Xueming Shen wrote: > Hi, > > It appears we dropped the ball on u+200c and u+200d when we updated > the "simple word boundaries" back to jdk7 [1]. You can find most of the > related discussion here [2]. These 2 code points are listed as one of the > issues we were trying to fix but obviously the final doc and implementation > don't address them. Mainly because the \p{Join_Control} was not explicitly > listed in TR#18 "compatibility" section back then (the earlier version) [3], > though these 2 code points are explicitly mentioned at section RL1.4 Simple > Word Boundaries [4]. The \p{Join_Control} (u+200c and u+200d) has been > added/listed in the "compatibility" section in the latest version of TR#18 [5]. > > The proposed change here is to > (1) add these two code points back to the collection of \w > (2) list them explicitly into the \w definition as \p{Join_Control} > (3) list Join_Control as one of the supported binary properties. > > http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html > > The webrev for RegExTest.java above includes the change for 8013252 > which is being reviewed as well, I'm not separating them out just for > convenience. The regression/unit tests may not that "direct", here is > a direct version to verify the fix. > > Matcher wordU = Pattern.compile("\\w", Pattern.UNICODE_CHARACTER_CLASS).matcher(""); > System.out.println(wordU.reset("\u200c").find()); > System.out.println(wordU.reset("\u200d").find()); > > thanks > -Sherman > > [1] http://ccc.us.oracle.com/7039066 > [2] http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html > [3] http://www.unicode.org/reports/tr18/tr18-13.html#Compatibility_Properties > [4] http://www.unicode.org/reports/tr18/tr18-13.html#Simple_Word_Boundaries > [5] http://www.unicode.org/reports/tr18/#Compatibility_Properties