<i18n dev> RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control}

Mandy Chung mandy.chung at oracle.com
Wed May 1 08:44:45 PDT 2013


On 4/30/2013 2:01 PM, Xueming Shen wrote:
>
> http://cr.openjdk.java.net/~sherman/8013254/webrev/
>

Looks good.

Mandy

> -Sherman
>
> On 04/30/2013 10:01 AM, Xueming Shen wrote:
>> Hi,
>>
>> It appears we dropped the ball on u+200c and u+200d when we updated
>> the "simple word boundaries" back to jdk7 [1]. You can find most of the
>> related discussion here [2]. These 2 code points are listed as one of 
>> the
>> issues we were trying to fix but obviously the final doc and 
>> implementation
>> don't address them. Mainly because the \p{Join_Control} was not 
>> explicitly
>> listed in TR#18 "compatibility" section back then (the earlier 
>> version) [3],
>> though these 2 code points are explicitly mentioned at section RL1.4 
>> Simple
>> Word Boundaries [4]. The \p{Join_Control} (u+200c and u+200d) has been
>> added/listed in the "compatibility" section in the latest version of 
>> TR#18 [5].
>>
>> The proposed change here is to
>> (1) add these two code points back to the collection of \w
>> (2) list them explicitly into the \w definition as \p{Join_Control}
>> (3) list Join_Control as one of the supported binary properties.
>>
>> http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html
>>
>> The webrev for RegExTest.java above includes the change for 8013252
>> which is being reviewed as well, I'm not separating them out just for
>> convenience. The regression/unit tests may not that "direct", here is
>> a direct version to verify the fix.
>>
>>         Matcher wordU = Pattern.compile("\\w", 
>> Pattern.UNICODE_CHARACTER_CLASS).matcher("");
>>         System.out.println(wordU.reset("\u200c").find());
>>         System.out.println(wordU.reset("\u200d").find());
>>
>> thanks
>> -Sherman
>>
>> [1] http://ccc.us.oracle.com/7039066
>> [2] 
>> http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html
>> [3] 
>> http://www.unicode.org/reports/tr18/tr18-13.html#Compatibility_Properties
>> [4] 
>> http://www.unicode.org/reports/tr18/tr18-13.html#Simple_Word_Boundaries
>> [5] http://www.unicode.org/reports/tr18/#Compatibility_Properties
>



More information about the i18n-dev mailing list