RFR: JDK-8143282: \p{Cn} unassigned code points should be included in \p{C}
Martin Buchholz
martinrb at google.com
Fri May 20 17:13:27 UTC 2016
On Fri, May 20, 2016 at 9:55 AM, Xueming Shen <xueming.shen at oracle.com> wrote:
> On 5/20/16 9:11 AM, Martin Buchholz wrote:
>>
>> Here is a duplicate check:
>>
>> 4422 (Character.CONTROL == type || Character.CONTROL
>> == type ||
>>
>> I don't see any tests for corresponding p{Cn}
>
>
> It appears we don't have any regex test for the gc in current test cases.
> Not a surprise
> though. I added the tests for block and script when adding script support
> ...
>
>>
>> I expected to see general category Other "C" in Character.java
>
>
> can open a rfe for that if needed.
Well, don't we want complete correspondence between Unicode standard,
Character, and regex?
Anything missing seems like a bug, not rfe!
>
>>
>> I'd like to see tests that p{C} is the same as p{Other} is the same as
>> p{isOther} and similar with other categories.
>
>
> Did you mean you want to add the "long name" support for unicode category?
I expect \p{C} and \p{Other} and \p{isOther} all to work (haven't tried it).
Is that not a reasonable expectation?
>>
>> You could add a test assertion that checks that p{C} has identical
>> effect to [p{Cn}p{Cs}p{C....]
>>
>> The matcher("") with reset idiom looks weird to me. I'd just create
>> Patterns and then keep creating new Matchers, at least in test code.
>
>
> just tried to speed up the loop, which is iterating on 0x30000 cps. the
> property tests are
> taking longer and longer.
Having done test performance work, I am sympathetic!
I'm surprised reusing a Matcher helps significantly, but OK!
More information about the core-libs-dev
mailing list