Unicode script support in Regex and Character class

Xueming Shen xueming.shen at oracle.com
Thu Apr 22 22:38:49 UTC 2010


Ulf Zibis wrote:
>> (3) the syntax for script constructs. In addition to the "normal"
>>     \p{InScriptName} and \P{InScriptName} for the script support
>>     I'm also adding
>>    \p{script=ScriptName} \P{script=ScriptName}  for the new script 
>> support
>>    \p{block=BlockName} \P{block=BlockName}  for the "existing" block 
>> support
>>    \p{general_category=CategoryName} 
>> \P{general_category=CategoryName} for the "existing" gc
>>    Perl recently also started to accept this  \p{propName=propValue} 
>> Unicode style.
>>    It opens the door for future "expanding", for example \p{name=XYZ} 
>> :-)
> (2) the piggyback method j.l.c.getName() :-)
>
> I'm missing \p{InScriptName} in Pattern javadoc.
>

I meant to say

\p{IsScriptName} and \P{IsScriptName}

So the "recommended" usage would be

Script:
\p{IsScriptName} and \P{IsScriptName} or \p{script=ScriptName} 
\P{script=ScriptName}

Block
\p{InBlockName} \P{InBlockName} or \p{block=BlockName} \P{block=BlockName}

Category
\p{CategoryName} \P{CategoryName} or \p{general_category=CategoryName} 
\P{general_category=CategoryName}

For compatibility reason, we also take \p{IsCategoryName} \P{IsCategoryName}
It appears there is no conflict between the category name and script 
name, yet.

My apology for the inconvenience.

Sherman



More information about the core-libs-dev mailing list