<i18n dev> RL1.3 Subtraction and Intersection

Tom Christiansen tchrist at perl.com
Sun Jan 23 11:14:48 PST 2011


    RL1.3       Subtraction and Intersection

    To meet this requirement, an implementation shall supply
    mechanisms for union, intersection and set-difference of
    Unicode sets.

Java meets this requirement.  However, because RL1.2 is not met,
it is of limited practical usefulness.

This is one of the things I really like about Java regular expressions.  
It uses a different notation than given in the standard, but that's ok.

    http://download.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

    [abc]               a, b, or c (simple class)
    [^abc]              Any character except a, b, or c (negation)
    [a-zA-Z]            a through z or A through Z, inclusive (range)
    [a-d[m-p]]          a through d, or m through p: [a-dm-p] (union)
    [a-z&&[def]]        d, e, or f (intersection) 
    [a-z&&[^bc]]        a through z, except for b and c: [ad-z] (subtraction)
    [a-z&&[^m-p]]       a through z, and not m through p: [a-lq-z](subtraction)

Although can *can* do all these using an appropriate combination 
of lookahead assertions and negations, but doing so can be inconvient.

--tom


More information about the i18n-dev mailing list