<i18n dev> regex rewriting code (part 1 of 3)

Tom Christiansen tchrist at perl.com
Tue Jan 25 12:02:45 PST 2011


> The fact that these POSIX/ASCII only version properties/constructs
> have been there for years ("compatibility") and it appears that "most"
> developers are happy (habit, performance...) with them, I don't think
> we can and want to switch to the  Unicode version, simply for
> conformance.

I agree with you 100.000%.  That's why I said that the compatability
issue merits a separate letter.  I have a whole bunch of different
ideas toward how to make everybody happy here; perhaps one or more
of them will actually do that.  I would never want to change things
out from under people.  It would take positive action on their part
to get something to change.

> Name space conflict is really not a big issue (for me anyway) a
> possible solution is to have a prefix "Is" for all Unicode binary
> properties, for example "IsAlpha", "IsLowerCase", the problem we
> have here is to to provide the TR#18 compatible version for those
> listed properties, if we want to continue claim tr#18 level 1.

That's why Perl finally bit the bullet and allowed those properties
to take on the tr18 meaning--because it said we had to.  This has
*not* made the POSIX-minded people happy.  That's why we finally had
to (VERY recently, in the devel version) add a /a or (?a) flag.

It's also why we "POSIX_" to get back the POSIX senses.  This
may matter more in Perl than I think it does in Java, because I
don't know that Java ever does anything POSIXy outside of the C
locale; is this correct?  In Perl or C, I'd call setlocale().

> The dis-connection between \b and \w is a headache, 

Yes, it really is, isn't it?  

Let me bounce some stuff around in my head for a while longer on
the compat issue, because there's a chance a solution to the \b
and \w dichotomy may fall out from that.

Thank you for your time.

--tom


More information about the i18n-dev mailing list