<i18n dev> regex rewriting code (part 1 of 3)
Tom Christiansen
tchrist at perl.com
Tue Jan 25 12:02:45 PST 2011
> The fact that these POSIX/ASCII only version properties/constructs
> have been there for years ("compatibility") and it appears that "most"
> developers are happy (habit, performance...) with them, I don't think
> we can and want to switch to the Unicode version, simply for
> conformance.
I agree with you 100.000%. That's why I said that the compatability
issue merits a separate letter. I have a whole bunch of different
ideas toward how to make everybody happy here; perhaps one or more
of them will actually do that. I would never want to change things
out from under people. It would take positive action on their part
to get something to change.
> Name space conflict is really not a big issue (for me anyway) a
> possible solution is to have a prefix "Is" for all Unicode binary
> properties, for example "IsAlpha", "IsLowerCase", the problem we
> have here is to to provide the TR#18 compatible version for those
> listed properties, if we want to continue claim tr#18 level 1.
That's why Perl finally bit the bullet and allowed those properties
to take on the tr18 meaning--because it said we had to. This has
*not* made the POSIX-minded people happy. That's why we finally had
to (VERY recently, in the devel version) add a /a or (?a) flag.
It's also why we "POSIX_" to get back the POSIX senses. This
may matter more in Perl than I think it does in Java, because I
don't know that Java ever does anything POSIXy outside of the C
locale; is this correct? In Perl or C, I'd call setlocale().
> The dis-connection between \b and \w is a headache,
Yes, it really is, isn't it?
Let me bounce some stuff around in my head for a while longer on
the compat issue, because there's a chance a solution to the \b
and \w dichotomy may fall out from that.
Thank you for your time.
--tom
More information about the i18n-dev
mailing list