Regex named-group and backreference syntax

Alan Moore uncle.alice at gmail.com
Wed Sep 2 08:58:46 UTC 2009


Hi Sherman,

On Wed, Sep 2, 2009 at 12:15 AM, Xueming Shen<Xueming.Shen at sun.com> wrote:
> It would be an "ambiguity" (and then confused) only if we
> had the \k<n> and $<n> as the legally supported group
> reference syntax:-) That said I have to admit that it does
> not have any value-add to allow the a group name begins
> with a digit character. So if we have a consensus I would
> be happy to change the spec/implementation to dis-allow the
> digit letter started group name.

Yeah, "ambiguity" isn't really the right word for allowing all-digit
group names. It's not so much that it's inherently confusing, just
that it makes it easier than usual for people to confuse themselves.
:-/

> I kinda disagree that the "rest of the named-group syntax" is
> copied from .Net. Actually it is the syntax from Perl
> 5.10.0/named capture buffer, in which the naming syntax is
> (?<NAME>....) and to backreference it with the \k<NAME>. I did
> not find a "reference of named capture buffer in replacement"
> from there. I did consider to use the .Net syntax, but finally
> decided to go with $<name> because it is more consistent with
> the (?<name>...) and \k<name> syntax.

Didn't Perl copy it from .NET (among others)?  Perl didn't introduce
named capture until v5.10, and now it supports all of the syntax
variants found .NET and Python (which definitely had named groups
before Perl).  But wherever you got it from, the syntax is the same as
.NET's, except for replacement-string backreferences.

Anyway, I think consistency with other flavors is more important than
internal consistency in this case.  Every flavor uses angle brackets
within the regex, but Perl uses $+{name} in the replacement string,
while .NET and JRegex both use ${name}.  I think anyone who comes over
to java.util.regex with previous regex experience is more likely to
expect that syntax than anything else.

> To allow \k<n> and $<n> is a fine idea, it at least looks less "complicated"
> in replacement case.



More information about the core-libs-dev mailing list