RFR 8197462 : Inconsistent exception messages for invalid capturing group names

Ivan Gerasimov ivan.gerasimov at oracle.com
Fri Feb 9 04:32:30 UTC 2018


Hello!

Capturing group name can be used in a regular expression in two 
contexts:  When introducing a group (?<name>...) or when referring it 
\k<name>.
If the name is invalid (i.e. does not start with a Latin letter, or 
contains wrong chars) then we may see different error messages, some of 
which look confusing.

Here are examples of the messages produced by the current JDK:
Unknown look-behind group near index 3
(?<>)
    ^
named capturing group is missing trailing '>' near index 4
\\k<>
     ^
Unknown look-behind group near index 4
(?<.>)
     ^
(named capturing group <.> does not exit near index 4
\\k<.>
     ^
named capturing group is missing trailing '>' near index 4
(?<a.>)
     ^
named capturing group is missing trailing '>' near index 4
\\k<a.>
     ^

In particular, this diversity is caused by that the internal 
Pattern.groupname() function lacks a check for the very first character 
of the name.
So that when \k<name> is parsed, the first char is always accepted, no 
matter what it was.

Some cleanup was also done along the way.

Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8197462
WEBREV: http://cr.openjdk.java.net/~igerasim/8197462/00/webrev/

Thanks in advance!

-- 
With kind regards,
Ivan Gerasimov



More information about the core-libs-dev mailing list