Codereview request for 7189363: Regex Pattern compilation buggy for special sequences

Xueming Shen xueming.shen at oracle.com
Wed Aug 8 21:58:37 UTC 2012


Hi

It appears the optimization (flatten a binary tree structure into a branch/
switch) we put into JDK6 for alternation operation [1] has problem if the
first construct is a group "(...)" followed by a greedy/reluctant "once 
or not
at all" quantifier. For example regex "(a)?bc|d" or "(a)??bc|d" can't be 
found
for input "d", while "a?bc|d" or "a??bc|d" just works fine.

The root cause is that expr() mistakenly to use the Branch node from the
sub-expression (in above case, the branch node for "(a)?" instead of using
its own Branch node (because the incorrect use of "prev instance of Branch",
see the diff), which basically messes up the internal regex node tree [2]

The webrev is at
http://cr.openjdk.java.net/~sherman/7189363/webrev

Thanks,
-Sherman

PS:
I'm surprised this bug has not been noticed until now. One of the reasons is
that the "messed up" regex node tree actually works for most of the cases
for this particular regex, for example

Pattern.compile("(a)?bc|d").matcher("d").matches()
Pattern.compile("(a)?bc|d").matcher("bc").matches()
Pattern.compile("(a)?bc|d").matcher("abc").matches()
and even
Pattern.compile("(a)?bc|de").matcher("de").find()

all return true. The "find()" for single "d" fails only because the "minimum
length" check (see [2] for details, if interested)

[1] http://cr.openjdk.java.net/~sherman/5013651_6271399_6342544/
[2] http://cr.openjdk.java.net/~sherman/7189363/nodes



More information about the core-libs-dev mailing list