Codereview request for 7189363: Regex Pattern compilation buggy for special sequences

Paul Sandoz paul.sandoz at oracle.com
Thu Aug 9 08:38:53 UTC 2012


Looks OK to me.

Paul.

On Aug 8, 2012, at 11:58 PM, Xueming Shen <xueming.shen at oracle.com> wrote:

> Hi
> 
> It appears the optimization (flatten a binary tree structure into a branch/
> switch) we put into JDK6 for alternation operation [1] has problem if the
> first construct is a group "(...)" followed by a greedy/reluctant "once or not
> at all" quantifier. For example regex "(a)?bc|d" or "(a)??bc|d" can't be found
> for input "d", while "a?bc|d" or "a??bc|d" just works fine.
> 
> The root cause is that expr() mistakenly to use the Branch node from the
> sub-expression (in above case, the branch node for "(a)?" instead of using
> its own Branch node (because the incorrect use of "prev instance of Branch",
> see the diff), which basically messes up the internal regex node tree [2]
> 
> The webrev is at
> http://cr.openjdk.java.net/~sherman/7189363/webrev
> 
> Thanks,
> -Sherman
> 
> PS:
> I'm surprised this bug has not been noticed until now. One of the reasons is
> that the "messed up" regex node tree actually works for most of the cases
> for this particular regex, for example
> 
> Pattern.compile("(a)?bc|d").matcher("d").matches()
> Pattern.compile("(a)?bc|d").matcher("bc").matches()
> Pattern.compile("(a)?bc|d").matcher("abc").matches()
> and even
> Pattern.compile("(a)?bc|de").matcher("de").find()
> 
> all return true. The "find()" for single "d" fails only because the "minimum
> length" check (see [2] for details, if interested)
> 
> [1] http://cr.openjdk.java.net/~sherman/5013651_6271399_6342544/
> [2] http://cr.openjdk.java.net/~sherman/7189363/nodes




More information about the core-libs-dev mailing list