Codereview request for 7189363: Regex Pattern compilation buggy for special sequences
Paul Sandoz
paul.sandoz at oracle.com
Thu Aug 9 08:38:53 UTC 2012
Looks OK to me.
Paul.
On Aug 8, 2012, at 11:58 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
> Hi
>
> It appears the optimization (flatten a binary tree structure into a branch/
> switch) we put into JDK6 for alternation operation [1] has problem if the
> first construct is a group "(...)" followed by a greedy/reluctant "once or not
> at all" quantifier. For example regex "(a)?bc|d" or "(a)??bc|d" can't be found
> for input "d", while "a?bc|d" or "a??bc|d" just works fine.
>
> The root cause is that expr() mistakenly to use the Branch node from the
> sub-expression (in above case, the branch node for "(a)?" instead of using
> its own Branch node (because the incorrect use of "prev instance of Branch",
> see the diff), which basically messes up the internal regex node tree [2]
>
> The webrev is at
> http://cr.openjdk.java.net/~sherman/7189363/webrev
>
> Thanks,
> -Sherman
>
> PS:
> I'm surprised this bug has not been noticed until now. One of the reasons is
> that the "messed up" regex node tree actually works for most of the cases
> for this particular regex, for example
>
> Pattern.compile("(a)?bc|d").matcher("d").matches()
> Pattern.compile("(a)?bc|d").matcher("bc").matches()
> Pattern.compile("(a)?bc|d").matcher("abc").matches()
> and even
> Pattern.compile("(a)?bc|de").matcher("de").find()
>
> all return true. The "find()" for single "d" fails only because the "minimum
> length" check (see [2] for details, if interested)
>
> [1] http://cr.openjdk.java.net/~sherman/5013651_6271399_6342544/
> [2] http://cr.openjdk.java.net/~sherman/7189363/nodes
More information about the core-libs-dev
mailing list