RFR: 8007395 StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters

Xueming Shen xueming.shen at oracle.com
Fri Apr 26 17:25:13 UTC 2013


Hi

Please help review the proposed fix for

8007395: StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters

http://cr.openjdk.java.net/~sherman/8007395/webrev

The root cause is the "iterative optimization" class GroupCurly fails to backtrack
correctly when matching/finding fails, if the previously matched (for each iteration)
have different size (for example a CharProperty regex constructor that can match
both bmp and non-bmp in this case). The existing implementation does have the
mechanism to deal with the "different sized" matching result for each iteration, see
ln#4451, by "recursively" entering into a new layer of match0, but it incorrectly
uses the latest matched size to backtrack all the way back to the "cmin" when
the "next" matching fails (so in this case, it backtrack by two char all the way back
to "cmin", when in fact it should back off by 2 only for the last surrogate pair, then
using 1 for the rest). Each match0() really should only backtrack to its starting iteration
count, and leave the rest to its "invoker". The fix is an easy two-line fix, to make
sure backtrack backs off correctly with the appropriate matching size.

Thanks,
-Sherman



More information about the core-libs-dev mailing list