Integrated: 8281315: Unicode, (?i) flag and backreference throwing IndexOutOfBounds Exception

Ian Graves igraves at openjdk.java.net
Tue Feb 22 16:35:52 UTC 2022


On Wed, 16 Feb 2022 18:45:29 GMT, Ian Graves <igraves at openjdk.org> wrote:

> This is a fix in the buggy way CIBackRef traverses unicode characters that could be variable-length. Originally it followed the approach that BackRef does, but failed to account for unicode characters that could be 2 chars-long. The upper bound (groupSize) for the traversing loop is set by the difference between group start and stop indexes. This works for single char characters and it also works for case-sensitive comparisons because byte-by-byte comparisons are acceptable, but it doesn't work for a comparison where some kind of normalization (i.e. case) is required. This fix adjusts the upper bound for the loop that traverses the character when a two-char character is encountered.
> 
> An alternative was to check the length of the group size by scanning the group in advance and converting to code points, but this could potentially result in multiple scans and codepoint conversions of the same matcher group which could be long. The solution that adjusts the loop bounds on the fly avoids this case.

This pull request has now been integrated.

Changeset: 3cb38678
Author:    Ian Graves <igraves at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/3cb38678aa7f03356421f5a17c1de4156e206d68
Stats:     25 lines in 2 files changed: 21 ins; 0 del; 4 mod

8281315: Unicode, (?i) flag and backreference throwing IndexOutOfBounds Exception

Reviewed-by: naoto

-------------

PR: https://git.openjdk.java.net/jdk/pull/7501


More information about the core-libs-dev mailing list