RFR: 8281315: Unicode, (?i) flag and backreference throwing IndexOutOfBounds Exception

Ian Graves igraves at openjdk.java.net
Wed Feb 16 18:52:25 UTC 2022


This is a fix in the buggy way CIBackRef traverses unicode characters that could be variable-length. Originally it followed the approach that BackRef does, but failed to account for unicode characters that could be 2 chars-long. The upper bound (groupSize) for the traversing loop is set by the difference between group start and stop indexes. This works for single char characters and it also works for case-sensitive comparisons because byte-by-byte comparisons are acceptable, but it doesn't work for a comparison where some kind of normalization (i.e. case) is required. This fix adjusts the upper bound for the loop that traverses the character when a two-char character is encountered.

An alternative was to check the length of the group size by scanning the group in advance and converting to code points, but this could potentially result in multiple scans and codepoint conversions of the same matcher group which could be long. The solution that adjusts the loop bounds on the fly avoids this case.

-------------

Commit messages:
 - Adding test
 - Initial fix for IOOBE in CIBackRef

Changes: https://git.openjdk.java.net/jdk/pull/7501/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7501&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8281315
  Stats: 26 lines in 2 files changed: 22 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7501.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7501/head:pull/7501

PR: https://git.openjdk.java.net/jdk/pull/7501


More information about the core-libs-dev mailing list