RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
David Li
david.x.li at oracle.com
Wed Mar 19 23:10:14 UTC 2014
Hi,
This is an update from Xerces for file
impl/xpath/regex/TokenRange.java. For details, please refer to:
https://bugs.openjdk.java.net/browse/JDK-8035577.
Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/
Existing tests: JAXP SQE and unit tests passed.
Test cases added for typo fix in RangeToken.intersectRanges. Code also
updated to fix a bug where regular expression intersection returns
incorrect value when first range ends later than second range. Example
below. Test cases have been added to cover any scenarios that the code
changes affect.
new RegularExpression("(?[b-d]&[a-r])"); -> returns [b-d] (Correct)
new RegularExpression("(?[a-r]&[b-d])"); -> returns [b-de-r] (Incorrect)
Thanks,
David
P.S. Notes on bug fixes.
1) Line 404 removal of while loop.
This fixes a new bug where incorrect results are given when first range
ends later than second range. In the old code we got
(?[a-r]&[b-d]) -> returns [b-de-r]
By removing the while loop, we get [b-d].
This while loop looks like a copy-paste error from subtractRanges. In
subtractRanges we need to keep the leftover portion from the first
range, but this does not apply to intersection.
2) Line 388, addition of src2 += 2;
This code change affects anything of the form (?[a-r]&[b-eg-j]). The
code execution is diagrammed below.
o------------o (src1)
o--o o--o (src2)
For the first match we get
o------------o (src1)
o--o (src2)
Next we want to run src2+=2 to get the second pair of endpoints (since
the first two endpoints are already used). Notice how src1begin has
been updated to this.ranges[src1] = src2end+1, which is directly from
the code.
o------o (src1)
o--o (src2)
The src2+=2 statement was left out of the old code, and is added in this
webrev. If we leave out the src2+=2 at line 388, on the next iteration
of the large while loop we will reach case "} else if (src2end <
src1begin) {" which also executes "src2+=2". This means the correct
final result is generated, but on a later loop. We want to add the new
code because it's better to have all associated variable updated in the
sameloop. In addition, all the other conditions have similar src1 or
src2 updates.
More information about the core-libs-dev
mailing list