RFR 8225466 : Optimize matching BMP Slice nodes
Roger Riggs
Roger.Riggs at oracle.com
Wed Dec 18 15:48:34 UTC 2019
Hi Ivan,
Though the new code has a good effect, the asymmetry and duplication
seems unnecessary.
Can it be structured to have a single copy of the loop comparing the
available range
and still get the desired performance improvement.
Like:
boolean match(Matcher matcher,int i, CharSequence seq) {
int[] buf =buffer;
int len = buf.length;
for (int j =0; j < Math.min(len, matcher.to); j++) {
if (buf[j] != seq.charAt(i+j))
return false;
}
if (len >= matcher.to) {
matcher.hitEnd =true;
return false;
}
return next.match(matcher, i+len, seq);
}
Regards, Roger
On 10/28/19 9:03 PM, Ivan Gerasimov wrote:
> Hello!
>
> When building a Pattern object, the regex parser recognizes "slices" -
> continuous char subsequences, which all have to be matched
> case-sensitively/case-insensitively. Matching with such a slice is
> implemented as a simple loop over a portion of the input.
>
> In the current implementation, on each iteration of the loop it is
> checked if we have hit the end of the input (which is an uncommon case).
>
> This check can be done only once, before the loop, which will make the
> loop lighter.
>
> Benchmark shows up to +4% to the throughput for the case-insensitive
> matching.
>
> Would you please help review the enhancement?
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
> WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/
>
>
> ----------- benchmark results ---------------
>
> UNFIXED
> Benchmark Mode Cnt Score Error Units
> PatternBench.sliceIFind avgt 16 190.612 ? 0.336 ns/op
>
> FIXED
> Benchmark Mode Cnt Score Error Units
> PatternBench.sliceIFind avgt 16 182.954 ? 0.493 ns/op
> -------------------------------------------
>
More information about the core-libs-dev
mailing list