RFR 8225466 : Optimize matching BMP Slice nodes

Roger Riggs Roger.Riggs at oracle.com
Wed Dec 18 15:48:34 UTC 2019


Hi Ivan,

Though the new code has a good effect, the asymmetry and duplication 
seems unnecessary.
Can it be structured to have a single copy of the loop comparing the 
available range
and still get the desired performance improvement.

Like:

boolean match(Matcher matcher,int i, CharSequence seq) {
     int[] buf =buffer;
     int len = buf.length;
     for (int j =0; j < Math.min(len, matcher.to); j++) {
         if (buf[j] != seq.charAt(i+j))
             return false;
     }
     if (len >= matcher.to) {
         matcher.hitEnd =true;
         return false;
     }
     return next.match(matcher, i+len, seq);
}

Regards, Roger


On 10/28/19 9:03 PM, Ivan Gerasimov wrote:
> Hello!
>
> When building a Pattern object, the regex parser recognizes "slices" - 
> continuous char subsequences, which all have to be matched 
> case-sensitively/case-insensitively.  Matching with such a slice is 
> implemented as a simple loop over a portion of the input.
>
> In the current implementation, on each iteration of the loop it is 
> checked if we have hit the end of the input (which is an uncommon case).
>
> This check can be done only once, before the loop, which will make the 
> loop lighter.
>
> Benchmark shows up to +4% to the throughput for the case-insensitive 
> matching.
>
> Would you please help review the enhancement?
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
> WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/
>
>
> ----------- benchmark results ---------------
>
> UNFIXED
> Benchmark                Mode  Cnt    Score   Error  Units
> PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op
>
> FIXED
> Benchmark                Mode  Cnt    Score   Error  Units
> PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
> -------------------------------------------
>



More information about the core-libs-dev mailing list