RFR 8225466 : Optimize matching BMP Slice nodes

Ivan Gerasimov ivan.gerasimov at oracle.com
Wed Dec 18 18:37:07 UTC 2019


Hi Roger.

Thank you for taking a look!

The variant with a single loop was the first thing I tried (should have 
mentioned that in the review request).

Unfortunately, that showed performance decrease.

I suspect that hitting the end of the input should be a less common 
thing, that's why it it beneficial to separate it as a slow path.

With kind regards,

Ivan


On 12/18/19 7:48 AM, Roger Riggs wrote:
> Hi Ivan,
>
> Though the new code has a good effect, the asymmetry and duplication 
> seems unnecessary.
> Can it be structured to have a single copy of the loop comparing the 
> available range
> and still get the desired performance improvement.
>
> Like:
>
> boolean match(Matcher matcher,int i, CharSequence seq) {
>     int[] buf =buffer;
>     int len = buf.length;
>     for (int j =0; j < Math.min(len, matcher.to); j++) {
>         if (buf[j] != seq.charAt(i+j))
>             return false;
>     }
>     if (len >= matcher.to) {
>         matcher.hitEnd =true;
>         return false;
>     }
>     return next.match(matcher, i+len, seq);
> }
>
> Regards, Roger
>
>
> On 10/28/19 9:03 PM, Ivan Gerasimov wrote:
>> Hello!
>>
>> When building a Pattern object, the regex parser recognizes "slices" 
>> - continuous char subsequences, which all have to be matched 
>> case-sensitively/case-insensitively.  Matching with such a slice is 
>> implemented as a simple loop over a portion of the input.
>>
>> In the current implementation, on each iteration of the loop it is 
>> checked if we have hit the end of the input (which is an uncommon case).
>>
>> This check can be done only once, before the loop, which will make 
>> the loop lighter.
>>
>> Benchmark shows up to +4% to the throughput for the case-insensitive 
>> matching.
>>
>> Would you please help review the enhancement?
>>
>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/
>>
>>
>> ----------- benchmark results ---------------
>>
>> UNFIXED
>> Benchmark                Mode  Cnt    Score   Error  Units
>> PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op
>>
>> FIXED
>> Benchmark                Mode  Cnt    Score   Error  Units
>> PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
>> -------------------------------------------
>>
>
-- 
With kind regards,
Ivan Gerasimov



More information about the core-libs-dev mailing list