RFR 8225466 : Optimize matching BMP Slice nodes
Ivan Gerasimov
ivan.gerasimov at oracle.com
Wed Dec 18 18:37:07 UTC 2019
Hi Roger.
Thank you for taking a look!
The variant with a single loop was the first thing I tried (should have
mentioned that in the review request).
Unfortunately, that showed performance decrease.
I suspect that hitting the end of the input should be a less common
thing, that's why it it beneficial to separate it as a slow path.
With kind regards,
Ivan
On 12/18/19 7:48 AM, Roger Riggs wrote:
> Hi Ivan,
>
> Though the new code has a good effect, the asymmetry and duplication
> seems unnecessary.
> Can it be structured to have a single copy of the loop comparing the
> available range
> and still get the desired performance improvement.
>
> Like:
>
> boolean match(Matcher matcher,int i, CharSequence seq) {
> int[] buf =buffer;
> int len = buf.length;
> for (int j =0; j < Math.min(len, matcher.to); j++) {
> if (buf[j] != seq.charAt(i+j))
> return false;
> }
> if (len >= matcher.to) {
> matcher.hitEnd =true;
> return false;
> }
> return next.match(matcher, i+len, seq);
> }
>
> Regards, Roger
>
>
> On 10/28/19 9:03 PM, Ivan Gerasimov wrote:
>> Hello!
>>
>> When building a Pattern object, the regex parser recognizes "slices"
>> - continuous char subsequences, which all have to be matched
>> case-sensitively/case-insensitively. Matching with such a slice is
>> implemented as a simple loop over a portion of the input.
>>
>> In the current implementation, on each iteration of the loop it is
>> checked if we have hit the end of the input (which is an uncommon case).
>>
>> This check can be done only once, before the loop, which will make
>> the loop lighter.
>>
>> Benchmark shows up to +4% to the throughput for the case-insensitive
>> matching.
>>
>> Would you please help review the enhancement?
>>
>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/
>>
>>
>> ----------- benchmark results ---------------
>>
>> UNFIXED
>> Benchmark Mode Cnt Score Error Units
>> PatternBench.sliceIFind avgt 16 190.612 ? 0.336 ns/op
>>
>> FIXED
>> Benchmark Mode Cnt Score Error Units
>> PatternBench.sliceIFind avgt 16 182.954 ? 0.493 ns/op
>> -------------------------------------------
>>
>
--
With kind regards,
Ivan Gerasimov
More information about the core-libs-dev
mailing list