RFR: 8225061: Performance regression in Regex

Sat Jun 1 00:23:08 UTC 2019

Hi Claes,

Looks good to me. Thanks for catching this on so quickly!

Naoto

On 5/31/19 5:13 PM, Claes Redestad wrote:
> Hi,
> 
> recent Unicode 12.1 updates caused a noticeable regression to Mac OS X
> build times.
> 
> Quoting Naoto:
> "The regression was caused by the call to Grapheme.nextBoundary() in
> NFCCharProperty.match() method, which got slower with the fix to
> JDK-8221431 / JDK-8222978 (Unicode 12.1 / Grapheme 12.0 support). The
> purpose of issuing nextBoundary() is to detect whether to call (much
> heavy weight) Normalizer.normalize() call or not. Since this fast check
> does not require fully fledged boundary detection, including stateful
> segmentation check such as Emoji sequence, simply checking the break
> possibility between two code points as before should suffice. Suggested
> fix is to bring back the isBoundary(cp1, cp2) method from the previous
> revision in Grapheme.java, and issue it only from
> NFCCharProperty.match() method for the fast check."
> 
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8225061
> Webrev: http://cr.openjdk.java.net/~redestad/8225061/open.01/
> 
> While narrowing this down, I created a couple of microbenchmarks and
> experimented with a sequence of optimizations that got the regression of
> using the heavier nextBoundary() check down from about 300x to just
> about 2x as costly as before JDK-8221431. These improvements were then
> bypassed by reverting to isBoundary in some micros, but still helps a
> lot in other cases that has taken a toll from making the grapheme logic
> more complete/correct, so I'd like to leave them in.
> 
> Testing: tier1-3, verified a 300x speedup in the complex
> Pattern.CANON_EQ micro, and a 2x speedup on the simpler Grapheme/\\b{g}
> micro.
> 
> Thanks!
> 
> /Claes