RFR: Regex exponential backtracking issue --- more cleanup/tuning

Xueming Shen xueming.shen at oracle.com
Fri Mar 18 18:28:54 UTC 2016


Thanks Roger!

I have pulled those regex changes together at

http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev

So I'm updating the responding change there.

On 03/18/2016 10:03 AM, Roger Riggs wrote:
> Hi Sherman,
>
> A few comments:
>
> src/java.base/share/classes/java/util/regex/CharPredicates.java:
>
> - The static fields like ALPHABETIC should be final
> - line: 139/140 the posix and uprops HashMaps could have an initializer for size to avoid resizing (12 and 18):
> - 224: typo: "categoreis"
> - 232: double space: "return  props.get"
>

Updated. I'm making the CharPredicates class final.


> - Remove debugging prints to "PrintPattern.pmap..." before commit

Removed (it's convenient to have them there to just comment in for debugging)

> - 288: can you resort "Pf" and "Pi" to be with the other P<x> entries; maybe sort the whole list.
>

Those on the top with single-line-def are sorted/listed by the order of their types defined
in Character.java.

> - 305: Is the definition of "C" missing the equivalent of "Cn" unassigned or is that an intentional difference?
>

My memory can't tell, but I would assume it was an "intentional" difference. I
will dig a little deep to see if it's the update of the spec. This has not be touched
for a long time, I'm not sure if it can be easily put back, if it's NOT intentional.


> -323:  "LD" the definition of LD is not in the reference  (is it specified somewhere?)

It's the letter-and-digit. looks like a "home-made" one. Again, not sure if this
was in the spec in the old day. And, don't think we can just remove it.


> -333: the referenced spec defines the names of the character classes but not the contents; are they elsewhere?

http://www.unicode.org/reports/tr44/
5.7.1 General Category Values


> -374: add "final" to static declarations

Updated the whole class to final.

>
> Pattern.java:
> -2651: stray trailing "{"  after comment
> -2746: indentation of arguments

Updated

> -5493: javadoc the description of Predicate interface and implementations


> -5637: if you want it to be the first initializer put it at the top of the file, not buried at the end.
>    and make them final.
>

The "original" comment is a little "misleading. The Pattern class itself is final though.

> That's a start, I need to look a bit more closely at the algorithmic change that prompted the fix.
>

thanks! I have also asked Masayoshi to help review the canonical-equivalents part.

Sherman



More information about the core-libs-dev mailing list