RFR: regex changes
Xueming Shen
xueming.shen at oracle.com
Tue Mar 22 20:23:36 UTC 2016
Thanks Roger!
webrev has been updated accordingly. See comments below.
http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev/
On 03/22/2016 12:48 PM, Roger Riggs wrote:
> Hi Sherman,
>
> A few more comments,
>
> Pattern.java:
> - 1782: typo: "deterministci"
>
fixed.
> - 2176: commented out code?
>
Yes, commented out for now. It helps the case below, which has "lots"
of ".*" lined up in a single patter. But I have not decided if it's worth the
potential cost of adding a "backtracking stopper" for each ".*". In most
this kind of cases, it gets slow, very slow, but still come back alive,
instead of a complete hangup.
// 5014450 -> top level single greedy ...
{ "^\\s*sites\\[sites\\.length\\+\\+\\]\\s*=\\s*new\\s+Array\\(.*" +
"\\s*//\\s*(\\d+).*" +
"\\s*//\\s*([^-]+).*" +
"\\s*//\\s*([^-]+).*" +
"\\s*//\\s*([^-]+).*" +
"/(?sui).*$",
"\tsites[sites.length++] = new Array(\n" +
"// 1079193366647 - creation time\n" +
"// 1078902678663 1078852539723 1078753482632 0 0 0 0 0 0 0 0 0 0 0 - creation time last 14 days\n" +
"// 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - bad\n" +
"// 0.0030 0.0080 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -\n\n",
false
},
I would leave it for now. There is a bugid for it. So we can address it separately,
might with better tradeoff.
> - 2667: indentation
>
fixed.
> - 5660,5655: lambda syntax use of simply "ch" is preferred over "(ch)" for single parameter lambdas
> and for consistency within the file.
>
fixed.
> PrintPattern.java needs the standard copyright text if it is going to be in the repo.
>
fixed.
> - 29: The Print(fmt, args...) method should follow the method naming convention. (initial lowercase)
>
fixed.
> IntHashSet: does performance matter enough to warrant adding this extra code.
>
The measurement I did suggests it's still worth adding such a small piece code, given
this one probably will be used for most of the greedy {}, with lots of raw "int" in and
out, without boxing, and much smaller footprint.
Thanks again,
Sherman
>
>
> On 3/18/2016 4:05 PM, Xueming Shen wrote:
>> Hi,
>>
>> There are couple regex related changes waiting for review. I have pull them
>> together here (with the notes) to make it easy to review.
>>
>> http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev/
>>
>> (1) Exponential backtracking
>>
>> Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/backtracking
>>
>> https://bugs.openjdk.java.net/browse/JDK-6328855
>> https://bugs.openjdk.java.net/browse/JDK-6192895
>> https://bugs.openjdk.java.net/browse/JDK-6345469
>> https://bugs.openjdk.java.net/browse/JDK-6988218
>> https://bugs.openjdk.java.net/browse/JDK-6693451
>> https://bugs.openjdk.java.net/browse/JDK-7006761
>> https://bugs.openjdk.java.net/browse/JDK-8140212
>>
>> (2) Anonymous class to lambda function cleanup
>>
>> Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/lambdafunction
>>
>> https://bugs.openjdk.java.net/browse/JDK-8151481
>> https://bugs.openjdk.java.net/browse/JDK-6609854
>>
>> (3) Canonical Equivalents
>>
>> Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/canonEQ
>>
>> https://bugs.openjdk.java.net/browse/JDK-4916384
>> https://bugs.openjdk.java.net/browse/JDK-4867170
>> https://bugs.openjdk.java.net/browse/JDK-6995635
>> https://bugs.openjdk.java.net/browse/JDK-6728861
>> https://bugs.openjdk.java.net/browse/JDK-6736245
>> https://bugs.openjdk.java.net/browse/JDK-7080302
>>
>> Thanks
>> Sherman
>
More information about the core-libs-dev
mailing list