<i18n dev> RFR: regex changes

Xueming Shen xueming.shen at oracle.com
Tue Mar 22 20:23:36 UTC 2016


Thanks Roger!

webrev has been updated accordingly. See comments below.

http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev/

On 03/22/2016 12:48 PM, Roger Riggs wrote:
> Hi Sherman,
>
> A few more comments,
>
> Pattern.java:
>  - 1782: typo: "deterministci"
>
fixed.

>  - 2176: commented out code?
>
Yes, commented out for now. It helps the case below, which has "lots"
of ".*" lined up in a single patter. But I have not decided if it's worth the
potential cost of adding a "backtracking stopper" for each ".*". In most
this kind of cases, it gets slow, very slow, but still come back alive,
instead of a complete hangup.

             // 5014450    -> top level single greedy ...
             { "^\\s*sites\\[sites\\.length\\+\\+\\]\\s*=\\s*new\\s+Array\\(.*" +
               "\\s*//\\s*(\\d+).*" +
               "\\s*//\\s*([^-]+).*" +
               "\\s*//\\s*([^-]+).*" +
               "\\s*//\\s*([^-]+).*" +
               "/(?sui).*$",
               "\tsites[sites.length++] = new Array(\n" +
               "// 1079193366647 - creation time\n" +
               "// 1078902678663 1078852539723 1078753482632 0 0 0 0 0 0 0 0 0 0 0 - creation time last 14 days\n" +
               "// 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - bad\n" +
               "// 0.0030 0.0080 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -\n\n",
               false
            },

I would leave it for now. There is a bugid for it. So we can address it separately,
might with better tradeoff.

>  - 2667: indentation
>

fixed.

>  - 5660,5655: lambda syntax  use of simply "ch" is preferred over "(ch)" for single parameter lambdas
>    and for consistency within the file.
>
fixed.

> PrintPattern.java needs the standard copyright text if it is going to be in the repo.
>
fixed.

> - 29: The Print(fmt, args...) method should follow the method naming convention. (initial lowercase)
>
fixed.

> IntHashSet:  does performance matter enough to warrant adding this extra code.
>

The measurement I did suggests it's still worth adding such a small piece code, given
this one probably will be used for most of the greedy {}, with lots of raw "int" in and
out, without boxing, and much smaller footprint.

Thanks again,

Sherman



>
>
> On 3/18/2016 4:05 PM, Xueming Shen wrote:
>> Hi,
>>
>> There are couple regex related changes waiting for review. I have pull them
>> together here (with the notes) to make it easy to review.
>>
>> http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev/
>>
>> (1) Exponential backtracking
>>
>> Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/backtracking
>>
>> https://bugs.openjdk.java.net/browse/JDK-6328855
>> https://bugs.openjdk.java.net/browse/JDK-6192895
>> https://bugs.openjdk.java.net/browse/JDK-6345469
>> https://bugs.openjdk.java.net/browse/JDK-6988218
>> https://bugs.openjdk.java.net/browse/JDK-6693451
>> https://bugs.openjdk.java.net/browse/JDK-7006761
>> https://bugs.openjdk.java.net/browse/JDK-8140212
>>
>> (2) Anonymous class to lambda function cleanup
>>
>> Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/lambdafunction
>>
>> https://bugs.openjdk.java.net/browse/JDK-8151481
>> https://bugs.openjdk.java.net/browse/JDK-6609854
>>
>> (3) Canonical Equivalents
>>
>> Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/canonEQ
>>
>> https://bugs.openjdk.java.net/browse/JDK-4916384
>> https://bugs.openjdk.java.net/browse/JDK-4867170
>> https://bugs.openjdk.java.net/browse/JDK-6995635
>> https://bugs.openjdk.java.net/browse/JDK-6728861
>> https://bugs.openjdk.java.net/browse/JDK-6736245
>> https://bugs.openjdk.java.net/browse/JDK-7080302
>>
>> Thanks
>> Sherman
>



More information about the i18n-dev mailing list