Matcher.replaceAll(Function<MatchResult, String> f) [was: Re: hg: lambda/lambda/jdk: Pattern.splitAsStream.]

Paul Sandoz paul.sandoz at oracle.com
Mon Apr 22 07:24:42 PDT 2013


On Apr 22, 2013, at 4:11 PM, Peter Levart <peter.levart at gmail.com> wrote:

> On 04/22/2013 02:54 PM, Paul Sandoz wrote:
>> Hi Jürgen,
>> 
>> Three issues:
>> 
>> - we should probably include replaceFirst
>> 
>> - we need to use different method names since replaceAll(null) is now ambiguous
> 
> But only if used with literal 'null' and then it throws NPE if the match is found, so I doubt anyone is using "matcher.replaceAll(null)" as a shorthand for "if (matcher.find()) throw new NullPointerException()" in disguise...
> 

I hope not too :-) Note that such a change does result in a compilation failure for the regex tests.

I don't quite know what the source code level compatibility requirements are here. How high is the bar set? I was presuming it was quite high.

Paul.


> Regards, Peter
> 
>> 
>> - need tests :-) (see jdk/tests/java/util/regex/RegExTest.java)
>> 
>> While these are nice to have i am not sure they carry their weight given the time constraints we have. If you can help us provide a more complete solution the better chance we have of getting this into JDK8.
>> 
>> Thanks,
>> Paul.
>> 
>> On Apr 19, 2013, at 12:59 AM, jk at blackdown.de wrote:
>> 
>>> Hi Paul,
>>> 
>>> Paul Sandoz <paul.sandoz at oracle.com> writes:
>>> 
>>>> Hi Jürgen,
>>>> 
>>>> That seems useful as a more general approach than Matcher.replaceAll(String ) e.g.
>>>> 
>>>>  Matcher.replaceAll(Function<MatchResult, String> f)
>>>> 
>>>> Ben, thoughts?
>>> like this?
>>> 
>>> # HG changeset patch
>>> # User Jürgen Kreileder <jk at blackdown.de>
>>> # Date 1366322703 -7200
>>> # Node ID 59766f458701af5fbb23d195dd48a928505f3306
>>> # Parent  3ec06ef568a8ded0a7ecc7624df9d3a025dad6bc
>>> Matcher.replaceAll(Function<MatchResult, String> f)
>>> 
>>> diff --git a/src/share/classes/java/util/regex/Matcher.java b/src/share/classes/java/util/regex/Matcher.java
>>> --- a/src/share/classes/java/util/regex/Matcher.java
>>> +++ b/src/share/classes/java/util/regex/Matcher.java
>>> @@ -25,6 +25,7 @@
>>> 
>>> package java.util.regex;
>>> 
>>> +import java.util.function.Function;
>>> 
>>> /**
>>>  * An engine that performs match operations on a {@link java.lang.CharSequence
>>> @@ -916,6 +917,54 @@
>>>     }
>>> 
>>>     /**
>>> +     * Replaces every subsequence of the input sequence that matches the
>>> +     * pattern with the string returned by the given replacement function.
>>> +     *
>>> +     * <p> This method first resets this matcher.  It then scans the input
>>> +     * sequence looking for matches of the pattern.  Characters that are not
>>> +     * part of any match are appended directly to the result string; each match
>>> +     * is replaced in the result by the string returned by the replacement
>>> +     * function.  The replacement strings may contain references to captured
>>> +     * subsequences as in the {@link #appendReplacement appendReplacement}
>>> +     * method.
>>> +     *
>>> +     * <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
>>> +     * the string returned by the replacement function may cause the results to
>>> +     * be different than if they were being treated as a literal strings. Dollar
>>> +     * signs may be treated as references to captured subsequences as described
>>> +     * above, and backslashes are used to escape literal characters in the
>>> +     * replacement string.
>>> +     *
>>> +     * <p> Given the regular expression <tt>(\\w)(\\w*)</tt>, the input
>>> +     * <tt>"paTTern maTcher"</tt>, and the replacement function
>>> +     * <tt>m -> m.group(1).toUpperCase() + m.group(2).toLowerCase()</tt>, an
>>> +     * invocation of this method on a matcher for that expression would yield
>>> +     * the string <tt>"Pattern Matcher"</tt>. </p>
>>> +     *
>>> +     * <p> Invoking this method changes this matcher's state.  If the matcher
>>> +     * is to be used in further matching operations then it should first be
>>> +     * reset.  </p>
>>> +     *
>>> +     * @param  f
>>> +     *         The function providing replacement strings
>>> +     * @return  The string constructed by replacing each matching subsequence
>>> +     *          by the replacement string provide by the given function,
>>> +     *          substituting captured subsequences as needed
>>> +     * @since 1.8
>>> +     */
>>> +    public String replaceAll(Function<MatchResult, String> f) {
>>> +        reset();
>>> +        if (find()) {
>>> +            StringBuffer sb = new StringBuffer();
>>> +            do {
>>> +                appendReplacement(sb, f.apply(this));
>>> +            } while (find());
>>> +            return appendTail(sb).toString();
>>> +        }
>>> +        return text.toString();
>>> +    }
>>> +
>>> +    /**
>>>      * Replaces the first subsequence of the input sequence that matches the
>>>      * pattern with the given replacement string.
>>>      *
>>> ==
>>> 
>>> 
>>>     Juergen
>>> 
>>> 
>>>> On Apr 8, 2013, at 6:59 PM, jk at blackdown.de wrote:
>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> it would be nice if Pattern/Matcher offered a terse way to loop over all
>>>>> matches in a string and replace them via a callback.
>>>>> 
>>>>> E.g. I'm currently using something like this:
>>>>> 
>>>>> private static final PatternAndReplacement PASS2 = new PatternAndReplacement(
>>>>>       Pattern.compile("  ( "
>>>>>                       + "   \\A \\p{Punct}*"         // start of title…
>>>>>                       + " |"
>>>>>                       + "   [:.;?!]\\ +"             // or of subsentence…
>>>>>                       + " | "
>>>>>                       + "   \\  ['\"“‘(\\[] \\ *"    // or of inserted subphrase…
>>>>>                       + ")"
>>>>>                       + "(" + SMALL_WORDS + ") \\b", // … followed by small word
>>>>>                       Pattern.COMMENTS | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS),
>>>>>       m -> Matcher.quoteReplacement(m.group(1) + capitalize(m.group(2))));
>>>>> 
>>>>> with PatternAndReplacement being
>>>>> 
>>>>> private static class PatternReplacement implements Function<String, String> {
>>>>>   private final Pattern pattern;
>>>>>   private final Function<MatchResult, String> function;
>>>>> 
>>>>>   PatternReplacement(final Pattern p, final Function<MatchResult, String> f) {
>>>>>       pattern = p;
>>>>>       function = f;
>>>>>   }
>>>>> 
>>>>>   @Override
>>>>>   public final String apply(final String s) {
>>>>>       Matcher m = pattern.matcher(s);
>>>>>       if (m.find()) {
>>>>>           StringBuffer sb = new StringBuffer(s.length());
>>>>>           do {
>>>>>               m.appendReplacement(sb, function.apply(m));
>>>>>           } while (m.find());
>>>>>           return m.appendTail(sb).toString();
>>>>>       }
>>>>>       return s;
>>>>>   }
>>>>> }
>>>>> 
>>>>> Any plans for something like this?
>>>>> 
>>>>> 
>>>>> Jürgen
>>>>> 
>>>>> 
>>>>> paul.sandoz at oracle.com writes:
>>>>> 
>>>>>> Changeset: 526131346981
>>>>>> Author:    psandoz
>>>>>> Date:      2013-04-08 17:16 +0200
>>>>>> URL:       http://hg.openjdk.java.net/lambda/lambda/jdk/rev/526131346981
>>>>>> 
>>>>>> Pattern.splitAsStream.
>>>>>> Contributed-by: Ben Evans <benjamin.john.evans at gmail.com>
>>>>>> 
>>>>>> ! src/share/classes/java/util/regex/Pattern.java
>>>>>> + test-ng/tests/org/openjdk/tests/java/util/regex/PatternTest.java
>>> -- 
>>> https://blackdown.de/
>> 
> 



More information about the lambda-dev mailing list