Matcher.replaceAll(Function<MatchResult, String> f) [was: Re: hg: lambda/lambda/jdk: Pattern.splitAsStream.]

Peter Levart peter.levart at gmail.com
Tue Apr 23 00:00:18 PDT 2013


On 04/22/2013 04:24 PM, Paul Sandoz wrote:
> On Apr 22, 2013, at 4:11 PM, Peter Levart <peter.levart at gmail.com> wrote:
>
>> On 04/22/2013 02:54 PM, Paul Sandoz wrote:
>>> Hi Jürgen,
>>>
>>> Three issues:
>>>
>>> - we should probably include replaceFirst
>>>
>>> - we need to use different method names since replaceAll(null) is now ambiguous
>> But only if used with literal 'null' and then it throws NPE if the match is found, so I doubt anyone is using "matcher.replaceAll(null)" as a shorthand for "if (matcher.find()) throw new NullPointerException()" in disguise...
>>
> I hope not too :-) Note that such a change does result in a compilation failure for the regex tests.
>
> I don't quite know what the source code level compatibility requirements are here. How high is the bar set? I was presuming it was quite high.
>
> Paul.

This is a kind of source-level incompatibility where there is a simple 
fix which makes code compilable also with previous major versions of JDK 
(very important!). In addition it is very unlikely that such usage is 
found in real-world code (the test is specific in this respect, since it 
exhibits corner cases). Considering that when some code-base wants to 
migrate to JDK8, there will be other places where fixes are going to be 
necessary because of source-level incompatibilities with generics, 
inference, etc., this very unlikely incompatibility will be taken care 
of in the same batch.

It would be a waste not to re-use the most natural choice of method name 
in this case, I think.

Regards, Peter

>
>
>> Regards, Peter
>>
>>> - need tests :-) (see jdk/tests/java/util/regex/RegExTest.java)
>>>
>>> While these are nice to have i am not sure they carry their weight given the time constraints we have. If you can help us provide a more complete solution the better chance we have of getting this into JDK8.
>>>
>>> Thanks,
>>> Paul.
>>>
>>> On Apr 19, 2013, at 12:59 AM, jk at blackdown.de wrote:
>>>
>>>> Hi Paul,
>>>>
>>>> Paul Sandoz <paul.sandoz at oracle.com> writes:
>>>>
>>>>> Hi Jürgen,
>>>>>
>>>>> That seems useful as a more general approach than Matcher.replaceAll(String ) e.g.
>>>>>
>>>>>   Matcher.replaceAll(Function<MatchResult, String> f)
>>>>>
>>>>> Ben, thoughts?
>>>> like this?
>>>>
>>>> # HG changeset patch
>>>> # User Jürgen Kreileder <jk at blackdown.de>
>>>> # Date 1366322703 -7200
>>>> # Node ID 59766f458701af5fbb23d195dd48a928505f3306
>>>> # Parent  3ec06ef568a8ded0a7ecc7624df9d3a025dad6bc
>>>> Matcher.replaceAll(Function<MatchResult, String> f)
>>>>
>>>> diff --git a/src/share/classes/java/util/regex/Matcher.java b/src/share/classes/java/util/regex/Matcher.java
>>>> --- a/src/share/classes/java/util/regex/Matcher.java
>>>> +++ b/src/share/classes/java/util/regex/Matcher.java
>>>> @@ -25,6 +25,7 @@
>>>>
>>>> package java.util.regex;
>>>>
>>>> +import java.util.function.Function;
>>>>
>>>> /**
>>>>   * An engine that performs match operations on a {@link java.lang.CharSequence
>>>> @@ -916,6 +917,54 @@
>>>>      }
>>>>
>>>>      /**
>>>> +     * Replaces every subsequence of the input sequence that matches the
>>>> +     * pattern with the string returned by the given replacement function.
>>>> +     *
>>>> +     * <p> This method first resets this matcher.  It then scans the input
>>>> +     * sequence looking for matches of the pattern.  Characters that are not
>>>> +     * part of any match are appended directly to the result string; each match
>>>> +     * is replaced in the result by the string returned by the replacement
>>>> +     * function.  The replacement strings may contain references to captured
>>>> +     * subsequences as in the {@link #appendReplacement appendReplacement}
>>>> +     * method.
>>>> +     *
>>>> +     * <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
>>>> +     * the string returned by the replacement function may cause the results to
>>>> +     * be different than if they were being treated as a literal strings. Dollar
>>>> +     * signs may be treated as references to captured subsequences as described
>>>> +     * above, and backslashes are used to escape literal characters in the
>>>> +     * replacement string.
>>>> +     *
>>>> +     * <p> Given the regular expression <tt>(\\w)(\\w*)</tt>, the input
>>>> +     * <tt>"paTTern maTcher"</tt>, and the replacement function
>>>> +     * <tt>m -> m.group(1).toUpperCase() + m.group(2).toLowerCase()</tt>, an
>>>> +     * invocation of this method on a matcher for that expression would yield
>>>> +     * the string <tt>"Pattern Matcher"</tt>. </p>
>>>> +     *
>>>> +     * <p> Invoking this method changes this matcher's state.  If the matcher
>>>> +     * is to be used in further matching operations then it should first be
>>>> +     * reset.  </p>
>>>> +     *
>>>> +     * @param  f
>>>> +     *         The function providing replacement strings
>>>> +     * @return  The string constructed by replacing each matching subsequence
>>>> +     *          by the replacement string provide by the given function,
>>>> +     *          substituting captured subsequences as needed
>>>> +     * @since 1.8
>>>> +     */
>>>> +    public String replaceAll(Function<MatchResult, String> f) {
>>>> +        reset();
>>>> +        if (find()) {
>>>> +            StringBuffer sb = new StringBuffer();
>>>> +            do {
>>>> +                appendReplacement(sb, f.apply(this));
>>>> +            } while (find());
>>>> +            return appendTail(sb).toString();
>>>> +        }
>>>> +        return text.toString();
>>>> +    }
>>>> +
>>>> +    /**
>>>>       * Replaces the first subsequence of the input sequence that matches the
>>>>       * pattern with the given replacement string.
>>>>       *
>>>> ==
>>>>
>>>>
>>>>      Juergen
>>>>
>>>>
>>>>> On Apr 8, 2013, at 6:59 PM, jk at blackdown.de wrote:
>>>>>
>>>>>> Hi Paul,
>>>>>>
>>>>>> it would be nice if Pattern/Matcher offered a terse way to loop over all
>>>>>> matches in a string and replace them via a callback.
>>>>>>
>>>>>> E.g. I'm currently using something like this:
>>>>>>
>>>>>> private static final PatternAndReplacement PASS2 = new PatternAndReplacement(
>>>>>>        Pattern.compile("  ( "
>>>>>>                        + "   \\A \\p{Punct}*"         // start of title…
>>>>>>                        + " |"
>>>>>>                        + "   [:.;?!]\\ +"             // or of subsentence…
>>>>>>                        + " | "
>>>>>>                        + "   \\  ['\"“‘(\\[] \\ *"    // or of inserted subphrase…
>>>>>>                        + ")"
>>>>>>                        + "(" + SMALL_WORDS + ") \\b", // … followed by small word
>>>>>>                        Pattern.COMMENTS | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS),
>>>>>>        m -> Matcher.quoteReplacement(m.group(1) + capitalize(m.group(2))));
>>>>>>
>>>>>> with PatternAndReplacement being
>>>>>>
>>>>>> private static class PatternReplacement implements Function<String, String> {
>>>>>>    private final Pattern pattern;
>>>>>>    private final Function<MatchResult, String> function;
>>>>>>
>>>>>>    PatternReplacement(final Pattern p, final Function<MatchResult, String> f) {
>>>>>>        pattern = p;
>>>>>>        function = f;
>>>>>>    }
>>>>>>
>>>>>>    @Override
>>>>>>    public final String apply(final String s) {
>>>>>>        Matcher m = pattern.matcher(s);
>>>>>>        if (m.find()) {
>>>>>>            StringBuffer sb = new StringBuffer(s.length());
>>>>>>            do {
>>>>>>                m.appendReplacement(sb, function.apply(m));
>>>>>>            } while (m.find());
>>>>>>            return m.appendTail(sb).toString();
>>>>>>        }
>>>>>>        return s;
>>>>>>    }
>>>>>> }
>>>>>>
>>>>>> Any plans for something like this?
>>>>>>
>>>>>>
>>>>>> Jürgen
>>>>>>
>>>>>>
>>>>>> paul.sandoz at oracle.com writes:
>>>>>>
>>>>>>> Changeset: 526131346981
>>>>>>> Author:    psandoz
>>>>>>> Date:      2013-04-08 17:16 +0200
>>>>>>> URL:       http://hg.openjdk.java.net/lambda/lambda/jdk/rev/526131346981
>>>>>>>
>>>>>>> Pattern.splitAsStream.
>>>>>>> Contributed-by: Ben Evans <benjamin.john.evans at gmail.com>
>>>>>>>
>>>>>>> ! src/share/classes/java/util/regex/Pattern.java
>>>>>>> + test-ng/tests/org/openjdk/tests/java/util/regex/PatternTest.java
>>>> -- 
>>>> https://blackdown.de/



More information about the lambda-dev mailing list