Matcher.replaceAll(Function<MatchResult, String> f) [was: Re: hg: lambda/lambda/jdk: Pattern.splitAsStream.]

Jürgen Kreileder jk at blackdown.de
Thu Apr 18 15:59:51 PDT 2013


Hi Paul,

Paul Sandoz <paul.sandoz at oracle.com> writes:

> Hi Jürgen,
>
> That seems useful as a more general approach than Matcher.replaceAll(String ) e.g.
>
>   Matcher.replaceAll(Function<MatchResult, String> f)
>
> Ben, thoughts?

like this?

# HG changeset patch
# User Jürgen Kreileder <jk at blackdown.de>
# Date 1366322703 -7200
# Node ID 59766f458701af5fbb23d195dd48a928505f3306
# Parent  3ec06ef568a8ded0a7ecc7624df9d3a025dad6bc
Matcher.replaceAll(Function<MatchResult, String> f)

diff --git a/src/share/classes/java/util/regex/Matcher.java b/src/share/classes/java/util/regex/Matcher.java
--- a/src/share/classes/java/util/regex/Matcher.java
+++ b/src/share/classes/java/util/regex/Matcher.java
@@ -25,6 +25,7 @@
 
 package java.util.regex;
 
+import java.util.function.Function;
 
 /**
  * An engine that performs match operations on a {@link java.lang.CharSequence
@@ -916,6 +917,54 @@
     }
 
     /**
+     * Replaces every subsequence of the input sequence that matches the
+     * pattern with the string returned by the given replacement function.
+     *
+     * <p> This method first resets this matcher.  It then scans the input
+     * sequence looking for matches of the pattern.  Characters that are not
+     * part of any match are appended directly to the result string; each match
+     * is replaced in the result by the string returned by the replacement
+     * function.  The replacement strings may contain references to captured
+     * subsequences as in the {@link #appendReplacement appendReplacement}
+     * method.
+     *
+     * <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
+     * the string returned by the replacement function may cause the results to
+     * be different than if they were being treated as a literal strings. Dollar
+     * signs may be treated as references to captured subsequences as described
+     * above, and backslashes are used to escape literal characters in the
+     * replacement string.
+     *
+     * <p> Given the regular expression <tt>(\\w)(\\w*)</tt>, the input
+     * <tt>"paTTern maTcher"</tt>, and the replacement function
+     * <tt>m -> m.group(1).toUpperCase() + m.group(2).toLowerCase()</tt>, an
+     * invocation of this method on a matcher for that expression would yield
+     * the string <tt>"Pattern Matcher"</tt>. </p>
+     *
+     * <p> Invoking this method changes this matcher's state.  If the matcher
+     * is to be used in further matching operations then it should first be
+     * reset.  </p>
+     *
+     * @param  f
+     *         The function providing replacement strings
+     * @return  The string constructed by replacing each matching subsequence
+     *          by the replacement string provide by the given function,
+     *          substituting captured subsequences as needed
+     * @since 1.8
+     */
+    public String replaceAll(Function<MatchResult, String> f) {
+        reset();
+        if (find()) {
+            StringBuffer sb = new StringBuffer();
+            do {
+                appendReplacement(sb, f.apply(this));
+            } while (find());
+            return appendTail(sb).toString();
+        }
+        return text.toString();
+    }
+
+    /**
      * Replaces the first subsequence of the input sequence that matches the
      * pattern with the given replacement string.
      *
==


     Juergen


> On Apr 8, 2013, at 6:59 PM, jk at blackdown.de wrote:
>
>> Hi Paul,
>> 
>> it would be nice if Pattern/Matcher offered a terse way to loop over all
>> matches in a string and replace them via a callback.
>> 
>> E.g. I'm currently using something like this:
>> 
>> private static final PatternAndReplacement PASS2 = new PatternAndReplacement(
>>        Pattern.compile("  ( "
>>                        + "   \\A \\p{Punct}*"         // start of title…
>>                        + " |"
>>                        + "   [:.;?!]\\ +"             // or of subsentence…
>>                        + " | "
>>                        + "   \\  ['\"“‘(\\[] \\ *"    // or of inserted subphrase…
>>                        + ")"
>>                        + "(" + SMALL_WORDS + ") \\b", // … followed by small word
>>                        Pattern.COMMENTS | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS),
>>        m -> Matcher.quoteReplacement(m.group(1) + capitalize(m.group(2))));
>> 
>> with PatternAndReplacement being
>> 
>> private static class PatternReplacement implements Function<String, String> {
>>    private final Pattern pattern;
>>    private final Function<MatchResult, String> function;
>> 
>>    PatternReplacement(final Pattern p, final Function<MatchResult, String> f) {
>>        pattern = p;
>>        function = f;
>>    }
>> 
>>    @Override
>>    public final String apply(final String s) {
>>        Matcher m = pattern.matcher(s);
>>        if (m.find()) {
>>            StringBuffer sb = new StringBuffer(s.length());
>>            do {
>>                m.appendReplacement(sb, function.apply(m));
>>            } while (m.find());
>>            return m.appendTail(sb).toString();
>>        }
>>        return s;
>>    }
>> }
>> 
>> Any plans for something like this?
>> 
>> 
>> Jürgen
>> 
>> 
>> paul.sandoz at oracle.com writes:
>> 
>>> Changeset: 526131346981
>>> Author:    psandoz
>>> Date:      2013-04-08 17:16 +0200
>>> URL:       http://hg.openjdk.java.net/lambda/lambda/jdk/rev/526131346981
>>> 
>>> Pattern.splitAsStream.
>>> Contributed-by: Ben Evans <benjamin.john.evans at gmail.com>
>>> 
>>> ! src/share/classes/java/util/regex/Pattern.java
>>> + test-ng/tests/org/openjdk/tests/java/util/regex/PatternTest.java
>

-- 
https://blackdown.de/


More information about the lambda-dev mailing list