StringBuilder version of java.util.regex.Matcher.append*
Jeremy Manson
jeremymanson at google.com
Tue Mar 25 21:07:59 UTC 2014
Okay. Thanks, Sherman. Here's an updated version.
I've diverged a bit from Peter's version. In this version,
appendExpandedReplacement takes a StringBuilder. The implications is that
In the StringBuilder case, it saves creating a new StringBuilder object.
In the StringBuffer case, it creates a new StringBuilder, but it doesn't
have to acquire and release all of those locks.
I also noticed a redundant cast to (int), which I removed.
Jeremy
On Fri, Mar 21, 2014 at 7:34 PM, Xueming Shen <xueming.shen at oracle.com>wrote:
> let's add the StringBuilder method(s), if you can provide an updated
> version, I can run the rest (since it's
> to add new api, there is an internal CCC process need to go through).
>
> -Sherman
>
>
> On 3/21/14 5:18 PM, Jeremy Manson wrote:
>
> So, this is all a little opaque to me. How do we make the go/no-go
> decision on something like this? Everyone who has chimed in seems to think
> it is a good idea.
>
> Jeremy
>
>
> On Thu, Mar 20, 2014 at 10:38 AM, Jeremy Manson <jeremymanson at google.com>wrote:
>
>> Sherman,
>>
>> If you had released it then (which you wouldn't have been able to do,
>> because you would have to wait another two years for Java 7), you would
>> have found that it improved performance even with C2. It is only
>> post-escape-analysis that the performance in C2 equalized.
>>
>> Anyway, I think adding the StringBuilder variant and deferring /
>> dealing with the Appendable differently is the right approach, FWIW.
>>
>> Jeremy
>>
>>
>> On Thu, Mar 20, 2014 at 10:25 AM, Xueming Shen <xueming.shen at oracle.com>wrote:
>>
>>> 2009? I do have something similar back to 2009 :-)
>>>
>>> http://cr.openjdk.java.net/~sherman/regex_replace/webrev/
>>>
>>> Then the ball was dropped around the discussion of whether or not
>>> the IOE should be thrown.
>>>
>>> But if we are going to/have to have explicit StringBuilder/Buffer pair
>>> anyway, then we can keep the Appendable version as private for now
>>> and deal with the StringBuilder and Appendable as two separate
>>> issues.
>>>
>>> -Sherman
>>>
>>>
>>> On 03/20/2014 09:52 AM, Jeremy Manson wrote:
>>>
>>>> That's definitely an improvement - I think that when I wrote this (circa
>>>> 2009), I didn't think about Appendable.
>>>>
>>>> I take it my argument convinced someone? :)
>>>>
>>>> Jeremy
>>>>
>>>>
>>>> On Thu, Mar 20, 2014 at 1:32 AM, Peter Levart<peter.levart at gmail.com
>>>> >wrote:
>>>>
>>>> On 03/19/2014 06:51 PM, Jeremy Manson wrote:
>>>>>
>>>>> I'm told that the diff didn't make it. I've put it in a Google drive
>>>>>> folder...
>>>>>>
>>>>>> https://drive.google.com/file/d/0B_GaXa6O4K5LY3Y0aHpranM3aEU/
>>>>>> edit?usp=sharing
>>>>>>
>>>>>> Jeremy
>>>>>>
>>>>>> Hi Jeremy,
>>>>>
>>>>> Your factoring-out of expandReplacement() method exposed an
>>>>> opportunity to
>>>>> further optimize the code. Instead of creating intermediate
>>>>> StringBuilder
>>>>> instance for each expandReplacement() call, this method could append
>>>>> directly to resulting StringBuffer/StringBuilder, like in the
>>>>> following:
>>>>>
>>>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/MatcherWithStringBuilder/
>>>>> webrev.01/
>>>>>
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 10:41 AM, Jeremy Manson<
>>>>>> jeremymanson at google.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>>> We've had this internally for a while, and I keep meaning to bring
>>>>>>> it up
>>>>>>> here. The Matcher class has a few public methods that take
>>>>>>> StringBuffers,
>>>>>>> and we've found it useful to add similar versions that take
>>>>>>> StringBuilders.
>>>>>>>
>>>>>>> It has two benefits:
>>>>>>>
>>>>>>> - Users don't have to convert from one to the other when they want
>>>>>>> to use
>>>>>>> the method in question. The symmetry is nice.
>>>>>>>
>>>>>>> - The StringBuilder variants are faster (if lock optimizations don't
>>>>>>> kick
>>>>>>> in, which happens in the interpreter and the client compiler). For
>>>>>>> interpreted / client-compiled code, we saw something like a 25%
>>>>>>> speedup
>>>>>>> on
>>>>>>> String.replaceAll(), which calls into this code.
>>>>>>>
>>>>>>> Any interest? Diff attached.
>>>>>>>
>>>>>>> Jeremy
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>>
>
>
-------------- next part --------------
diff --git a/src/share/classes/java/util/regex/Matcher.java b/src/share/classes/java/util/regex/Matcher.java
--- a/src/share/classes/java/util/regex/Matcher.java
+++ b/src/share/classes/java/util/regex/Matcher.java
@@ -65,7 +65,8 @@
* new strings whose contents can, if desired, be computed from the match
* result. The {@link #appendReplacement appendReplacement} and {@link
* #appendTail appendTail} methods can be used in tandem in order to collect
- * the result into an existing string buffer, or the more convenient {@link
+ * the result into an existing string buffer or
+ * string builder. Alternatively, the more convenient {@link
* #replaceAll replaceAll} method can be used to create a string in which every
* matching subsequence in the input sequence is replaced.
*
@@ -792,14 +793,118 @@
* that does not exist in the pattern
*/
public Matcher appendReplacement(StringBuffer sb, String replacement) {
-
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
- // Process substitution string to replace group references with groups
+ StringBuilder result = new StringBuilder();
+ appendExpandedReplacement(replacement, result);
+
+ // Append the intervening text
+ sb.append(text, lastAppendPosition, first);
+ // Append the match substitution
+ sb.append(result);
+
+ lastAppendPosition = last;
+ return this;
+ }
+
+ /**
+ * Implements a non-terminal append-and-replace step.
+ *
+ * <p> This method performs the following actions: </p>
+ *
+ * <ol>
+ *
+ * <li><p> It reads characters from the input sequence, starting at the
+ * append position, and appends them to the given string builder. It
+ * stops after reading the last character preceding the previous match,
+ * that is, the character at index {@link
+ * #start()} <tt>-</tt> <tt>1</tt>. </p></li>
+ *
+ * <li><p> It appends the given replacement string to the string builder.
+ * </p></li>
+ *
+ * <li><p> It sets the append position of this matcher to the index of
+ * the last character matched, plus one, that is, to {@link #end()}.
+ * </p></li>
+ *
+ * </ol>
+ *
+ * <p> The replacement string may contain references to subsequences
+ * captured during the previous match: Each occurrence of
+ * <tt>$</tt><i>g</i><tt></tt> will be replaced by the result of
+ * evaluating {@link #group(int) group}<tt>(</tt><i>g</i><tt>)</tt>.
+ * The first number after the <tt>$</tt> is always treated as part of
+ * the group reference. Subsequent numbers are incorporated into g if
+ * they would form a legal group reference. Only the numerals '0'
+ * through '9' are considered as potential components of the group
+ * reference. If the second group matched the string <tt>"foo"</tt>, for
+ * example, then passing the replacement string <tt>"$2bar"</tt> would
+ * cause <tt>"foobar"</tt> to be appended to the string builder. A dollar
+ * sign (<tt>$</tt>) may be included as a literal in the replacement
+ * string by preceding it with a backslash (<tt>\$</tt>).
+ *
+ * <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
+ * the replacement string may cause the results to be different than if it
+ * were being treated as a literal replacement string. Dollar signs may be
+ * treated as references to captured subsequences as described above, and
+ * backslashes are used to escape literal characters in the replacement
+ * string.
+ *
+ * <p> This method is intended to be used in a loop together with the
+ * {@link #appendTail appendTail} and {@link #find find} methods. The
+ * following code, for example, writes <tt>one dog two dogs in the
+ * yard</tt> to the standard-output stream: </p>
+ *
+ * <blockquote><pre>
+ * Pattern p = Pattern.compile("cat");
+ * Matcher m = p.matcher("one cat two cats in the yard");
+ * StringBuilder sb = new StringBuilder();
+ * while (m.find()) {
+ * m.appendReplacement(sb, "dog");
+ * }
+ * m.appendTail(sb);
+ * System.out.println(sb.toString());</pre></blockquote>
+ *
+ * @param sb
+ * The target string builder
+ *
+ * @param replacement
+ * The replacement string
+ *
+ * @return This matcher
+ *
+ * @throws IllegalStateException
+ * If no match has yet been attempted,
+ * or if the previous match operation failed
+ *
+ * @throws IndexOutOfBoundsException
+ * If the replacement string refers to a capturing group
+ * that does not exist in the pattern
+ */
+ public Matcher appendReplacement(StringBuilder sb, String replacement) {
+ // If no match, return error
+ if (first < 0)
+ throw new IllegalStateException("No match available");
+
+ // Append the intervening text
+ sb.append(text, lastAppendPosition, first);
+
+ // Append the match substitution
+ appendExpandedReplacement(replacement, sb);
+
+ lastAppendPosition = last;
+ return this;
+ }
+
+ /**
+ * Processes replacement string to replace group references with
+ * groups.
+ */
+ private StringBuilder appendExpandedReplacement(
+ String replacement, StringBuilder result) {
int cursor = 0;
- StringBuilder result = new StringBuilder();
while (cursor < replacement.length()) {
char nextChar = replacement.charAt(cursor);
@@ -852,8 +957,8 @@
cursor++;
} else {
// The first number is always a group
- refNum = (int)nextChar - '0';
- if ((refNum < 0)||(refNum > 9))
+ refNum = nextChar - '0';
+ if ((refNum < 0) || (refNum > 9))
throw new IllegalArgumentException(
"Illegal group reference");
cursor++;
@@ -864,7 +969,7 @@
break;
}
int nextDigit = replacement.charAt(cursor) - '0';
- if ((nextDigit < 0)||(nextDigit > 9)) { // not a number
+ if ((nextDigit < 0) || (nextDigit > 9)) { // not a number
break;
}
int newRefNum = (refNum * 10) + nextDigit;
@@ -884,13 +989,7 @@
cursor++;
}
}
- // Append the intervening text
- sb.append(text, lastAppendPosition, first);
- // Append the match substitution
- sb.append(result);
-
- lastAppendPosition = last;
- return this;
+ return result;
}
/**
@@ -913,6 +1012,25 @@
}
/**
+ * Implements a terminal append-and-replace step.
+ *
+ * <p> This method reads characters from the input sequence, starting at
+ * the append position, and appends them to the given string builder. It is
+ * intended to be invoked after one or more invocations of the {@link
+ * #appendReplacement appendReplacement} method in order to copy the
+ * remainder of the input sequence. </p>
+ *
+ * @param sb
+ * The target string builder
+ *
+ * @return The target string builder
+ */
+ public StringBuilder appendTail(StringBuilder sb) {
+ sb.append(text, lastAppendPosition, getTextLength());
+ return sb;
+ }
+
+ /**
* Replaces every subsequence of the input sequence that matches the
* pattern with the given replacement string.
*
@@ -950,7 +1068,7 @@
reset();
boolean result = find();
if (result) {
- StringBuffer sb = new StringBuffer();
+ StringBuilder sb = new StringBuilder();
do {
appendReplacement(sb, replacement);
result = find();
@@ -1000,7 +1118,7 @@
reset();
if (!find())
return text.toString();
- StringBuffer sb = new StringBuffer();
+ StringBuilder sb = new StringBuilder();
appendReplacement(sb, replacement);
appendTail(sb);
return sb.toString();
More information about the core-libs-dev
mailing list