[PATCH] regex matcher opt: remove redundant StringBuilder
(moving this to a separate discussion) --- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first < 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result); On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen <xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception. My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
Hi Isaac, I actually meant to say "we are not supposed to output the partial text into the output buffer in case of an exception". It has nothing to do with the changeset you cited. This has been the behavior since day one/JDK1.4, though it is not specified explicitly in the API doc. The newly added StringBuilder variant simply follows this behavior. If it's really desired it is kinda doable to save that StringBuilder without the "incompatible" behavior change but just wonder if it is really worth the effort. Thanks, Sherman On 4/24/18, 9:11 AM, Isaac Levy wrote:
(moving this to a separate discussion)
--- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first< 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result);
On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen<xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception.
My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
Hi Sherman, Thanks for clarifying. Looks like exceptions are caused by invalid format string. I wouldn't expect most programs to be catching this and preserving their buffer, but dunno. How much does it affect perf? Well it depends on use case, a jmh of replaceAll with a length 200 string of digits and regex "(\w)" shows about 30% speedup. [info] Benchmark Mode Cnt Score Error Units [info] origM avgt 10 11.669 ± 0.211 us/op [info] newM avgt 10 8.926 ± 0.095 us/op Isaac On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
Hi Isaac,
I actually meant to say "we are not supposed to output the partial text into the output buffer in case of an exception". It has nothing to do with the changeset you cited. This has been the behavior since day one/JDK1.4, though it is not specified explicitly in the API doc. The newly added StringBuilder variant simply follows this behavior. If it's really desired it is kinda doable to save that StringBuilder without the "incompatible" behavior change but just wonder if it is really worth the effort.
Thanks, Sherman
On 4/24/18, 9:11 AM, Isaac Levy wrote:
(moving this to a separate discussion)
--- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first< 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result);
On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen<xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception.
My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
for String based replaceAll/First() it might be worth doing something like http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/ On 4/24/18, 10:47 AM, Isaac Levy wrote:
Hi Sherman,
Thanks for clarifying. Looks like exceptions are caused by invalid format string. I wouldn't expect most programs to be catching this and preserving their buffer, but dunno.
How much does it affect perf? Well it depends on use case, a jmh of replaceAll with a length 200 string of digits and regex "(\w)" shows about 30% speedup.
[info] Benchmark Mode Cnt Score Error Units [info] origM avgt 10 11.669 ± 0.211 us/op [info] newM avgt 10 8.926 ± 0.095 us/op
Isaac
On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen<xueming.shen@oracle.com> wrote:
Hi Isaac,
I actually meant to say "we are not supposed to output the partial text into the output buffer in case of an exception". It has nothing to do with the changeset you cited. This has been the behavior since day one/JDK1.4, though it is not specified explicitly in the API doc. The newly added StringBuilder variant simply follows this behavior. If it's really desired it is kinda doable to save that StringBuilder without the "incompatible" behavior change but just wonder if it is really worth the effort.
Thanks, Sherman
On 4/24/18, 9:11 AM, Isaac Levy wrote:
(moving this to a separate discussion)
--- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first< 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result);
On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen<xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception.
My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
yeah perhaps this is enough. Though users of appendReplacement are presumably after high performance loops or they'd be using a higher level API like replaceAll. I just can't imagine people using this API hit the format code exception in normal usage, and it's not like java is dumping junk into the StringBuilder when it throws the exception -- it pushed the correct replacement up to the error. Maybe I can catch exceptions, roll back the builder using setLength, and rethrow. Isaac On Tue, Apr 24, 2018 at 9:38 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
for String based replaceAll/First() it might be worth doing something like
http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/
On 4/24/18, 10:47 AM, Isaac Levy wrote:
Hi Sherman,
Thanks for clarifying. Looks like exceptions are caused by invalid format string. I wouldn't expect most programs to be catching this and preserving their buffer, but dunno.
How much does it affect perf? Well it depends on use case, a jmh of replaceAll with a length 200 string of digits and regex "(\w)" shows about 30% speedup.
[info] Benchmark Mode Cnt Score Error Units [info] origM avgt 10 11.669 ± 0.211 us/op [info] newM avgt 10 8.926 ± 0.095 us/op
Isaac
On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen<xueming.shen@oracle.com> wrote:
Hi Isaac,
I actually meant to say "we are not supposed to output the partial text into the output buffer in case of an exception". It has nothing to do with the changeset you cited. This has been the behavior since day one/JDK1.4, though it is not specified explicitly in the API doc. The newly added StringBuilder variant simply follows this behavior. If it's really desired it is kinda doable to save that StringBuilder without the "incompatible" behavior change but just wonder if it is really worth the effort.
Thanks, Sherman
On 4/24/18, 9:11 AM, Isaac Levy wrote:
(moving this to a separate discussion)
--- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first< 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result);
On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen<xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception.
My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
Your patch looks good to me, and will get the majority of performance benefits without any API issues. Isaac On Tue, Apr 24, 2018 at 9:38 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
for String based replaceAll/First() it might be worth doing something like
http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/
On 4/24/18, 10:47 AM, Isaac Levy wrote:
Hi Sherman,
Thanks for clarifying. Looks like exceptions are caused by invalid format string. I wouldn't expect most programs to be catching this and preserving their buffer, but dunno.
How much does it affect perf? Well it depends on use case, a jmh of replaceAll with a length 200 string of digits and regex "(\w)" shows about 30% speedup.
[info] Benchmark Mode Cnt Score Error Units [info] origM avgt 10 11.669 ± 0.211 us/op [info] newM avgt 10 8.926 ± 0.095 us/op
Isaac
On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen<xueming.shen@oracle.com> wrote:
Hi Isaac,
I actually meant to say "we are not supposed to output the partial text into the output buffer in case of an exception". It has nothing to do with the changeset you cited. This has been the behavior since day one/JDK1.4, though it is not specified explicitly in the API doc. The newly added StringBuilder variant simply follows this behavior. If it's really desired it is kinda doable to save that StringBuilder without the "incompatible" behavior change but just wonder if it is really worth the effort.
Thanks, Sherman
On 4/24/18, 9:11 AM, Isaac Levy wrote:
(moving this to a separate discussion)
--- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first< 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result);
On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen<xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception.
My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
I still think it would be valuable to land a patch for replaceAll to avoid temporary StringBuilders, is there anyone who wants to help me land it? Isaac On Sun, Apr 29, 2018 at 10:24 PM, Isaac Levy <isaac.r.levy@gmail.com> wrote:
Your patch looks good to me, and will get the majority of performance benefits without any API issues.
Isaac
On Tue, Apr 24, 2018 at 9:38 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
for String based replaceAll/First() it might be worth doing something like
http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/
On 4/24/18, 10:47 AM, Isaac Levy wrote:
Hi Sherman,
Thanks for clarifying. Looks like exceptions are caused by invalid format string. I wouldn't expect most programs to be catching this and preserving their buffer, but dunno.
How much does it affect perf? Well it depends on use case, a jmh of replaceAll with a length 200 string of digits and regex "(\w)" shows about 30% speedup.
[info] Benchmark Mode Cnt Score Error Units [info] origM avgt 10 11.669 ± 0.211 us/op [info] newM avgt 10 8.926 ± 0.095 us/op
Isaac
On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen<xueming.shen@oracle.com> wrote:
Hi Isaac,
I actually meant to say "we are not supposed to output the partial text into the output buffer in case of an exception". It has nothing to do with the changeset you cited. This has been the behavior since day one/JDK1.4, though it is not specified explicitly in the API doc. The newly added StringBuilder variant simply follows this behavior. If it's really desired it is kinda doable to save that StringBuilder without the "incompatible" behavior change but just wonder if it is really worth the effort.
Thanks, Sherman
On 4/24/18, 9:11 AM, Isaac Levy wrote:
(moving this to a separate discussion)
--- a/src/java.base/share/classes/java/util/regex/Matcher.java +++ b/src/java.base/share/classes/java/util/regex/Matcher.java @@ -993,13 +993,11 @@ public Matcher appendReplacement(StringBuilder sb, String replacement) { // If no match, return error if (first< 0) throw new IllegalStateException("No match available"); - StringBuilder result = new StringBuilder(); - appendExpandedReplacement(replacement, result); // Append the intervening text sb.append(text, lastAppendPosition, first); // Append the match substitution + appendExpandedReplacement(replacement, sb); - sb.append(result);
On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen<xueming.shen@oracle.com> wrote:
I would assume in case of an exception thrown from appendExpandedReplacement() we don't want "text" to be pushed into the "sb".
-sherman
Perhaps. Though the behavior under exception is undefined and this function is probably primarily used though the replaceAll API, which wouldn’t return the intermediate sb under the case of exception.
My reading of the blame was the temp StringBuilder was an artifact of the api previously using StringBuffer externally. See http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
participants (2)
-
Isaac Levy
-
Xueming Shen