StringBuilder version of java.util.regex.Matcher.append*

Peter Levart peter.levart at gmail.com
Tue Apr 8 12:54:23 UTC 2014


On 04/07/2014 07:00 PM, Xueming Shen wrote:
> On 04/04/2014 10:08 AM, Xueming Shen wrote:
>> On 4/3/14 4:43 PM, Jeremy Manson wrote:
>>> Good catch, thanks.
>>>
>>> I think we should probably just go with the (equivalent to the) 
>>> StringBuffer variant.  I'm pretty loathe to modify the StringBuilder 
>>> directly if we are going to back that change out.
>>>
>>> Do you want me to generate a new patch?
>>
>> I can/will send out an updated webrev before push.
>
> the latest webrev.
>
> http://cr.openjdk.java.net/~sherman/8039124/webrev
>
> -Sherman

Hi Sherman,

This seems most straight-forward and simple. My proposed variant of 
appending directly to StringBuffer/StringBuilder (thus to 
AbstractStringBuilder) had also a performance disadvantage, since code 
in appendExpandedReplacement() could see at least two different 
subclasses of AbstractStringBuilder and therefore could not inline calls 
to AbstractStringBuilder's virtual methods and had to use duo or even 
megamorphic calls. I measured and it appears that this has more overhead 
than an additional StringBuilder copy for common-sized replacements...

One thing to note. Matcher.appendReplacement(StringBuffer, ...) has 
never been specified as or implemented to be an atomic operation from 
the StringBuffer's perspective. But this could easily be achieved:

  795     public Matcher appendReplacement(StringBuffer sb, String replacement) {
  796         // If no match, return error
  797         if (first < 0)
  798             throw new IllegalStateException("No match available");
  799         StringBuilder result = new StringBuilder();
  800         appendExpandedReplacement(replacement, result);
  +           synchronized(sb) {
  801             // Append the intervening text
  802             sb.append(text, lastAppendPosition, first);
  803             // Append the match substitution
  804             sb.append(result);
  +           }
  805         lastAppendPosition = last;
  806         return this;
  807     }


...I don't know if this makes uncontended locking exhibit any 
performance penalties since the "sb" monitor is locked twice (nested). I 
haven't measured, sorry.

Peter

>
>>
>> -Sherman
>>
>>>
>>> Jeremy
>>>
>>>
>>> On Thu, Apr 3, 2014 at 2:27 PM, Xueming Shen 
>>> <xueming.shen at oracle.com <mailto:xueming.shen at oracle.com>> wrote:
>>>
>>>     On 03/25/2014 02:07 PM, Jeremy Manson wrote:
>>>>     Okay.  Thanks, Sherman.  Here's an updated version.
>>>>
>>>>     I've diverged a bit from Peter's version.  In this version, 
>>>> appendExpandedReplacement takes a StringBuilder. The implications 
>>>> is that In the StringBuilder case, it saves creating a new 
>>>> StringBuilder object.  In the StringBuffer case, it creates a new 
>>>> StringBuilder, but it doesn't have to acquire and release all of 
>>>> those locks.
>>>
>>>     Hi Jeremy,
>>>
>>>     It appears the "optimized" StringBuilder version will cause 
>>> following test case failure,
>>>     in which the "xyz" will be copied into the result buffer, even 
>>> when the replacement
>>>     string triggers a IAE.
>>>
>>>             // Check nothing has been appended into the output 
>>> buffer if
>>>             // the replacement string triggers 
>>> IllegalArgumentException.
>>>             Pattern p = Pattern.compile("(abc)");
>>>             Matcher m = p.matcher("abcd");
>>>             StringBuilder result = new StringBuilder();
>>>             try {
>>>                 m.appendReplacement(result, ("xyz$g"));
>>>             } catch (IllegalArgumentException iae) {
>>>                 if (result.length() != 0)
>>>                     System.err.println(" FAILED");
>>>
>>>             }
>>>
>>>     We may have to either catch the IAE and reset the sb, or create
>>>     a new sb, as the StringBuffer does.
>>>
>>>     -Sherman
>>>
>>>
>>>
>>>>
>>>>     I also noticed a redundant cast to (int), which I removed.
>>>>
>>>>     Jeremy
>>>>
>>>>
>>>>     On Fri, Mar 21, 2014 at 7:34 PM, Xueming Shen 
>>>> <xueming.shen at oracle.com <mailto:xueming.shen at oracle.com>> wrote:
>>>>
>>>>         let's add the StringBuilder method(s), if you can provide 
>>>> an updated version, I can run the rest (since it's
>>>>         to add new api, there is an internal CCC process need to go 
>>>> through).
>>>>
>>>>         -Sherman
>>>>
>>>>
>>>>         On 3/21/14 5:18 PM, Jeremy Manson wrote:
>>>>>         So, this is all a little opaque to me.  How do we make the 
>>>>> go/no-go decision on something like this?  Everyone who has chimed 
>>>>> in seems to think it is a good idea.
>>>>>
>>>>>         Jeremy
>>>>>
>>>>>
>>>>>         On Thu, Mar 20, 2014 at 10:38 AM, Jeremy Manson 
>>>>> <jeremymanson at google.com <mailto:jeremymanson at google.com>> wrote:
>>>>>
>>>>>             Sherman,
>>>>>
>>>>>             If you had released it then (which you wouldn't have 
>>>>> been able to do, because you would have to wait another two years 
>>>>> for Java 7), you would have found that it improved performance 
>>>>> even with C2.  It is only post-escape-analysis that the 
>>>>> performance in C2 equalized.
>>>>>
>>>>>             Anyway, I think adding the StringBuilder variant and 
>>>>> deferring / dealing with the Appendable differently is the right 
>>>>> approach, FWIW.
>>>>>
>>>>>             Jeremy
>>>>>
>>>>>
>>>>>             On Thu, Mar 20, 2014 at 10:25 AM, Xueming Shen 
>>>>> <xueming.shen at oracle.com <mailto:xueming.shen at oracle.com>> wrote:
>>>>>
>>>>>                 2009? I do have something similar back to 2009 :-)
>>>>>
>>>>> http://cr.openjdk.java.net/~sherman/regex_replace/webrev/ 
>>>>> <http://cr.openjdk.java.net/%7Esherman/regex_replace/webrev/>
>>>>>
>>>>>                 Then the ball was dropped around the discussion of 
>>>>> whether or not
>>>>>                 the IOE should be thrown.
>>>>>
>>>>>                 But if we are going to/have to have explicit 
>>>>> StringBuilder/Buffer pair
>>>>>                 anyway, then we can keep the Appendable version as 
>>>>> private for now
>>>>>                 and deal with the StringBuilder and Appendable as 
>>>>> two separate
>>>>>                 issues.
>>>>>
>>>>>                 -Sherman
>>>>>
>>>>>
>>>>>                 On 03/20/2014 09:52 AM, Jeremy Manson wrote:
>>>>>
>>>>>                     That's definitely an improvement - I think 
>>>>> that when I wrote this (circa
>>>>>                     2009), I didn't think about Appendable.
>>>>>
>>>>>                     I take it my argument convinced someone?  :)
>>>>>
>>>>>                     Jeremy
>>>>>
>>>>>
>>>>>                     On Thu, Mar 20, 2014 at 1:32 AM, Peter 
>>>>> Levart<peter.levart at gmail.com <mailto:peter.levart at gmail.com>>wrote:
>>>>>
>>>>>                         On 03/19/2014 06:51 PM, Jeremy Manson wrote:
>>>>>
>>>>>                             I'm told that the diff didn't make 
>>>>> it.  I've put it in a Google drive
>>>>>                             folder...
>>>>>
>>>>> https://drive.google.com/file/d/0B_GaXa6O4K5LY3Y0aHpranM3aEU/
>>>>>                             edit?usp=sharing
>>>>>
>>>>>                             Jeremy
>>>>>
>>>>>                         Hi Jeremy,
>>>>>
>>>>>                         Your factoring-out of expandReplacement() 
>>>>> method exposed an opportunity to
>>>>>                         further optimize the code. Instead of 
>>>>> creating intermediate StringBuilder
>>>>>                         instance for each expandReplacement() 
>>>>> call, this method could append
>>>>>                         directly to resulting 
>>>>> StringBuffer/StringBuilder, like in the following:
>>>>>
>>>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/MatcherWithStringBuilder/ 
>>>>> <http://cr.openjdk.java.net/%7Eplevart/jdk9-dev/MatcherWithStringBuilder/> 
>>>>>
>>>>>                         webrev.01/
>>>>>
>>>>>
>>>>>                         Regards, Peter
>>>>>
>>>>>
>>>>>
>>>>>                             On Wed, Mar 19, 2014 at 10:41 AM, 
>>>>> Jeremy Manson<jeremymanson at google.com 
>>>>> <mailto:jeremymanson at google.com>>
>>>>>                             wrote:
>>>>>
>>>>>                               Hi folks,
>>>>>
>>>>>                                 We've had this internally for a 
>>>>> while, and I keep meaning to bring it up
>>>>>                                 here.  The Matcher class has a few 
>>>>> public methods that take
>>>>>                                 StringBuffers,
>>>>>                                 and we've found it useful to add 
>>>>> similar versions that take
>>>>>                                 StringBuilders.
>>>>>
>>>>>                                 It has two benefits:
>>>>>
>>>>>                                 - Users don't have to convert from 
>>>>> one to the other when they want to use
>>>>>                                 the method in question. The 
>>>>> symmetry is nice.
>>>>>
>>>>>                                 - The StringBuilder variants are 
>>>>> faster (if lock optimizations don't kick
>>>>>                                 in, which happens in the 
>>>>> interpreter and the client compiler).  For
>>>>>                                 interpreted / client-compiled 
>>>>> code, we saw something like a 25% speedup
>>>>>                                 on
>>>>>                                 String.replaceAll(), which calls 
>>>>> into this code.
>>>>>
>>>>>                                 Any interest?  Diff attached.
>>>>>
>>>>>                                 Jeremy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>




More information about the core-libs-dev mailing list