RFR: 8013395 StringBuffer.toString performance regression impacting embedded benchmarks

Mon May 13 08:07:02 UTC 2013

Hi David,

On 05/13/2013 09:12 AM, David Holmes wrote:
> Thanks for all the feedback and discussion. I've rewritten the patch 
> to use Peter's suggestion of caching the char[] instead of the actual 
> String:
>
> http://cr.openjdk.java.net/~dholmes/8013395/webrev.v3/
>
> I too am concerned that any form of caching that avoids the array-copy 
> (which is the crux of the issue) is going to incur a 2x space penalty. 
> I can imagine old but well-behaved code that uses StringBuffer rather 
> than StringBuilder where very long buffers are created and toString() 
> only called once - and they may get immediate or subsequent OOME due 
> to the extra char[]; or excessive GC activity might be induced if a 
> lot of StringBuffers are used.
>
> So I tried to implement a simple copy-on-write-if-shared (COWIS) scheme:
>
> http://cr.openjdk.java.net/~dholmes/8013395/webrev.v4/
>
> but I must have done something silly as the resulting JDK won't run - 
> for some reason it causes classes and resources to not be found. :(

I don't know if it's the only error but take a look to the constructor 
String(char[], boolean),
you will understand what's wrong.

The other possible source of errors is that the VM has some string 
optimizations that
recognize patterns like buffer.append(s).append(s2).toString().

>
> David
> -----

Rémi

>
> On 11/05/2013 3:36 AM, Mike Duigou wrote:
>> The implementation looks OK to me. I like Peter's suggestion of 
>> caching the char[] rather than string.
>>
>> I do wish that the cache could be in a soft reference but understand 
>> that there are tradeoffs to doing so. I am worried about leaks with 
>> this implementation.
>>
>> Another possibility, to go totally nuts, is to consider implementing 
>> a form of copy-on-write-following-toString. This would be the exact 
>> opposite of "minimal set of changes to address this specific problem" 
>> and probably not wise to attempt for Java 8.
>>
>> Just as an FYI, this issue has in-flight conflicts with Martin's 
>> ongoing CharSequence.getChars patch.
>>
>> Mike
>>
>> On May 9 2013, at 23:03 , David Holmes wrote:
>>
>>> Short version:
>>>
>>> Cache the value returned by toString and use it to copy-construct a 
>>> new String on subsequent calls to toString(). Clear the cache on any 
>>> mutating operation.
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/
>>>
>>> Testing: microbenchmark for toString performance; new regression 
>>> test for correctness; JPRT testset core as a sanity check
>>>
>>> Still TBD - full SE benchmark (?)
>>>
>>> Thanks,
>>> David
>>> ---------
>>>
>>> Long version:
>>>
>>> One of the goals for JDK8 is to provide a path from Java ME CDC to 
>>> Java SE (or SE Embedded). In the embedded space some pretty old 
>>> benchmarks still get used for doing comparisons between JRE's. One 
>>> of which makes heavy use of StringBuffer.toString, without modifying 
>>> the StringBuffer in between.
>>>
>>> Up to Java 1.4.2 a StringBuffer and a String could share the 
>>> underlying char[]. This meant that toString simply needed to create 
>>> a new String that referenced the StringBuffer's char[] with no 
>>> copying of the array needed. In Java 5 the String/StringBuffer 
>>> implementations were completely revised: StringBuilder was 
>>> introduced for non-synchronized use, and a new AbstractStringBuilder 
>>> base class added for it and StringBuffer. In that implementation 
>>> toString now has to copy the StringBuffer's char[]. This resulted in 
>>> a significant performance regression for toString() and a bug - 
>>> 6219959 - was opened. There is quite an elaborate evaluation in that 
>>> bug report but bottom line was that "real code doesn't depend on 
>>> this - won't fix".
>>>
>>> At some stage ME also updated to the new Java 5 code and they also 
>>> noticed the problem. As a result CDC6 included a variation of the 
>>> caching strategy that is proposed here.
>>>
>>> Going forward because we want people to be able to compare ME and SE 
>>> with their familiar benchmarks, we would like to address this corner 
>>> case and fix it using the caching strategy outlined. As a data point 
>>> an 8K StringBuffer that takes ~1ms to be converted to a String 
>>> initially, can process subsequent toString() calls in a few 
>>> microseconds. So that performance issue is addressed.
>>>
>>> However we've added a write to a field in all the mutating methods 
>>> which obviously adds some additional computational effort - though I 
>>> have no doubt it is lost in the noise for all but the smallest of 
>>> mutating methods. Even so this should be run against regular SE 
>>> benchmarks to ensure there are no performance regressions there - so 
>>> if anyone has a suggestion as to the best benchmark to run to 
>>> exercise StringBuffer (if it exists), please let me know.
>>>
>>> Thanks for reading this far :)
>>