RFR: 8013395 StringBuffer.toString performance regression impacting embedded benchmarks
Remi Forax
forax at univ-mlv.fr
Mon May 13 08:07:02 UTC 2013
Hi David,
On 05/13/2013 09:12 AM, David Holmes wrote:
> Thanks for all the feedback and discussion. I've rewritten the patch
> to use Peter's suggestion of caching the char[] instead of the actual
> String:
>
> http://cr.openjdk.java.net/~dholmes/8013395/webrev.v3/
>
> I too am concerned that any form of caching that avoids the array-copy
> (which is the crux of the issue) is going to incur a 2x space penalty.
> I can imagine old but well-behaved code that uses StringBuffer rather
> than StringBuilder where very long buffers are created and toString()
> only called once - and they may get immediate or subsequent OOME due
> to the extra char[]; or excessive GC activity might be induced if a
> lot of StringBuffers are used.
>
> So I tried to implement a simple copy-on-write-if-shared (COWIS) scheme:
>
> http://cr.openjdk.java.net/~dholmes/8013395/webrev.v4/
>
> but I must have done something silly as the resulting JDK won't run -
> for some reason it causes classes and resources to not be found. :(
I don't know if it's the only error but take a look to the constructor
String(char[], boolean),
you will understand what's wrong.
The other possible source of errors is that the VM has some string
optimizations that
recognize patterns like buffer.append(s).append(s2).toString().
>
> David
> -----
Rémi
>
> On 11/05/2013 3:36 AM, Mike Duigou wrote:
>> The implementation looks OK to me. I like Peter's suggestion of
>> caching the char[] rather than string.
>>
>> I do wish that the cache could be in a soft reference but understand
>> that there are tradeoffs to doing so. I am worried about leaks with
>> this implementation.
>>
>> Another possibility, to go totally nuts, is to consider implementing
>> a form of copy-on-write-following-toString. This would be the exact
>> opposite of "minimal set of changes to address this specific problem"
>> and probably not wise to attempt for Java 8.
>>
>> Just as an FYI, this issue has in-flight conflicts with Martin's
>> ongoing CharSequence.getChars patch.
>>
>> Mike
>>
>> On May 9 2013, at 23:03 , David Holmes wrote:
>>
>>> Short version:
>>>
>>> Cache the value returned by toString and use it to copy-construct a
>>> new String on subsequent calls to toString(). Clear the cache on any
>>> mutating operation.
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/
>>>
>>> Testing: microbenchmark for toString performance; new regression
>>> test for correctness; JPRT testset core as a sanity check
>>>
>>> Still TBD - full SE benchmark (?)
>>>
>>> Thanks,
>>> David
>>> ---------
>>>
>>> Long version:
>>>
>>> One of the goals for JDK8 is to provide a path from Java ME CDC to
>>> Java SE (or SE Embedded). In the embedded space some pretty old
>>> benchmarks still get used for doing comparisons between JRE's. One
>>> of which makes heavy use of StringBuffer.toString, without modifying
>>> the StringBuffer in between.
>>>
>>> Up to Java 1.4.2 a StringBuffer and a String could share the
>>> underlying char[]. This meant that toString simply needed to create
>>> a new String that referenced the StringBuffer's char[] with no
>>> copying of the array needed. In Java 5 the String/StringBuffer
>>> implementations were completely revised: StringBuilder was
>>> introduced for non-synchronized use, and a new AbstractStringBuilder
>>> base class added for it and StringBuffer. In that implementation
>>> toString now has to copy the StringBuffer's char[]. This resulted in
>>> a significant performance regression for toString() and a bug -
>>> 6219959 - was opened. There is quite an elaborate evaluation in that
>>> bug report but bottom line was that "real code doesn't depend on
>>> this - won't fix".
>>>
>>> At some stage ME also updated to the new Java 5 code and they also
>>> noticed the problem. As a result CDC6 included a variation of the
>>> caching strategy that is proposed here.
>>>
>>> Going forward because we want people to be able to compare ME and SE
>>> with their familiar benchmarks, we would like to address this corner
>>> case and fix it using the caching strategy outlined. As a data point
>>> an 8K StringBuffer that takes ~1ms to be converted to a String
>>> initially, can process subsequent toString() calls in a few
>>> microseconds. So that performance issue is addressed.
>>>
>>> However we've added a write to a field in all the mutating methods
>>> which obviously adds some additional computational effort - though I
>>> have no doubt it is lost in the noise for all but the smallest of
>>> mutating methods. Even so this should be run against regular SE
>>> benchmarks to ensure there are no performance regressions there - so
>>> if anyone has a suggestion as to the best benchmark to run to
>>> exercise StringBuffer (if it exists), please let me know.
>>>
>>> Thanks for reading this far :)
>>
More information about the core-libs-dev
mailing list