API change proposal: String concatenation boost

Server Performance serverperformance at gmail.com
Sun Sep 21 08:15:43 UTC 2008


Hello, this is my first collaboration to OpenJDK so sorry if I missed some
step...  And sorry for my English :-(
This is my proposal to be discussed:

THE GOAL: Boost the overall String concatenation / append operations.

BACKGROUND / HISTORY:
• At the beginning (JDK 1.0 days) we had String.concat() and StringBuffer to
build Strings. Both approaches had initially bad performance.
• Starting at JDK 1.4 (I think), a share-on-copy strategy was introduced in
StringBuffer. The performance gain was obvious, but increased the needed
head and in some cases produced some memory leak when reusing StringBuffer.
• Starting at JDK 1.5, StringBuilder was introduced as the “unsyncronized
version”, but also the copy-on-write optimization was undo, becoming an
always copy scenario. Also, the String + operator is translated to
StringBuilder.append() by javac. This has been discussed but no better
alternative was found (see
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6219959 )
• This current implementation generates several System.arraycopy() calls: at
least one per append/insert/delete (two if expanding capacity) and a final
one in the toString() method.

STUDYING THE USES:
• If we look at the uses of StringBuilder (both inside JDK code, in
application server and/or final applications), in nearly 99% of times its is
only used to create a String in a single-threaded context and (the most
important fact) only using the append() and toString() methods.
• Also, only in 5% of the instantiatings, the coder establishes the initial
capacity. Many times doesn’t matter, but other times it is impossible to
guess it or calculate it. And even worst: some times the coder fails in his
guess: establishes to much initial capacity or too few.

MY PROPOSAL:
• Create a new class java.lang.StringAppender implements Appendable
1. Mostly same in its exposed public constructors and methods than
StringBuilder, but the only operations are the “append()” ones (no insert,
no delete, no replace)
2. Internally represented as a String array
3. Only arraycopy() or create char arrays once, inside the toString() method
(well, this isn’t completely true: also arraycopies when appending
objects/variables other than String instances or char arrays, but the most
typical operation is appending strings!)
4. Doesn’t need to stablish an initial capacity. Never more calculating it
or guessing it. Always 
• Add a new constructor in the java.lang.String class (actually 5 new
constructors for performance reasons, see below):
1. public String(String... strs)
2. public String(String str0, String str1)
3. public String(String str0, String str1, String str2)
4. public String(String str0, String str1, String str2, String str3)
(NOTE: these 3 additional constructors are needed to boost appends of a
small number of Strings, in which case the overload of creating the array
and then looping inside is much greater than passing 2, 3 or 4 parameters in
the constructor invocation).
• Change the javac behavior: the String + operator must be translated into
“new String(String... )” instead of “new
StringBuilder().append().append()... ..toString()”
• Revise other JDK sourcecodes to use StringAppender, and the rest of
programs all around the world. (By the way in the Glassfish V2 sourcecode I
see several String.concat() invocations; seems strange to me... )
• So the new blueprints for String concatenation should be:
1. For append-only, not conditional concatenations, use the new String
constructor. Example: String result = new String(part1, part2, part3,
part4);
2. For append-only, conditional or looped concatenations, use the
StringAppender class.
3. For  other manipulations (insert, delete, replace), use StringBuilder
4. For a thread-safe version, use StringBuffer

THE BOOST:
As you can see in my microbenchmark results, executed in Linux x64 and
Windows 32 bits (-server, -client, and -XX:+AggressiveOpts versions), we can
achieve a boost between 1% and 167% (depends on the scenario and
architecture). Well those values are the extremes, the typical gains go
between 20% and 70%. I think these results are good enough to be taken into
consideration :-)

THE SOURCE CODE:
See attachments, String.java.diff with the added code (it is clear), and
StringAppender with the new proposed class.

THE MICROBENCHMARK CODE:
See attachment.
Of course should be revised. I think I have made it correctly.

THE MICROBENCHMARK RESULTS (varied to me about +/-1% in different executions
due to the host load or whatever):
See attached file. I think they are great...


What do you think?
Best regards,
--Jesús Viñuales
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BENCHMARK_results.txt
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20080921/da78e4cc/BENCHMARK_results.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MicrobenchmarkString.java
Type: application/octet-stream
Size: 25097 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20080921/da78e4cc/MicrobenchmarkString.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: String.java.diff
Type: application/octet-stream
Size: 5603 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20080921/da78e4cc/String.java.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: StringAppender.java
Type: application/octet-stream
Size: 8210 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20080921/da78e4cc/StringAppender.java>


More information about the core-libs-dev mailing list