Request for Enhancement: java.io.Writer.of(Appendable) as an efficient alternative to java.io.StringWriter
Markus KARG
markus at headcrashing.eu
Fri Dec 20 15:11:11 UTC 2024
Dear Sirs,
JDK 24 comes with Reader.of(CharSequence), now let's provide the
symmetrical counterpart Writer.of(Appendable) in JDK 25! :-)
For performance reasons, hereby I like to propose the new public factory
method Writer.of(Appendable). This will provide the same benefits for
writing, that Reader.of(CharSequence) provides for reading since JDK 24
(see JDK-8341566). Before sharing a pull request, I'd kindly like to
request for comments.
Since Java 1.1 we have the StringWriter class. Since Java 1.5 we have
the Appendable interface. StringBuilder, StringBuffer and CharBuffer are
first-class implementations of it in the JDK, and there might exist
third-party implementations of non-String text sinks. Until today,
however, we do not have a Writer for Appendables, but need to go costly
detours.
Text sinks in Java are expected to implement the Writer interface.
Libraries and frameworks expect application code to provide Writers to
consume text produced by the library or framework, for example.
Application code often wants to modify the received text, e. g. embed
received SVG text into in a larger HTML text document, or simply forward
the text as-is to I/O, so StringBuilder or CharBuffer is what the
application code actually uses, but not Strings! In such cases, taking
the StringWriter.toString() detour is common but inefficient: It implies
duplicating the COMPLETE text for the sole sake of creating a temporary
String, while the subsequent processing will copy the data anyways or
just uses a small piece of it. This eats up time and memory uselessly,
and increases GC pressure. Also, StringWriter is synchronized (not
explicitly, but de-facto, as it uses StringBuffer), which implies
another needless slowdown. In many cases, the synchronization has no use
at all, as in real-world applications least Writers are actually
accessed concurrently. As a result, today the major benefit of
StringBuilder over StringBuffer (being non-synchronized) vanishes as
soon as a StringWriter is used to provide its content. This means,
"stringBuilder.append(stringWriter.toString())" imposes slower
performance than essentially needed, in two ways: toString(), synchronized.
In an attempt to improve performance of this rather typical use case, I
like to contribute a pull request providing the new public factory
method java.io.Writer.of(Appendable). This is symmetrical to the
solution we implemented in JDK-8341566 for the reversed case:
java.io.Reader.of(CharSequence).
My idea is to mostly copy the existing code of StringWriter, but wrap a
caller-provided Appendable instead of an internally created
StringBuilder; this strips synchronization; then add optimized use for
the StringBuffer, StringBuilder and CharBuffer implementations (in the
sense of write(char[],start,end) to prevent a char-by-char loop in these
cases).
Alternatives:
- Applications could use Apache Commons IO's StringBuilderWriter, which
is limited to StringBuilder, so is not usable for the CharBuffer or
custom Appendable case. As it is an open-source third-party dependency,
some authors might not be allowed to use it, or may not want to carry
this additional burden just for the sake of this single performance
improvement. In addition, this library is not actively modernized; its
Java baseline still is Java 8. There is no commercial support.
- Applications could write their own Writer implementation. Given the
assumption that this is a rather common use case, this imposes
unjustified additional work for the authors of thousands of
applications. It is hard to justify why there is a StringWriter but not
a Writer for other Appendables.
- Instead of writing a new Writer factory method, we could slightly
modify StringWriter, so it uses StringBuilder (instead of StringBuffer).
This (still) results in unnecessary duplication of the full text at
toString() and (now also) at getBuffer(), and it will break existing
applications due the missing synchronization.
- Instead of writing a new Writer factory method, we could write a new
AppendableWriter class. This piles up the amount of public classes,
which was the main reason in JDK-8341566 to go with the
"Reader.of(CharSequence)" factory method instead of the
"CharSequenceReader" class. Also it would be confusing to have
Reader.of(...) but not Writer.of(...) in the API.
- We could go with a specific Appendable class (like StringBuilder)
instead of supporting all Appendable implementations. This would reduce
the number of applicable use cases daramatically (in particular as
CharBuffer is not supported any more) without providing any considerable
benefit (other than making the OpenJDK-internal source code a bit
shorter). In particular it makes it impossible to opt-in for the below
option:
Option:
- Once we have Writer.of(Appendable), we could replace the full
implementation of StringWriter by synchronized calls to the new Writer.
This would reduce duplicate code.
Kindly requesting comments.
-Markus Karg
More information about the core-libs-dev
mailing list