Proposal for new public class: java.io.CharSequenceReader
Markus Karg
markus at headcrashing.eu
Sat Sep 28 16:15:54 UTC 2024
Dear Sirs,
for performance reasons, hereby I like to propose the new public class
java.io.CharSequenceReader. Before sharing a pull request, I'd kindly like
to request for comments.
Since Java 1.1 we have the StringReader class. Since Java 1.4 we have the
CharSequence class. StringBuilder, StringBuffer and CharBuffer are
first-class implementations of it in the JDK, and there might exist
third-party implementations of non-String character streams. Until today,
however, we do not have a Reader for CharSequences, but need to go costly
detours.
To process non-String character streams, the typical detour today is to turn
a CharSequence into a temporary String (hence duplicating its full
contents), which needs time and memory (and eventually GC), for the sole
sake of being processable by a StringReader. As StringReader is
synchronized, each single access is synthetically slowed down. In many cases
the synchronization has no use at all, as in real-world applications, least
Readers are actually accessed concurrently. As a result, today the major
benefit of StringBuilder over StringBuffer (being non-synchronized) vanishes
as soon as a StringReader is used to access it. This means, "new
StringReader(stringBuffer.toString());" imposes slower performance than
essentially needed, in two ways: toString, synchronized.
In an attempt to improve performance of this rather typical use case, I like
to contribute a pull request providing the new public class
java.io.CharSequenceReader. My idea is to mostly copy the existing code of
StringReader, but wrap CharSequence instead of String; then strip
synchronization; then add optimized access for the String, StringBuffer and
StringBuilder implementations (in the sense of ::getChars(char[], int, int)
to prevent a char-by-char loop in these cases). The idea mostly is covered
by Apache Commons IO's CharSequenceReader, which nicely serves as a PoC:
https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/co
mmons/io/input/CharSequenceReader.java.
Alternatives:
- Applications could use Apache Commons IO's CharSequenceReader. As it is an
open-source third-party dependency, some authors might not be allowed to use
it, or may not want to carry this additional burden just for the sake of
this single performance improvement. In addition, this library is not very
actively maintained; its Java baseline still is Java 8. There is no
commercial support.
- Applications could write their own Reader implementation. Given the
assumption that this is a rather common use case, this imposes unjustified
additional work for the authors of thousands of applications. It is hard to
justify why there is a StringReader but not a CharSequenceReader.
- Instead of writing a new CharSequenceReader class we could slightly modify
StringReader, so it accepts CharSequences (not only Strings). This does not
remove the synchronization overhead unless we decide to remove the
synchronization from StringReader's implementation, and it would be
confusing / surprising (in the negative sense) that a class named
"StringReader" actually is a "CharSequenceReader".
Options:
- Instead of adding special cases for
"String/StringBuilder/StringBuffer::getChars()", we could add
"getChars(char[], int, int)" as a new default method to the CharSequence
interface, essentially providing a char-by-char loop for all CharSequence
implementations not already having an optimized getChars method. This makes
the implementation of CharSequenceReader simpler, as not "switch over
instanceof" is needed to benefit from optimized "getChars" implementations.
- Once we have CharSequenceReader, we could replace the full implementation
of StringReader by synchronized calls to CharSequenceReader. This would
reduce duplicate code.
Kindly requesting comments.
-Markus Karg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240928/ecb55e47/attachment-0001.htm>
More information about the core-libs-dev
mailing list