Request for Comments: Adding bulk-read method "CharSequence.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)"

Sat Oct 26 16:06:27 UTC 2024

Hi Markus,
Should we drop the srcBigin/srcEnd parameters, as they can be replaced by a
subSequence(srcBegin, srcEnd) call?

On Fri, Oct 25, 2024, 12:34 PM Markus Karg <markus at headcrashing.eu> wrote:

> I hereby request for comments on the proposal to generalize the existing
> method "String.getChars()"'s signature to become a new default interface
> method "CharSequence.getChars()".
>
>
>
> Problem
>
>
>
> For performance reasons, many CharSequence implementations, in particular
> String, StringBuilder, StringBuffer and CharBuffer, provide a way to
> bulk-read a complete region of their characters content into a provided
> char array.
>
> Unfortunately, there is no _uniform_ way to perform this, and it is not
> guaranteed that there is bulk-reading implemented with _any_ CharSequence,
> in particular custom ones.
>
> While String, StringBuilder and StringBuffer all share the same getChars()
> method signature for this purpose, CharBuffer's way to perform the very
> same is the get() method.
>
> Other implementations have other method signatures, or do not have any
> solution to this problem at all.
>
> In particular, there is no method in their common interface, CharSequence,
> to perform such an bulk-optimized read, as CharSequence only allows to read
> one character after the next in a sequential way, either by iterating over
> charAt() from 0 to length(), or by consuming the chars() Stream.
>
>
>
> As a result, code that wants to read from CharSequence in an
> implementation-agnostic, but still bulk-optimized way, needs to know _each_
> possible implementation's specific method!
>
> Effectively this results in code like this (real-world example taken from
> the implementation of Reader.of(CharSequence) in JDK 24):
>
>
>
> switch (cs) {
>
>                case String s -> s.getChars(next, next + n, cbuf, off);
>
>                case StringBuilder sb -> sb.getChars(next, next + n, cbuf,
> off);
>
>                case StringBuffer sb -> sb.getChars(next, next + n, cbuf,
> off);
>
>                case CharBuffer cb -> cb.get(next, cbuf, off, n);
>
>                default -> {
>
>                               for (int i = 0; i < n; i++)
>
>                                              cbuf[off + i] =
> cs.charAt(next + i);
>
>                }
>
> }
>
>
>
> The problem with this code is that it is bound and limited to exactly that
> given set of CharSequence implementations.
>
> If a future CharSequence implementation shall get accessed in an
> bulk-optimized way, the switch expression has to get extended and
> recompiled _every time_.
>
> If some custom CharSequence implementation is used that this code is not
> aware of, sequential read is applied, even if that implementation _does_
> provide some bulk-read method!
>
>
>
> Solution
>
>
>
> There are several possible alternative solutions:
>
> * (A) CharSequence.getChars(int srcBegin, int srcEnd, char[] dst, int
> dstBegin) - As this signature is already supported by String, StringBuffer
> and StringBuilder, I hereby propose to add this signature to CharSequence
> and provide a default implementation that iterates over charAt(int) from 0
> to length().
>
> * (B) Alternatively the same default method could be implemented using the
> chars() Stream - I assume that might run slower, but correct me if I am
> wrong.
>
> * (C) Alternatively we could go with the signature get(char[] dst, int
> offset, int length) - Only CharBuffer implements that already, so more
> changes are needed and more duplicate methods will exist in the end.
>
> * (D) Alternatively we could come up with a totally different signature -
> That would be most fair to all existing implementations, but in the end it
> will imply the most changes and the most duplicate methods.
>
> * (E) We could give up the idea and live with the situation as-is. - I
> assume only few people really prefer that outcome.
>
>
>
> Please tell me if I missed a viable option!
>
>
>
> As a side benefit of CharSequence.getChars(), its existence might trigger
> implementors to provide bulk-reading if not done yet, at least for those
> cases where it is actually feasible.
>
> In the same way it might trigger callers of Reader to start making use of
> bulk reading, at least in those cases where it does make sense but
> application authors were reluctant to implement the switch-case shown above.
>
>
>
> Hence, if nobody vetoes, I will file Jira Issue, PR and CSR for
> "CharSequence.getChars()" (alternative A) in the next days.
>
>
>
> -Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20241026/6e6421ad/attachment-0001.htm>