RFR: 8343110: Add getChars(int, int, char[], int) to CharSequence and CharBuffer

Markus KARG duke at openjdk.org
Sat Mar 22 17:51:08 UTC 2025


On Sat, 26 Oct 2024 15:48:11 GMT, Markus KARG <duke at openjdk.org> wrote:

> This Pull Request proposes an implementation for [JDK-8343110](https://bugs.openjdk.org/browse/JDK-8343110): Adding the new method `public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)` to the `CharSequence` interface, providing a **bulk-read** facility including a default implementation iterating over `charAt(int)`.
> 
> In addition, this Pull Request proposes to replace the implementation of `Reader.of(CharSequence).read(char[] cbuf, int off, int len)` to invoke `CharSequence.getChars(next, next + n, cbuf, off)` instead of utilizing pattern matching for switch. Also, this PR proposes to implement `CharBuffer.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)` as an alias for `CharBuffer.get(srcBegin, dst, dstBegin, srcEnd - srcBegin)`.
> 
> To ensure quality...
> * ...the method signature and JavaDocs are adapted from `AbstractStringBuilder.getChars(...)`.
> * ...this PR relies upon the existing tests for `Reader.of(CharSequence)`, as these provide sufficient coverage of all changes introduced by this PR.

Chen,

I do see your point and like to ask everybody to chime in and comment about the alternatives we have. In the end, we need to hear more people than just the two of us to decide *which* of both alternatives to finally implement:


* Alternative A: `CharSequence::getChars(int, int, char[], int)` - Fast, but as risky as `Reader::read(char[], int, int len)`

This alternative provides best possible performance, but rogue implementations of `CharSequence::getChars(int, int, char[], int)` might access the passed array *beyond* the requested limits. This allows poisoning and spying.

There already are APIs in the JRE which pose that exact risk *since very long time*: `Reader::read(char[], int, int len)`, just as one example, exists since Java 1.x, so there could exist lots of rogue custom `Reader` implementations already. As a consequence, IMHO we can expect Java developers to be very familiar with this kind of risk (and to have set up strategies to deal with it) *for decades*. As those rogue `Reader`s *already could* exist, developers *unaware* of this kind of risk actually are in very big trouble *already*. Hence IMHO de-facto we gain *nothing* if we prevent a well-known risk at point B that exists since 30 years at point A. *Iff* we really want to get rid of that kind of risk, we need to talk about adding array-views/sub-arrays to the Java language (which is illusory *in the next years*) or about removing existing core APIs (which is illusory *absolutely*).

Developers will tackle this risk at `CharSequence::getChars` just in the very same way as they did it for 30 years at `Reader::read`: Either they bear the risk, or they go with the code you proposed (intermediate array-copy in untrusted cases). The JRE itself even does that internally, see https://github.com/openjdk/jdk/commit/b0d145097cdc61e4bab19393a125e63aa3bc29b9 for example.


* Alternative B: `CharSequence::subsequence(int, int).getChars(char[], int)` - Risk-free but possibly very slow

This implementation prevents the risk implied by Alternative A, but for the price of an additional (possibly expensive) copy performed *always* (*not just* in the untrusted case). While JIT eventually *might* optimize away this costs for *some* implementations, it might *not* for interpreted code, non-HotSpot JVMs, or I/O-bound implementations. `CharSequence` implementations *are not forced to* hold its data in-memory. In the I/O-bound case, imposing `subsequence` *might* possibly impose *even worse* performance than a loop over `getChar(int)` (assumption in this example: I/O duplication forces I/O writes/flushes, but I/O loop-over-read-char will fill a buffer once and read from that). As this new API proposal *solely* is about improving performance, this actually would be counter-productive.


My personal conclusion is: As the risk exists since 30 years, it will not be a problem for developers. Let's pave the way for performance.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21730#issuecomment-2745384387


More information about the nio-dev mailing list