RFR: JDK-8337974 - ChannelInputStream::skip can use splice (linux)
Alan Bateman
alanb at openjdk.org
Tue Aug 13 11:10:48 UTC 2024
On Wed, 7 Aug 2024 09:36:46 GMT, Markus KARG <duke at openjdk.org> wrote:
> # Targets
>
> The major target of this issue is to reduce execution time of `ChannelInputStream::skip(long)`. In particular, make `skip(n)` run noticable faster than `read(new byte[n])` on pipes and sockets in the optimal case, but don't run noticable slower in the worst case.
>
> A side target of this issue is to provide unit testing for `ChannelInputStream::skip(long)`. In particular, provide unit testing for files, pipes and sockets.
>
> An appreciated benefit of this issue is reduced resource consumption (in the sense of CPU cycles, Java Heap, power consumption, CO2 footprint, etc.) of `ChannelInputStream::skip(long)`. Albeit, as it is not a target, this was not acitvely monitored.
>
>
> # Non-Targets
>
> It is not a target to improve any other methods of the mentioned or any other class. Such changes should go in separate issues.
>
> It is not a target to add any new *public* APIs. The public API shall be kept unchanged. All changes implied by the current improvement shall be purely *internal* to OpenJDK.
>
> It is not a target to improve other source types besides pipes and sockets.
>
>
> # Description
>
> What users expect from `InputStream::skip`, besides comfort, is "at least some" measurable benefit over `InputStream::read`. Otherwise skipping instead of reading makes no sense to users.
>
> For files, OpenJDK already applies an effective `seek`-based optimization. For pipes and sockets, benefits were neglectible so far, as `skip` more or less was simply an alias for `read`.
>
> Hence, this issue proposes optimized implementations for `ChannelInputStream::skip` in the pipes and sockets cases.
>
>
> # Implementation
>
> The abstract idea behind this issue is to prevent transmission of skipped data into the JVM's on-heap memory in the pipes and socket cases. As a Java application obviously is not interested in skipped data, copying it into the Java heap is a waste of both, time and heap, and induces (albeit neglectible) GC stress without any benefit.
>
> Hence, this pull request changes the implementation of `ChannelInputStream::skip` in the following aspects:
> 1. On *all* operating systems, for pipe and socket channels, `skip` is implemented in C. While the data *still is* transferred form the source into the OS kernel and from the OS kernel into the JVM's off-heap memory, it is *not* transferred into the JVM's on-heap memory.
> 2. For *Linux* pipes only, `splice` is used with `/dev/null` as the target. Data is neither transferred from the source into the OS kernel, nor from the OS kernel into n...
I think the Socket and Pipe changes need to be separated and the merits of each discussed separately.
Starting with the changes to improve InputStream.skip when connected to a socket seems okay. There are two implementations, one is the SocketImpl implementation used by the Socket API (not changed here), the other is implementation returned by SocketChannel::socket which is changed here. It seems plausible that improving skip will help some use-cases. My initial reaction to touching this area is that will likely require clarifications to the specification of Socket.getInputStream, e.g. this method doesn't specify how show skip behaves when a timeout is set, doesn't specify IllegalBlockingModeException when the channel is non-blocking, and doesn't specify how it behaves when the Thread is interrupted.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20489#issuecomment-2285981102
More information about the nio-dev
mailing list