RFR: JDK-8337974 - Implementing ChannelInputStream::skip using splice on Linux and C-loops elsewhere

Wed Aug 7 09:46:59 UTC 2024

# Targets

The major target of this issue is to reduce execution time of `ChannelInputStream::skip(long)`. In particular, make `skip(n)` run noticable faster than `read(new byte[n])` on pipes and sockets in the optimal case, but don't run noticable slower in the worst case.

A side target of this issue is to provide unit testing for `ChannelInputStream::skip(long)`. In particular, provide unit testing for files, pipes and sockets.

An appreciated benefit of this issue is reduced resource consumption (in the sense of CPU cycles, Java Heap, power consumption, CO2 footprint, etc.) of `ChannelInputStream::skip(long)`. Albeit, as it is not a target, this was not acitvely monitored.

# Non-Targets

It is not a target to improve any other methods of the mentioned or any other class. Such changes should go in separate issues.

It is not a target to add any new *public* APIs. The public API shall be kept unchanged. All changes implied by the current improvement shall be purely *internal* to OpenJDK.

It is not a target to improve other source types besides pipes and sockets.

# Description

What users expect from `InputStream::skip`, besides comfort, is "at least some" measurable benefit over `InputStream::read`. Otherwise skipping instead of reading makes no sense to users.

For files, OpenJDK already applies an effective `seek`-based optimization. For pipes and sockets, benefits were neglectible so far, as `skip` more or less was simply an alias for `read`.

Hence, this issue proposes optimized implementations for `ChannelInputStream::skip` in the pipes and sockets cases.

# Implementation

The abstract idea behind this issue is to prevent transmission of skipped data into the JVM's on-heap memory in the pipes and socket cases. As a Java application obviously is not interested in skipped data, copying it into the Java heap is a waste of both, time and heap, and induces (albeit neglectible) GC stress without any benefit.

Hence, this pull request changes the implementation of `ChannelInputStream::skip` in the following aspects:
1. On *all* operating systems, for pipe and socket channels, `skip` is implemented in C. While the data *still is* transferred form the source into the OS kernel and from the OS kernel into the JVM's off-heap memory, it is *not* transferred into the JVM's on-heap memory.
2. For *Linux* pipes only, `splice` is used with `/dev/null` as the target. Data is neither transferred from the source into the OS kernel, nor from the OS kernel into neither the JVM's off- nor on-heap memory.

For the latter, `/dev/null` is kept open permanently, as dynamically closing and reopening it imposes a considerable performance penalty, while keeping it open imposes only neglectable overhead.

Note: The implementation is mostly copied from existing code of the `read` case and of the `transferTo` test suite. I deliberately tried to modify only the very essential pieces, so the code stays comparable with `read` and the `transferTo` test suite, and hence, stays easiliy maintainable together with that origins.

# Benchmarking

## Case Selection
Benchmarking was performed for pipes only, and only on Linux (Debian on WSL2) and Windows (W2K Pro), but on the exact same hardware. Linux and Windows are assumed to be not only the most used operating systems, but to also cover a big diversity of I/O performance behaviors (Linux is know to be rather fast, Windows is known to be rather slow). On Windows, NIO pipes are actually utilizing OS sockets in OpenJDK 23, so effectively sockets are benchmarked implicitly by the pipes benchmark).

Benchmarking was performed on the optimized branch and on OpenJDK 23 main branch as a baseline.

## Results
The charts below indicate that on both tested operating systems and on both tested source types, performance should in no case be noticable slower than the baseline.

There is clear evidence, that in some many cases, the optimization is faster than the baseline, and in few cases, it is even considerably faster.

In particular, pipes on Linux showed up to 17.5 times the baseline throughput), and "pipes" (effectively: sockets) on Windows reached up to 1.54 times the baseline.

Note: X-axis is logarithmic. Candlesticks reflect error range.

![grafik](https://github.com/user-attachments/assets/5a8c81c2-cced-465a-84b8-bd4f5382689e)

![grafik](https://github.com/user-attachments/assets/21ff33c3-91d7-4acb-8b9f-0a531f0e7d6c)

-------------

Commit messages:
 - Removed trailing whitespace
 - Corrected issue ID: 8337974
 - JDK-8337974 Implementing ChannelInputStream::skip using splice on Linux and C-loops elsewhere

Changes: https://git.openjdk.org/jdk/pull/20489/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20489&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337974
  Stats: 1177 lines in 20 files changed: 1175 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20489.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20489/head:pull/20489

PR: https://git.openjdk.org/jdk/pull/20489