RFR: 8294696 - BufferedInputStream.transferTo should drain buffer when mark set [v5]

Mon Oct 31 08:30:23 UTC 2022

On Sat, 29 Oct 2022 12:31:41 GMT, Markus KARG <duke at openjdk.org> wrote:

>> This PR implements JDK-8294696.
>
> Markus KARG has updated the pull request incrementally with one additional commit since the last revision:
> 
>   No immediate buffer resize; in the rare case of read-past-EOF automatic buffer grow will happen anyways

**TL;DR:** Your optimization is now contained in this PR and outperforms the previous code. Nevertheless I plead for Alternative C (double buffering) instead, which is even faster but much simpler.

**Justification**

I took some time and did intensive JMH testing (with two buffer sizes and with four percental start positions within the buffer to simulate non-pre-read, slightly-pre-read, mostly-pre-read and fully-pre-read buffer situations) with three variants: The current code with early buffer resize (which you say is not necessary), the current code without early buffer resize (which you propose instead), and the so-called Alternative C (which uses double-buffering instead of buffer-replacing).

* Alternative B: Optimization outperforms unoptimized code
The results are shown below, and as you can see, early buffer resize is only a problem in *some*, but not all, cases. Nevertheless I agree that should be avoided, so I already pushed it as commit https://github.com/openjdk/jdk/pull/10525/commits/9c2f502c06083694deefe6a9d4c8ee4b77a92c68 and hope that you are fine with this latest piece of code.

* Alternative C: Outperforms Alternative B (optimized *and* unoptimized code) unless buffer is huge
Interesting to see BTW is that Alternative C (double-buffering instead of buffer-replacing) even outperforms that optimized code for the default buffer size (8K), and is only slightly slower in the case of large (1M) buffers (apparently `System.arraycopy` is faster than `Arrays.fill` unless the array is huge, which I did not expect originally). As the code is super-simple compared to the current code, and as the originally assumed risk of OOME does not exists (if it would, the `fill()` method would fail already, as it uses double-buffering, too, and is called by the `read` happening *before* `transferTo`) I would propose that we go with this code instead of the optimized Alternative B:

private long implTransferTo(OutputStream out) throws IOException {
    if (getClass() == BufferedInputStream.class && markpos == -1) {
        int avail = count - pos;
        if (avail > 0) {
            byte[] buffer = new byte[avail]; // Double buffering prevents poisoning and leaking
            System.arraycopy(getBufIfOpen(), pos, buffer, 0, avail);
            out.write(buffer);
            pos = count;
        }
        return avail + getInIfOpen().transferTo(out);
    } else {
        return super.transferTo(out);
    }
}

Here are the numbers in full length for your reference:

Alternative B optimized (no early buffer resize)

Benchmark                                            (bufSize)  (posPercent)   Mode  Cnt        Score       Error  Units
BufferedInputStreamTransferToPerformance.transferTo       8192             0  thrpt   25     4289,949 ┬▒   217,452  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192            10  thrpt   25     3823,443 ┬▒   314,171  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192            90  thrpt   25     4122,949 ┬▒    52,131  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192           100  thrpt   25     4675,491 ┬▒    46,194  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576             0  thrpt   25     3477,497 ┬▒   111,229  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576            10  thrpt   25     4767,220 ┬▒    18,274  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576            90  thrpt   25    43900,923 ┬▒   799,471  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576           100  thrpt   25  1680163,414 ┬▒ 29823,617  ops/s

Alternative B original (with early buffer resize)

Benchmark                                            (bufSize)  (posPercent)   Mode  Cnt        Score       Error  Units
BufferedInputStreamTransferToPerformance.transferTo       8192             0  thrpt   25     4346,835 ┬▒   273,140  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192            10  thrpt   25     3816,744 ┬▒   220,669  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192            90  thrpt   25     3624,021 ┬▒   340,279  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192           100  thrpt   25     4484,060 ┬▒    95,201  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576             0  thrpt   25     3507,334 ┬▒   140,604  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576            10  thrpt   25     4800,050 ┬▒    44,987  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576            90  thrpt   25    44327,841 ┬▒   256,322  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576           100  thrpt   25  1699685,163 ┬▒ 44257,421  ops/s

Alternative C: Double buffering

Benchmark                                            (bufSize)  (posPercent)   Mode  Cnt        Score       Error  Units
BufferedInputStreamTransferToPerformance.transferTo       8192             0  thrpt   25     4555,523 ┬▒    83,370  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192            10  thrpt   25     3972,562 ┬▒   489,380  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192            90  thrpt   25     4181,716 ┬▒    49,527  ops/s
BufferedInputStreamTransferToPerformance.transferTo       8192           100  thrpt   25     4602,409 ┬▒   125,641  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576             0  thrpt   25     3373,734 ┬▒   161,989  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576            10  thrpt   25     4684,583 ┬▒    87,257  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576            90  thrpt   25    44082,232 ┬▒   364,210  ops/s
BufferedInputStreamTransferToPerformance.transferTo    1048576           100  thrpt   25  1690855,778 ┬▒ 53885,439  ops/s

-------------

PR: https://git.openjdk.org/jdk/pull/10525