RFR: 8294696 - BufferedInputStream.transferTo should drain buffer when mark set [v5]
Markus KARG
duke at openjdk.org
Mon Oct 31 08:30:23 UTC 2022
On Sat, 29 Oct 2022 12:31:41 GMT, Markus KARG <duke at openjdk.org> wrote:
>> This PR implements JDK-8294696.
>
> Markus KARG has updated the pull request incrementally with one additional commit since the last revision:
>
> No immediate buffer resize; in the rare case of read-past-EOF automatic buffer grow will happen anyways
**TL;DR:** Your optimization is now contained in this PR and outperforms the previous code. Nevertheless I plead for Alternative C (double buffering) instead, which is even faster but much simpler.
**Justification**
I took some time and did intensive JMH testing (with two buffer sizes and with four percental start positions within the buffer to simulate non-pre-read, slightly-pre-read, mostly-pre-read and fully-pre-read buffer situations) with three variants: The current code with early buffer resize (which you say is not necessary), the current code without early buffer resize (which you propose instead), and the so-called Alternative C (which uses double-buffering instead of buffer-replacing).
* Alternative B: Optimization outperforms unoptimized code
The results are shown below, and as you can see, early buffer resize is only a problem in *some*, but not all, cases. Nevertheless I agree that should be avoided, so I already pushed it as commit https://github.com/openjdk/jdk/pull/10525/commits/9c2f502c06083694deefe6a9d4c8ee4b77a92c68 and hope that you are fine with this latest piece of code.
* Alternative C: Outperforms Alternative B (optimized *and* unoptimized code) unless buffer is huge
Interesting to see BTW is that Alternative C (double-buffering instead of buffer-replacing) even outperforms that optimized code for the default buffer size (8K), and is only slightly slower in the case of large (1M) buffers (apparently `System.arraycopy` is faster than `Arrays.fill` unless the array is huge, which I did not expect originally). As the code is super-simple compared to the current code, and as the originally assumed risk of OOME does not exists (if it would, the `fill()` method would fail already, as it uses double-buffering, too, and is called by the `read` happening *before* `transferTo`) I would propose that we go with this code instead of the optimized Alternative B:
private long implTransferTo(OutputStream out) throws IOException {
if (getClass() == BufferedInputStream.class && markpos == -1) {
int avail = count - pos;
if (avail > 0) {
byte[] buffer = new byte[avail]; // Double buffering prevents poisoning and leaking
System.arraycopy(getBufIfOpen(), pos, buffer, 0, avail);
out.write(buffer);
pos = count;
}
return avail + getInIfOpen().transferTo(out);
} else {
return super.transferTo(out);
}
}
Here are the numbers in full length for your reference:
Alternative B optimized (no early buffer resize)
Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units
BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4289,949 ┬▒ 217,452 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3823,443 ┬▒ 314,171 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 4122,949 ┬▒ 52,131 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4675,491 ┬▒ 46,194 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3477,497 ┬▒ 111,229 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4767,220 ┬▒ 18,274 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 43900,923 ┬▒ 799,471 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1680163,414 ┬▒ 29823,617 ops/s
Alternative B original (with early buffer resize)
Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units
BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4346,835 ┬▒ 273,140 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3816,744 ┬▒ 220,669 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 3624,021 ┬▒ 340,279 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4484,060 ┬▒ 95,201 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3507,334 ┬▒ 140,604 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4800,050 ┬▒ 44,987 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 44327,841 ┬▒ 256,322 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1699685,163 ┬▒ 44257,421 ops/s
Alternative C: Double buffering
Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units
BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4555,523 ┬▒ 83,370 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3972,562 ┬▒ 489,380 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 4181,716 ┬▒ 49,527 ops/s
BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4602,409 ┬▒ 125,641 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3373,734 ┬▒ 161,989 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4684,583 ┬▒ 87,257 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 44082,232 ┬▒ 364,210 ops/s
BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1690855,778 ┬▒ 53885,439 ops/s
-------------
PR: https://git.openjdk.org/jdk/pull/10525
More information about the core-libs-dev
mailing list