RFR: 8294696 - BufferedInputStream.transferTo should drain buffer when mark set [v5]
Alan Bateman
alanb at openjdk.org
Wed Nov 2 09:25:35 UTC 2022
On Mon, 31 Oct 2022 09:52:06 GMT, Alan Bateman <alanb at openjdk.org> wrote:
>> **TL;DR:** Your optimization is now contained in this PR and outperforms the previous code. Nevertheless I plead for Alternative C (double buffering) instead, which is even faster but much simpler.
>>
>> **Justification**
>>
>> I took some time and did intensive JMH testing (with two buffer sizes and with four percental start positions within the buffer to simulate non-pre-read, slightly-pre-read, mostly-pre-read and fully-pre-read buffer situations) with three variants: The current code with early buffer resize (which you say is not necessary), the current code without early buffer resize (which you propose instead), and the so-called Alternative C (which uses double-buffering instead of buffer-replacing).
>>
>> * Alternative B: Optimization outperforms unoptimized code
>> The results are shown below, and as you can see, early buffer resize is only a problem in *some*, but not all, cases. Nevertheless I agree that should be avoided, so I already pushed it as commit https://github.com/openjdk/jdk/pull/10525/commits/9c2f502c06083694deefe6a9d4c8ee4b77a92c68 and hope that you are fine with this latest piece of code.
>>
>> * Alternative C: Outperforms Alternative B (optimized *and* unoptimized code) unless buffer is huge
>> Interesting to see BTW is that Alternative C (double-buffering instead of buffer-replacing) even outperforms that optimized code for the default buffer size (8K), and is only slightly slower in the case of large (1M) buffers (apparently `System.arraycopy` is faster than `Arrays.fill` unless the array is huge, which I did not expect originally). As the code is super-simple compared to the current code, and as the originally assumed risk of OOME does not exists (if it would, the `fill()` method would fail already, as it uses double-buffering, too, and is called by the `read` happening *before* `transferTo`) I would propose that we go with this code instead of the optimized Alternative B:
>>
>> private long implTransferTo(OutputStream out) throws IOException {
>> if (getClass() == BufferedInputStream.class && markpos == -1) {
>> int avail = count - pos;
>> if (avail > 0) {
>> byte[] buffer = new byte[avail]; // Double buffering prevents poisoning and leaking
>> System.arraycopy(getBufIfOpen(), pos, buffer, 0, avail);
>> out.write(buffer);
>> pos = count;
>> }
>> return avail + getInIfOpen().transferTo(out);
>> } else {
>> return super.transferTo(out);
>> }
>> }
>>
>>
>> Here are the numbers in full length for your reference:
>>
>> Alternative B optimized (no early buffer resize)
>>
>> Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units
>> BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4289,949 ┬▒ 217,452 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3823,443 ┬▒ 314,171 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 4122,949 ┬▒ 52,131 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4675,491 ┬▒ 46,194 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3477,497 ┬▒ 111,229 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4767,220 ┬▒ 18,274 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 43900,923 ┬▒ 799,471 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1680163,414 ┬▒ 29823,617 ops/s
>>
>> Alternative B original (with early buffer resize)
>>
>> Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units
>> BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4346,835 ┬▒ 273,140 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3816,744 ┬▒ 220,669 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 3624,021 ┬▒ 340,279 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4484,060 ┬▒ 95,201 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3507,334 ┬▒ 140,604 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4800,050 ┬▒ 44,987 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 44327,841 ┬▒ 256,322 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1699685,163 ┬▒ 44257,421 ops/s
>>
>> Alternative C: Double buffering
>>
>> Benchmark (bufSize) (posPercent) Mode Cnt Score Error Units
>> BufferedInputStreamTransferToPerformance.transferTo 8192 0 thrpt 25 4555,523 ┬▒ 83,370 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 10 thrpt 25 3972,562 ┬▒ 489,380 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 90 thrpt 25 4181,716 ┬▒ 49,527 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 8192 100 thrpt 25 4602,409 ┬▒ 125,641 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 0 thrpt 25 3373,734 ┬▒ 161,989 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 10 thrpt 25 4684,583 ┬▒ 87,257 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 90 thrpt 25 44082,232 ┬▒ 364,210 ops/s
>> BufferedInputStreamTransferToPerformance.transferTo 1048576 100 thrpt 25 1690855,778 ┬▒ 53885,439 ops/s
>>
>> (N.B.: Huge buffers appear to have better numbers than small buffers simply because the test always transported 1 MB of data, so with a huge buffer this happens in few steps, while with small buffers this happens in many steps.)
>
>> **TL;DR:** Your optimization is now contained in this PR and outperforms the previous code. Nevertheless I plead for Alternative C (double buffering) instead, which is even faster but much simpler.
>
> You had dismissed this previously so it's useful to have results now to compare. I agree it's much simpler, it can probably use Arrays.copyOfRange too.
>
> For the results, can you say what the underlying input stream is and what the target output stream is? Also it would be useful to get a baseline. I suspect you have that, it's just not pasted in the table above.
> @AlanBateman Alternative C+ Arrays.copyOfRange() is implemented in 92ed32c. So I think we finally got it? :-)
Yes, I think so, this is a lot simpler, so I think you've got it to a good place.
-------------
PR: https://git.openjdk.org/jdk/pull/10525
More information about the core-libs-dev
mailing list