RFR 8193832: Performance of InputStream.readAllBytes() could be improved
Paul Sandoz
paul.sandoz at oracle.com
Tue Dec 19 20:52:03 UTC 2017
Hi,
For the case of reading 2^N bytes i believe you can avoid doing a last copy by checking if “n < 0" within the “nread > 0” block when “nread == DEAFULT_BUFFER_SIZE”. That might close the perf gap for smaller cases. You can also move "nread = 0” to the same block e.g.:
var copy = (n < 0 && nread == DEAFULT_BUFFER_SIZE) ? buf : Arrays.copyOf(buf, nread);
list.add(copy)
nread = 0;
262 byte[] output = new byte[total];
263 int offset = 0;
264 int numCached = list.size();
265 for (int i = 0; i < numCached; i++) {
266 byte[] b = list.get(i);
267 System.arraycopy(b, 0, output, offset, b.length);
268 offset += b.length;
269 }
You can simplify to:
var result = new byte[total];
int offset = 0;
for (buf : list) {
System.arraycopy(buf, 0, result, offset, buf.length);
offset += buf.length;
}
s/list/bufs and then you can use var for the declarations at the start of the method.
Paul.
> On 19 Dec 2017, at 11:57, Brian Burkhalter <brian.burkhalter at oracle.com> wrote:
>
> https://bugs.openjdk.java.net/browse/JDK-8193832
> http://cr.openjdk.java.net/~bpb/8193832/webrev.00/
>
> The implementation of InputStream.readAllBytes() can be modified to be in general faster and to require less memory. The current implementation starts with an internal buffer of size 8192 bytes and doubles the size of the buffer each time more capacity is required resulting in a geometric progression in the buffer size. At the end if the buffer size is not exactly equal to the total number of bytes read, then another allocation equal to the total number read is performed. The amount of memory N required can be calculated for initial buffer size B and total bytes read L as
>
> M for L == B*(2^n)
> N =
> M + L for L != B*(2^n)
>
> where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).
>
> An alternative implementation is not to increase the size of the internal buffer each time more capacity is needed, but to instead maintain a list of buffers of up to B bytes each and gather those into an output buffer at the end. If there is only a single buffer in the list then gathering is not needed and it can be returned directly. The amount of memory N required in this case is
>
> B + L for L <= B
> N =
> B + 2*L for L > B
>
> A comparison of memory usage for a number of lengths L with a buffer size B of 8192 is:
>
> L N_old N_new
> 8191 16383 16383
> 8192 8192 16384
> 8193 32769 24578
> 16383 40959 40958
> 16384 24576 40960
> 16385 73729 40962
> 32767 90111 73726
> 32768 57344 73728
> 32769 155649 73730
> 65535 188415 139262
> 65536 122880 139264
> 65537 319489 139266
> 131071 385023 270334
> 131072 253952 270336
> 131073 647169 270338
> 262143 778239 532478
> 262144 516096 532480
> 262145 1302529 532482
>
> In general if the size of the last intermediate buffer in the old version does not equal the total number of bytes read, then the old version will require more memory thaw the new. The performance for the same set of lengths was measured using JMH:
>
> Benchmark (base) (offset) Mode Cnt Score Error Units
> BenchReadAllBytesParam.readAllBytes 8192 -1 thrpt 5 578064.253 ± 18955.667 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 8192 -1 thrpt 5 530963.634 ± 4799.923 ops/s
> BenchReadAllBytesParam.readAllBytes 8192 0 thrpt 5 1414243.593 ± 68520.245 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 8192 0 thrpt 5 532019.149 ± 7051.738 ops/s
> BenchReadAllBytesParam.readAllBytes 8192 1 thrpt 5 293586.365 ± 8196.939 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 8192 1 thrpt 5 361682.265 ± 27081.111 ops/s
> BenchReadAllBytesParam.readAllBytes 16384 -1 thrpt 5 216809.517 ± 4853.852 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 16384 -1 thrpt 5 212346.244 ± 2980.916 ops/s
> BenchReadAllBytesParam.readAllBytes 16384 0 thrpt 5 379757.470 ± 14524.249 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 16384 0 thrpt 5 218802.256 ± 4361.080 ops/s
> BenchReadAllBytesParam.readAllBytes 16384 1 thrpt 5 123570.712 ± 1665.661 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 16384 1 thrpt 5 208801.577 ± 8005.196 ops/s
> BenchReadAllBytesParam.readAllBytes 32768 -1 thrpt 5 93083.146 ± 1024.309 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 32768 -1 thrpt 5 104398.304 ± 1310.509 ops/s
> BenchReadAllBytesParam.readAllBytes 32768 0 thrpt 5 151104.878 ± 3461.359 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 32768 0 thrpt 5 104849.088 ± 3841.224 ops/s
> BenchReadAllBytesParam.readAllBytes 32768 1 thrpt 5 58039.827 ± 398.908 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 32768 1 thrpt 5 104489.685 ± 2118.496 ops/s
> BenchReadAllBytesParam.readAllBytes 65536 -1 thrpt 5 43144.695 ± 440.349 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 65536 -1 thrpt 5 55583.338 ± 2115.162 ops/s
> BenchReadAllBytesParam.readAllBytes 65536 0 thrpt 5 67216.536 ± 2055.057 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 65536 0 thrpt 5 55680.238 ± 1235.571 ops/s
> BenchReadAllBytesParam.readAllBytes 65536 1 thrpt 5 27908.000 ± 568.473 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 65536 1 thrpt 5 55938.779 ± 813.991 ops/s
> BenchReadAllBytesParam.readAllBytes 131072 -1 thrpt 5 20299.014 ± 568.616 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 131072 -1 thrpt 5 28115.036 ± 842.392 ops/s
> BenchReadAllBytesParam.readAllBytes 131072 0 thrpt 5 31617.475 ± 467.378 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 131072 0 thrpt 5 28274.289 ± 259.699 ops/s
> BenchReadAllBytesParam.readAllBytes 131072 1 thrpt 5 13640.406 ± 303.125 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 131072 1 thrpt 5 28221.298 ± 515.030 ops/s
> BenchReadAllBytesParam.readAllBytes 262144 -1 thrpt 5 9995.183 ± 249.368 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 262144 -1 thrpt 5 14043.194 ± 138.026 ops/s
> BenchReadAllBytesParam.readAllBytes 262144 0 thrpt 5 15309.238 ± 353.752 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 262144 0 thrpt 5 14048.569 ± 348.699 ops/s
> BenchReadAllBytesParam.readAllBytes 262144 1 thrpt 5 5940.442 ± 206.855 ops/s
> BenchReadAllBytesParam.readAllBytesArrayList 262144 1 thrpt 5 14055.357 ± 412.470 ops/s
>
> In each case the number of bytes read is the sum of the base and offset parameters. Similar behavior with respect to data length is observed for performance as for memory usage: if the data length is not equal to the size of the last intermediate buffer used, then the performance of the old version is in general worse than that of the new.
>
> Thanks,
>
> Brian
More information about the core-libs-dev
mailing list