RFR 8193832: Performance of InputStream.readAllBytes() could be improved
Brian Burkhalter
brian.burkhalter at oracle.com
Tue Dec 19 19:57:26 UTC 2017
https://bugs.openjdk.java.net/browse/JDK-8193832
http://cr.openjdk.java.net/~bpb/8193832/webrev.00/
The implementation of InputStream.readAllBytes() can be modified to be in general faster and to require less memory. The current implementation starts with an internal buffer of size 8192 bytes and doubles the size of the buffer each time more capacity is required resulting in a geometric progression in the buffer size. At the end if the buffer size is not exactly equal to the total number of bytes read, then another allocation equal to the total number read is performed. The amount of memory N required can be calculated for initial buffer size B and total bytes read L as
M for L == B*(2^n)
N =
M + L for L != B*(2^n)
where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).
An alternative implementation is not to increase the size of the internal buffer each time more capacity is needed, but to instead maintain a list of buffers of up to B bytes each and gather those into an output buffer at the end. If there is only a single buffer in the list then gathering is not needed and it can be returned directly. The amount of memory N required in this case is
B + L for L <= B
N =
B + 2*L for L > B
A comparison of memory usage for a number of lengths L with a buffer size B of 8192 is:
L N_old N_new
8191 16383 16383
8192 8192 16384
8193 32769 24578
16383 40959 40958
16384 24576 40960
16385 73729 40962
32767 90111 73726
32768 57344 73728
32769 155649 73730
65535 188415 139262
65536 122880 139264
65537 319489 139266
131071 385023 270334
131072 253952 270336
131073 647169 270338
262143 778239 532478
262144 516096 532480
262145 1302529 532482
In general if the size of the last intermediate buffer in the old version does not equal the total number of bytes read, then the old version will require more memory thaw the new. The performance for the same set of lengths was measured using JMH:
Benchmark (base) (offset) Mode Cnt Score Error Units
BenchReadAllBytesParam.readAllBytes 8192 -1 thrpt 5 578064.253 ± 18955.667 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 8192 -1 thrpt 5 530963.634 ± 4799.923 ops/s
BenchReadAllBytesParam.readAllBytes 8192 0 thrpt 5 1414243.593 ± 68520.245 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 8192 0 thrpt 5 532019.149 ± 7051.738 ops/s
BenchReadAllBytesParam.readAllBytes 8192 1 thrpt 5 293586.365 ± 8196.939 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 8192 1 thrpt 5 361682.265 ± 27081.111 ops/s
BenchReadAllBytesParam.readAllBytes 16384 -1 thrpt 5 216809.517 ± 4853.852 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 16384 -1 thrpt 5 212346.244 ± 2980.916 ops/s
BenchReadAllBytesParam.readAllBytes 16384 0 thrpt 5 379757.470 ± 14524.249 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 16384 0 thrpt 5 218802.256 ± 4361.080 ops/s
BenchReadAllBytesParam.readAllBytes 16384 1 thrpt 5 123570.712 ± 1665.661 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 16384 1 thrpt 5 208801.577 ± 8005.196 ops/s
BenchReadAllBytesParam.readAllBytes 32768 -1 thrpt 5 93083.146 ± 1024.309 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 32768 -1 thrpt 5 104398.304 ± 1310.509 ops/s
BenchReadAllBytesParam.readAllBytes 32768 0 thrpt 5 151104.878 ± 3461.359 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 32768 0 thrpt 5 104849.088 ± 3841.224 ops/s
BenchReadAllBytesParam.readAllBytes 32768 1 thrpt 5 58039.827 ± 398.908 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 32768 1 thrpt 5 104489.685 ± 2118.496 ops/s
BenchReadAllBytesParam.readAllBytes 65536 -1 thrpt 5 43144.695 ± 440.349 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 65536 -1 thrpt 5 55583.338 ± 2115.162 ops/s
BenchReadAllBytesParam.readAllBytes 65536 0 thrpt 5 67216.536 ± 2055.057 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 65536 0 thrpt 5 55680.238 ± 1235.571 ops/s
BenchReadAllBytesParam.readAllBytes 65536 1 thrpt 5 27908.000 ± 568.473 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 65536 1 thrpt 5 55938.779 ± 813.991 ops/s
BenchReadAllBytesParam.readAllBytes 131072 -1 thrpt 5 20299.014 ± 568.616 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 131072 -1 thrpt 5 28115.036 ± 842.392 ops/s
BenchReadAllBytesParam.readAllBytes 131072 0 thrpt 5 31617.475 ± 467.378 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 131072 0 thrpt 5 28274.289 ± 259.699 ops/s
BenchReadAllBytesParam.readAllBytes 131072 1 thrpt 5 13640.406 ± 303.125 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 131072 1 thrpt 5 28221.298 ± 515.030 ops/s
BenchReadAllBytesParam.readAllBytes 262144 -1 thrpt 5 9995.183 ± 249.368 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 262144 -1 thrpt 5 14043.194 ± 138.026 ops/s
BenchReadAllBytesParam.readAllBytes 262144 0 thrpt 5 15309.238 ± 353.752 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 262144 0 thrpt 5 14048.569 ± 348.699 ops/s
BenchReadAllBytesParam.readAllBytes 262144 1 thrpt 5 5940.442 ± 206.855 ops/s
BenchReadAllBytesParam.readAllBytesArrayList 262144 1 thrpt 5 14055.357 ± 412.470 ops/s
In each case the number of bytes read is the sum of the base and offset parameters. Similar behavior with respect to data length is observed for performance as for memory usage: if the data length is not equal to the size of the last intermediate buffer used, then the performance of the old version is in general worse than that of the new.
Thanks,
Brian
More information about the core-libs-dev
mailing list