RFR 8193832: Performance of InputStream.readAllBytes() could be improved

Tue Dec 19 19:57:26 UTC 2017

https://bugs.openjdk.java.net/browse/JDK-8193832
http://cr.openjdk.java.net/~bpb/8193832/webrev.00/

The implementation of InputStream.readAllBytes() can be modified to be in general faster and to require less memory. The current implementation starts with an internal buffer of size 8192 bytes and doubles the size of the buffer each time more capacity is required resulting in a geometric progression in the buffer size. At the end if the buffer size is not exactly equal to the total number of bytes read, then another allocation equal to the total number read is performed. The amount of memory N required can be calculated for initial buffer size B and total bytes read L as

	M for L == B*(2^n)
N =
	M + L for L != B*(2^n)

where M = B*(2^(n + 1) - 1) and n = ceil(log2(L/B)).

An alternative implementation is not to increase the size of the internal buffer each time more capacity is needed, but to instead maintain a list of buffers of up to B bytes each and gather those into an output buffer at the end. If there is only a single buffer in the list then gathering is not needed and it can be returned directly. The amount of memory N required in this case is

	B + L for L <= B
N =
	B + 2*L for L > B

A comparison of memory usage for a number of lengths L with a buffer size B of 8192 is:

L     	   N_old      N_new       
8191       16383      16383     
8192       8192       16384     
8193       32769      24578     
16383      40959      40958     
16384      24576      40960     
16385      73729      40962     
32767      90111      73726     
32768      57344      73728     
32769      155649     73730     
65535      188415     139262    
65536      122880     139264    
65537      319489     139266    
131071     385023     270334    
131072     253952     270336    
131073     647169     270338    
262143     778239     532478    
262144     516096     532480    
262145     1302529    532482

In general if the size of the last intermediate buffer in the old version does not equal the total number of bytes read, then the old version will require more memory thaw the new. The performance for the same set of lengths was measured using JMH:

Benchmark                                     (base)  (offset)   Mode  Cnt        Score       Error  Units
BenchReadAllBytesParam.readAllBytes             8192        -1  thrpt    5   578064.253 ± 18955.667  ops/s
BenchReadAllBytesParam.readAllBytesArrayList    8192        -1  thrpt    5   530963.634 ±  4799.923  ops/s
BenchReadAllBytesParam.readAllBytes             8192         0  thrpt    5  1414243.593 ± 68520.245  ops/s
BenchReadAllBytesParam.readAllBytesArrayList    8192         0  thrpt    5   532019.149 ±  7051.738  ops/s
BenchReadAllBytesParam.readAllBytes             8192         1  thrpt    5   293586.365 ±  8196.939  ops/s
BenchReadAllBytesParam.readAllBytesArrayList    8192         1  thrpt    5   361682.265 ± 27081.111  ops/s
BenchReadAllBytesParam.readAllBytes            16384        -1  thrpt    5   216809.517 ±  4853.852  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   16384        -1  thrpt    5   212346.244 ±  2980.916  ops/s
BenchReadAllBytesParam.readAllBytes            16384         0  thrpt    5   379757.470 ± 14524.249  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   16384         0  thrpt    5   218802.256 ±  4361.080  ops/s
BenchReadAllBytesParam.readAllBytes            16384         1  thrpt    5   123570.712 ±  1665.661  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   16384         1  thrpt    5   208801.577 ±  8005.196  ops/s
BenchReadAllBytesParam.readAllBytes            32768        -1  thrpt    5    93083.146 ±  1024.309  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   32768        -1  thrpt    5   104398.304 ±  1310.509  ops/s
BenchReadAllBytesParam.readAllBytes            32768         0  thrpt    5   151104.878 ±  3461.359  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   32768         0  thrpt    5   104849.088 ±  3841.224  ops/s
BenchReadAllBytesParam.readAllBytes            32768         1  thrpt    5    58039.827 ±   398.908  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   32768         1  thrpt    5   104489.685 ±  2118.496  ops/s
BenchReadAllBytesParam.readAllBytes            65536        -1  thrpt    5    43144.695 ±   440.349  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   65536        -1  thrpt    5    55583.338 ±  2115.162  ops/s
BenchReadAllBytesParam.readAllBytes            65536         0  thrpt    5    67216.536 ±  2055.057  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   65536         0  thrpt    5    55680.238 ±  1235.571  ops/s
BenchReadAllBytesParam.readAllBytes            65536         1  thrpt    5    27908.000 ±   568.473  ops/s
BenchReadAllBytesParam.readAllBytesArrayList   65536         1  thrpt    5    55938.779 ±   813.991  ops/s
BenchReadAllBytesParam.readAllBytes           131072        -1  thrpt    5    20299.014 ±   568.616  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  131072        -1  thrpt    5    28115.036 ±   842.392  ops/s
BenchReadAllBytesParam.readAllBytes           131072         0  thrpt    5    31617.475 ±   467.378  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  131072         0  thrpt    5    28274.289 ±   259.699  ops/s
BenchReadAllBytesParam.readAllBytes           131072         1  thrpt    5    13640.406 ±   303.125  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  131072         1  thrpt    5    28221.298 ±   515.030  ops/s
BenchReadAllBytesParam.readAllBytes           262144        -1  thrpt    5     9995.183 ±   249.368  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  262144        -1  thrpt    5    14043.194 ±   138.026  ops/s
BenchReadAllBytesParam.readAllBytes           262144         0  thrpt    5    15309.238 ±   353.752  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  262144         0  thrpt    5    14048.569 ±   348.699  ops/s
BenchReadAllBytesParam.readAllBytes           262144         1  thrpt    5     5940.442 ±   206.855  ops/s
BenchReadAllBytesParam.readAllBytesArrayList  262144         1  thrpt    5    14055.357 ±   412.470  ops/s

In each case the number of bytes read is the sum of the base and offset parameters. Similar behavior with respect to data length is observed for performance as for memory usage: if the data length is not equal to the size of the last intermediate buffer used, then the performance of the old version is in general worse than that of the new.

Thanks,

Brian