RFR 8193832: Performance of InputStream.readAllBytes() could be improved

Wed Dec 20 12:40:32 UTC 2017

Hi Brian,

I found another improvement. If you are reading from files, there's no 
difference. But if you read from say socket(s), there may be short reads 
(read() returning 0). Your current code bails out from inner loop when 
this happens and consequently does not fill the buf up to the top when 
it could fill it if it retried the inner loop. This is a variant where 
inner loop guarantees that either buf is filled to the top or stream is 
at EOF:

     public byte[] readAllBytes() throws IOException {
         List<byte[]> bufs = null;
         byte[] result = null;
         byte[] buf = new byte[DEFAULT_BUFFER_SIZE];
         int total = 0;
         int remaining; // # of bytes remaining to fill buf
         do {
             remaining = buf.length;
             int n;
             // read to fill the buf or until EOF. Loop ends when either:
             // - remaining == 0 and there's possibly more to be read 
from stream; or
             // - remaining > 0 and the stream is at EOF
             while (remaining > 0 &&
                    (n = read(buf, buf.length - remaining, remaining)) 
 >= 0) {
                 remaining -= n;
             }

             int nread = buf.length - remaining;
             if (nread > 0) {
                 if (MAX_BUFFER_SIZE - total < nread) {
                     throw new OutOfMemoryError("Required array size too 
large");
                 }
                 total += nread;
                 byte[] copy;
                 if (remaining == 0) { // buf is filled to the top
                     copy = buf;
                     buf = new byte[DEFAULT_BUFFER_SIZE];
                 } else {
                     copy = Arrays.copyOf(buf, nread);
                 }
                 if (result == null) {
                     result = copy;
                 } else {
                     bufs = new ArrayList<>(8);
                     bufs.add(result);
                     bufs.add(copy);
                 }
             }
         } while (remaining == 0); // there may be more bytes in stream

         if (bufs == null) {
             return result == null ? new byte[0] : result;
         }

         result = new byte[total];
         int offset = 0;
         for (byte[] b : bufs) {
             System.arraycopy(b, 0, result, offset, b.length);
             offset += b.length;
         }

         return result;
     }

Regards, Peter

On 12/20/2017 12:59 PM, Peter Levart wrote:
> Hi Brian,
>
> There's also a variation of copy-ing fragment possible, that replaces 
> copying with allocation:
>
>                 byte[] copy;
>                 if (nread == DEFAULT_BUFFER_SIZE) {
>                     copy = buf;
>                     if (n >= 0) {
>                         buf = new byte[DEFAULT_BUFFER_SIZE];
>                     }
>                 } else {
>                     copy = Arrays.copyOf(buf, nread);
>                 }
>
> For big FileInputStream(s), the buf will be fully read (nread == 
> DEFAULT_BUFFER_SIZE) most of the times. So this might be an 
> improvement if allocation (involving pre-zeroing) is faster than 
> Arrays.copyOf() which avoids pre-zeroing, but involves copying.
>
> Regards, Peter
>
> On 12/20/2017 12:45 PM, Peter Levart wrote:
>> Hi Brian,
>>
>> On 12/20/2017 12:22 AM, Brian Burkhalter wrote:
>>> On Dec 19, 2017, at 2:36 PM, Brian Burkhalter 
>>> <brian.burkhalter at oracle.com> wrote:
>>>
>>>>> You can also simplify the “for(;;) + break" into a do while loop:
>>>>>
>>>>> do {
>>>>>   int nread = 0;
>>>>>   ...
>>>>> } while (n > 0);
>>>> Good suggestion but I think that this needs to be "while (n >= 0)."
>>> Updated version here:
>>>
>>> http://cr.openjdk.java.net/~bpb/8193832/webrev.02/
>>>
>>> Thanks,
>>>
>>> Brian
>>
>> Looks good. There is one case that could be further optimized. When 
>> result.length <= DEFAULT_BUFFER_SIZE, the allocation of ArrayList 
>> could be avoided. Imagine a use case where lots of small files are 
>> read into byte[] arrays. For exmaple:
>>
>>     public byte[] readAllBytes() throws IOException {
>>         List<byte[]> bufs = null;
>>         byte[] result = null;
>>         byte[] buf = new byte[DEFAULT_BUFFER_SIZE];
>>         int total = 0;
>>         int n;
>>         do {
>>             int nread = 0;
>>
>>             // read to EOF which may read more or less than buffer size
>>             while ((n = read(buf, nread, buf.length - nread)) > 0) {
>>                 nread += n;
>>             }
>>
>>             if (nread > 0) {
>>                 if (MAX_BUFFER_SIZE - total < nread) {
>>                     throw new OutOfMemoryError("Required array size 
>> too large");
>>                 }
>>                 total += nread;
>>                 byte[] copy = (n < 0 && nread == DEFAULT_BUFFER_SIZE) ?
>>                     buf : Arrays.copyOf(buf, nread);
>>                 if (result == null) {
>>                     result = copy;
>>                 } else {
>>                     bufs = new ArrayList<>(8);
>>                     bufs.add(result);
>>                     bufs.add(copy);
>>                 }
>>             }
>>         } while (n >= 0); // if the last call to read returned -1, 
>> then break
>>
>>         if (bufs == null) {
>>             return result == null ? new byte[0] : result;
>>         }
>>
>>         result = new byte[total];
>>         int offset = 0;
>>         for (byte[] b : bufs) {
>>             System.arraycopy(b, 0, result, offset, b.length);
>>             offset += b.length;
>>         }
>>
>>         return result;
>>     }
>>
>>
>> There is a possibility that JIT already avoids allocating ArrayList 
>> utilizing EA if all involved ArrayList methods inline, so this 
>> potential optimization should be tested 1st to see if it actually 
>> helps improve the "small file" case.
>>
>> Regards, Peter
>>
>