RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Thu Jul 25 23:56:18 UTC 2013

Hi Roger!

On 25.07.2013 17:42, roger riggs wrote:
> Hi Ivan,
>
> Thank you for your diligence.

Thank you for your patience :)

> 1) Should the test use an alternate mechanism to read the file 
> (FileInputStream)
> and confirm the length of the array?

This file can change from read to read. But it should not be empty, and 
that's what we check.
I moved the test from a separate file to BytesAndLines.java.
I've also added testing of how Files.readAllLines() reads from procfs 
files (it already does well.)

> 2) There is an edge case where the file size is between MAX_ARRAY_SIZE
> and Integer.MAX_VALUE that should be avoided.

Yes, you're right.
I rewrote the logic in a simpler way with no overflowing and casting to 
long.

Would you please help review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/4/webrev/

Sincerely yours,
Ivan

> The lines at L3022 are confusing.
> If the max array size is MAX_BUFFER_SIZE then it should be used
> also at L3063 in readAllBytes.
>
> L3015-3026: The logic around the 3 nested if's is a bit confusing for 
> the actions being taken.
> By switching to a long for newCapacity this can be easier to read.
>
>    if (capacity == MAX_INTEGER) {
>         throw ...
>    }
>    long newCapacity = ((long)capacity) << 1;
>    newCapacity = Math.max(newCapacity, BUFFER_SIZE);       // at least BUFFER_SIZE
>    newCapacity = Math.min(newCapacity, MAX_BUFFER_SIZE);   // at most MAX_BUFFER_SIZE
>    capacity = (int) newCapacity;
>
>     buf = Arrays.copyOf(buf, capacity);
>    buf[nread++] = (byte)n;
>    rem = buf.length - nread;
> Thanks, Roger
>
> BTW, I'm not an official reviewer
>
> On 7/24/2013 7:47 PM, Ivan Gerasimov wrote:
>> Hello everybody!
>>
>> Would you please review an updated webrev?
>> http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/ 
>> <http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>
>>
>> Thanks in advance,
>> Ivan
>>
>>
>> On 24.07.2013 23:36, Martin Buchholz wrote:
>>> Use javadoc style even in private methods.
>>> s/Read/Reads/
>>> Use @param initialSize
>>>       /**
>>> +     * Read all the bytes from an input stream. The {@code initialSize}
>>> +     * parameter indicates the initial size of the byte[] to allocate.
>>> +     */
>>> ---
>>>
>>> This needs to be tested for an ordinary zero-length file.  It looks 
>>> like for the zero case
>>> +            int newCapacity = capacity << 1;
>>> will infloop not actually increasing the buffer.
>>>
>>> ---
>>>
>>> You might want to copy this technique from ArrayList et al:
>>>
>>>
>>>     /**
>>>      * The maximum size of array to allocate.
>>>      * Some VMs reserve some header words in an array.
>>>      * Attempts to allocate larger arrays may result in
>>>      * OutOfMemoryError: Requested array size exceeds VM limit
>>>      */
>>>     private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
>>>
>>>
>>>
>>>
>>> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov 
>>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>>>
>>>     Would you please take a look at the updated webrev?
>>>     http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
>>>     <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>
>>>
>>>     readAllBytes() was recently (in b93) changed by Alan Bateman to
>>>     fix 8014928.
>>>
>>>     Here's what I've done:
>>>     - reverted readAllBytes() to its implementation prior b93.
>>>     - modified it to address both 8020669 and 8014928.
>>>
>>>     http://bugs.sun.com/view_bug.do?bug_id=8020669
>>>     <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>>>     http://bugs.sun.com/view_bug.do?bug_id=8014928
>>>
>>>     Sincerely yours,
>>>     Ivan
>>>
>>>
>>>     On 23.07.2013 18:09, David M. Lloyd wrote:
>>>
>>>         Here's how it should behave:
>>>
>>>         - allocate 'size' byte byte array
>>>         - if size > 0:
>>>           - use simple old I/O to read into the array
>>>         - do a one-byte read(), if not EOF then expand the array,
>>>         using a simple growth pattern like 3/2 (with a special case
>>>         for 0), and continue reading until EOF
>>>         - if the array matches the size of the file, return the
>>>         array, else use copyOf() to shrink it
>>>
>>>         This way you only ever copy the array size() was wrong.
>>>
>>>         On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>>
>>>             Hi Roger!
>>>
>>>             This is how my implementation behaves:
>>>             - allocate 'size' bytes in BAOS
>>>             - allocate 8k for temp buffer
>>>             - in cycle read 8k or less bytes from input stream and
>>>             copy them into BAOS
>>>             - if capacity of BAOS isn't sufficient (file had grown),
>>>             its buffer will
>>>             be reallocated
>>>             Thus, 1 reallocation and 1 copying of already read data
>>>             on each 8k piece
>>>             of additional bytes.
>>>
>>>             In normal case, i.e. when fc.size() is correct, we have
>>>             overhead of 1
>>>             allocation and copying 'size' bytes in size/8k iterations.
>>>
>>>             And this is how current implementation does
>>>             - allocate 'size' bytes
>>>             - allocate 'size' bytes of native memory for temp buffer
>>>             in IOUtil.read()
>>>             - read the whole file into temp buffer
>>>             - copy the temp buffer back into our buffer
>>>
>>>             In common when fc.size() is right, we have 1 allocation
>>>             and copying
>>>             'size' bytes from temp buffer back.
>>>
>>>             So there is a difference in allocations/copying, but in
>>>             my opinion it's
>>>             not that significant for this particular task.
>>>
>>>             Sincerely yours,
>>>             Ivan
>>>
>>>             On 22.07.2013 20:03, roger riggs wrote:
>>>
>>>                 Hi Ivan,
>>>
>>>                 I'm concerned about the change in behavior for the
>>>                 existing working
>>>                 cases.
>>>
>>>                 How many times are the bytes copied in your proposed
>>>                 implementation?
>>>                 How many arrays are allocated and discarded?
>>>                 Files.copy()  uses an extra array for the copies.
>>>
>>>                 BAOS should only be used for size == 0; that would
>>>                 address the issue
>>>                 without changing the current behavior or allocations.
>>>
>>>                 Roger
>>>
>>>
>>>
>>>
>>>                 On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>
>>>                     Roger, David thanks for suggestions!
>>>
>>>                     Would you please take a look at an updated webrev?
>>>                     http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>>                     <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>
>>>
>>>                     - File size is used as an initial size of BAOS's
>>>                     buffer.
>>>                     - BAOS avoids copying its buffer in
>>>                     toByteArray(), if size is correct .
>>>
>>>                     I don't want to initialize BAOS with a positive
>>>                     number if size
>>>                     happened to be zero.
>>>                     Because zero could indicate that the file is
>>>                     really empty.
>>>
>>>                     Sincerely yours,
>>>                     Ivan
>>>
>>>                     On 19.07.2013 22:30, David M. Lloyd wrote:
>>>
>>>                         My mistake, we're not talking about strings.
>>>                          Still you can subclass
>>>                         and determine whether the buffer size guess
>>>                         was right, and if so
>>>                         return the array as-is (swap in an empty
>>>                         array or something as needed).
>>>
>>>                         On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>
>>>                             It's trivial to subclass
>>>                             ByteArrayOutputStream and add a method which
>>>                             converts its contents to a string using
>>>                             the two protected fields which
>>>                             give you all the info you need to do so.
>>>                              So no extra copy is needed
>>>                             that you aren't already doing.
>>>
>>>                             On 07/19/2013 01:06 PM, roger riggs wrote:
>>>
>>>                                 Hi Ivan,
>>>
>>>                                 I think this change takes too big a
>>>                                 hit for the cases where the
>>>                                 size is
>>>                                 correct.
>>>
>>>                                 No real file system can be wrong
>>>                                 about the size of a file so this
>>>                                 is a
>>>                                 problem
>>>                                 only for special file systems. If
>>>                                 the problem is with size
>>>                                 reporting zero
>>>                                 then maybe using the incremental
>>>                                 read only for size == would be a
>>>                                 better
>>>                                 fix.
>>>
>>>                                 At least you could pass the size to
>>>                                 the constructor for BAOS and
>>>                                 avoid
>>>                                 the thrashing for every re-size; but
>>>                                 still it will allocate and
>>>                                 create
>>>                                 an extra copy
>>>                                 of the every time.
>>>
>>>                                 $.02, Roger
>>>
>>>
>>>                                 On 7/19/2013 1:15 PM, Ivan Gerasimov
>>>                                 wrote:
>>>
>>>                                     Hello everybody!
>>>
>>>                                     Would you please review a fix
>>>                                     for the problem with
>>>                                     j.n.f.Files.readAllBytes() function?
>>>                                     The current implementation
>>>                                     relies on FileChannel.size() to
>>>                                     preallocate
>>>                                     a buffer for the whole file's
>>>                                     content.
>>>                                     However, some special
>>>                                     filesystems can report a wrong size.
>>>                                     An example is procfs under
>>>                                     Linux, which reports many files
>>>                                     under
>>>                                     /proc
>>>                                     to be zero sized.
>>>
>>>                                     Thus it is proposed not to rely
>>>                                     on the size() and instead
>>>                                     continuously
>>>                                     read until EOF.
>>>
>>>                                     The downside is reallocation and
>>>                                     copying file content between
>>>                                     buffers.
>>>                                     But taking into account that the
>>>                                     doc says: "It is not intended for
>>>                                     reading in large files." it
>>>                                     should not be a big problem.
>>>
>>>                                     http://bugs.sun.com/view_bug.do?bug_id=8020669
>>>                                     http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>>                                     <http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>
>>>
>>>                                     The fix is for JDK8. If it is
>>>                                     approved, it can be applied to
>>>                                     JDK7 as
>>>                                     well.
>>>
>>>                                     Sincerely yours,
>>>                                     Ivan Gerasimov
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>