RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

roger riggs roger.riggs at oracle.com
Thu Jul 25 13:42:22 UTC 2013


Hi Ivan,

Thank you for your diligence.

1) Should the test use an alternate mechanism to read the file 
(FileInputStream)
and confirm the length of the array?

2) There is an edge case where the file size is between MAX_ARRAY_SIZE
and Integer.MAX_VALUE that should be avoided. The lines at L3022 are 
confusing.
If the max array size is MAX_BUFFER_SIZE then it should be used
also at L3063 in readAllBytes.

L3015-3026: The logic around the 3 nested if's is a bit confusing for 
the actions being taken.
By switching to a long for newCapacity this can be easier to read.

   if (capacity == MAX_INTEGER) {
        throw ...
   }
   long newCapacity = ((long)capacity) << 1;
   newCapacity = Math.max(newCapacity, BUFFER_SIZE);       // at least BUFFER_SIZE
   newCapacity = Math.min(newCapacity, MAX_BUFFER_SIZE);   // at most MAX_BUFFER_SIZE
   capacity = (int) newCapacity;

    buf = Arrays.copyOf(buf, capacity);
   buf[nread++] = (byte)n;
   rem = buf.length - nread;

Thanks, Roger

BTW, I'm not an official reviewer

On 7/24/2013 7:47 PM, Ivan Gerasimov wrote:
> Hello everybody!
>
> Would you please review an updated webrev?
> http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/ 
> <http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>
>
> Thanks in advance,
> Ivan
>
>
> On 24.07.2013 23:36, Martin Buchholz wrote:
>> Use javadoc style even in private methods.
>> s/Read/Reads/
>> Use @param initialSize
>>       /**
>> +     * Read all the bytes from an input stream. The {@code initialSize}
>> +     * parameter indicates the initial size of the byte[] to allocate.
>> +     */
>> ---
>>
>> This needs to be tested for an ordinary zero-length file.  It looks 
>> like for the zero case
>> +            int newCapacity = capacity << 1;
>> will infloop not actually increasing the buffer.
>>
>> ---
>>
>> You might want to copy this technique from ArrayList et al:
>>
>>
>>     /**
>>      * The maximum size of array to allocate.
>>      * Some VMs reserve some header words in an array.
>>      * Attempts to allocate larger arrays may result in
>>      * OutOfMemoryError: Requested array size exceeds VM limit
>>      */
>>     private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
>>
>>
>>
>>
>> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov 
>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>>
>>     Would you please take a look at the updated webrev?
>>     http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
>>     <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>
>>
>>     readAllBytes() was recently (in b93) changed by Alan Bateman to
>>     fix 8014928.
>>
>>     Here's what I've done:
>>     - reverted readAllBytes() to its implementation prior b93.
>>     - modified it to address both 8020669 and 8014928.
>>
>>     http://bugs.sun.com/view_bug.do?bug_id=8020669
>>     <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>>     http://bugs.sun.com/view_bug.do?bug_id=8014928
>>
>>     Sincerely yours,
>>     Ivan
>>
>>
>>     On 23.07.2013 18:09, David M. Lloyd wrote:
>>
>>         Here's how it should behave:
>>
>>         - allocate 'size' byte byte array
>>         - if size > 0:
>>           - use simple old I/O to read into the array
>>         - do a one-byte read(), if not EOF then expand the array,
>>         using a simple growth pattern like 3/2 (with a special case
>>         for 0), and continue reading until EOF
>>         - if the array matches the size of the file, return the
>>         array, else use copyOf() to shrink it
>>
>>         This way you only ever copy the array size() was wrong.
>>
>>         On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>
>>             Hi Roger!
>>
>>             This is how my implementation behaves:
>>             - allocate 'size' bytes in BAOS
>>             - allocate 8k for temp buffer
>>             - in cycle read 8k or less bytes from input stream and
>>             copy them into BAOS
>>             - if capacity of BAOS isn't sufficient (file had grown),
>>             its buffer will
>>             be reallocated
>>             Thus, 1 reallocation and 1 copying of already read data
>>             on each 8k piece
>>             of additional bytes.
>>
>>             In normal case, i.e. when fc.size() is correct, we have
>>             overhead of 1
>>             allocation and copying 'size' bytes in size/8k iterations.
>>
>>             And this is how current implementation does
>>             - allocate 'size' bytes
>>             - allocate 'size' bytes of native memory for temp buffer
>>             in IOUtil.read()
>>             - read the whole file into temp buffer
>>             - copy the temp buffer back into our buffer
>>
>>             In common when fc.size() is right, we have 1 allocation
>>             and copying
>>             'size' bytes from temp buffer back.
>>
>>             So there is a difference in allocations/copying, but in
>>             my opinion it's
>>             not that significant for this particular task.
>>
>>             Sincerely yours,
>>             Ivan
>>
>>             On 22.07.2013 20:03, roger riggs wrote:
>>
>>                 Hi Ivan,
>>
>>                 I'm concerned about the change in behavior for the
>>                 existing working
>>                 cases.
>>
>>                 How many times are the bytes copied in your proposed
>>                 implementation?
>>                 How many arrays are allocated and discarded?
>>                 Files.copy()  uses an extra array for the copies.
>>
>>                 BAOS should only be used for size == 0; that would
>>                 address the issue
>>                 without changing the current behavior or allocations.
>>
>>                 Roger
>>
>>
>>
>>
>>                 On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>
>>                     Roger, David thanks for suggestions!
>>
>>                     Would you please take a look at an updated webrev?
>>                     http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>                     <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>
>>
>>                     - File size is used as an initial size of BAOS's
>>                     buffer.
>>                     - BAOS avoids copying its buffer in
>>                     toByteArray(), if size is correct .
>>
>>                     I don't want to initialize BAOS with a positive
>>                     number if size
>>                     happened to be zero.
>>                     Because zero could indicate that the file is
>>                     really empty.
>>
>>                     Sincerely yours,
>>                     Ivan
>>
>>                     On 19.07.2013 22:30, David M. Lloyd wrote:
>>
>>                         My mistake, we're not talking about strings.
>>                          Still you can subclass
>>                         and determine whether the buffer size guess
>>                         was right, and if so
>>                         return the array as-is (swap in an empty
>>                         array or something as needed).
>>
>>                         On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>
>>                             It's trivial to subclass
>>                             ByteArrayOutputStream and add a method which
>>                             converts its contents to a string using
>>                             the two protected fields which
>>                             give you all the info you need to do so.
>>                              So no extra copy is needed
>>                             that you aren't already doing.
>>
>>                             On 07/19/2013 01:06 PM, roger riggs wrote:
>>
>>                                 Hi Ivan,
>>
>>                                 I think this change takes too big a
>>                                 hit for the cases where the
>>                                 size is
>>                                 correct.
>>
>>                                 No real file system can be wrong
>>                                 about the size of a file so this
>>                                 is a
>>                                 problem
>>                                 only for special file systems. If the
>>                                 problem is with size
>>                                 reporting zero
>>                                 then maybe using the incremental read
>>                                 only for size == would be a
>>                                 better
>>                                 fix.
>>
>>                                 At least you could pass the size to
>>                                 the constructor for BAOS and
>>                                 avoid
>>                                 the thrashing for every re-size; but
>>                                 still it will allocate and
>>                                 create
>>                                 an extra copy
>>                                 of the every time.
>>
>>                                 $.02, Roger
>>
>>
>>                                 On 7/19/2013 1:15 PM, Ivan Gerasimov
>>                                 wrote:
>>
>>                                     Hello everybody!
>>
>>                                     Would you please review a fix for
>>                                     the problem with
>>                                     j.n.f.Files.readAllBytes() function?
>>                                     The current implementation relies
>>                                     on FileChannel.size() to
>>                                     preallocate
>>                                     a buffer for the whole file's
>>                                     content.
>>                                     However, some special filesystems
>>                                     can report a wrong size.
>>                                     An example is procfs under Linux,
>>                                     which reports many files under
>>                                     /proc
>>                                     to be zero sized.
>>
>>                                     Thus it is proposed not to rely
>>                                     on the size() and instead
>>                                     continuously
>>                                     read until EOF.
>>
>>                                     The downside is reallocation and
>>                                     copying file content between
>>                                     buffers.
>>                                     But taking into account that the
>>                                     doc says: "It is not intended for
>>                                     reading in large files." it
>>                                     should not be a big problem.
>>
>>                                     http://bugs.sun.com/view_bug.do?bug_id=8020669
>>                                     http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>                                     <http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>
>>
>>                                     The fix is for JDK8. If it is
>>                                     approved, it can be applied to
>>                                     JDK7 as
>>                                     well.
>>
>>                                     Sincerely yours,
>>                                     Ivan Gerasimov
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>




More information about the core-libs-dev mailing list