RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Wed Jul 24 23:47:21 UTC 2013

Hello everybody!

Would you please review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/ 
<http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>

Thanks in advance,
Ivan

On 24.07.2013 23:36, Martin Buchholz wrote:
> Use javadoc style even in private methods.
> s/Read/Reads/
> Use @param initialSize
>       /**
> +     * Read all the bytes from an input stream. The {@code initialSize}
> +     * parameter indicates the initial size of the byte[] to allocate.
> +     */
> ---
>
> This needs to be tested for an ordinary zero-length file.  It looks 
> like for the zero case
> +            int newCapacity = capacity << 1;
> will infloop not actually increasing the buffer.
>
> ---
>
> You might want to copy this technique from ArrayList et al:
>
>
>     /**
>      * The maximum size of array to allocate.
>      * Some VMs reserve some header words in an array.
>      * Attempts to allocate larger arrays may result in
>      * OutOfMemoryError: Requested array size exceeds VM limit
>      */
>     private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
>
>
>
>
> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov 
> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>
>     Would you please take a look at the updated webrev?
>     http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
>     <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>
>
>     readAllBytes() was recently (in b93) changed by Alan Bateman to
>     fix 8014928.
>
>     Here's what I've done:
>     - reverted readAllBytes() to its implementation prior b93.
>     - modified it to address both 8020669 and 8014928.
>
>     http://bugs.sun.com/view_bug.do?bug_id=8020669
>     <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>     http://bugs.sun.com/view_bug.do?bug_id=8014928
>
>     Sincerely yours,
>     Ivan
>
>
>     On 23.07.2013 18:09, David M. Lloyd wrote:
>
>         Here's how it should behave:
>
>         - allocate 'size' byte byte array
>         - if size > 0:
>           - use simple old I/O to read into the array
>         - do a one-byte read(), if not EOF then expand the array,
>         using a simple growth pattern like 3/2 (with a special case
>         for 0), and continue reading until EOF
>         - if the array matches the size of the file, return the array,
>         else use copyOf() to shrink it
>
>         This way you only ever copy the array size() was wrong.
>
>         On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>
>             Hi Roger!
>
>             This is how my implementation behaves:
>             - allocate 'size' bytes in BAOS
>             - allocate 8k for temp buffer
>             - in cycle read 8k or less bytes from input stream and
>             copy them into BAOS
>             - if capacity of BAOS isn't sufficient (file had grown),
>             its buffer will
>             be reallocated
>             Thus, 1 reallocation and 1 copying of already read data on
>             each 8k piece
>             of additional bytes.
>
>             In normal case, i.e. when fc.size() is correct, we have
>             overhead of 1
>             allocation and copying 'size' bytes in size/8k iterations.
>
>             And this is how current implementation does
>             - allocate 'size' bytes
>             - allocate 'size' bytes of native memory for temp buffer
>             in IOUtil.read()
>             - read the whole file into temp buffer
>             - copy the temp buffer back into our buffer
>
>             In common when fc.size() is right, we have 1 allocation
>             and copying
>             'size' bytes from temp buffer back.
>
>             So there is a difference in allocations/copying, but in my
>             opinion it's
>             not that significant for this particular task.
>
>             Sincerely yours,
>             Ivan
>
>             On 22.07.2013 20:03, roger riggs wrote:
>
>                 Hi Ivan,
>
>                 I'm concerned about the change in behavior for the
>                 existing working
>                 cases.
>
>                 How many times are the bytes copied in your proposed
>                 implementation?
>                 How many arrays are allocated and discarded?
>                 Files.copy()  uses an extra array for the copies.
>
>                 BAOS should only be used for size == 0; that would
>                 address the issue
>                 without changing the current behavior or allocations.
>
>                 Roger
>
>
>
>
>                 On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>
>                     Roger, David thanks for suggestions!
>
>                     Would you please take a look at an updated webrev?
>                     http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>                     <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>
>
>                     - File size is used as an initial size of BAOS's
>                     buffer.
>                     - BAOS avoids copying its buffer in toByteArray(),
>                     if size is correct .
>
>                     I don't want to initialize BAOS with a positive
>                     number if size
>                     happened to be zero.
>                     Because zero could indicate that the file is
>                     really empty.
>
>                     Sincerely yours,
>                     Ivan
>
>                     On 19.07.2013 22:30, David M. Lloyd wrote:
>
>                         My mistake, we're not talking about strings.
>                          Still you can subclass
>                         and determine whether the buffer size guess
>                         was right, and if so
>                         return the array as-is (swap in an empty array
>                         or something as needed).
>
>                         On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>
>                             It's trivial to subclass
>                             ByteArrayOutputStream and add a method which
>                             converts its contents to a string using
>                             the two protected fields which
>                             give you all the info you need to do so.
>                              So no extra copy is needed
>                             that you aren't already doing.
>
>                             On 07/19/2013 01:06 PM, roger riggs wrote:
>
>                                 Hi Ivan,
>
>                                 I think this change takes too big a
>                                 hit for the cases where the
>                                 size is
>                                 correct.
>
>                                 No real file system can be wrong about
>                                 the size of a file so this
>                                 is a
>                                 problem
>                                 only for special file systems. If the
>                                 problem is with size
>                                 reporting zero
>                                 then maybe using the incremental read
>                                 only for size == would be a
>                                 better
>                                 fix.
>
>                                 At least you could pass the size to
>                                 the constructor for BAOS and
>                                 avoid
>                                 the thrashing for every re-size; but
>                                 still it will allocate and
>                                 create
>                                 an extra copy
>                                 of the every time.
>
>                                 $.02, Roger
>
>
>                                 On 7/19/2013 1:15 PM, Ivan Gerasimov
>                                 wrote:
>
>                                     Hello everybody!
>
>                                     Would you please review a fix for
>                                     the problem with
>                                     j.n.f.Files.readAllBytes() function?
>                                     The current implementation relies
>                                     on FileChannel.size() to
>                                     preallocate
>                                     a buffer for the whole file's content.
>                                     However, some special filesystems
>                                     can report a wrong size.
>                                     An example is procfs under Linux,
>                                     which reports many files under
>                                     /proc
>                                     to be zero sized.
>
>                                     Thus it is proposed not to rely on
>                                     the size() and instead
>                                     continuously
>                                     read until EOF.
>
>                                     The downside is reallocation and
>                                     copying file content between
>                                     buffers.
>                                     But taking into account that the
>                                     doc says: "It is not intended for
>                                     reading in large files." it should
>                                     not be a big problem.
>
>                                     http://bugs.sun.com/view_bug.do?bug_id=8020669
>                                     http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>                                     <http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>
>
>                                     The fix is for JDK8. If it is
>                                     approved, it can be applied to JDK7 as
>                                     well.
>
>                                     Sincerely yours,
>                                     Ivan Gerasimov
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>