RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0
Ivan Gerasimov
ivan.gerasimov at oracle.com
Wed Jul 24 23:47:21 UTC 2013
Hello everybody!
Would you please review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/
<http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>
Thanks in advance,
Ivan
On 24.07.2013 23:36, Martin Buchholz wrote:
> Use javadoc style even in private methods.
> s/Read/Reads/
> Use @param initialSize
> /**
> + * Read all the bytes from an input stream. The {@code initialSize}
> + * parameter indicates the initial size of the byte[] to allocate.
> + */
> ---
>
> This needs to be tested for an ordinary zero-length file. It looks
> like for the zero case
> + int newCapacity = capacity << 1;
> will infloop not actually increasing the buffer.
>
> ---
>
> You might want to copy this technique from ArrayList et al:
>
>
> /**
> * The maximum size of array to allocate.
> * Some VMs reserve some header words in an array.
> * Attempts to allocate larger arrays may result in
> * OutOfMemoryError: Requested array size exceeds VM limit
> */
> private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
>
>
>
>
> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>
> Would you please take a look at the updated webrev?
> http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
> <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>
>
> readAllBytes() was recently (in b93) changed by Alan Bateman to
> fix 8014928.
>
> Here's what I've done:
> - reverted readAllBytes() to its implementation prior b93.
> - modified it to address both 8020669 and 8014928.
>
> http://bugs.sun.com/view_bug.do?bug_id=8020669
> <http://bugs.sun.com/view_bug.do?bug_id=8014928>
> http://bugs.sun.com/view_bug.do?bug_id=8014928
>
> Sincerely yours,
> Ivan
>
>
> On 23.07.2013 18:09, David M. Lloyd wrote:
>
> Here's how it should behave:
>
> - allocate 'size' byte byte array
> - if size > 0:
> - use simple old I/O to read into the array
> - do a one-byte read(), if not EOF then expand the array,
> using a simple growth pattern like 3/2 (with a special case
> for 0), and continue reading until EOF
> - if the array matches the size of the file, return the array,
> else use copyOf() to shrink it
>
> This way you only ever copy the array size() was wrong.
>
> On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>
> Hi Roger!
>
> This is how my implementation behaves:
> - allocate 'size' bytes in BAOS
> - allocate 8k for temp buffer
> - in cycle read 8k or less bytes from input stream and
> copy them into BAOS
> - if capacity of BAOS isn't sufficient (file had grown),
> its buffer will
> be reallocated
> Thus, 1 reallocation and 1 copying of already read data on
> each 8k piece
> of additional bytes.
>
> In normal case, i.e. when fc.size() is correct, we have
> overhead of 1
> allocation and copying 'size' bytes in size/8k iterations.
>
> And this is how current implementation does
> - allocate 'size' bytes
> - allocate 'size' bytes of native memory for temp buffer
> in IOUtil.read()
> - read the whole file into temp buffer
> - copy the temp buffer back into our buffer
>
> In common when fc.size() is right, we have 1 allocation
> and copying
> 'size' bytes from temp buffer back.
>
> So there is a difference in allocations/copying, but in my
> opinion it's
> not that significant for this particular task.
>
> Sincerely yours,
> Ivan
>
> On 22.07.2013 20:03, roger riggs wrote:
>
> Hi Ivan,
>
> I'm concerned about the change in behavior for the
> existing working
> cases.
>
> How many times are the bytes copied in your proposed
> implementation?
> How many arrays are allocated and discarded?
> Files.copy() uses an extra array for the copies.
>
> BAOS should only be used for size == 0; that would
> address the issue
> without changing the current behavior or allocations.
>
> Roger
>
>
>
>
> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>
> Roger, David thanks for suggestions!
>
> Would you please take a look at an updated webrev?
> http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
> <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>
>
> - File size is used as an initial size of BAOS's
> buffer.
> - BAOS avoids copying its buffer in toByteArray(),
> if size is correct .
>
> I don't want to initialize BAOS with a positive
> number if size
> happened to be zero.
> Because zero could indicate that the file is
> really empty.
>
> Sincerely yours,
> Ivan
>
> On 19.07.2013 22:30, David M. Lloyd wrote:
>
> My mistake, we're not talking about strings.
> Still you can subclass
> and determine whether the buffer size guess
> was right, and if so
> return the array as-is (swap in an empty array
> or something as needed).
>
> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>
> It's trivial to subclass
> ByteArrayOutputStream and add a method which
> converts its contents to a string using
> the two protected fields which
> give you all the info you need to do so.
> So no extra copy is needed
> that you aren't already doing.
>
> On 07/19/2013 01:06 PM, roger riggs wrote:
>
> Hi Ivan,
>
> I think this change takes too big a
> hit for the cases where the
> size is
> correct.
>
> No real file system can be wrong about
> the size of a file so this
> is a
> problem
> only for special file systems. If the
> problem is with size
> reporting zero
> then maybe using the incremental read
> only for size == would be a
> better
> fix.
>
> At least you could pass the size to
> the constructor for BAOS and
> avoid
> the thrashing for every re-size; but
> still it will allocate and
> create
> an extra copy
> of the every time.
>
> $.02, Roger
>
>
> On 7/19/2013 1:15 PM, Ivan Gerasimov
> wrote:
>
> Hello everybody!
>
> Would you please review a fix for
> the problem with
> j.n.f.Files.readAllBytes() function?
> The current implementation relies
> on FileChannel.size() to
> preallocate
> a buffer for the whole file's content.
> However, some special filesystems
> can report a wrong size.
> An example is procfs under Linux,
> which reports many files under
> /proc
> to be zero sized.
>
> Thus it is proposed not to rely on
> the size() and instead
> continuously
> read until EOF.
>
> The downside is reallocation and
> copying file content between
> buffers.
> But taking into account that the
> doc says: "It is not intended for
> reading in large files." it should
> not be a big problem.
>
> http://bugs.sun.com/view_bug.do?bug_id=8020669
> http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
> <http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>
>
> The fix is for JDK8. If it is
> approved, it can be applied to JDK7 as
> well.
>
> Sincerely yours,
> Ivan Gerasimov
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
More information about the core-libs-dev
mailing list