RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Wed Jul 24 21:59:31 UTC 2013

Hello Roger!

On 25.07.2013 0:49, roger riggs wrote:
> It looks like under the covers the ChannelInputStream uses a 1-byte 
> buffer
> but caches it and re-uses it.  That's slightly better than allocating 
> a new one every time.

Thanks, I'll change it.

> Also, I don't see much difference between the original use (before the 
> Alan's previous change)
> of Files.newInputStream(path) vs the compound try in readAllBytes().
> That might also be reverted to its previous state as well.

The previous implementation reads the file size before opening it.
And now the file is first opened and then its size is read, so we're 
sure we read size of the correct file.
Basically it is stat() vs fstat(). I think the former is better, because 
the name of file can change between reading size and opening.

Sincerely yours,
Ivan

> Roger
>
>
>
> On 7/24/2013 4:40 PM, David M. Lloyd wrote:
>> I suspect the reason this was done is because the stream is actually 
>> a ChannelInputStream.  If you don't do the 1-byte array thing, the 
>> ChannelInputStream will construct an entire ByteBuffer to read into, 
>> which is even worse.  As it is, it'll create a wrapper ByteBuffer to 
>> encapsulate the destination array, which is bad enough.
>>
>> That's why I was saying that it would be nice if this could get a 
>> FileInputStream, which afaik can read directly without messing around 
>> with buffers.
>>
>> On 07/24/2013 02:24 PM, Martin Buchholz wrote:
>>> It's wasteful to create a 1-byte array to read into.  Just use the
>>> nullary read method.
>>> http://download.java.net/jdk8/docs/api/java/io/InputStream.html#read()
>>>
>>>
>>> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
>>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>>>
>>>     Would you please take a look at the updated webrev?
>>> http://cr.openjdk.java.net/~__igerasim/8020669/2/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/>
>>>
>>>     readAllBytes() was recently (in b93) changed by Alan Bateman to fix
>>>     8014928.
>>>
>>>     Here's what I've done:
>>>     - reverted readAllBytes() to its implementation prior b93.
>>>     - modified it to address both 8020669 and 8014928.
>>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8020669
>>> <http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>>     <http://bugs.sun.com/view_bug.__do?bug_id=8014928
>>> <http://bugs.sun.com/view_bug.do?bug_id=8014928>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8014928
>>> <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>>>
>>>     Sincerely yours,
>>>     Ivan
>>>
>>>
>>>     On 23.07.2013 18:09, David M. Lloyd wrote:
>>>
>>>         Here's how it should behave:
>>>
>>>         - allocate 'size' byte byte array
>>>         - if size > 0:
>>>            - use simple old I/O to read into the array
>>>         - do a one-byte read(), if not EOF then expand the array, using
>>>         a simple growth pattern like 3/2 (with a special case for 0),
>>>         and continue reading until EOF
>>>         - if the array matches the size of the file, return the array,
>>>         else use copyOf() to shrink it
>>>
>>>         This way you only ever copy the array size() was wrong.
>>>
>>>         On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>>
>>>             Hi Roger!
>>>
>>>             This is how my implementation behaves:
>>>             - allocate 'size' bytes in BAOS
>>>             - allocate 8k for temp buffer
>>>             - in cycle read 8k or less bytes from input stream and copy
>>>             them into BAOS
>>>             - if capacity of BAOS isn't sufficient (file had grown), 
>>> its
>>>             buffer will
>>>             be reallocated
>>>             Thus, 1 reallocation and 1 copying of already read data on
>>>             each 8k piece
>>>             of additional bytes.
>>>
>>>             In normal case, i.e. when fc.size() is correct, we have
>>>             overhead of 1
>>>             allocation and copying 'size' bytes in size/8k iterations.
>>>
>>>             And this is how current implementation does
>>>             - allocate 'size' bytes
>>>             - allocate 'size' bytes of native memory for temp buffer in
>>>             IOUtil.read()
>>>             - read the whole file into temp buffer
>>>             - copy the temp buffer back into our buffer
>>>
>>>             In common when fc.size() is right, we have 1 allocation and
>>>             copying
>>>             'size' bytes from temp buffer back.
>>>
>>>             So there is a difference in allocations/copying, but in my
>>>             opinion it's
>>>             not that significant for this particular task.
>>>
>>>             Sincerely yours,
>>>             Ivan
>>>
>>>             On 22.07.2013 20:03, roger riggs wrote:
>>>
>>>                 Hi Ivan,
>>>
>>>                 I'm concerned about the change in behavior for the
>>>                 existing working
>>>                 cases.
>>>
>>>                 How many times are the bytes copied in your proposed
>>>                 implementation?
>>>                 How many arrays are allocated and discarded?
>>>                 Files.copy()  uses an extra array for the copies.
>>>
>>>                 BAOS should only be used for size == 0; that would
>>>                 address the issue
>>>                 without changing the current behavior or allocations.
>>>
>>>                 Roger
>>>
>>>
>>>
>>>
>>>                 On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>
>>>                     Roger, David thanks for suggestions!
>>>
>>>                     Would you please take a look at an updated webrev?
>>> http://cr.openjdk.java.net/~__igerasim/8020669/1/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/>
>>>
>>>                     - File size is used as an initial size of BAOS's 
>>> buffer.
>>>                     - BAOS avoids copying its buffer in toByteArray(),
>>>                     if size is correct .
>>>
>>>                     I don't want to initialize BAOS with a positive
>>>                     number if size
>>>                     happened to be zero.
>>>                     Because zero could indicate that the file is really
>>>                     empty.
>>>
>>>                     Sincerely yours,
>>>                     Ivan
>>>
>>>                     On 19.07.2013 22:30, David M. Lloyd wrote:
>>>
>>>                         My mistake, we're not talking about strings.
>>>                           Still you can subclass
>>>                         and determine whether the buffer size guess was
>>>                         right, and if so
>>>                         return the array as-is (swap in an empty array
>>>                         or something as needed).
>>>
>>>                         On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>
>>>                             It's trivial to subclass
>>>                             ByteArrayOutputStream and add a method 
>>> which
>>>                             converts its contents to a string using the
>>>                             two protected fields which
>>>                             give you all the info you need to do 
>>> so.  So
>>>                             no extra copy is needed
>>>                             that you aren't already doing.
>>>
>>>                             On 07/19/2013 01:06 PM, roger riggs wrote:
>>>
>>>                                 Hi Ivan,
>>>
>>>                                 I think this change takes too big a hit
>>>                                 for the cases where the
>>>                                 size is
>>>                                 correct.
>>>
>>>                                 No real file system can be wrong about
>>>                                 the size of a file so this
>>>                                 is a
>>>                                 problem
>>>                                 only for special file systems. If the
>>>                                 problem is with size
>>>                                 reporting zero
>>>                                 then maybe using the incremental read
>>>                                 only for size == would be a
>>>                                 better
>>>                                 fix.
>>>
>>>                                 At least you could pass the size to the
>>>                                 constructor for BAOS and
>>>                                 avoid
>>>                                 the thrashing for every re-size; but
>>>                                 still it will allocate and
>>>                                 create
>>>                                 an extra copy
>>>                                 of the every time.
>>>
>>>                                 $.02, Roger
>>>
>>>
>>>                                 On 7/19/2013 1:15 PM, Ivan Gerasimov 
>>> wrote:
>>>
>>>                                     Hello everybody!
>>>
>>>                                     Would you please review a fix for
>>>                                     the problem with
>>>                                     j.n.f.Files.readAllBytes() 
>>> function?
>>>                                     The current implementation 
>>> relies on
>>>                                     FileChannel.size() to
>>>                                     preallocate
>>>                                     a buffer for the whole file's 
>>> content.
>>>                                     However, some special filesystems
>>>                                     can report a wrong size.
>>>                                     An example is procfs under Linux,
>>>                                     which reports many files under
>>>                                     /proc
>>>                                     to be zero sized.
>>>
>>>                                     Thus it is proposed not to rely on
>>>                                     the size() and instead
>>>                                     continuously
>>>                                     read until EOF.
>>>
>>>                                     The downside is reallocation and
>>>                                     copying file content between
>>>                                     buffers.
>>>                                     But taking into account that the 
>>> doc
>>>                                     says: "It is not intended for
>>>                                     reading in large files." it should
>>>                                     not be a big problem.
>>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8020669
>>> <http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>> http://cr.openjdk.java.net/~__igerasim/8020669/0/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/>
>>>
>>>                                     The fix is for JDK8. If it is
>>>                                     approved, it can be applied to 
>>> JDK7 as
>>>                                     well.
>>>
>>>                                     Sincerely yours,
>>>                                     Ivan Gerasimov
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>