RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Wed Jul 24 21:23:08 UTC 2013

Yes, you're right, I read this wrong.  I agree with Martin then that the 
nullary read method is a better choice.

On 07/24/2013 03:49 PM, roger riggs wrote:
> It looks like under the covers the ChannelInputStream uses a 1-byte buffer
> but caches it and re-uses it.  That's slightly better than allocating a
> new one every time.
>
> Also, I don't see much difference between the original use (before the
> Alan's previous change)
> of Files.newInputStream(path) vs the compound try in readAllBytes().
> That might also be reverted to its previous state as well.
>
> Roger
>
>
>
> On 7/24/2013 4:40 PM, David M. Lloyd wrote:
>> I suspect the reason this was done is because the stream is actually a
>> ChannelInputStream.  If you don't do the 1-byte array thing, the
>> ChannelInputStream will construct an entire ByteBuffer to read into,
>> which is even worse.  As it is, it'll create a wrapper ByteBuffer to
>> encapsulate the destination array, which is bad enough.
>>
>> That's why I was saying that it would be nice if this could get a
>> FileInputStream, which afaik can read directly without messing around
>> with buffers.
>>
>> On 07/24/2013 02:24 PM, Martin Buchholz wrote:
>>> It's wasteful to create a 1-byte array to read into.  Just use the
>>> nullary read method.
>>> http://download.java.net/jdk8/docs/api/java/io/InputStream.html#read()
>>>
>>>
>>> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
>>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>>>
>>>     Would you please take a look at the updated webrev?
>>>     http://cr.openjdk.java.net/~__igerasim/8020669/2/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/>
>>>
>>>     readAllBytes() was recently (in b93) changed by Alan Bateman to fix
>>>     8014928.
>>>
>>>     Here's what I've done:
>>>     - reverted readAllBytes() to its implementation prior b93.
>>>     - modified it to address both 8020669 and 8014928.
>>>
>>>     http://bugs.sun.com/view_bug.__do?bug_id=8020669
>>>     <http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>>     <http://bugs.sun.com/view_bug.__do?bug_id=8014928
>>>     <http://bugs.sun.com/view_bug.do?bug_id=8014928>>
>>>     http://bugs.sun.com/view_bug.__do?bug_id=8014928
>>>     <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>>>
>>>     Sincerely yours,
>>>     Ivan
>>>
>>>
>>>     On 23.07.2013 18:09, David M. Lloyd wrote:
>>>
>>>         Here's how it should behave:
>>>
>>>         - allocate 'size' byte byte array
>>>         - if size > 0:
>>>            - use simple old I/O to read into the array
>>>         - do a one-byte read(), if not EOF then expand the array, using
>>>         a simple growth pattern like 3/2 (with a special case for 0),
>>>         and continue reading until EOF
>>>         - if the array matches the size of the file, return the array,
>>>         else use copyOf() to shrink it
>>>
>>>         This way you only ever copy the array size() was wrong.
>>>
>>>         On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>>
>>>             Hi Roger!
>>>
>>>             This is how my implementation behaves:
>>>             - allocate 'size' bytes in BAOS
>>>             - allocate 8k for temp buffer
>>>             - in cycle read 8k or less bytes from input stream and copy
>>>             them into BAOS
>>>             - if capacity of BAOS isn't sufficient (file had grown), its
>>>             buffer will
>>>             be reallocated
>>>             Thus, 1 reallocation and 1 copying of already read data on
>>>             each 8k piece
>>>             of additional bytes.
>>>
>>>             In normal case, i.e. when fc.size() is correct, we have
>>>             overhead of 1
>>>             allocation and copying 'size' bytes in size/8k iterations.
>>>
>>>             And this is how current implementation does
>>>             - allocate 'size' bytes
>>>             - allocate 'size' bytes of native memory for temp buffer in
>>>             IOUtil.read()
>>>             - read the whole file into temp buffer
>>>             - copy the temp buffer back into our buffer
>>>
>>>             In common when fc.size() is right, we have 1 allocation and
>>>             copying
>>>             'size' bytes from temp buffer back.
>>>
>>>             So there is a difference in allocations/copying, but in my
>>>             opinion it's
>>>             not that significant for this particular task.
>>>
>>>             Sincerely yours,
>>>             Ivan
>>>
>>>             On 22.07.2013 20:03, roger riggs wrote:
>>>
>>>                 Hi Ivan,
>>>
>>>                 I'm concerned about the change in behavior for the
>>>                 existing working
>>>                 cases.
>>>
>>>                 How many times are the bytes copied in your proposed
>>>                 implementation?
>>>                 How many arrays are allocated and discarded?
>>>                 Files.copy()  uses an extra array for the copies.
>>>
>>>                 BAOS should only be used for size == 0; that would
>>>                 address the issue
>>>                 without changing the current behavior or allocations.
>>>
>>>                 Roger
>>>
>>>
>>>
>>>
>>>                 On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>
>>>                     Roger, David thanks for suggestions!
>>>
>>>                     Would you please take a look at an updated webrev?
>>> http://cr.openjdk.java.net/~__igerasim/8020669/1/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/>
>>>
>>>                     - File size is used as an initial size of BAOS's
>>> buffer.
>>>                     - BAOS avoids copying its buffer in toByteArray(),
>>>                     if size is correct .
>>>
>>>                     I don't want to initialize BAOS with a positive
>>>                     number if size
>>>                     happened to be zero.
>>>                     Because zero could indicate that the file is really
>>>                     empty.
>>>
>>>                     Sincerely yours,
>>>                     Ivan
>>>
>>>                     On 19.07.2013 22:30, David M. Lloyd wrote:
>>>
>>>                         My mistake, we're not talking about strings.
>>>                           Still you can subclass
>>>                         and determine whether the buffer size guess was
>>>                         right, and if so
>>>                         return the array as-is (swap in an empty array
>>>                         or something as needed).
>>>
>>>                         On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>
>>>                             It's trivial to subclass
>>>                             ByteArrayOutputStream and add a method which
>>>                             converts its contents to a string using the
>>>                             two protected fields which
>>>                             give you all the info you need to do so.  So
>>>                             no extra copy is needed
>>>                             that you aren't already doing.
>>>
>>>                             On 07/19/2013 01:06 PM, roger riggs wrote:
>>>
>>>                                 Hi Ivan,
>>>
>>>                                 I think this change takes too big a hit
>>>                                 for the cases where the
>>>                                 size is
>>>                                 correct.
>>>
>>>                                 No real file system can be wrong about
>>>                                 the size of a file so this
>>>                                 is a
>>>                                 problem
>>>                                 only for special file systems. If the
>>>                                 problem is with size
>>>                                 reporting zero
>>>                                 then maybe using the incremental read
>>>                                 only for size == would be a
>>>                                 better
>>>                                 fix.
>>>
>>>                                 At least you could pass the size to the
>>>                                 constructor for BAOS and
>>>                                 avoid
>>>                                 the thrashing for every re-size; but
>>>                                 still it will allocate and
>>>                                 create
>>>                                 an extra copy
>>>                                 of the every time.
>>>
>>>                                 $.02, Roger
>>>
>>>
>>>                                 On 7/19/2013 1:15 PM, Ivan Gerasimov
>>> wrote:
>>>
>>>                                     Hello everybody!
>>>
>>>                                     Would you please review a fix for
>>>                                     the problem with
>>>                                     j.n.f.Files.readAllBytes() function?
>>>                                     The current implementation relies on
>>>                                     FileChannel.size() to
>>>                                     preallocate
>>>                                     a buffer for the whole file's
>>> content.
>>>                                     However, some special filesystems
>>>                                     can report a wrong size.
>>>                                     An example is procfs under Linux,
>>>                                     which reports many files under
>>>                                     /proc
>>>                                     to be zero sized.
>>>
>>>                                     Thus it is proposed not to rely on
>>>                                     the size() and instead
>>>                                     continuously
>>>                                     read until EOF.
>>>
>>>                                     The downside is reallocation and
>>>                                     copying file content between
>>>                                     buffers.
>>>                                     But taking into account that the doc
>>>                                     says: "It is not intended for
>>>                                     reading in large files." it should
>>>                                     not be a big problem.
>>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8020669
>>> <http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>> http://cr.openjdk.java.net/~__igerasim/8020669/0/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/>
>>>
>>>                                     The fix is for JDK8. If it is
>>>                                     approved, it can be applied to
>>> JDK7 as
>>>                                     well.
>>>
>>>                                     Sincerely yours,
>>>                                     Ivan Gerasimov
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

-- 
- DML