RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Tue Jul 23 14:09:30 UTC 2013

Here's how it should behave:

- allocate 'size' byte byte array
- if size > 0:
   - use simple old I/O to read into the array
- do a one-byte read(), if not EOF then expand the array, using a simple 
growth pattern like 3/2 (with a special case for 0), and continue 
reading until EOF
- if the array matches the size of the file, return the array, else use 
copyOf() to shrink it

This way you only ever copy the array size() was wrong.

On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
> Hi Roger!
>
> This is how my implementation behaves:
> - allocate 'size' bytes in BAOS
> - allocate 8k for temp buffer
> - in cycle read 8k or less bytes from input stream and copy them into BAOS
> - if capacity of BAOS isn't sufficient (file had grown), its buffer will
> be reallocated
> Thus, 1 reallocation and 1 copying of already read data on each 8k piece
> of additional bytes.
>
> In normal case, i.e. when fc.size() is correct, we have overhead of 1
> allocation and copying 'size' bytes in size/8k iterations.
>
> And this is how current implementation does
> - allocate 'size' bytes
> - allocate 'size' bytes of native memory for temp buffer in IOUtil.read()
> - read the whole file into temp buffer
> - copy the temp buffer back into our buffer
>
> In common when fc.size() is right, we have 1 allocation and copying
> 'size' bytes from temp buffer back.
>
> So there is a difference in allocations/copying, but in my opinion it's
> not that significant for this particular task.
>
> Sincerely yours,
> Ivan
>
> On 22.07.2013 20:03, roger riggs wrote:
>> Hi Ivan,
>>
>> I'm concerned about the change in behavior for the existing working
>> cases.
>>
>> How many times are the bytes copied in your proposed implementation?
>> How many arrays are allocated and discarded?
>> Files.copy()  uses an extra array for the copies.
>>
>> BAOS should only be used for size == 0; that would address the issue
>> without changing the current behavior or allocations.
>>
>> Roger
>>
>>
>>
>>
>> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>> Roger, David thanks for suggestions!
>>>
>>> Would you please take a look at an updated webrev?
>>> http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>>
>>> - File size is used as an initial size of BAOS's buffer.
>>> - BAOS avoids copying its buffer in toByteArray(), if size is correct .
>>>
>>> I don't want to initialize BAOS with a positive number if size
>>> happened to be zero.
>>> Because zero could indicate that the file is really empty.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>>> My mistake, we're not talking about strings.  Still you can subclass
>>>> and determine whether the buffer size guess was right, and if so
>>>> return the array as-is (swap in an empty array or something as needed).
>>>>
>>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>>> It's trivial to subclass ByteArrayOutputStream and add a method which
>>>>> converts its contents to a string using the two protected fields which
>>>>> give you all the info you need to do so.  So no extra copy is needed
>>>>> that you aren't already doing.
>>>>>
>>>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>>>> Hi Ivan,
>>>>>>
>>>>>> I think this change takes too big a hit for the cases where the
>>>>>> size is
>>>>>> correct.
>>>>>>
>>>>>> No real file system can be wrong about the size of a file so this
>>>>>> is a
>>>>>> problem
>>>>>> only for special file systems. If the problem is with size
>>>>>> reporting zero
>>>>>> then maybe using the incremental read only for size == would be a
>>>>>> better
>>>>>> fix.
>>>>>>
>>>>>> At least you could pass the size to the constructor for BAOS and
>>>>>> avoid
>>>>>> the thrashing for every re-size; but still it will allocate and
>>>>>> create
>>>>>> an extra copy
>>>>>> of the every time.
>>>>>>
>>>>>> $.02, Roger
>>>>>>
>>>>>>
>>>>>> On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
>>>>>>> Hello everybody!
>>>>>>>
>>>>>>> Would you please review a fix for the problem with
>>>>>>> j.n.f.Files.readAllBytes() function?
>>>>>>> The current implementation relies on FileChannel.size() to
>>>>>>> preallocate
>>>>>>> a buffer for the whole file's content.
>>>>>>> However, some special filesystems can report a wrong size.
>>>>>>> An example is procfs under Linux, which reports many files under
>>>>>>> /proc
>>>>>>> to be zero sized.
>>>>>>>
>>>>>>> Thus it is proposed not to rely on the size() and instead
>>>>>>> continuously
>>>>>>> read until EOF.
>>>>>>>
>>>>>>> The downside is reallocation and copying file content between
>>>>>>> buffers.
>>>>>>> But taking into account that the doc says: "It is not intended for
>>>>>>> reading in large files." it should not be a big problem.
>>>>>>>
>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8020669
>>>>>>> http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>>>>>>
>>>>>>> The fix is for JDK8. If it is approved, it can be applied to JDK7 as
>>>>>>> well.
>>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Ivan Gerasimov
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

-- 
- DML