RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Wed Jul 24 00:09:39 UTC 2013

Would you please take a look at the updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/

readAllBytes() was recently (in b93) changed by Alan Bateman to fix 8014928.

Here's what I've done:
- reverted readAllBytes() to its implementation prior b93.
- modified it to address both 8020669 and 8014928.

http://bugs.sun.com/view_bug.do?bug_id=8020669 
<http://bugs.sun.com/view_bug.do?bug_id=8014928>
http://bugs.sun.com/view_bug.do?bug_id=8014928

Sincerely yours,
Ivan

On 23.07.2013 18:09, David M. Lloyd wrote:
> Here's how it should behave:
>
> - allocate 'size' byte byte array
> - if size > 0:
>   - use simple old I/O to read into the array
> - do a one-byte read(), if not EOF then expand the array, using a 
> simple growth pattern like 3/2 (with a special case for 0), and 
> continue reading until EOF
> - if the array matches the size of the file, return the array, else 
> use copyOf() to shrink it
>
> This way you only ever copy the array size() was wrong.
>
> On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>> Hi Roger!
>>
>> This is how my implementation behaves:
>> - allocate 'size' bytes in BAOS
>> - allocate 8k for temp buffer
>> - in cycle read 8k or less bytes from input stream and copy them into 
>> BAOS
>> - if capacity of BAOS isn't sufficient (file had grown), its buffer will
>> be reallocated
>> Thus, 1 reallocation and 1 copying of already read data on each 8k piece
>> of additional bytes.
>>
>> In normal case, i.e. when fc.size() is correct, we have overhead of 1
>> allocation and copying 'size' bytes in size/8k iterations.
>>
>> And this is how current implementation does
>> - allocate 'size' bytes
>> - allocate 'size' bytes of native memory for temp buffer in 
>> IOUtil.read()
>> - read the whole file into temp buffer
>> - copy the temp buffer back into our buffer
>>
>> In common when fc.size() is right, we have 1 allocation and copying
>> 'size' bytes from temp buffer back.
>>
>> So there is a difference in allocations/copying, but in my opinion it's
>> not that significant for this particular task.
>>
>> Sincerely yours,
>> Ivan
>>
>> On 22.07.2013 20:03, roger riggs wrote:
>>> Hi Ivan,
>>>
>>> I'm concerned about the change in behavior for the existing working
>>> cases.
>>>
>>> How many times are the bytes copied in your proposed implementation?
>>> How many arrays are allocated and discarded?
>>> Files.copy()  uses an extra array for the copies.
>>>
>>> BAOS should only be used for size == 0; that would address the issue
>>> without changing the current behavior or allocations.
>>>
>>> Roger
>>>
>>>
>>>
>>>
>>> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>> Roger, David thanks for suggestions!
>>>>
>>>> Would you please take a look at an updated webrev?
>>>> http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>>>
>>>> - File size is used as an initial size of BAOS's buffer.
>>>> - BAOS avoids copying its buffer in toByteArray(), if size is 
>>>> correct .
>>>>
>>>> I don't want to initialize BAOS with a positive number if size
>>>> happened to be zero.
>>>> Because zero could indicate that the file is really empty.
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>>>> My mistake, we're not talking about strings.  Still you can subclass
>>>>> and determine whether the buffer size guess was right, and if so
>>>>> return the array as-is (swap in an empty array or something as 
>>>>> needed).
>>>>>
>>>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>>>> It's trivial to subclass ByteArrayOutputStream and add a method 
>>>>>> which
>>>>>> converts its contents to a string using the two protected fields 
>>>>>> which
>>>>>> give you all the info you need to do so.  So no extra copy is needed
>>>>>> that you aren't already doing.
>>>>>>
>>>>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>>>>> Hi Ivan,
>>>>>>>
>>>>>>> I think this change takes too big a hit for the cases where the
>>>>>>> size is
>>>>>>> correct.
>>>>>>>
>>>>>>> No real file system can be wrong about the size of a file so this
>>>>>>> is a
>>>>>>> problem
>>>>>>> only for special file systems. If the problem is with size
>>>>>>> reporting zero
>>>>>>> then maybe using the incremental read only for size == would be a
>>>>>>> better
>>>>>>> fix.
>>>>>>>
>>>>>>> At least you could pass the size to the constructor for BAOS and
>>>>>>> avoid
>>>>>>> the thrashing for every re-size; but still it will allocate and
>>>>>>> create
>>>>>>> an extra copy
>>>>>>> of the every time.
>>>>>>>
>>>>>>> $.02, Roger
>>>>>>>
>>>>>>>
>>>>>>> On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
>>>>>>>> Hello everybody!
>>>>>>>>
>>>>>>>> Would you please review a fix for the problem with
>>>>>>>> j.n.f.Files.readAllBytes() function?
>>>>>>>> The current implementation relies on FileChannel.size() to
>>>>>>>> preallocate
>>>>>>>> a buffer for the whole file's content.
>>>>>>>> However, some special filesystems can report a wrong size.
>>>>>>>> An example is procfs under Linux, which reports many files under
>>>>>>>> /proc
>>>>>>>> to be zero sized.
>>>>>>>>
>>>>>>>> Thus it is proposed not to rely on the size() and instead
>>>>>>>> continuously
>>>>>>>> read until EOF.
>>>>>>>>
>>>>>>>> The downside is reallocation and copying file content between
>>>>>>>> buffers.
>>>>>>>> But taking into account that the doc says: "It is not intended for
>>>>>>>> reading in large files." it should not be a big problem.
>>>>>>>>
>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8020669
>>>>>>>> http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>>>>>>>
>>>>>>>> The fix is for JDK8. If it is approved, it can be applied to 
>>>>>>>> JDK7 as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> Sincerely yours,
>>>>>>>> Ivan Gerasimov
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>