RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Tue Jul 23 10:06:36 UTC 2013

Hi Roger!

This is how my implementation behaves:
- allocate 'size' bytes in BAOS
- allocate 8k for temp buffer
- in cycle read 8k or less bytes from input stream and copy them into BAOS
- if capacity of BAOS isn't sufficient (file had grown), its buffer will 
be reallocated
Thus, 1 reallocation and 1 copying of already read data on each 8k piece 
of additional bytes.

In normal case, i.e. when fc.size() is correct, we have overhead of 1 
allocation and copying 'size' bytes in size/8k iterations.

And this is how current implementation does
- allocate 'size' bytes
- allocate 'size' bytes of native memory for temp buffer in IOUtil.read()
- read the whole file into temp buffer
- copy the temp buffer back into our buffer

In common when fc.size() is right, we have 1 allocation and copying 
'size' bytes from temp buffer back.

So there is a difference in allocations/copying, but in my opinion it's 
not that significant for this particular task.

Sincerely yours,
Ivan

On 22.07.2013 20:03, roger riggs wrote:
> Hi Ivan,
>
> I'm concerned about the change in behavior for the existing working 
> cases.
>
> How many times are the bytes copied in your proposed implementation?
> How many arrays are allocated and discarded?
> Files.copy()  uses an extra array for the copies.
>
> BAOS should only be used for size == 0; that would address the issue
> without changing the current behavior or allocations.
>
> Roger
>
>
>
>
> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>> Roger, David thanks for suggestions!
>>
>> Would you please take a look at an updated webrev?
>> http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>
>> - File size is used as an initial size of BAOS's buffer.
>> - BAOS avoids copying its buffer in toByteArray(), if size is correct .
>>
>> I don't want to initialize BAOS with a positive number if size 
>> happened to be zero.
>> Because zero could indicate that the file is really empty.
>>
>> Sincerely yours,
>> Ivan
>>
>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>> My mistake, we're not talking about strings.  Still you can subclass 
>>> and determine whether the buffer size guess was right, and if so 
>>> return the array as-is (swap in an empty array or something as needed).
>>>
>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>> It's trivial to subclass ByteArrayOutputStream and add a method which
>>>> converts its contents to a string using the two protected fields which
>>>> give you all the info you need to do so.  So no extra copy is needed
>>>> that you aren't already doing.
>>>>
>>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>>> Hi Ivan,
>>>>>
>>>>> I think this change takes too big a hit for the cases where the 
>>>>> size is
>>>>> correct.
>>>>>
>>>>> No real file system can be wrong about the size of a file so this 
>>>>> is a
>>>>> problem
>>>>> only for special file systems. If the problem is with size 
>>>>> reporting zero
>>>>> then maybe using the incremental read only for size == would be a 
>>>>> better
>>>>> fix.
>>>>>
>>>>> At least you could pass the size to the constructor for BAOS and 
>>>>> avoid
>>>>> the thrashing for every re-size; but still it will allocate and 
>>>>> create
>>>>> an extra copy
>>>>> of the every time.
>>>>>
>>>>> $.02, Roger
>>>>>
>>>>>
>>>>> On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
>>>>>> Hello everybody!
>>>>>>
>>>>>> Would you please review a fix for the problem with
>>>>>> j.n.f.Files.readAllBytes() function?
>>>>>> The current implementation relies on FileChannel.size() to 
>>>>>> preallocate
>>>>>> a buffer for the whole file's content.
>>>>>> However, some special filesystems can report a wrong size.
>>>>>> An example is procfs under Linux, which reports many files under 
>>>>>> /proc
>>>>>> to be zero sized.
>>>>>>
>>>>>> Thus it is proposed not to rely on the size() and instead 
>>>>>> continuously
>>>>>> read until EOF.
>>>>>>
>>>>>> The downside is reallocation and copying file content between 
>>>>>> buffers.
>>>>>> But taking into account that the doc says: "It is not intended for
>>>>>> reading in large files." it should not be a big problem.
>>>>>>
>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8020669
>>>>>> http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>>>>>
>>>>>> The fix is for JDK8. If it is approved, it can be applied to JDK7 as
>>>>>> well.
>>>>>>
>>>>>> Sincerely yours,
>>>>>> Ivan Gerasimov
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>