RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Martin Buchholz martinrb at google.com
Wed Jul 24 19:36:11 UTC 2013


Use javadoc style even in private methods.

s/Read/Reads/

Use @param initialSize

     /**
+     * Read all the bytes from an input stream. The {@code initialSize}
+     * parameter indicates the initial size of the byte[] to allocate.
+     */

---

This needs to be tested for an ordinary zero-length file.  It looks like
for the zero case

+            int newCapacity = capacity << 1;

will infloop not actually increasing the buffer.

---

You might want to copy this technique from ArrayList et al:


    /**
     * The maximum size of array to allocate.
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;




On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
<ivan.gerasimov at oracle.com>wrote:

> Would you please take a look at the updated webrev?
> http://cr.openjdk.java.net/~**igerasim/8020669/2/webrev/<http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/>
>
> readAllBytes() was recently (in b93) changed by Alan Bateman to fix
> 8014928.
>
> Here's what I've done:
> - reverted readAllBytes() to its implementation prior b93.
> - modified it to address both 8020669 and 8014928.
>
> http://bugs.sun.com/view_bug.**do?bug_id=8020669<http://bugs.sun.com/view_bug.do?bug_id=8020669><
> http://bugs.sun.com/view_bug.**do?bug_id=8014928<http://bugs.sun.com/view_bug.do?bug_id=8014928>
> >
> http://bugs.sun.com/view_bug.**do?bug_id=8014928<http://bugs.sun.com/view_bug.do?bug_id=8014928>
>
> Sincerely yours,
> Ivan
>
>
> On 23.07.2013 18:09, David M. Lloyd wrote:
>
>> Here's how it should behave:
>>
>> - allocate 'size' byte byte array
>> - if size > 0:
>>   - use simple old I/O to read into the array
>> - do a one-byte read(), if not EOF then expand the array, using a simple
>> growth pattern like 3/2 (with a special case for 0), and continue reading
>> until EOF
>> - if the array matches the size of the file, return the array, else use
>> copyOf() to shrink it
>>
>> This way you only ever copy the array size() was wrong.
>>
>> On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>
>>> Hi Roger!
>>>
>>> This is how my implementation behaves:
>>> - allocate 'size' bytes in BAOS
>>> - allocate 8k for temp buffer
>>> - in cycle read 8k or less bytes from input stream and copy them into
>>> BAOS
>>> - if capacity of BAOS isn't sufficient (file had grown), its buffer will
>>> be reallocated
>>> Thus, 1 reallocation and 1 copying of already read data on each 8k piece
>>> of additional bytes.
>>>
>>> In normal case, i.e. when fc.size() is correct, we have overhead of 1
>>> allocation and copying 'size' bytes in size/8k iterations.
>>>
>>> And this is how current implementation does
>>> - allocate 'size' bytes
>>> - allocate 'size' bytes of native memory for temp buffer in IOUtil.read()
>>> - read the whole file into temp buffer
>>> - copy the temp buffer back into our buffer
>>>
>>> In common when fc.size() is right, we have 1 allocation and copying
>>> 'size' bytes from temp buffer back.
>>>
>>> So there is a difference in allocations/copying, but in my opinion it's
>>> not that significant for this particular task.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 22.07.2013 20:03, roger riggs wrote:
>>>
>>>> Hi Ivan,
>>>>
>>>> I'm concerned about the change in behavior for the existing working
>>>> cases.
>>>>
>>>> How many times are the bytes copied in your proposed implementation?
>>>> How many arrays are allocated and discarded?
>>>> Files.copy()  uses an extra array for the copies.
>>>>
>>>> BAOS should only be used for size == 0; that would address the issue
>>>> without changing the current behavior or allocations.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>>
>>>> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>>
>>>>> Roger, David thanks for suggestions!
>>>>>
>>>>> Would you please take a look at an updated webrev?
>>>>> http://cr.openjdk.java.net/~**igerasim/8020669/1/webrev/<http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/>
>>>>>
>>>>> - File size is used as an initial size of BAOS's buffer.
>>>>> - BAOS avoids copying its buffer in toByteArray(), if size is correct .
>>>>>
>>>>> I don't want to initialize BAOS with a positive number if size
>>>>> happened to be zero.
>>>>> Because zero could indicate that the file is really empty.
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>>>>
>>>>>> My mistake, we're not talking about strings.  Still you can subclass
>>>>>> and determine whether the buffer size guess was right, and if so
>>>>>> return the array as-is (swap in an empty array or something as
>>>>>> needed).
>>>>>>
>>>>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>>>>
>>>>>>> It's trivial to subclass ByteArrayOutputStream and add a method which
>>>>>>> converts its contents to a string using the two protected fields
>>>>>>> which
>>>>>>> give you all the info you need to do so.  So no extra copy is needed
>>>>>>> that you aren't already doing.
>>>>>>>
>>>>>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>>>>>
>>>>>>>> Hi Ivan,
>>>>>>>>
>>>>>>>> I think this change takes too big a hit for the cases where the
>>>>>>>> size is
>>>>>>>> correct.
>>>>>>>>
>>>>>>>> No real file system can be wrong about the size of a file so this
>>>>>>>> is a
>>>>>>>> problem
>>>>>>>> only for special file systems. If the problem is with size
>>>>>>>> reporting zero
>>>>>>>> then maybe using the incremental read only for size == would be a
>>>>>>>> better
>>>>>>>> fix.
>>>>>>>>
>>>>>>>> At least you could pass the size to the constructor for BAOS and
>>>>>>>> avoid
>>>>>>>> the thrashing for every re-size; but still it will allocate and
>>>>>>>> create
>>>>>>>> an extra copy
>>>>>>>> of the every time.
>>>>>>>>
>>>>>>>> $.02, Roger
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
>>>>>>>>
>>>>>>>>> Hello everybody!
>>>>>>>>>
>>>>>>>>> Would you please review a fix for the problem with
>>>>>>>>> j.n.f.Files.readAllBytes() function?
>>>>>>>>> The current implementation relies on FileChannel.size() to
>>>>>>>>> preallocate
>>>>>>>>> a buffer for the whole file's content.
>>>>>>>>> However, some special filesystems can report a wrong size.
>>>>>>>>> An example is procfs under Linux, which reports many files under
>>>>>>>>> /proc
>>>>>>>>> to be zero sized.
>>>>>>>>>
>>>>>>>>> Thus it is proposed not to rely on the size() and instead
>>>>>>>>> continuously
>>>>>>>>> read until EOF.
>>>>>>>>>
>>>>>>>>> The downside is reallocation and copying file content between
>>>>>>>>> buffers.
>>>>>>>>> But taking into account that the doc says: "It is not intended for
>>>>>>>>> reading in large files." it should not be a big problem.
>>>>>>>>>
>>>>>>>>> http://bugs.sun.com/view_bug.**do?bug_id=8020669<http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>>>>>>>> http://cr.openjdk.java.net/~**igerasim/8020669/0/webrev/<http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/>
>>>>>>>>>
>>>>>>>>> The fix is for JDK8. If it is approved, it can be applied to JDK7
>>>>>>>>> as
>>>>>>>>> well.
>>>>>>>>>
>>>>>>>>> Sincerely yours,
>>>>>>>>> Ivan Gerasimov
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>



More information about the core-libs-dev mailing list