RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0
Ivan Gerasimov
ivan.gerasimov at oracle.com
Thu Jul 25 23:56:18 UTC 2013
Hi Roger!
On 25.07.2013 17:42, roger riggs wrote:
> Hi Ivan,
>
> Thank you for your diligence.
Thank you for your patience :)
> 1) Should the test use an alternate mechanism to read the file
> (FileInputStream)
> and confirm the length of the array?
This file can change from read to read. But it should not be empty, and
that's what we check.
I moved the test from a separate file to BytesAndLines.java.
I've also added testing of how Files.readAllLines() reads from procfs
files (it already does well.)
> 2) There is an edge case where the file size is between MAX_ARRAY_SIZE
> and Integer.MAX_VALUE that should be avoided.
Yes, you're right.
I rewrote the logic in a simpler way with no overflowing and casting to
long.
Would you please help review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/4/webrev/
Sincerely yours,
Ivan
> The lines at L3022 are confusing.
> If the max array size is MAX_BUFFER_SIZE then it should be used
> also at L3063 in readAllBytes.
>
> L3015-3026: The logic around the 3 nested if's is a bit confusing for
> the actions being taken.
> By switching to a long for newCapacity this can be easier to read.
>
> if (capacity == MAX_INTEGER) {
> throw ...
> }
> long newCapacity = ((long)capacity) << 1;
> newCapacity = Math.max(newCapacity, BUFFER_SIZE); // at least BUFFER_SIZE
> newCapacity = Math.min(newCapacity, MAX_BUFFER_SIZE); // at most MAX_BUFFER_SIZE
> capacity = (int) newCapacity;
>
> buf = Arrays.copyOf(buf, capacity);
> buf[nread++] = (byte)n;
> rem = buf.length - nread;
> Thanks, Roger
>
> BTW, I'm not an official reviewer
>
> On 7/24/2013 7:47 PM, Ivan Gerasimov wrote:
>> Hello everybody!
>>
>> Would you please review an updated webrev?
>> http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/
>> <http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>
>>
>> Thanks in advance,
>> Ivan
>>
>>
>> On 24.07.2013 23:36, Martin Buchholz wrote:
>>> Use javadoc style even in private methods.
>>> s/Read/Reads/
>>> Use @param initialSize
>>> /**
>>> + * Read all the bytes from an input stream. The {@code initialSize}
>>> + * parameter indicates the initial size of the byte[] to allocate.
>>> + */
>>> ---
>>>
>>> This needs to be tested for an ordinary zero-length file. It looks
>>> like for the zero case
>>> + int newCapacity = capacity << 1;
>>> will infloop not actually increasing the buffer.
>>>
>>> ---
>>>
>>> You might want to copy this technique from ArrayList et al:
>>>
>>>
>>> /**
>>> * The maximum size of array to allocate.
>>> * Some VMs reserve some header words in an array.
>>> * Attempts to allocate larger arrays may result in
>>> * OutOfMemoryError: Requested array size exceeds VM limit
>>> */
>>> private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
>>>
>>>
>>>
>>>
>>> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
>>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>>>
>>> Would you please take a look at the updated webrev?
>>> http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
>>> <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>
>>>
>>> readAllBytes() was recently (in b93) changed by Alan Bateman to
>>> fix 8014928.
>>>
>>> Here's what I've done:
>>> - reverted readAllBytes() to its implementation prior b93.
>>> - modified it to address both 8020669 and 8014928.
>>>
>>> http://bugs.sun.com/view_bug.do?bug_id=8020669
>>> <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>>> http://bugs.sun.com/view_bug.do?bug_id=8014928
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>>
>>> On 23.07.2013 18:09, David M. Lloyd wrote:
>>>
>>> Here's how it should behave:
>>>
>>> - allocate 'size' byte byte array
>>> - if size > 0:
>>> - use simple old I/O to read into the array
>>> - do a one-byte read(), if not EOF then expand the array,
>>> using a simple growth pattern like 3/2 (with a special case
>>> for 0), and continue reading until EOF
>>> - if the array matches the size of the file, return the
>>> array, else use copyOf() to shrink it
>>>
>>> This way you only ever copy the array size() was wrong.
>>>
>>> On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>>
>>> Hi Roger!
>>>
>>> This is how my implementation behaves:
>>> - allocate 'size' bytes in BAOS
>>> - allocate 8k for temp buffer
>>> - in cycle read 8k or less bytes from input stream and
>>> copy them into BAOS
>>> - if capacity of BAOS isn't sufficient (file had grown),
>>> its buffer will
>>> be reallocated
>>> Thus, 1 reallocation and 1 copying of already read data
>>> on each 8k piece
>>> of additional bytes.
>>>
>>> In normal case, i.e. when fc.size() is correct, we have
>>> overhead of 1
>>> allocation and copying 'size' bytes in size/8k iterations.
>>>
>>> And this is how current implementation does
>>> - allocate 'size' bytes
>>> - allocate 'size' bytes of native memory for temp buffer
>>> in IOUtil.read()
>>> - read the whole file into temp buffer
>>> - copy the temp buffer back into our buffer
>>>
>>> In common when fc.size() is right, we have 1 allocation
>>> and copying
>>> 'size' bytes from temp buffer back.
>>>
>>> So there is a difference in allocations/copying, but in
>>> my opinion it's
>>> not that significant for this particular task.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 22.07.2013 20:03, roger riggs wrote:
>>>
>>> Hi Ivan,
>>>
>>> I'm concerned about the change in behavior for the
>>> existing working
>>> cases.
>>>
>>> How many times are the bytes copied in your proposed
>>> implementation?
>>> How many arrays are allocated and discarded?
>>> Files.copy() uses an extra array for the copies.
>>>
>>> BAOS should only be used for size == 0; that would
>>> address the issue
>>> without changing the current behavior or allocations.
>>>
>>> Roger
>>>
>>>
>>>
>>>
>>> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>
>>> Roger, David thanks for suggestions!
>>>
>>> Would you please take a look at an updated webrev?
>>> http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>> <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>
>>>
>>> - File size is used as an initial size of BAOS's
>>> buffer.
>>> - BAOS avoids copying its buffer in
>>> toByteArray(), if size is correct .
>>>
>>> I don't want to initialize BAOS with a positive
>>> number if size
>>> happened to be zero.
>>> Because zero could indicate that the file is
>>> really empty.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>>
>>> My mistake, we're not talking about strings.
>>> Still you can subclass
>>> and determine whether the buffer size guess
>>> was right, and if so
>>> return the array as-is (swap in an empty
>>> array or something as needed).
>>>
>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>
>>> It's trivial to subclass
>>> ByteArrayOutputStream and add a method which
>>> converts its contents to a string using
>>> the two protected fields which
>>> give you all the info you need to do so.
>>> So no extra copy is needed
>>> that you aren't already doing.
>>>
>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>
>>> Hi Ivan,
>>>
>>> I think this change takes too big a
>>> hit for the cases where the
>>> size is
>>> correct.
>>>
>>> No real file system can be wrong
>>> about the size of a file so this
>>> is a
>>> problem
>>> only for special file systems. If
>>> the problem is with size
>>> reporting zero
>>> then maybe using the incremental
>>> read only for size == would be a
>>> better
>>> fix.
>>>
>>> At least you could pass the size to
>>> the constructor for BAOS and
>>> avoid
>>> the thrashing for every re-size; but
>>> still it will allocate and
>>> create
>>> an extra copy
>>> of the every time.
>>>
>>> $.02, Roger
>>>
>>>
>>> On 7/19/2013 1:15 PM, Ivan Gerasimov
>>> wrote:
>>>
>>> Hello everybody!
>>>
>>> Would you please review a fix
>>> for the problem with
>>> j.n.f.Files.readAllBytes() function?
>>> The current implementation
>>> relies on FileChannel.size() to
>>> preallocate
>>> a buffer for the whole file's
>>> content.
>>> However, some special
>>> filesystems can report a wrong size.
>>> An example is procfs under
>>> Linux, which reports many files
>>> under
>>> /proc
>>> to be zero sized.
>>>
>>> Thus it is proposed not to rely
>>> on the size() and instead
>>> continuously
>>> read until EOF.
>>>
>>> The downside is reallocation and
>>> copying file content between
>>> buffers.
>>> But taking into account that the
>>> doc says: "It is not intended for
>>> reading in large files." it
>>> should not be a big problem.
>>>
>>> http://bugs.sun.com/view_bug.do?bug_id=8020669
>>> http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>> <http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>
>>>
>>> The fix is for JDK8. If it is
>>> approved, it can be applied to
>>> JDK7 as
>>> well.
>>>
>>> Sincerely yours,
>>> Ivan Gerasimov
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
More information about the core-libs-dev
mailing list