RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0
David M. Lloyd
david.lloyd at redhat.com
Wed Jul 24 21:23:08 UTC 2013
Yes, you're right, I read this wrong. I agree with Martin then that the
nullary read method is a better choice.
On 07/24/2013 03:49 PM, roger riggs wrote:
> It looks like under the covers the ChannelInputStream uses a 1-byte buffer
> but caches it and re-uses it. That's slightly better than allocating a
> new one every time.
>
> Also, I don't see much difference between the original use (before the
> Alan's previous change)
> of Files.newInputStream(path) vs the compound try in readAllBytes().
> That might also be reverted to its previous state as well.
>
> Roger
>
>
>
> On 7/24/2013 4:40 PM, David M. Lloyd wrote:
>> I suspect the reason this was done is because the stream is actually a
>> ChannelInputStream. If you don't do the 1-byte array thing, the
>> ChannelInputStream will construct an entire ByteBuffer to read into,
>> which is even worse. As it is, it'll create a wrapper ByteBuffer to
>> encapsulate the destination array, which is bad enough.
>>
>> That's why I was saying that it would be nice if this could get a
>> FileInputStream, which afaik can read directly without messing around
>> with buffers.
>>
>> On 07/24/2013 02:24 PM, Martin Buchholz wrote:
>>> It's wasteful to create a 1-byte array to read into. Just use the
>>> nullary read method.
>>> http://download.java.net/jdk8/docs/api/java/io/InputStream.html#read()
>>>
>>>
>>> On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
>>> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>>>
>>> Would you please take a look at the updated webrev?
>>> http://cr.openjdk.java.net/~__igerasim/8020669/2/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/>
>>>
>>> readAllBytes() was recently (in b93) changed by Alan Bateman to fix
>>> 8014928.
>>>
>>> Here's what I've done:
>>> - reverted readAllBytes() to its implementation prior b93.
>>> - modified it to address both 8020669 and 8014928.
>>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8020669
>>> <http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>> <http://bugs.sun.com/view_bug.__do?bug_id=8014928
>>> <http://bugs.sun.com/view_bug.do?bug_id=8014928>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8014928
>>> <http://bugs.sun.com/view_bug.do?bug_id=8014928>
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>>
>>> On 23.07.2013 18:09, David M. Lloyd wrote:
>>>
>>> Here's how it should behave:
>>>
>>> - allocate 'size' byte byte array
>>> - if size > 0:
>>> - use simple old I/O to read into the array
>>> - do a one-byte read(), if not EOF then expand the array, using
>>> a simple growth pattern like 3/2 (with a special case for 0),
>>> and continue reading until EOF
>>> - if the array matches the size of the file, return the array,
>>> else use copyOf() to shrink it
>>>
>>> This way you only ever copy the array size() was wrong.
>>>
>>> On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>>
>>> Hi Roger!
>>>
>>> This is how my implementation behaves:
>>> - allocate 'size' bytes in BAOS
>>> - allocate 8k for temp buffer
>>> - in cycle read 8k or less bytes from input stream and copy
>>> them into BAOS
>>> - if capacity of BAOS isn't sufficient (file had grown), its
>>> buffer will
>>> be reallocated
>>> Thus, 1 reallocation and 1 copying of already read data on
>>> each 8k piece
>>> of additional bytes.
>>>
>>> In normal case, i.e. when fc.size() is correct, we have
>>> overhead of 1
>>> allocation and copying 'size' bytes in size/8k iterations.
>>>
>>> And this is how current implementation does
>>> - allocate 'size' bytes
>>> - allocate 'size' bytes of native memory for temp buffer in
>>> IOUtil.read()
>>> - read the whole file into temp buffer
>>> - copy the temp buffer back into our buffer
>>>
>>> In common when fc.size() is right, we have 1 allocation and
>>> copying
>>> 'size' bytes from temp buffer back.
>>>
>>> So there is a difference in allocations/copying, but in my
>>> opinion it's
>>> not that significant for this particular task.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 22.07.2013 20:03, roger riggs wrote:
>>>
>>> Hi Ivan,
>>>
>>> I'm concerned about the change in behavior for the
>>> existing working
>>> cases.
>>>
>>> How many times are the bytes copied in your proposed
>>> implementation?
>>> How many arrays are allocated and discarded?
>>> Files.copy() uses an extra array for the copies.
>>>
>>> BAOS should only be used for size == 0; that would
>>> address the issue
>>> without changing the current behavior or allocations.
>>>
>>> Roger
>>>
>>>
>>>
>>>
>>> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>
>>> Roger, David thanks for suggestions!
>>>
>>> Would you please take a look at an updated webrev?
>>> http://cr.openjdk.java.net/~__igerasim/8020669/1/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/>
>>>
>>> - File size is used as an initial size of BAOS's
>>> buffer.
>>> - BAOS avoids copying its buffer in toByteArray(),
>>> if size is correct .
>>>
>>> I don't want to initialize BAOS with a positive
>>> number if size
>>> happened to be zero.
>>> Because zero could indicate that the file is really
>>> empty.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>>
>>> My mistake, we're not talking about strings.
>>> Still you can subclass
>>> and determine whether the buffer size guess was
>>> right, and if so
>>> return the array as-is (swap in an empty array
>>> or something as needed).
>>>
>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>
>>> It's trivial to subclass
>>> ByteArrayOutputStream and add a method which
>>> converts its contents to a string using the
>>> two protected fields which
>>> give you all the info you need to do so. So
>>> no extra copy is needed
>>> that you aren't already doing.
>>>
>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>
>>> Hi Ivan,
>>>
>>> I think this change takes too big a hit
>>> for the cases where the
>>> size is
>>> correct.
>>>
>>> No real file system can be wrong about
>>> the size of a file so this
>>> is a
>>> problem
>>> only for special file systems. If the
>>> problem is with size
>>> reporting zero
>>> then maybe using the incremental read
>>> only for size == would be a
>>> better
>>> fix.
>>>
>>> At least you could pass the size to the
>>> constructor for BAOS and
>>> avoid
>>> the thrashing for every re-size; but
>>> still it will allocate and
>>> create
>>> an extra copy
>>> of the every time.
>>>
>>> $.02, Roger
>>>
>>>
>>> On 7/19/2013 1:15 PM, Ivan Gerasimov
>>> wrote:
>>>
>>> Hello everybody!
>>>
>>> Would you please review a fix for
>>> the problem with
>>> j.n.f.Files.readAllBytes() function?
>>> The current implementation relies on
>>> FileChannel.size() to
>>> preallocate
>>> a buffer for the whole file's
>>> content.
>>> However, some special filesystems
>>> can report a wrong size.
>>> An example is procfs under Linux,
>>> which reports many files under
>>> /proc
>>> to be zero sized.
>>>
>>> Thus it is proposed not to rely on
>>> the size() and instead
>>> continuously
>>> read until EOF.
>>>
>>> The downside is reallocation and
>>> copying file content between
>>> buffers.
>>> But taking into account that the doc
>>> says: "It is not intended for
>>> reading in large files." it should
>>> not be a big problem.
>>>
>>> http://bugs.sun.com/view_bug.__do?bug_id=8020669
>>> <http://bugs.sun.com/view_bug.do?bug_id=8020669>
>>> http://cr.openjdk.java.net/~__igerasim/8020669/0/webrev/
>>> <http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/>
>>>
>>> The fix is for JDK8. If it is
>>> approved, it can be applied to
>>> JDK7 as
>>> well.
>>>
>>> Sincerely yours,
>>> Ivan Gerasimov
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
--
- DML
More information about the core-libs-dev
mailing list