RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Wed Jul 24 00:26:47 UTC 2013

@@ -2965,6 +2965,46 @@
      }

      /**
+     * Read all the bytes from an input stream. The {@code initialSize}
+     * parameter indicates the initial size of the byte[] to allocate.
+     */
+    private static byte[] read(InputStream source, int initialSize)
+            throws IOException
+    {
+        int capacity = initialSize;
+        byte[] buf = new byte[capacity];
+        byte[] b = new byte[1];

I assume this is because of...

@@ -2989,22 +3029,13 @@
       *          method is invoked to check read access to the file.
       */
      public static byte[] readAllBytes(Path path) throws IOException {
-        try (FileChannel fc = FileChannel.open(path)) {
+        try (FileChannel fc = FileChannel.open(path);
+             InputStream fis = Channels.newInputStream(fc)) {
              long size = fc.size();
              if (size > (long)Integer.MAX_VALUE)
                  throw new OutOfMemoryError("Required array size too 
large");

It would be nice if there was a way to get a plain old FileInputStream 
here (though I don't see any way to do it at first glance).

On 07/23/2013 07:09 PM, Ivan Gerasimov wrote:
> Would you please take a look at the updated webrev?
> http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
>
> readAllBytes() was recently (in b93) changed by Alan Bateman to fix 8014928.
>
> Here's what I've done:
> - reverted readAllBytes() to its implementation prior b93.
> - modified it to address both 8020669 and 8014928.
>
> http://bugs.sun.com/view_bug.do?bug_id=8020669
> <http://bugs.sun.com/view_bug.do?bug_id=8014928>
> http://bugs.sun.com/view_bug.do?bug_id=8014928
>
> Sincerely yours,
> Ivan
>
> On 23.07.2013 18:09, David M. Lloyd wrote:
>> Here's how it should behave:
>>
>> - allocate 'size' byte byte array
>> - if size > 0:
>>   - use simple old I/O to read into the array
>> - do a one-byte read(), if not EOF then expand the array, using a
>> simple growth pattern like 3/2 (with a special case for 0), and
>> continue reading until EOF
>> - if the array matches the size of the file, return the array, else
>> use copyOf() to shrink it
>>
>> This way you only ever copy the array size() was wrong.
>>
>> On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
>>> Hi Roger!
>>>
>>> This is how my implementation behaves:
>>> - allocate 'size' bytes in BAOS
>>> - allocate 8k for temp buffer
>>> - in cycle read 8k or less bytes from input stream and copy them into
>>> BAOS
>>> - if capacity of BAOS isn't sufficient (file had grown), its buffer will
>>> be reallocated
>>> Thus, 1 reallocation and 1 copying of already read data on each 8k piece
>>> of additional bytes.
>>>
>>> In normal case, i.e. when fc.size() is correct, we have overhead of 1
>>> allocation and copying 'size' bytes in size/8k iterations.
>>>
>>> And this is how current implementation does
>>> - allocate 'size' bytes
>>> - allocate 'size' bytes of native memory for temp buffer in
>>> IOUtil.read()
>>> - read the whole file into temp buffer
>>> - copy the temp buffer back into our buffer
>>>
>>> In common when fc.size() is right, we have 1 allocation and copying
>>> 'size' bytes from temp buffer back.
>>>
>>> So there is a difference in allocations/copying, but in my opinion it's
>>> not that significant for this particular task.
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>> On 22.07.2013 20:03, roger riggs wrote:
>>>> Hi Ivan,
>>>>
>>>> I'm concerned about the change in behavior for the existing working
>>>> cases.
>>>>
>>>> How many times are the bytes copied in your proposed implementation?
>>>> How many arrays are allocated and discarded?
>>>> Files.copy()  uses an extra array for the copies.
>>>>
>>>> BAOS should only be used for size == 0; that would address the issue
>>>> without changing the current behavior or allocations.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>>
>>>> On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
>>>>> Roger, David thanks for suggestions!
>>>>>
>>>>> Would you please take a look at an updated webrev?
>>>>> http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
>>>>>
>>>>> - File size is used as an initial size of BAOS's buffer.
>>>>> - BAOS avoids copying its buffer in toByteArray(), if size is
>>>>> correct .
>>>>>
>>>>> I don't want to initialize BAOS with a positive number if size
>>>>> happened to be zero.
>>>>> Because zero could indicate that the file is really empty.
>>>>>
>>>>> Sincerely yours,
>>>>> Ivan
>>>>>
>>>>> On 19.07.2013 22:30, David M. Lloyd wrote:
>>>>>> My mistake, we're not talking about strings.  Still you can subclass
>>>>>> and determine whether the buffer size guess was right, and if so
>>>>>> return the array as-is (swap in an empty array or something as
>>>>>> needed).
>>>>>>
>>>>>> On 07/19/2013 01:28 PM, David M. Lloyd wrote:
>>>>>>> It's trivial to subclass ByteArrayOutputStream and add a method
>>>>>>> which
>>>>>>> converts its contents to a string using the two protected fields
>>>>>>> which
>>>>>>> give you all the info you need to do so.  So no extra copy is needed
>>>>>>> that you aren't already doing.
>>>>>>>
>>>>>>> On 07/19/2013 01:06 PM, roger riggs wrote:
>>>>>>>> Hi Ivan,
>>>>>>>>
>>>>>>>> I think this change takes too big a hit for the cases where the
>>>>>>>> size is
>>>>>>>> correct.
>>>>>>>>
>>>>>>>> No real file system can be wrong about the size of a file so this
>>>>>>>> is a
>>>>>>>> problem
>>>>>>>> only for special file systems. If the problem is with size
>>>>>>>> reporting zero
>>>>>>>> then maybe using the incremental read only for size == would be a
>>>>>>>> better
>>>>>>>> fix.
>>>>>>>>
>>>>>>>> At least you could pass the size to the constructor for BAOS and
>>>>>>>> avoid
>>>>>>>> the thrashing for every re-size; but still it will allocate and
>>>>>>>> create
>>>>>>>> an extra copy
>>>>>>>> of the every time.
>>>>>>>>
>>>>>>>> $.02, Roger
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
>>>>>>>>> Hello everybody!
>>>>>>>>>
>>>>>>>>> Would you please review a fix for the problem with
>>>>>>>>> j.n.f.Files.readAllBytes() function?
>>>>>>>>> The current implementation relies on FileChannel.size() to
>>>>>>>>> preallocate
>>>>>>>>> a buffer for the whole file's content.
>>>>>>>>> However, some special filesystems can report a wrong size.
>>>>>>>>> An example is procfs under Linux, which reports many files under
>>>>>>>>> /proc
>>>>>>>>> to be zero sized.
>>>>>>>>>
>>>>>>>>> Thus it is proposed not to rely on the size() and instead
>>>>>>>>> continuously
>>>>>>>>> read until EOF.
>>>>>>>>>
>>>>>>>>> The downside is reallocation and copying file content between
>>>>>>>>> buffers.
>>>>>>>>> But taking into account that the doc says: "It is not intended for
>>>>>>>>> reading in large files." it should not be a big problem.
>>>>>>>>>
>>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8020669
>>>>>>>>> http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
>>>>>>>>>
>>>>>>>>> The fix is for JDK8. If it is approved, it can be applied to
>>>>>>>>> JDK7 as
>>>>>>>>> well.
>>>>>>>>>
>>>>>>>>> Sincerely yours,
>>>>>>>>> Ivan Gerasimov
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
- DML