Request/discussion: BufferedReader reading using async API while providing sync API

Fri Oct 28 08:16:46 UTC 2016

I'll try going back to a previous version I worked on which used the 
java7's AsynchronousFileChannel and work from there. My small research 
shows it can also work with AsynchronousFileChannel mostly without changes.

For now, 1 question:
Is Thread.sleep() a possible way of dealing the block requirements of 
read()? Do I need to use LockSupport.park() or something like that?

I'll call back here when it is done.

On 27/10/2016 22:09, David Holmes wrote:
> You might try discussing on net-dev rather than core-libs-dev, to get 
> additional historical info related to the io and nio file APIs.
>
> David
>
> On 28/10/2016 5:08 AM, Brunoais wrote:
>> You are right. Even in windows it does not set the flags for async
>> reads. It seems like it is windows itself that does the decision to
>> buffer the contents based on its own heuristics.
>>
>> But... Why? Why won't it be? Why is there no API for it? How am I
>> getting 100% HDD use and faster times when I fake work to delay getting
>> more data and I only have a fluctuating 60-90% (always going up and
>> down) when I use an InputStream?
>> Is it related to how both classes cache and how frequently and how much
>> each one asks for data?
>>
>> I really would prefer not having to read the source code because it
>> takes a real long time T.T.
>>
>> I end up reinstating... And wondering...
>>
>> Why doesn't java provide a single-threaded non-block API for file reads
>> for all OS that support it? I simply cannot find that information no
>> matter how much I search on google, bing, duck duck go... Can any of you
>> point me to whomever knows?
>>
>> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>>> I don't know about Windows specifically, but generally file systems
>>> across major OS's will implement readahead in their IO scheduler when
>>> they detect sequential scans.
>>>
>>> On Linux, you can also strace your test to confirm which syscalls are
>>> emitted (you should be seeing plain read()'s there, with
>>> FileInputStream and FileChannel).
>>>
>>> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
>>> <mailto:brunoaiss at gmail.com>> wrote:
>>>
>>>     Thanks for the heads up.
>>>
>>>     I'll try that later. These tests are still useful then. Meanwhile,
>>>     I'll end up also checking how FileChannel queries the OS on
>>>     windows. I'm getting 100% HDD reads... Could it be that the OS
>>>     reads the file ahead on its own?... Anyway, I'll look into it.
>>>     Thanks for the heads up.
>>>
>>>
>>>     On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>     On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoaiss at gmail.com
>>>>     <mailto:brunoaiss at gmail.com>> wrote:
>>>>
>>>>         Oh... I see. In that case, it means something is terribly
>>>>         wrong. It can be my initial tests, though.
>>>>
>>>>         I'm testing on both linux and windows and I'm getting
>>>>         performance gains from using the FileChannel compared to
>>>>         using FileInputStream... The tests also make sense based on
>>>>         my predictions O_O...
>>>>
>>>>     FileInputStream requires copying native buffers holding the read
>>>>     data to the java byte[].  If you're using direct ByteBuffer for
>>>>     FileChannel, that whole memcpy is skipped.  Try comparing
>>>>     FileChannel with HeapByteBuffer instead.
>>>>
>>>>
>>>>         On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>>>
>>>>>
>>>>>         On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
>>>>>         <mailto:brunoaiss at gmail.com>> wrote:
>>>>>
>>>>>             Did you read the C code?
>>>>>
>>>>>         I looked at the Linux code in the JDK.
>>>>>
>>>>>             Have you got any idea how many functions Windows or
>>>>>             Linux (nearly all flavors) have for the read operation
>>>>>             towards a file?
>>>>>
>>>>>         I do.
>>>>>
>>>>>
>>>>>             I have already done that homework myself. I may not have
>>>>>             read JVM's source code but I know well that there's
>>>>>             functions on both Windows and Linux that provide such
>>>>>             interface I mentioned although they require a slightly
>>>>>             different treatment (and different constants).
>>>>>
>>>>>         You should read the JDK (native) source code instead of
>>>>>         guessing/assuming.  On Linux, it doesn't use aio facilities
>>>>>         for files.  The kernel io scheduler may issue readahead
>>>>>         behind the scenes, but there's no nonblocking file io that's
>>>>>         at the heart of your premise.
>>>>>
>>>>>
>>>>>
>>>>>             On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>>
>>>>>
>>>>>
>>>>>                 On Wednesday, October 26, 2016, Brunoais
>>>>>                 <brunoaiss at gmail.com <mailto:brunoaiss at gmail.com>>
>>>>>                 wrote:
>>>>>
>>>>>                     It is actually based on the premise that:
>>>>>
>>>>>                     1. The first call to
>>>>>                 ReadableByteChannel.read(ByteBuffer) sets the OS
>>>>>                        buffer size to fill in as the same size as
>>>>>                 ByteBuffer.
>>>>>
>>>>>                 Why do you say that? AFAICT, it issues a read
>>>>>                 syscall and that will block if the data isn't in
>>>>>                 page cache.
>>>>>
>>>>>                     2. The consecutive calls to
>>>>>                 ReadableByteChannel.read(ByteBuffer)
>>>>>                     orders
>>>>>                        the JVM to order the OS to execute memcpy()
>>>>>                 to copy from its memory
>>>>>                        to the shared memory created at ByteBuffer
>>>>>                 instantiation (in
>>>>>                     java 8)
>>>>>                        using Unsafe and then for the JVM to update
>>>>>                 the ByteBuffer fields.
>>>>>
>>>>>                 I think subsequent reads just invoke the same read
>>>>>                 syscall, passing the current file offset maintained
>>>>>                 by the file channel instance.
>>>>>
>>>>>                     3. The call will not block waiting for I/O and
>>>>>                 it won't take longer
>>>>>                        than the JNI interface if no new data exists.
>>>>>                 However, it will
>>>>>                     block
>>>>>                        waiting for the OS to execute memcpy() to the
>>>>>                 shared memory.
>>>>>
>>>>>                 So why do you think it won't block?
>>>>>
>>>>>
>>>>>                     Is my premise wrong?
>>>>>
>>>>>                     If I read correctly, if I don't use a
>>>>>                 DirectBuffer, there would be
>>>>>                     even another intermediate buffer to copy data to
>>>>>                 before giving it
>>>>>                     to the "user" which would be useless.
>>>>>
>>>>>                 If you use a HeapByteBuffer, then there's an extra
>>>>>                 copy from the native buffer to the Java buffer.
>>>>>
>>>>>
>>>>>
>>>>>                     On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>>
>>>>>                         I believe I see where you coming from.
>>>>>                 Please correct me if
>>>>>                         I'm wrong.
>>>>>
>>>>>                         Your implementation is based on the premise
>>>>>                 that a call to
>>>>>                 ReadableByteChannel.read()
>>>>>                         _initiates_ the operation and returns
>>>>>                 immediately. The OS then
>>>>>                         continues to fill
>>>>>                         the buffer while there's a free space in the
>>>>>                 buffer and the
>>>>>                         channel hasn't encountered EOF.
>>>>>
>>>>>                         Is that right?
>>>>>
>>>>>                             On 25 Oct 2016, at 22:16, Brunoais
>>>>>                 <brunoaiss at gmail.com>
>>>>>                             wrote:
>>>>>
>>>>>                             Thank you for your time. I'll try to
>>>>>                 explain it. I hope I
>>>>>                             can clear it up.
>>>>>                             First of it, I made a meaning mistake
>>>>>                 between asynchronous
>>>>>                             and non-blocking. This implementation
>>>>>                 uses a non-blocking
>>>>>                             algorithm internally while providing a
>>>>>                 blocking-like
>>>>>                             algorithm on the surface. It is
>>>>>                 single-threaded and not
>>>>>                             multi-threaded where one thread fetches
>>>>>                 data and blocks
>>>>>                             waiting and the other accumulates it and
>>>>>                 provides to
>>>>>                             whichever wants it.
>>>>>
>>>>>                             Second of it, I had made a mistake of
>>>>>                 going after
>>>>>                             BufferedReader instead of going after
>>>>>                 BufferedInputStream.
>>>>>                             If you want me to go after
>>>>>                 BufferedReader it's ok but I
>>>>>                             only thought that going after
>>>>>                 BufferedInputStream would be
>>>>>                             more generically useful than
>>>>>                 BufferedReaderwhen I started
>>>>>                             the poc.
>>>>>
>>>>>                             On to my code:
>>>>>                             Short answers:
>>>>>                                     • The sleep(int) exists because
>>>>>                 I don't know how
>>>>>                             to wait until more data exists in the
>>>>>                 buffer which is part
>>>>>                             of read()'s contract.
>>>>>                                     • The ByteBuffer gives a buffer
>>>>>                 that is filled by
>>>>>                             the OS (what I believe Channels do)
>>>>>                 instead of getting
>>>>>                             data only         by demand (what I
>>>>>                 believe Streams do).
>>>>>                             Full answers:
>>>>>                             The blockingFill(boolean) method is a
>>>>>                 method for a busy
>>>>>                             wait for a fill which is used
>>>>>                 exclusively by the read()
>>>>>                             method. All other methods use the
>>>>>                 version that does not
>>>>>                             sleep (fill(boolean)).
>>>>>                 blockingFill(boolean)'s existance like that is only
>>>>>                             because the read() method must not
>>>>>                 return unless either:
>>>>>
>>>>>                                     • The stream ended.
>>>>>                                     • The next byte is ready for
>>>>>                 reading.
>>>>>                             Additionally, statistically, that while
>>>>>                 loop will rarely
>>>>>                             evaluate to true as reads are in chunks
>>>>>                 so readPos will be
>>>>>                             behind writePos most of the time.
>>>>>                             I have no idea if an interrupt will ever
>>>>>                 happen, to be
>>>>>                             honest. The main reasons why I'm using a
>>>>>                 sleep is because
>>>>>                             I didn't want a hog onto the CPU in a
>>>>>                 full thread usage
>>>>>                             busy wait and because I didn't find any
>>>>>                 way of doing a
>>>>>                             thread sleep in order to wake up later
>>>>>                 when the buffer
>>>>>                             managed by native code has more data.
>>>>>                             The Non-blocking part is managed by the
>>>>>                 buffer the OS
>>>>>                             keeps filling most if not all the time.
>>>>>                 That buffer is the
>>>>>                             field
>>>>>
>>>>>                             ByteBuffer readBuffer
>>>>>                             That's the gaining part against the
>>>>>                 plain old Buffered
>>>>>                             classes.
>>>>>
>>>>>
>>>>>                             Did that make sense to you? Feel free to
>>>>>                 ask anything else
>>>>>                             you need.
>>>>>
>>>>>                             On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>>
>>>>>                                 I've skimmed through the code and
>>>>>                 I'm not sure I can
>>>>>                                 see any asynchronicity
>>>>>                                 (you were pointing at the lack of it
>>>>>                 in BufferedReader).
>>>>>                                 And the mechanics of this is very
>>>>>                 puzzling to me, to
>>>>>                                 be honest:
>>>>>                                      void blockingFill(boolean
>>>>>                 forced) throws
>>>>>                                 IOException {
>>>>>                  fill(forced);
>>>>>                                          while (readPos == 
>>>>> writePos) {
>>>>>                                              try {
>>>>>                  Thread.sleep(100);
>>>>>                                              } catch
>>>>>                 (InterruptedException e) {
>>>>>                  // An interrupt may mean more data is
>>>>>                                 available
>>>>>                                              }
>>>>>                  fill(forced);
>>>>>                                          }
>>>>>                                      }
>>>>>                                 I thought you were suggesting that
>>>>>                 we should utilize
>>>>>                                 the tools which OS provides
>>>>>                                 more efficiently. Instead we have
>>>>>                 something that looks
>>>>>                                 very similarly to a
>>>>>                                 "busy loop" and... also who and when
>>>>>                 is supposed to
>>>>>                                 interrupt Thread.sleep()?
>>>>>                                 Sorry, I'm not following. Could you
>>>>>                 please explain how
>>>>>                                 this is supposed to work?
>>>>>
>>>>>                                     On 24 Oct 2016, at 15:59, 
>>>>> Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>>                                       wrote:
>>>>>                                     Attached and sending!
>>>>>                                     On 24/10/2016 13:48, Pavel Rappo
>>>>>                 wrote:
>>>>>
>>>>>                                         Could you please send a new
>>>>>                 email on this list
>>>>>                                         with the source attached as a
>>>>>                                         text file?
>>>>>
>>>>>                                             On 23 Oct 2016, at
>>>>>                 19:14, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>>                 wrote:
>>>>>                 Here's my poc/prototype:
>>>>>
>>>>>                 http://pastebin.com/WRpYWDJF
>>>>>
>>>>>                                             I've implemented the
>>>>>                 bare minimum of the
>>>>>                 class that follows the same contract of
>>>>>                 BufferedReader while signaling all issues
>>>>>                                             I think it may have or
>>>>>                 has in comments.
>>>>>                                             I also wrote some
>>>>>                 javadoc to help guiding
>>>>>                 through the class.
>>>>>                                             I could have used more
>>>>>                 fields from
>>>>>                 BufferedReader but the names were so
>>>>>                 minimalistic that were confusing me. I
>>>>>                 intent to change them before sending this
>>>>>                                             to openJDK.
>>>>>                                             One of the major
>>>>>                 problems this has is long
>>>>>                 overflowing. It is major because it is
>>>>>                 hidden, it will be extremely rare and it
>>>>>                 takes a really long time to reproduce.
>>>>>                 There are different ways of dealing with
>>>>>                                             it. From just
>>>>>                 documenting to actually
>>>>>                 making code that works with it.
>>>>>                                             I built a simple test
>>>>>                 code for it to have
>>>>>                                             some ideas about
>>>>>                 performance and correctness.
>>>>>
>>>>>                 http://pastebin.com/eh6LFgwT
>>>>>
>>>>>                                             This doesn't do a
>>>>>                 through test if it is
>>>>>                 actually working correctly but I see no
>>>>>                 reason for it not working correctly after
>>>>>                 fixing the 2 bugs that test found.
>>>>>                                             I'll also leave here
>>>>>                 some conclusions
>>>>>                 about speed and resource consumption I found.
>>>>>                                             I made tests with
>>>>>                 default buffer sizes,
>>>>>                 5000B 15_000B and 500_000B. I noticed
>>>>>                 that, with my hardware, with the 1 530 000
>>>>>                                             000B file, I was getting
>>>>>                 around:
>>>>>                                             In all buffers and fake
>>>>>                 work: 10~15s speed
>>>>>                 improvement ( from 90% HDD speed to 100%
>>>>>                                             HDD speed)
>>>>>                                             In all buffers and no
>>>>>                 fake work: 1~2s
>>>>>                 speed improvement ( from 90% HDD speed to
>>>>>                                             100% HDD speed)
>>>>>                 Changing the buffer size was giving
>>>>>                 different reading speeds but both were
>>>>>                 quite equal in how much they would change
>>>>>                                             when changing the buffer
>>>>>                 size.
>>>>>                 Finally, I could always confirm that I/O
>>>>>                                             was always the slowest
>>>>>                 thing while this
>>>>>                                             code was running.
>>>>>                                             For the ones wondering
>>>>>                 about the file
>>>>>                 size; it is both to avoid OS cache and to
>>>>>                                             make the reading at the
>>>>>                 main use-case
>>>>>                 these objects are for (large streams of
>>>>>                 bytes).
>>>>>                 @Pavel, are you open for discussion now
>>>>>                                             ;)? Need anything else?
>>>>>                                             On 21/10/2016 19:21,
>>>>>                 Pavel Rappo wrote:
>>>>>
>>>>>                 Just to append to my previous email.
>>>>>                 BufferedReader wraps any Reader out there.
>>>>>                 Not specifically FileReader. While
>>>>>                 you're talking about the case of effective
>>>>>                 reading from a file.
>>>>>                 I guess there's one existing
>>>>>                 possibility to provide exactly what
>>>>>                 you need (as I
>>>>>                 understand it) under this method:
>>>>>                 /**
>>>>>                   * Opens a file for reading,
>>>>>                 returning a {@code BufferedReader} to
>>>>>                 read text
>>>>>                   * from the file in an efficient
>>>>>                 manner...
>>>>>                     ...
>>>>>                   */
>>>>>
>>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>>>                 It can return _anything_ as long as it
>>>>>                 is a BufferedReader. We can do it, but it
>>>>>                 needs to be investigated not only for
>>>>>                 your favorite OS but for other OSes as
>>>>>                 well. Feel free to prototype this and
>>>>>                 we can discuss it on the list later.
>>>>>                 Thanks,
>>>>>                 -Pavel
>>>>>
>>>>>                     On 21 Oct 2016, at 18:56, Brunoais
>>>>>                     <brunoaiss at gmail.com>
>>>>>                       wrote:
>>>>>                     Pavel is right.
>>>>>                     In reality, I was expecting such
>>>>>                     BufferedReader to use only a
>>>>>                     single buffer and have that Buffer
>>>>>                     being filled asynchronously, not
>>>>>                     in a different Thread.
>>>>>                     Additionally, I don't have the
>>>>>                     intention of having a larger
>>>>>                     buffer than before unless stated
>>>>>                     through the API (the constructor).
>>>>>                     In my idea, internally, it is
>>>>>                     supposed to use
>>>>>                 java.nio.channels.AsynchronousFileChannel
>>>>>                     or equivalent.
>>>>>                     It does not prevent having two
>>>>>                     buffers and I do not intent to
>>>>>                     change BufferedReader itself. I'd
>>>>>                     do an BufferedAsyncReader of sorts
>>>>>                     (any name suggestion is welcome as
>>>>>                     I'm an awful namer).
>>>>>                     On 21/10/2016 18:38, Roger Riggs
>>>>>                     wrote:
>>>>>
>>>>>                         Hi Pavel,
>>>>>                         I think Brunoais asking for a
>>>>>                         double buffering scheme in
>>>>>                         which the implementation of
>>>>>                         BufferReader fills (a second
>>>>>                         buffer) in parallel with the
>>>>>                         application reading from the
>>>>>                         1st buffer
>>>>>                         and managing the swaps and
>>>>>                         async reads transparently.
>>>>>                         It would not change the API
>>>>>                         but would change the
>>>>>                         interactions between the
>>>>>                         buffered reader
>>>>>                         and the underlying stream.  It
>>>>>                         would also increase memory
>>>>>                         requirements and processing
>>>>>                         by introducing or using a
>>>>>                         separate thread and the
>>>>>                         necessary synchronization.
>>>>>                         Though I think the formal
>>>>>                         interface semantics could be
>>>>>                         maintained, I have doubts
>>>>>                         about compatibility and its
>>>>>                         unintended consequences on
>>>>>                         existing subclasses,
>>>>>                         applications and libraries.
>>>>>                         $.02, Roger
>>>>>                         On 10/21/16 1:22 PM, Pavel
>>>>>                         Rappo wrote:
>>>>>
>>>>>                             Off the top of my head, I
>>>>>                             would say it's not
>>>>>                             possible to change the
>>>>>                             design of an
>>>>>                             _extensible_ type that has
>>>>>                             been out there for 20 or
>>>>>                             so years. All these I/O
>>>>>                             streams from java.io <http://java.io>
>>>>>                             <http://java.io> were
>>>>>                             designed for simple
>>>>>                             synchronous use case.
>>>>>                             It's not that their design
>>>>>                             is flawed in some way,
>>>>>                             it's that they doesn't seem to
>>>>>                             suit your needs. Have you
>>>>>                             considered using
>>>>>                 java.nio.channels.AsynchronousFileChannel
>>>>>                             in your applications?
>>>>>                             -Pavel
>>>>>
>>>>>                                 On 21 Oct 2016, at
>>>>>                                 17:08, Brunoais
>>>>> <brunoaiss at gmail.com>
>>>>>                                   wrote:
>>>>>                                 Any feedback on this?
>>>>>                                 I'm really interested
>>>>>                                 in implementing such
>>>>>                 BufferedReader/BufferedStreamReader
>>>>>                                 to allow speeding up
>>>>>                                 my applications
>>>>>                                 without having to
>>>>>                                 think in an
>>>>>                                 asynchronous way or
>>>>>                                 multi-threading while
>>>>>                                 programming with it.
>>>>>                                 That's why I'm asking
>>>>>                                 this here.
>>>>>                                 On 13/10/2016 14:45,
>>>>>                                 Brunoais wrote:
>>>>>
>>>>>                                     Hi,
>>>>>                                     I looked at
>>>>>                 BufferedReader
>>>>>                                     source code for
>>>>>                                     java 9 long with
>>>>>                                     the source code of
>>>>>                                     the
>>>>>                 channels/streams
>>>>>                                     used. I noticed
>>>>>                                     that, like in java
>>>>>                                     7, BufferedReader
>>>>>                                     does not use an
>>>>>                                     Async API to load
>>>>>                                     data from files,
>>>>>                                     instead, the data
>>>>>                                     loading is all
>>>>>                                     done synchronously
>>>>>                                     even when the OS
>>>>>                                     allows requesting
>>>>>                                     a file to be read
>>>>>                                     and getting a
>>>>>                                     warning later when
>>>>>                                     the file is
>>>>>                                     effectively read.
>>>>>                                     Why Is
>>>>>                 BufferedReader not
>>>>>                                     async while
>>>>>                                     providing a sync API?
>>>>>
>>>>> <BufferedNonBlockStream.java><Tests.java>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 --                 Sent from my phone
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         --         Sent from my phone
>>>>
>>>>
>>>
>>>
>>
>