Request/discussion: BufferedReader reading using async API while providing sync API

David Holmes david.holmes at oracle.com
Thu Oct 27 21:09:03 UTC 2016


You might try discussing on net-dev rather than core-libs-dev, to get 
additional historical info related to the io and nio file APIs.

David

On 28/10/2016 5:08 AM, Brunoais wrote:
> You are right. Even in windows it does not set the flags for async
> reads. It seems like it is windows itself that does the decision to
> buffer the contents based on its own heuristics.
>
> But... Why? Why won't it be? Why is there no API for it? How am I
> getting 100% HDD use and faster times when I fake work to delay getting
> more data and I only have a fluctuating 60-90% (always going up and
> down) when I use an InputStream?
> Is it related to how both classes cache and how frequently and how much
> each one asks for data?
>
> I really would prefer not having to read the source code because it
> takes a real long time T.T.
>
> I end up reinstating... And wondering...
>
> Why doesn't java provide a single-threaded non-block API for file reads
> for all OS that support it? I simply cannot find that information no
> matter how much I search on google, bing, duck duck go... Can any of you
> point me to whomever knows?
>
> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>> I don't know about Windows specifically, but generally file systems
>> across major OS's will implement readahead in their IO scheduler when
>> they detect sequential scans.
>>
>> On Linux, you can also strace your test to confirm which syscalls are
>> emitted (you should be seeing plain read()'s there, with
>> FileInputStream and FileChannel).
>>
>> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
>> <mailto:brunoaiss at gmail.com>> wrote:
>>
>>     Thanks for the heads up.
>>
>>     I'll try that later. These tests are still useful then. Meanwhile,
>>     I'll end up also checking how FileChannel queries the OS on
>>     windows. I'm getting 100% HDD reads... Could it be that the OS
>>     reads the file ahead on its own?... Anyway, I'll look into it.
>>     Thanks for the heads up.
>>
>>
>>     On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>>
>>>
>>>     On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoaiss at gmail.com
>>>     <mailto:brunoaiss at gmail.com>> wrote:
>>>
>>>         Oh... I see. In that case, it means something is terribly
>>>         wrong. It can be my initial tests, though.
>>>
>>>         I'm testing on both linux and windows and I'm getting
>>>         performance gains from using the FileChannel compared to
>>>         using FileInputStream... The tests also make sense based on
>>>         my predictions O_O...
>>>
>>>     FileInputStream requires copying native buffers holding the read
>>>     data to the java byte[].  If you're using direct ByteBuffer for
>>>     FileChannel, that whole memcpy is skipped.  Try comparing
>>>     FileChannel with HeapByteBuffer instead.
>>>
>>>
>>>         On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>         On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
>>>>         <mailto:brunoaiss at gmail.com>> wrote:
>>>>
>>>>             Did you read the C code?
>>>>
>>>>         I looked at the Linux code in the JDK.
>>>>
>>>>             Have you got any idea how many functions Windows or
>>>>             Linux (nearly all flavors) have for the read operation
>>>>             towards a file?
>>>>
>>>>         I do.
>>>>
>>>>
>>>>             I have already done that homework myself. I may not have
>>>>             read JVM's source code but I know well that there's
>>>>             functions on both Windows and Linux that provide such
>>>>             interface I mentioned although they require a slightly
>>>>             different treatment (and different constants).
>>>>
>>>>         You should read the JDK (native) source code instead of
>>>>         guessing/assuming.  On Linux, it doesn't use aio facilities
>>>>         for files.  The kernel io scheduler may issue readahead
>>>>         behind the scenes, but there's no nonblocking file io that's
>>>>         at the heart of your premise.
>>>>
>>>>
>>>>
>>>>             On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>
>>>>                 On Wednesday, October 26, 2016, Brunoais
>>>>                 <brunoaiss at gmail.com <mailto:brunoaiss at gmail.com>>
>>>>                 wrote:
>>>>
>>>>                     It is actually based on the premise that:
>>>>
>>>>                     1. The first call to
>>>>                 ReadableByteChannel.read(ByteBuffer) sets the OS
>>>>                        buffer size to fill in as the same size as
>>>>                 ByteBuffer.
>>>>
>>>>                 Why do you say that? AFAICT, it issues a read
>>>>                 syscall and that will block if the data isn't in
>>>>                 page cache.
>>>>
>>>>                     2. The consecutive calls to
>>>>                 ReadableByteChannel.read(ByteBuffer)
>>>>                     orders
>>>>                        the JVM to order the OS to execute memcpy()
>>>>                 to copy from its memory
>>>>                        to the shared memory created at ByteBuffer
>>>>                 instantiation (in
>>>>                     java 8)
>>>>                        using Unsafe and then for the JVM to update
>>>>                 the ByteBuffer fields.
>>>>
>>>>                 I think subsequent reads just invoke the same read
>>>>                 syscall, passing the current file offset maintained
>>>>                 by the file channel instance.
>>>>
>>>>                     3. The call will not block waiting for I/O and
>>>>                 it won't take longer
>>>>                        than the JNI interface if no new data exists.
>>>>                 However, it will
>>>>                     block
>>>>                        waiting for the OS to execute memcpy() to the
>>>>                 shared memory.
>>>>
>>>>                 So why do you think it won't block?
>>>>
>>>>
>>>>                     Is my premise wrong?
>>>>
>>>>                     If I read correctly, if I don't use a
>>>>                 DirectBuffer, there would be
>>>>                     even another intermediate buffer to copy data to
>>>>                 before giving it
>>>>                     to the "user" which would be useless.
>>>>
>>>>                 If you use a HeapByteBuffer, then there's an extra
>>>>                 copy from the native buffer to the Java buffer.
>>>>
>>>>
>>>>
>>>>                     On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>
>>>>                         I believe I see where you coming from.
>>>>                 Please correct me if
>>>>                         I'm wrong.
>>>>
>>>>                         Your implementation is based on the premise
>>>>                 that a call to
>>>>                 ReadableByteChannel.read()
>>>>                         _initiates_ the operation and returns
>>>>                 immediately. The OS then
>>>>                         continues to fill
>>>>                         the buffer while there's a free space in the
>>>>                 buffer and the
>>>>                         channel hasn't encountered EOF.
>>>>
>>>>                         Is that right?
>>>>
>>>>                             On 25 Oct 2016, at 22:16, Brunoais
>>>>                 <brunoaiss at gmail.com>
>>>>                             wrote:
>>>>
>>>>                             Thank you for your time. I'll try to
>>>>                 explain it. I hope I
>>>>                             can clear it up.
>>>>                             First of it, I made a meaning mistake
>>>>                 between asynchronous
>>>>                             and non-blocking. This implementation
>>>>                 uses a non-blocking
>>>>                             algorithm internally while providing a
>>>>                 blocking-like
>>>>                             algorithm on the surface. It is
>>>>                 single-threaded and not
>>>>                             multi-threaded where one thread fetches
>>>>                 data and blocks
>>>>                             waiting and the other accumulates it and
>>>>                 provides to
>>>>                             whichever wants it.
>>>>
>>>>                             Second of it, I had made a mistake of
>>>>                 going after
>>>>                             BufferedReader instead of going after
>>>>                 BufferedInputStream.
>>>>                             If you want me to go after
>>>>                 BufferedReader it's ok but I
>>>>                             only thought that going after
>>>>                 BufferedInputStream would be
>>>>                             more generically useful than
>>>>                 BufferedReaderwhen I started
>>>>                             the poc.
>>>>
>>>>                             On to my code:
>>>>                             Short answers:
>>>>                                     • The sleep(int) exists because
>>>>                 I don't know how
>>>>                             to wait until more data exists in the
>>>>                 buffer which is part
>>>>                             of read()'s contract.
>>>>                                     • The ByteBuffer gives a buffer
>>>>                 that is filled by
>>>>                             the OS (what I believe Channels do)
>>>>                 instead of getting
>>>>                             data only         by demand (what I
>>>>                 believe Streams do).
>>>>                             Full answers:
>>>>                             The blockingFill(boolean) method is a
>>>>                 method for a busy
>>>>                             wait for a fill which is used
>>>>                 exclusively by the read()
>>>>                             method. All other methods use the
>>>>                 version that does not
>>>>                             sleep (fill(boolean)).
>>>>                 blockingFill(boolean)'s existance like that is only
>>>>                             because the read() method must not
>>>>                 return unless either:
>>>>
>>>>                                     • The stream ended.
>>>>                                     • The next byte is ready for
>>>>                 reading.
>>>>                             Additionally, statistically, that while
>>>>                 loop will rarely
>>>>                             evaluate to true as reads are in chunks
>>>>                 so readPos will be
>>>>                             behind writePos most of the time.
>>>>                             I have no idea if an interrupt will ever
>>>>                 happen, to be
>>>>                             honest. The main reasons why I'm using a
>>>>                 sleep is because
>>>>                             I didn't want a hog onto the CPU in a
>>>>                 full thread usage
>>>>                             busy wait and because I didn't find any
>>>>                 way of doing a
>>>>                             thread sleep in order to wake up later
>>>>                 when the buffer
>>>>                             managed by native code has more data.
>>>>                             The Non-blocking part is managed by the
>>>>                 buffer the OS
>>>>                             keeps filling most if not all the time.
>>>>                 That buffer is the
>>>>                             field
>>>>
>>>>                             ByteBuffer readBuffer
>>>>                             That's the gaining part against the
>>>>                 plain old Buffered
>>>>                             classes.
>>>>
>>>>
>>>>                             Did that make sense to you? Feel free to
>>>>                 ask anything else
>>>>                             you need.
>>>>
>>>>                             On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>
>>>>                                 I've skimmed through the code and
>>>>                 I'm not sure I can
>>>>                                 see any asynchronicity
>>>>                                 (you were pointing at the lack of it
>>>>                 in BufferedReader).
>>>>                                 And the mechanics of this is very
>>>>                 puzzling to me, to
>>>>                                 be honest:
>>>>                                      void blockingFill(boolean
>>>>                 forced) throws
>>>>                                 IOException {
>>>>                  fill(forced);
>>>>                                          while (readPos == writePos) {
>>>>                                              try {
>>>>                  Thread.sleep(100);
>>>>                                              } catch
>>>>                 (InterruptedException e) {
>>>>                  // An interrupt may mean more data is
>>>>                                 available
>>>>                                              }
>>>>                  fill(forced);
>>>>                                          }
>>>>                                      }
>>>>                                 I thought you were suggesting that
>>>>                 we should utilize
>>>>                                 the tools which OS provides
>>>>                                 more efficiently. Instead we have
>>>>                 something that looks
>>>>                                 very similarly to a
>>>>                                 "busy loop" and... also who and when
>>>>                 is supposed to
>>>>                                 interrupt Thread.sleep()?
>>>>                                 Sorry, I'm not following. Could you
>>>>                 please explain how
>>>>                                 this is supposed to work?
>>>>
>>>>                                     On 24 Oct 2016, at 15:59, Brunoais
>>>>                                     <brunoaiss at gmail.com>
>>>>                                       wrote:
>>>>                                     Attached and sending!
>>>>                                     On 24/10/2016 13:48, Pavel Rappo
>>>>                 wrote:
>>>>
>>>>                                         Could you please send a new
>>>>                 email on this list
>>>>                                         with the source attached as a
>>>>                                         text file?
>>>>
>>>>                                             On 23 Oct 2016, at
>>>>                 19:14, Brunoais
>>>>                                             <brunoaiss at gmail.com>
>>>>                 wrote:
>>>>                 Here's my poc/prototype:
>>>>
>>>>                 http://pastebin.com/WRpYWDJF
>>>>
>>>>                                             I've implemented the
>>>>                 bare minimum of the
>>>>                 class that follows the same contract of
>>>>                 BufferedReader while signaling all issues
>>>>                                             I think it may have or
>>>>                 has in comments.
>>>>                                             I also wrote some
>>>>                 javadoc to help guiding
>>>>                 through the class.
>>>>                                             I could have used more
>>>>                 fields from
>>>>                 BufferedReader but the names were so
>>>>                 minimalistic that were confusing me. I
>>>>                 intent to change them before sending this
>>>>                                             to openJDK.
>>>>                                             One of the major
>>>>                 problems this has is long
>>>>                 overflowing. It is major because it is
>>>>                 hidden, it will be extremely rare and it
>>>>                 takes a really long time to reproduce.
>>>>                 There are different ways of dealing with
>>>>                                             it. From just
>>>>                 documenting to actually
>>>>                 making code that works with it.
>>>>                                             I built a simple test
>>>>                 code for it to have
>>>>                                             some ideas about
>>>>                 performance and correctness.
>>>>
>>>>                 http://pastebin.com/eh6LFgwT
>>>>
>>>>                                             This doesn't do a
>>>>                 through test if it is
>>>>                 actually working correctly but I see no
>>>>                 reason for it not working correctly after
>>>>                 fixing the 2 bugs that test found.
>>>>                                             I'll also leave here
>>>>                 some conclusions
>>>>                 about speed and resource consumption I found.
>>>>                                             I made tests with
>>>>                 default buffer sizes,
>>>>                 5000B 15_000B and 500_000B. I noticed
>>>>                 that, with my hardware, with the 1 530 000
>>>>                                             000B file, I was getting
>>>>                 around:
>>>>                                             In all buffers and fake
>>>>                 work: 10~15s speed
>>>>                 improvement ( from 90% HDD speed to 100%
>>>>                                             HDD speed)
>>>>                                             In all buffers and no
>>>>                 fake work: 1~2s
>>>>                 speed improvement ( from 90% HDD speed to
>>>>                                             100% HDD speed)
>>>>                 Changing the buffer size was giving
>>>>                 different reading speeds but both were
>>>>                 quite equal in how much they would change
>>>>                                             when changing the buffer
>>>>                 size.
>>>>                 Finally, I could always confirm that I/O
>>>>                                             was always the slowest
>>>>                 thing while this
>>>>                                             code was running.
>>>>                                             For the ones wondering
>>>>                 about the file
>>>>                 size; it is both to avoid OS cache and to
>>>>                                             make the reading at the
>>>>                 main use-case
>>>>                 these objects are for (large streams of
>>>>                 bytes).
>>>>                 @Pavel, are you open for discussion now
>>>>                                             ;)? Need anything else?
>>>>                                             On 21/10/2016 19:21,
>>>>                 Pavel Rappo wrote:
>>>>
>>>>                 Just to append to my previous email.
>>>>                 BufferedReader wraps any Reader out there.
>>>>                 Not specifically FileReader. While
>>>>                 you're talking about the case of effective
>>>>                 reading from a file.
>>>>                 I guess there's one existing
>>>>                 possibility to provide exactly what
>>>>                 you need (as I
>>>>                 understand it) under this method:
>>>>                 /**
>>>>                   * Opens a file for reading,
>>>>                 returning a {@code BufferedReader} to
>>>>                 read text
>>>>                   * from the file in an efficient
>>>>                 manner...
>>>>                     ...
>>>>                   */
>>>>
>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>>                 It can return _anything_ as long as it
>>>>                 is a BufferedReader. We can do it, but it
>>>>                 needs to be investigated not only for
>>>>                 your favorite OS but for other OSes as
>>>>                 well. Feel free to prototype this and
>>>>                 we can discuss it on the list later.
>>>>                 Thanks,
>>>>                 -Pavel
>>>>
>>>>                     On 21 Oct 2016, at 18:56, Brunoais
>>>>                     <brunoaiss at gmail.com>
>>>>                       wrote:
>>>>                     Pavel is right.
>>>>                     In reality, I was expecting such
>>>>                     BufferedReader to use only a
>>>>                     single buffer and have that Buffer
>>>>                     being filled asynchronously, not
>>>>                     in a different Thread.
>>>>                     Additionally, I don't have the
>>>>                     intention of having a larger
>>>>                     buffer than before unless stated
>>>>                     through the API (the constructor).
>>>>                     In my idea, internally, it is
>>>>                     supposed to use
>>>>                 java.nio.channels.AsynchronousFileChannel
>>>>                     or equivalent.
>>>>                     It does not prevent having two
>>>>                     buffers and I do not intent to
>>>>                     change BufferedReader itself. I'd
>>>>                     do an BufferedAsyncReader of sorts
>>>>                     (any name suggestion is welcome as
>>>>                     I'm an awful namer).
>>>>                     On 21/10/2016 18:38, Roger Riggs
>>>>                     wrote:
>>>>
>>>>                         Hi Pavel,
>>>>                         I think Brunoais asking for a
>>>>                         double buffering scheme in
>>>>                         which the implementation of
>>>>                         BufferReader fills (a second
>>>>                         buffer) in parallel with the
>>>>                         application reading from the
>>>>                         1st buffer
>>>>                         and managing the swaps and
>>>>                         async reads transparently.
>>>>                         It would not change the API
>>>>                         but would change the
>>>>                         interactions between the
>>>>                         buffered reader
>>>>                         and the underlying stream.  It
>>>>                         would also increase memory
>>>>                         requirements and processing
>>>>                         by introducing or using a
>>>>                         separate thread and the
>>>>                         necessary synchronization.
>>>>                         Though I think the formal
>>>>                         interface semantics could be
>>>>                         maintained, I have doubts
>>>>                         about compatibility and its
>>>>                         unintended consequences on
>>>>                         existing subclasses,
>>>>                         applications and libraries.
>>>>                         $.02, Roger
>>>>                         On 10/21/16 1:22 PM, Pavel
>>>>                         Rappo wrote:
>>>>
>>>>                             Off the top of my head, I
>>>>                             would say it's not
>>>>                             possible to change the
>>>>                             design of an
>>>>                             _extensible_ type that has
>>>>                             been out there for 20 or
>>>>                             so years. All these I/O
>>>>                             streams from java.io <http://java.io>
>>>>                             <http://java.io> were
>>>>                             designed for simple
>>>>                             synchronous use case.
>>>>                             It's not that their design
>>>>                             is flawed in some way,
>>>>                             it's that they doesn't seem to
>>>>                             suit your needs. Have you
>>>>                             considered using
>>>>                 java.nio.channels.AsynchronousFileChannel
>>>>                             in your applications?
>>>>                             -Pavel
>>>>
>>>>                                 On 21 Oct 2016, at
>>>>                                 17:08, Brunoais
>>>>                                 <brunoaiss at gmail.com>
>>>>                                   wrote:
>>>>                                 Any feedback on this?
>>>>                                 I'm really interested
>>>>                                 in implementing such
>>>>                 BufferedReader/BufferedStreamReader
>>>>                                 to allow speeding up
>>>>                                 my applications
>>>>                                 without having to
>>>>                                 think in an
>>>>                                 asynchronous way or
>>>>                                 multi-threading while
>>>>                                 programming with it.
>>>>                                 That's why I'm asking
>>>>                                 this here.
>>>>                                 On 13/10/2016 14:45,
>>>>                                 Brunoais wrote:
>>>>
>>>>                                     Hi,
>>>>                                     I looked at
>>>>                 BufferedReader
>>>>                                     source code for
>>>>                                     java 9 long with
>>>>                                     the source code of
>>>>                                     the
>>>>                 channels/streams
>>>>                                     used. I noticed
>>>>                                     that, like in java
>>>>                                     7, BufferedReader
>>>>                                     does not use an
>>>>                                     Async API to load
>>>>                                     data from files,
>>>>                                     instead, the data
>>>>                                     loading is all
>>>>                                     done synchronously
>>>>                                     even when the OS
>>>>                                     allows requesting
>>>>                                     a file to be read
>>>>                                     and getting a
>>>>                                     warning later when
>>>>                                     the file is
>>>>                                     effectively read.
>>>>                                     Why Is
>>>>                 BufferedReader not
>>>>                                     async while
>>>>                                     providing a sync API?
>>>>
>>>>                 <BufferedNonBlockStream.java><Tests.java>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 --                 Sent from my phone
>>>>
>>>>
>>>>
>>>>
>>>>         --         Sent from my phone
>>>
>>>
>>
>>
>


More information about the core-libs-dev mailing list