Request/discussion: BufferedReader reading using async API while providing sync API

Brunoais brunoaiss at gmail.com
Thu Oct 27 12:34:05 UTC 2016


Oh... I see. In that case, it means something is terribly wrong. It can 
be my initial tests, though.

I'm testing on both linux and windows and I'm getting performance gains 
from using the FileChannel compared to using FileInputStream... The 
tests also make sense based on my predictions O_O...


On 27/10/2016 11:47, Vitaly Davidovich wrote:
>
>
> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com 
> <mailto:brunoaiss at gmail.com>> wrote:
>
>     Did you read the C code?
>
> I looked at the Linux code in the JDK.
>
>     Have you got any idea how many functions Windows or Linux (nearly
>     all flavors) have for the read operation towards a file?
>
> I do.
>
>
>     I have already done that homework myself. I may not have read
>     JVM's source code but I know well that there's functions on both
>     Windows and Linux that provide such interface I mentioned although
>     they require a slightly different treatment (and different constants).
>
> You should read the JDK (native) source code instead of 
> guessing/assuming.  On Linux, it doesn't use aio facilities for 
> files.  The kernel io scheduler may issue readahead behind the scenes, 
> but there's no nonblocking file io that's at the heart of your premise.
>
>
>
>     On 27/10/2016 00:06, Vitaly Davidovich wrote:
>
>
>
>         On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com
>         <mailto:brunoaiss at gmail.com>> wrote:
>
>             It is actually based on the premise that:
>
>             1. The first call to ReadableByteChannel.read(ByteBuffer)
>         sets the OS
>                buffer size to fill in as the same size as ByteBuffer.
>
>         Why do you say that? AFAICT, it issues a read syscall and that
>         will block if the data isn't in page cache.
>
>             2. The consecutive calls to
>         ReadableByteChannel.read(ByteBuffer)
>             orders
>                the JVM to order the OS to execute memcpy() to copy
>         from its memory
>                to the shared memory created at ByteBuffer
>         instantiation (in
>             java 8)
>                using Unsafe and then for the JVM to update the
>         ByteBuffer fields.
>
>         I think subsequent reads just invoke the same read syscall,
>         passing the current file offset maintained by the file channel
>         instance.
>
>             3. The call will not block waiting for I/O and it won't
>         take longer
>                than the JNI interface if no new data exists. However,
>         it will
>             block
>                waiting for the OS to execute memcpy() to the shared
>         memory.
>
>         So why do you think it won't block?
>
>
>             Is my premise wrong?
>
>             If I read correctly, if I don't use a DirectBuffer, there
>         would be
>             even another intermediate buffer to copy data to before
>         giving it
>             to the "user" which would be useless.
>
>         If you use a HeapByteBuffer, then there's an extra copy from
>         the native buffer to the Java buffer.
>
>
>
>             On 26/10/2016 11:57, Pavel Rappo wrote:
>
>                 I believe I see where you coming from. Please correct
>         me if
>                 I'm wrong.
>
>                 Your implementation is based on the premise that a call to
>                 ReadableByteChannel.read()
>                 _initiates_ the operation and returns immediately. The
>         OS then
>                 continues to fill
>                 the buffer while there's a free space in the buffer
>         and the
>                 channel hasn't encountered EOF.
>
>                 Is that right?
>
>                     On 25 Oct 2016, at 22:16, Brunoais
>         <brunoaiss at gmail.com>
>                     wrote:
>
>                     Thank you for your time. I'll try to explain it. I
>         hope I
>                     can clear it up.
>                     First of it, I made a meaning mistake between
>         asynchronous
>                     and non-blocking. This implementation uses a
>         non-blocking
>                     algorithm internally while providing a blocking-like
>                     algorithm on the surface. It is single-threaded
>         and not
>                     multi-threaded where one thread fetches data and
>         blocks
>                     waiting and the other accumulates it and provides to
>                     whichever wants it.
>
>                     Second of it, I had made a mistake of going after
>                     BufferedReader instead of going after
>         BufferedInputStream.
>                     If you want me to go after BufferedReader it's ok
>         but I
>                     only thought that going after BufferedInputStream
>         would be
>                     more generically useful than BufferedReaderwhen I
>         started
>                     the poc.
>
>                     On to my code:
>                     Short answers:
>                             • The sleep(int) exists because I don't
>         know how
>                     to wait until more data exists in the buffer which
>         is part
>                     of read()'s contract.
>                             • The ByteBuffer gives a buffer that is
>         filled by
>                     the OS (what I believe Channels do) instead of getting
>                     data only         by demand (what I believe
>         Streams do).
>                     Full answers:
>                     The blockingFill(boolean) method is a method for a
>         busy
>                     wait for a fill which is used exclusively by the
>         read()
>                     method. All other methods use the version that
>         does not
>                     sleep (fill(boolean)).
>                     blockingFill(boolean)'s existance like that is only
>                     because the read() method must not return unless
>         either:
>
>                             • The stream ended.
>                             • The next byte is ready for reading.
>                     Additionally, statistically, that while loop will
>         rarely
>                     evaluate to true as reads are in chunks so readPos
>         will be
>                     behind writePos most of the time.
>                     I have no idea if an interrupt will ever happen, to be
>                     honest. The main reasons why I'm using a sleep is
>         because
>                     I didn't want a hog onto the CPU in a full thread
>         usage
>                     busy wait and because I didn't find any way of doing a
>                     thread sleep in order to wake up later when the buffer
>                     managed by native code has more data.
>                     The Non-blocking part is managed by the buffer the OS
>                     keeps filling most if not all the time. That
>         buffer is the
>                     field
>
>                     ByteBuffer readBuffer
>                     That's the gaining part against the plain old Buffered
>                     classes.
>
>
>                     Did that make sense to you? Feel free to ask
>         anything else
>                     you need.
>
>                     On 25/10/2016 20:52, Pavel Rappo wrote:
>
>                         I've skimmed through the code and I'm not sure
>         I can
>                         see any asynchronicity
>                         (you were pointing at the lack of it in
>         BufferedReader).
>                         And the mechanics of this is very puzzling to
>         me, to
>                         be honest:
>                              void blockingFill(boolean forced) throws
>                         IOException {
>                                  fill(forced);
>                                  while (readPos == writePos) {
>                                      try {
>                                          Thread.sleep(100);
>                                      } catch (InterruptedException e) {
>                                          // An interrupt may mean more
>         data is
>                         available
>                                      }
>                                      fill(forced);
>                                  }
>                              }
>                         I thought you were suggesting that we should
>         utilize
>                         the tools which OS provides
>                         more efficiently. Instead we have something
>         that looks
>                         very similarly to a
>                         "busy loop" and... also who and when is
>         supposed to
>                         interrupt Thread.sleep()?
>                         Sorry, I'm not following. Could you please
>         explain how
>                         this is supposed to work?
>
>                             On 24 Oct 2016, at 15:59, Brunoais
>                             <brunoaiss at gmail.com>
>                               wrote:
>                             Attached and sending!
>                             On 24/10/2016 13:48, Pavel Rappo wrote:
>
>                                 Could you please send a new email on
>         this list
>                                 with the source attached as a
>                                 text file?
>
>                                     On 23 Oct 2016, at 19:14, Brunoais
>                                     <brunoaiss at gmail.com>
>                                       wrote:
>                                     Here's my poc/prototype:
>
>         http://pastebin.com/WRpYWDJF
>
>                                     I've implemented the bare minimum
>         of the
>                                     class that follows the same
>         contract of
>                                     BufferedReader while signaling all
>         issues
>                                     I think it may have or has in
>         comments.
>                                     I also wrote some javadoc to help
>         guiding
>                                     through the class.
>                                     I could have used more fields from
>                                     BufferedReader but the names were so
>                                     minimalistic that were confusing me. I
>                                     intent to change them before
>         sending this
>                                     to openJDK.
>                                     One of the major problems this has
>         is long
>                                     overflowing. It is major because it is
>                                     hidden, it will be extremely rare
>         and it
>                                     takes a really long time to reproduce.
>                                     There are different ways of
>         dealing with
>                                     it. From just documenting to actually
>                                     making code that works with it.
>                                     I built a simple test code for it
>         to have
>                                     some ideas about performance and
>         correctness.
>
>         http://pastebin.com/eh6LFgwT
>
>                                     This doesn't do a through test if
>         it is
>                                     actually working correctly but I
>         see no
>                                     reason for it not working
>         correctly after
>                                     fixing the 2 bugs that test found.
>                                     I'll also leave here some conclusions
>                                     about speed and resource
>         consumption I found.
>                                     I made tests with default buffer
>         sizes,
>                                     5000B 15_000B and 500_000B. I noticed
>                                     that, with my hardware, with the 1
>         530 000
>                                     000B file, I was getting around:
>                                     In all buffers and fake work:
>         10~15s speed
>                                     improvement ( from 90% HDD speed
>         to 100%
>                                     HDD speed)
>                                     In all buffers and no fake work: 1~2s
>                                     speed improvement ( from 90% HDD
>         speed to
>                                     100% HDD speed)
>                                     Changing the buffer size was giving
>                                     different reading speeds but both were
>                                     quite equal in how much they would
>         change
>                                     when changing the buffer size.
>                                     Finally, I could always confirm
>         that I/O
>                                     was always the slowest thing while
>         this
>                                     code was running.
>                                     For the ones wondering about the file
>                                     size; it is both to avoid OS cache
>         and to
>                                     make the reading at the main use-case
>                                     these objects are for (large
>         streams of
>                                     bytes).
>                                     @Pavel, are you open for
>         discussion now
>                                     ;)? Need anything else?
>                                     On 21/10/2016 19:21, Pavel Rappo
>         wrote:
>
>                                         Just to append to my previous
>         email.
>                                         BufferedReader wraps any
>         Reader out there.
>                                         Not specifically FileReader. While
>                                         you're talking about the case
>         of effective
>                                         reading from a file.
>                                         I guess there's one existing
>                                         possibility to provide exactly
>         what
>                                         you need (as I
>                                         understand it) under this method:
>                                         /**
>                                           * Opens a file for reading,
>                                         returning a {@code
>         BufferedReader} to
>                                         read text
>                                           * from the file in an efficient
>                                         manner...
>                                             ...
>                                           */
>                                        
>         java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>                                         It can return _anything_ as
>         long as it
>                                         is a BufferedReader. We can do
>         it, but it
>                                         needs to be investigated not
>         only for
>                                         your favorite OS but for other
>         OSes as
>                                         well. Feel free to prototype
>         this and
>                                         we can discuss it on the list
>         later.
>                                         Thanks,
>                                         -Pavel
>
>                                             On 21 Oct 2016, at 18:56,
>         Brunoais
>                                             <brunoaiss at gmail.com>
>                                               wrote:
>                                             Pavel is right.
>                                             In reality, I was
>         expecting such
>                                             BufferedReader to use only a
>                                             single buffer and have
>         that Buffer
>                                             being filled
>         asynchronously, not
>                                             in a different Thread.
>                                             Additionally, I don't have the
>                                             intention of having a larger
>                                             buffer than before unless
>         stated
>                                             through the API (the
>         constructor).
>                                             In my idea, internally, it is
>                                             supposed to use
>         java.nio.channels.AsynchronousFileChannel
>                                             or equivalent.
>                                             It does not prevent having two
>                                             buffers and I do not intent to
>                                             change BufferedReader
>         itself. I'd
>                                             do an BufferedAsyncReader
>         of sorts
>                                             (any name suggestion is
>         welcome as
>                                             I'm an awful namer).
>                                             On 21/10/2016 18:38, Roger
>         Riggs
>                                             wrote:
>
>                                                 Hi Pavel,
>                                                 I think Brunoais
>         asking for a
>                                                 double buffering scheme in
>                                                 which the
>         implementation of
>                                                 BufferReader fills (a
>         second
>                                                 buffer) in parallel
>         with the
>                                                 application reading
>         from the
>                                                 1st buffer
>                                                 and managing the swaps and
>                                                 async reads transparently.
>                                                 It would not change
>         the API
>                                                 but would change the
>                                                 interactions between the
>                                                 buffered reader
>                                                 and the underlying
>         stream.  It
>                                                 would also increase memory
>                                                 requirements and
>         processing
>                                                 by introducing or using a
>                                                 separate thread and the
>                                                 necessary synchronization.
>                                                 Though I think the formal
>                                                 interface semantics
>         could be
>                                                 maintained, I have doubts
>                                                 about compatibility
>         and its
>                                                 unintended consequences on
>                                                 existing subclasses,
>                                                 applications and
>         libraries.
>                                                 $.02, Roger
>                                                 On 10/21/16 1:22 PM, Pavel
>                                                 Rappo wrote:
>
>                                                     Off the top of my
>         head, I
>                                                     would say it's not
>                                                     possible to change the
>                                                     design of an
>                                                     _extensible_ type
>         that has
>                                                     been out there for
>         20 or
>                                                     so years. All
>         these I/O
>                                                     streams from
>         java.io <http://java.io>
>                                                     <http://java.io> were
>                                                     designed for simple
>                                                     synchronous use case.
>                                                     It's not that
>         their design
>                                                     is flawed in some way,
>                                                     it's that they
>         doesn't seem to
>                                                     suit your needs.
>         Have you
>                                                     considered using
>         java.nio.channels.AsynchronousFileChannel
>                                                     in your applications?
>                                                     -Pavel
>
>                                                         On 21 Oct 2016, at
>                                                         17:08, Brunoais
>                                                        
>         <brunoaiss at gmail.com>
>                                                           wrote:
>                                                         Any feedback
>         on this?
>                                                         I'm really
>         interested
>                                                         in
>         implementing such
>         BufferedReader/BufferedStreamReader
>                                                         to allow
>         speeding up
>                                                         my applications
>                                                         without having to
>                                                         think in an
>                                                         asynchronous
>         way or
>         multi-threading while
>                                                         programming
>         with it.
>                                                         That's why I'm
>         asking
>                                                         this here.
>                                                         On 13/10/2016
>         14:45,
>                                                         Brunoais wrote:
>
>                                                             Hi,
>                                                             I looked at
>         BufferedReader
>                                                             source
>         code for
>                                                             java 9
>         long with
>                                                             the source
>         code of
>                                                             the
>         channels/streams
>                                                             used. I
>         noticed
>                                                             that, like
>         in java
>                                                             7,
>         BufferedReader
>                                                             does not
>         use an
>                                                             Async API
>         to load
>                                                             data from
>         files,
>                                                             instead,
>         the data
>                                                             loading is all
>                                                             done
>         synchronously
>                                                             even when
>         the OS
>                                                             allows
>         requesting
>                                                             a file to
>         be read
>                                                             and getting a
>                                                             warning
>         later when
>                                                             the file is
>         effectively read.
>                                                             Why Is
>         BufferedReader not
>                                                             async while
>                                                             providing
>         a sync API?
>
>                             <BufferedNonBlockStream.java><Tests.java>
>
>
>
>
>
>         -- 
>         Sent from my phone
>
>
>
>
> -- 
> Sent from my phone



More information about the core-libs-dev mailing list