Request/discussion: BufferedReader reading using async API while providing sync API

Thu Oct 27 06:20:20 UTC 2016

Did you read the C code?
Have you got any idea how many functions Windows or Linux (nearly all 
flavors) have for the read operation towards a file?

I have already done that homework myself. I may not have read JVM's 
source code but I know well that there's functions on both Windows and 
Linux that provide such interface I mentioned although they require a 
slightly different treatment (and different constants).

On 27/10/2016 00:06, Vitaly Davidovich wrote:
>
>
> On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com 
> <mailto:brunoaiss at gmail.com>> wrote:
>
>     It is actually based on the premise that:
>
>     1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS
>        buffer size to fill in as the same size as ByteBuffer.
>
> Why do you say that? AFAICT, it issues a read syscall and that will 
> block if the data isn't in page cache.
>
>     2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
>     orders
>        the JVM to order the OS to execute memcpy() to copy from its memory
>        to the shared memory created at ByteBuffer instantiation (in
>     java 8)
>        using Unsafe and then for the JVM to update the ByteBuffer fields.
>
> I think subsequent reads just invoke the same read syscall, passing 
> the current file offset maintained by the file channel instance.
>
>     3. The call will not block waiting for I/O and it won't take longer
>        than the JNI interface if no new data exists. However, it will
>     block
>        waiting for the OS to execute memcpy() to the shared memory.
>
> So why do you think it won't block?
>
>
>     Is my premise wrong?
>
>     If I read correctly, if I don't use a DirectBuffer, there would be
>     even another intermediate buffer to copy data to before giving it
>     to the "user" which would be useless.
>
> If you use a HeapByteBuffer, then there's an extra copy from the 
> native buffer to the Java buffer.
>
>
>
>     On 26/10/2016 11:57, Pavel Rappo wrote:
>
>         I believe I see where you coming from. Please correct me if
>         I'm wrong.
>
>         Your implementation is based on the premise that a call to
>         ReadableByteChannel.read()
>         _initiates_ the operation and returns immediately. The OS then
>         continues to fill
>         the buffer while there's a free space in the buffer and the
>         channel hasn't encountered EOF.
>
>         Is that right?
>
>             On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com>
>             wrote:
>
>             Thank you for your time. I'll try to explain it. I hope I
>             can clear it up.
>             First of it, I made a meaning mistake between asynchronous
>             and non-blocking. This implementation uses a non-blocking
>             algorithm internally while providing a blocking-like
>             algorithm on the surface. It is single-threaded and not
>             multi-threaded where one thread fetches data and blocks
>             waiting and the other accumulates it and provides to
>             whichever wants it.
>
>             Second of it, I had made a mistake of going after
>             BufferedReader instead of going after BufferedInputStream.
>             If you want me to go after BufferedReader it's ok but I
>             only thought that going after BufferedInputStream would be
>             more generically useful than BufferedReaderwhen I started
>             the poc.
>
>             On to my code:
>             Short answers:
>                     • The sleep(int) exists because I don't know how
>             to wait until more data exists in the buffer which is part
>             of read()'s contract.
>                     • The ByteBuffer gives a buffer that is filled by
>             the OS (what I believe Channels do) instead of getting
>             data only         by demand (what I believe Streams do).
>             Full answers:
>             The blockingFill(boolean) method is a method for a busy
>             wait for a fill which is used exclusively by the read()
>             method. All other methods use the version that does not
>             sleep (fill(boolean)).
>             blockingFill(boolean)'s existance like that is only
>             because the read() method must not return unless either:
>
>                     • The stream ended.
>                     • The next byte is ready for reading.
>             Additionally, statistically, that while loop will rarely
>             evaluate to true as reads are in chunks so readPos will be
>             behind writePos most of the time.
>             I have no idea if an interrupt will ever happen, to be
>             honest. The main reasons why I'm using a sleep is because
>             I didn't want a hog onto the CPU in a full thread usage
>             busy wait and because I didn't find any way of doing a
>             thread sleep in order to wake up later when the buffer
>             managed by native code has more data.
>             The Non-blocking part is managed by the buffer the OS
>             keeps filling most if not all the time. That buffer is the
>             field
>
>             ByteBuffer readBuffer
>             That's the gaining part against the plain old Buffered
>             classes.
>
>
>             Did that make sense to you? Feel free to ask anything else
>             you need.
>
>             On 25/10/2016 20:52, Pavel Rappo wrote:
>
>                 I've skimmed through the code and I'm not sure I can
>                 see any asynchronicity
>                 (you were pointing at the lack of it in BufferedReader).
>                 And the mechanics of this is very puzzling to me, to
>                 be honest:
>                      void blockingFill(boolean forced) throws
>                 IOException {
>                          fill(forced);
>                          while (readPos == writePos) {
>                              try {
>                                  Thread.sleep(100);
>                              } catch (InterruptedException e) {
>                                  // An interrupt may mean more data is
>                 available
>                              }
>                              fill(forced);
>                          }
>                      }
>                 I thought you were suggesting that we should utilize
>                 the tools which OS provides
>                 more efficiently. Instead we have something that looks
>                 very similarly to a
>                 "busy loop" and... also who and when is supposed to
>                 interrupt Thread.sleep()?
>                 Sorry, I'm not following. Could you please explain how
>                 this is supposed to work?
>
>                     On 24 Oct 2016, at 15:59, Brunoais
>                     <brunoaiss at gmail.com>
>                       wrote:
>                     Attached and sending!
>                     On 24/10/2016 13:48, Pavel Rappo wrote:
>
>                         Could you please send a new email on this list
>                         with the source attached as a
>                         text file?
>
>                             On 23 Oct 2016, at 19:14, Brunoais
>                             <brunoaiss at gmail.com>
>                               wrote:
>                             Here's my poc/prototype:
>
>                             http://pastebin.com/WRpYWDJF
>
>                             I've implemented the bare minimum of the
>                             class that follows the same contract of
>                             BufferedReader while signaling all issues
>                             I think it may have or has in comments.
>                             I also wrote some javadoc to help guiding
>                             through the class.
>                             I could have used more fields from
>                             BufferedReader but the names were so
>                             minimalistic that were confusing me. I
>                             intent to change them before sending this
>                             to openJDK.
>                             One of the major problems this has is long
>                             overflowing. It is major because it is
>                             hidden, it will be extremely rare and it
>                             takes a really long time to reproduce.
>                             There are different ways of dealing with
>                             it. From just documenting to actually
>                             making code that works with it.
>                             I built a simple test code for it to have
>                             some ideas about performance and correctness.
>
>                             http://pastebin.com/eh6LFgwT
>
>                             This doesn't do a through test if it is
>                             actually working correctly but I see no
>                             reason for it not working correctly after
>                             fixing the 2 bugs that test found.
>                             I'll also leave here some conclusions
>                             about speed and resource consumption I found.
>                             I made tests with default buffer sizes,
>                             5000B 15_000B and 500_000B. I noticed
>                             that, with my hardware, with the 1 530 000
>                             000B file, I was getting around:
>                             In all buffers and fake work: 10~15s speed
>                             improvement ( from 90% HDD speed to 100%
>                             HDD speed)
>                             In all buffers and no fake work: 1~2s
>                             speed improvement ( from 90% HDD speed to
>                             100% HDD speed)
>                             Changing the buffer size was giving
>                             different reading speeds but both were
>                             quite equal in how much they would change
>                             when changing the buffer size.
>                             Finally, I could always confirm that I/O
>                             was always the slowest thing while this
>                             code was running.
>                             For the ones wondering about the file
>                             size; it is both to avoid OS cache and to
>                             make the reading at the main use-case
>                             these objects are for (large streams of
>                             bytes).
>                             @Pavel, are you open for discussion now
>                             ;)? Need anything else?
>                             On 21/10/2016 19:21, Pavel Rappo wrote:
>
>                                 Just to append to my previous email.
>                                 BufferedReader wraps any Reader out there.
>                                 Not specifically FileReader. While
>                                 you're talking about the case of effective
>                                 reading from a file.
>                                 I guess there's one existing
>                                 possibility to provide exactly what
>                                 you need (as I
>                                 understand it) under this method:
>                                 /**
>                                   * Opens a file for reading,
>                                 returning a {@code BufferedReader} to
>                                 read text
>                                   * from the file in an efficient
>                                 manner...
>                                     ...
>                                   */
>                                 java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>                                 It can return _anything_ as long as it
>                                 is a BufferedReader. We can do it, but it
>                                 needs to be investigated not only for
>                                 your favorite OS but for other OSes as
>                                 well. Feel free to prototype this and
>                                 we can discuss it on the list later.
>                                 Thanks,
>                                 -Pavel
>
>                                     On 21 Oct 2016, at 18:56, Brunoais
>                                     <brunoaiss at gmail.com>
>                                       wrote:
>                                     Pavel is right.
>                                     In reality, I was expecting such
>                                     BufferedReader to use only a
>                                     single buffer and have that Buffer
>                                     being filled asynchronously, not
>                                     in a different Thread.
>                                     Additionally, I don't have the
>                                     intention of having a larger
>                                     buffer than before unless stated
>                                     through the API (the constructor).
>                                     In my idea, internally, it is
>                                     supposed to use
>                                     java.nio.channels.AsynchronousFileChannel
>                                     or equivalent.
>                                     It does not prevent having two
>                                     buffers and I do not intent to
>                                     change BufferedReader itself. I'd
>                                     do an BufferedAsyncReader of sorts
>                                     (any name suggestion is welcome as
>                                     I'm an awful namer).
>                                     On 21/10/2016 18:38, Roger Riggs
>                                     wrote:
>
>                                         Hi Pavel,
>                                         I think Brunoais asking for a
>                                         double buffering scheme in
>                                         which the implementation of
>                                         BufferReader fills (a second
>                                         buffer) in parallel with the
>                                         application reading from the
>                                         1st buffer
>                                         and managing the swaps and
>                                         async reads transparently.
>                                         It would not change the API
>                                         but would change the
>                                         interactions between the
>                                         buffered reader
>                                         and the underlying stream.  It
>                                         would also increase memory
>                                         requirements and processing
>                                         by introducing or using a
>                                         separate thread and the
>                                         necessary synchronization.
>                                         Though I think the formal
>                                         interface semantics could be
>                                         maintained, I have doubts
>                                         about compatibility and its
>                                         unintended consequences on
>                                         existing subclasses,
>                                         applications and libraries.
>                                         $.02, Roger
>                                         On 10/21/16 1:22 PM, Pavel
>                                         Rappo wrote:
>
>                                             Off the top of my head, I
>                                             would say it's not
>                                             possible to change the
>                                             design of an
>                                             _extensible_ type that has
>                                             been out there for 20 or
>                                             so years. All these I/O
>                                             streams from java.io
>                                             <http://java.io> were
>                                             designed for simple
>                                             synchronous use case.
>                                             It's not that their design
>                                             is flawed in some way,
>                                             it's that they doesn't seem to
>                                             suit your needs. Have you
>                                             considered using
>                                             java.nio.channels.AsynchronousFileChannel
>                                             in your applications?
>                                             -Pavel
>
>                                                 On 21 Oct 2016, at
>                                                 17:08, Brunoais
>                                                 <brunoaiss at gmail.com>
>                                                   wrote:
>                                                 Any feedback on this?
>                                                 I'm really interested
>                                                 in implementing such
>                                                 BufferedReader/BufferedStreamReader
>                                                 to allow speeding up
>                                                 my applications
>                                                 without having to
>                                                 think in an
>                                                 asynchronous way or
>                                                 multi-threading while
>                                                 programming with it.
>                                                 That's why I'm asking
>                                                 this here.
>                                                 On 13/10/2016 14:45,
>                                                 Brunoais wrote:
>
>                                                     Hi,
>                                                     I looked at
>                                                     BufferedReader
>                                                     source code for
>                                                     java 9 long with
>                                                     the source code of
>                                                     the
>                                                     channels/streams
>                                                     used. I noticed
>                                                     that, like in java
>                                                     7, BufferedReader
>                                                     does not use an
>                                                     Async API to load
>                                                     data from files,
>                                                     instead, the data
>                                                     loading is all
>                                                     done synchronously
>                                                     even when the OS
>                                                     allows requesting
>                                                     a file to be read
>                                                     and getting a
>                                                     warning later when
>                                                     the file is
>                                                     effectively read.
>                                                     Why Is
>                                                     BufferedReader not
>                                                     async while
>                                                     providing a sync API?
>
>                     <BufferedNonBlockStream.java><Tests.java>
>
>
>
>
>
> -- 
> Sent from my phone