Request/discussion: BufferedReader reading using async API while providing sync API

Thu Oct 27 10:47:09 UTC 2016

On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com> wrote:

> Did you read the C code?

I looked at the Linux code in the JDK.

> Have you got any idea how many functions Windows or Linux (nearly all
> flavors) have for the read operation towards a file?

I do.

>
> I have already done that homework myself. I may not have read JVM's source
> code but I know well that there's functions on both Windows and Linux that
> provide such interface I mentioned although they require a slightly
> different treatment (and different constants).

You should read the JDK (native) source code instead of guessing/assuming.
On Linux, it doesn't use aio facilities for files.  The kernel io scheduler
may issue readahead behind the scenes, but there's no nonblocking file io
that's at the heart of your premise.

>
>
> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>
>>
>>
>> On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com <mailto:
>> brunoaiss at gmail.com>> wrote:
>>
>>     It is actually based on the premise that:
>>
>>     1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS
>>        buffer size to fill in as the same size as ByteBuffer.
>>
>> Why do you say that? AFAICT, it issues a read syscall and that will block
>> if the data isn't in page cache.
>>
>>     2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
>>     orders
>>        the JVM to order the OS to execute memcpy() to copy from its memory
>>        to the shared memory created at ByteBuffer instantiation (in
>>     java 8)
>>        using Unsafe and then for the JVM to update the ByteBuffer fields.
>>
>> I think subsequent reads just invoke the same read syscall, passing the
>> current file offset maintained by the file channel instance.
>>
>>     3. The call will not block waiting for I/O and it won't take longer
>>        than the JNI interface if no new data exists. However, it will
>>     block
>>        waiting for the OS to execute memcpy() to the shared memory.
>>
>> So why do you think it won't block?
>>
>>
>>     Is my premise wrong?
>>
>>     If I read correctly, if I don't use a DirectBuffer, there would be
>>     even another intermediate buffer to copy data to before giving it
>>     to the "user" which would be useless.
>>
>> If you use a HeapByteBuffer, then there's an extra copy from the native
>> buffer to the Java buffer.
>>
>>
>>
>>     On 26/10/2016 11:57, Pavel Rappo wrote:
>>
>>         I believe I see where you coming from. Please correct me if
>>         I'm wrong.
>>
>>         Your implementation is based on the premise that a call to
>>         ReadableByteChannel.read()
>>         _initiates_ the operation and returns immediately. The OS then
>>         continues to fill
>>         the buffer while there's a free space in the buffer and the
>>         channel hasn't encountered EOF.
>>
>>         Is that right?
>>
>>             On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com>
>>             wrote:
>>
>>             Thank you for your time. I'll try to explain it. I hope I
>>             can clear it up.
>>             First of it, I made a meaning mistake between asynchronous
>>             and non-blocking. This implementation uses a non-blocking
>>             algorithm internally while providing a blocking-like
>>             algorithm on the surface. It is single-threaded and not
>>             multi-threaded where one thread fetches data and blocks
>>             waiting and the other accumulates it and provides to
>>             whichever wants it.
>>
>>             Second of it, I had made a mistake of going after
>>             BufferedReader instead of going after BufferedInputStream.
>>             If you want me to go after BufferedReader it's ok but I
>>             only thought that going after BufferedInputStream would be
>>             more generically useful than BufferedReaderwhen I started
>>             the poc.
>>
>>             On to my code:
>>             Short answers:
>>                     • The sleep(int) exists because I don't know how
>>             to wait until more data exists in the buffer which is part
>>             of read()'s contract.
>>                     • The ByteBuffer gives a buffer that is filled by
>>             the OS (what I believe Channels do) instead of getting
>>             data only         by demand (what I believe Streams do).
>>             Full answers:
>>             The blockingFill(boolean) method is a method for a busy
>>             wait for a fill which is used exclusively by the read()
>>             method. All other methods use the version that does not
>>             sleep (fill(boolean)).
>>             blockingFill(boolean)'s existance like that is only
>>             because the read() method must not return unless either:
>>
>>                     • The stream ended.
>>                     • The next byte is ready for reading.
>>             Additionally, statistically, that while loop will rarely
>>             evaluate to true as reads are in chunks so readPos will be
>>             behind writePos most of the time.
>>             I have no idea if an interrupt will ever happen, to be
>>             honest. The main reasons why I'm using a sleep is because
>>             I didn't want a hog onto the CPU in a full thread usage
>>             busy wait and because I didn't find any way of doing a
>>             thread sleep in order to wake up later when the buffer
>>             managed by native code has more data.
>>             The Non-blocking part is managed by the buffer the OS
>>             keeps filling most if not all the time. That buffer is the
>>             field
>>
>>             ByteBuffer readBuffer
>>             That's the gaining part against the plain old Buffered
>>             classes.
>>
>>
>>             Did that make sense to you? Feel free to ask anything else
>>             you need.
>>
>>             On 25/10/2016 20:52, Pavel Rappo wrote:
>>
>>                 I've skimmed through the code and I'm not sure I can
>>                 see any asynchronicity
>>                 (you were pointing at the lack of it in BufferedReader).
>>                 And the mechanics of this is very puzzling to me, to
>>                 be honest:
>>                      void blockingFill(boolean forced) throws
>>                 IOException {
>>                          fill(forced);
>>                          while (readPos == writePos) {
>>                              try {
>>                                  Thread.sleep(100);
>>                              } catch (InterruptedException e) {
>>                                  // An interrupt may mean more data is
>>                 available
>>                              }
>>                              fill(forced);
>>                          }
>>                      }
>>                 I thought you were suggesting that we should utilize
>>                 the tools which OS provides
>>                 more efficiently. Instead we have something that looks
>>                 very similarly to a
>>                 "busy loop" and... also who and when is supposed to
>>                 interrupt Thread.sleep()?
>>                 Sorry, I'm not following. Could you please explain how
>>                 this is supposed to work?
>>
>>                     On 24 Oct 2016, at 15:59, Brunoais
>>                     <brunoaiss at gmail.com>
>>                       wrote:
>>                     Attached and sending!
>>                     On 24/10/2016 13:48, Pavel Rappo wrote:
>>
>>                         Could you please send a new email on this list
>>                         with the source attached as a
>>                         text file?
>>
>>                             On 23 Oct 2016, at 19:14, Brunoais
>>                             <brunoaiss at gmail.com>
>>                               wrote:
>>                             Here's my poc/prototype:
>>
>>                             http://pastebin.com/WRpYWDJF
>>
>>                             I've implemented the bare minimum of the
>>                             class that follows the same contract of
>>                             BufferedReader while signaling all issues
>>                             I think it may have or has in comments.
>>                             I also wrote some javadoc to help guiding
>>                             through the class.
>>                             I could have used more fields from
>>                             BufferedReader but the names were so
>>                             minimalistic that were confusing me. I
>>                             intent to change them before sending this
>>                             to openJDK.
>>                             One of the major problems this has is long
>>                             overflowing. It is major because it is
>>                             hidden, it will be extremely rare and it
>>                             takes a really long time to reproduce.
>>                             There are different ways of dealing with
>>                             it. From just documenting to actually
>>                             making code that works with it.
>>                             I built a simple test code for it to have
>>                             some ideas about performance and correctness.
>>
>>                             http://pastebin.com/eh6LFgwT
>>
>>                             This doesn't do a through test if it is
>>                             actually working correctly but I see no
>>                             reason for it not working correctly after
>>                             fixing the 2 bugs that test found.
>>                             I'll also leave here some conclusions
>>                             about speed and resource consumption I found.
>>                             I made tests with default buffer sizes,
>>                             5000B 15_000B and 500_000B. I noticed
>>                             that, with my hardware, with the 1 530 000
>>                             000B file, I was getting around:
>>                             In all buffers and fake work: 10~15s speed
>>                             improvement ( from 90% HDD speed to 100%
>>                             HDD speed)
>>                             In all buffers and no fake work: 1~2s
>>                             speed improvement ( from 90% HDD speed to
>>                             100% HDD speed)
>>                             Changing the buffer size was giving
>>                             different reading speeds but both were
>>                             quite equal in how much they would change
>>                             when changing the buffer size.
>>                             Finally, I could always confirm that I/O
>>                             was always the slowest thing while this
>>                             code was running.
>>                             For the ones wondering about the file
>>                             size; it is both to avoid OS cache and to
>>                             make the reading at the main use-case
>>                             these objects are for (large streams of
>>                             bytes).
>>                             @Pavel, are you open for discussion now
>>                             ;)? Need anything else?
>>                             On 21/10/2016 19:21, Pavel Rappo wrote:
>>
>>                                 Just to append to my previous email.
>>                                 BufferedReader wraps any Reader out there.
>>                                 Not specifically FileReader. While
>>                                 you're talking about the case of effective
>>                                 reading from a file.
>>                                 I guess there's one existing
>>                                 possibility to provide exactly what
>>                                 you need (as I
>>                                 understand it) under this method:
>>                                 /**
>>                                   * Opens a file for reading,
>>                                 returning a {@code BufferedReader} to
>>                                 read text
>>                                   * from the file in an efficient
>>                                 manner...
>>                                     ...
>>                                   */
>>                                 java.nio.file.Files#newBuffere
>> dReader(java.nio.file.Path)
>>                                 It can return _anything_ as long as it
>>                                 is a BufferedReader. We can do it, but it
>>                                 needs to be investigated not only for
>>                                 your favorite OS but for other OSes as
>>                                 well. Feel free to prototype this and
>>                                 we can discuss it on the list later.
>>                                 Thanks,
>>                                 -Pavel
>>
>>                                     On 21 Oct 2016, at 18:56, Brunoais
>>                                     <brunoaiss at gmail.com>
>>                                       wrote:
>>                                     Pavel is right.
>>                                     In reality, I was expecting such
>>                                     BufferedReader to use only a
>>                                     single buffer and have that Buffer
>>                                     being filled asynchronously, not
>>                                     in a different Thread.
>>                                     Additionally, I don't have the
>>                                     intention of having a larger
>>                                     buffer than before unless stated
>>                                     through the API (the constructor).
>>                                     In my idea, internally, it is
>>                                     supposed to use
>>                                     java.nio.channels.Asynchronous
>> FileChannel
>>                                     or equivalent.
>>                                     It does not prevent having two
>>                                     buffers and I do not intent to
>>                                     change BufferedReader itself. I'd
>>                                     do an BufferedAsyncReader of sorts
>>                                     (any name suggestion is welcome as
>>                                     I'm an awful namer).
>>                                     On 21/10/2016 18:38, Roger Riggs
>>                                     wrote:
>>
>>                                         Hi Pavel,
>>                                         I think Brunoais asking for a
>>                                         double buffering scheme in
>>                                         which the implementation of
>>                                         BufferReader fills (a second
>>                                         buffer) in parallel with the
>>                                         application reading from the
>>                                         1st buffer
>>                                         and managing the swaps and
>>                                         async reads transparently.
>>                                         It would not change the API
>>                                         but would change the
>>                                         interactions between the
>>                                         buffered reader
>>                                         and the underlying stream.  It
>>                                         would also increase memory
>>                                         requirements and processing
>>                                         by introducing or using a
>>                                         separate thread and the
>>                                         necessary synchronization.
>>                                         Though I think the formal
>>                                         interface semantics could be
>>                                         maintained, I have doubts
>>                                         about compatibility and its
>>                                         unintended consequences on
>>                                         existing subclasses,
>>                                         applications and libraries.
>>                                         $.02, Roger
>>                                         On 10/21/16 1:22 PM, Pavel
>>                                         Rappo wrote:
>>
>>                                             Off the top of my head, I
>>                                             would say it's not
>>                                             possible to change the
>>                                             design of an
>>                                             _extensible_ type that has
>>                                             been out there for 20 or
>>                                             so years. All these I/O
>>                                             streams from java.io
>>                                             <http://java.io> were
>>                                             designed for simple
>>                                             synchronous use case.
>>                                             It's not that their design
>>                                             is flawed in some way,
>>                                             it's that they doesn't seem to
>>                                             suit your needs. Have you
>>                                             considered using
>>                                             java.nio.channels.Asynchronous
>> FileChannel
>>                                             in your applications?
>>                                             -Pavel
>>
>>                                                 On 21 Oct 2016, at
>>                                                 17:08, Brunoais
>>                                                 <brunoaiss at gmail.com>
>>                                                   wrote:
>>                                                 Any feedback on this?
>>                                                 I'm really interested
>>                                                 in implementing such
>>
>> BufferedReader/BufferedStreamReader
>>                                                 to allow speeding up
>>                                                 my applications
>>                                                 without having to
>>                                                 think in an
>>                                                 asynchronous way or
>>                                                 multi-threading while
>>                                                 programming with it.
>>                                                 That's why I'm asking
>>                                                 this here.
>>                                                 On 13/10/2016 14:45,
>>                                                 Brunoais wrote:
>>
>>                                                     Hi,
>>                                                     I looked at
>>                                                     BufferedReader
>>                                                     source code for
>>                                                     java 9 long with
>>                                                     the source code of
>>                                                     the
>>                                                     channels/streams
>>                                                     used. I noticed
>>                                                     that, like in java
>>                                                     7, BufferedReader
>>                                                     does not use an
>>                                                     Async API to load
>>                                                     data from files,
>>                                                     instead, the data
>>                                                     loading is all
>>                                                     done synchronously
>>                                                     even when the OS
>>                                                     allows requesting
>>                                                     a file to be read
>>                                                     and getting a
>>                                                     warning later when
>>                                                     the file is
>>                                                     effectively read.
>>                                                     Why Is
>>                                                     BufferedReader not
>>                                                     async while
>>                                                     providing a sync API?
>>
>>                     <BufferedNonBlockStream.java><Tests.java>
>>
>>
>>
>>
>>
>> --
>> Sent from my phone
>>
>
>

-- 
Sent from my phone