Request/discussion: BufferedReader reading using async API while providing sync API

Brunoais brunoaiss at gmail.com
Fri Oct 28 06:53:34 UTC 2016



On 27/10/2016 22:45, Vitaly Davidovich wrote:
>
>
> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com 
> <mailto:brunoaiss at gmail.com>> wrote:
>
>     You are right. Even in windows it does not set the flags for async
>     reads. It seems like it is windows itself that does the decision
>     to buffer the contents based on its own heuristics.
>
> You mean nonblocking, not async, right? Two different things.
Ups. Mistyped. On windows docs they seem to call it async...
>
>     But... Why? Why won't it be? Why is there no API for it? How am I
>     getting 100% HDD use and faster times when I fake work to delay
>     getting more data and I only have a fluctuating 60-90% (always
>     going up and down) when I use an InputStream?
>     Is it related to how both classes cache and how frequently and how
>     much each one asks for data?
>
>     I really would prefer not having to read the source code because
>     it takes a real long time T.T.
>
>     I end up reinstating... And wondering...
>
>     Why doesn't java provide a single-threaded non-block API for file
>     reads for all OS that support it? I simply cannot find that
>     information no matter how much I search on google, bing, duck duck
>     go... Can any of you point me to whomever knows?
>
> https://lwn.net/Articles/612483/ for Linux.  Unfortunately, the 
> nonblocking file io story is complicated and messy.
In Windows manual and Linux manual, they call asynchonous I/O for what 
is non-blocking synchonous I/O for the program that runs on the OS.
http://man7.org/linux/man-pages/man3/aio_read.3.html
http://man7.org/linux/man-pages/man7/aio.7.html
http://man7.org/linux/man-pages/man7/sigevent.7.html

This does not block, the OS writes directly to the user buffer, does not 
run on a different user thread and uses either signals or a function 
pointer as a callback when the operation is completed. Reading the 
manual, it seems it can even be the own thread. If it is with signals, I 
do know it is completely non-blocking and single-threaded (from the 
"user" thread's perspective). I'd like to see this in java...
I guess I only have the NIO2 for that, then with AsynchronousFileChannel.

>     On 27/10/2016 14:11, Vitaly Davidovich wrote:
>>     I don't know about Windows specifically, but generally file
>>     systems across major OS's will implement readahead in their IO
>>     scheduler when they detect sequential scans.
>>
>>     On Linux, you can also strace your test to confirm which syscalls
>>     are emitted (you should be seeing plain read()'s there, with
>>     FileInputStream and FileChannel).
>>
>>     On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
>>     <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>
>>         Thanks for the heads up.
>>
>>         I'll try that later. These tests are still useful then.
>>         Meanwhile, I'll end up also checking how FileChannel queries
>>         the OS on windows. I'm getting 100% HDD reads... Could it be
>>         that the OS reads the file ahead on its own?... Anyway, I'll
>>         look into it. Thanks for the heads up.
>>
>>
>>         On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>>
>>>
>>>         On Thu, Oct 27, 2016 at 8:34 AM, Brunoais
>>>         <brunoaiss at gmail.com
>>>         <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>>
>>>             Oh... I see. In that case, it means something is
>>>             terribly wrong. It can be my initial tests, though.
>>>
>>>             I'm testing on both linux and windows and I'm getting
>>>             performance gains from using the FileChannel compared to
>>>             using FileInputStream... The tests also make sense based
>>>             on my predictions O_O...
>>>
>>>         FileInputStream requires copying native buffers holding the
>>>         read data to the java byte[].  If you're using direct
>>>         ByteBuffer for FileChannel, that whole memcpy is skipped. 
>>>         Try comparing FileChannel with HeapByteBuffer instead.
>>>
>>>
>>>             On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>             On Thursday, October 27, 2016, Brunoais
>>>>             <brunoaiss at gmail.com
>>>>             <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>>
>>>>             wrote:
>>>>
>>>>                 Did you read the C code?
>>>>
>>>>             I looked at the Linux code in the JDK.
>>>>
>>>>                 Have you got any idea how many functions Windows or
>>>>                 Linux (nearly all flavors) have for the read
>>>>                 operation towards a file?
>>>>
>>>>             I do.
>>>>
>>>>
>>>>                 I have already done that homework myself. I may not
>>>>                 have read JVM's source code but I know well that
>>>>                 there's functions on both Windows and Linux that
>>>>                 provide such interface I mentioned although they
>>>>                 require a slightly different treatment (and
>>>>                 different constants).
>>>>
>>>>             You should read the JDK (native) source code instead of
>>>>             guessing/assuming.  On Linux, it doesn't use aio
>>>>             facilities for files.  The kernel io scheduler may
>>>>             issue readahead behind the scenes, but there's no
>>>>             nonblocking file io that's at the heart of your premise.
>>>>
>>>>
>>>>
>>>>                 On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>
>>>>
>>>>
>>>>                     On Wednesday, October 26, 2016, Brunoais
>>>>                     <brunoaiss at gmail.com
>>>>                     <mailto:brunoaiss at gmail.com>> wrote:
>>>>
>>>>                         It is actually based on the premise that:
>>>>
>>>>                         1. The first call to
>>>>                     ReadableByteChannel.read(ByteBuffer) sets the OS
>>>>                            buffer size to fill in as the same size
>>>>                     as ByteBuffer.
>>>>
>>>>                     Why do you say that? AFAICT, it issues a read
>>>>                     syscall and that will block if the data isn't
>>>>                     in page cache.
>>>>
>>>>                         2. The consecutive calls to
>>>>                     ReadableByteChannel.read(ByteBuffer)
>>>>                         orders
>>>>                            the JVM to order the OS to execute
>>>>                     memcpy() to copy from its memory
>>>>                            to the shared memory created at
>>>>                     ByteBuffer instantiation (in
>>>>                         java 8)
>>>>                            using Unsafe and then for the JVM to
>>>>                     update the ByteBuffer fields.
>>>>
>>>>                     I think subsequent reads just invoke the same
>>>>                     read syscall, passing the current file offset
>>>>                     maintained by the file channel instance.
>>>>
>>>>                         3. The call will not block waiting for I/O
>>>>                     and it won't take longer
>>>>                            than the JNI interface if no new data
>>>>                     exists. However, it will
>>>>                         block
>>>>                            waiting for the OS to execute memcpy()
>>>>                     to the shared memory.
>>>>
>>>>                     So why do you think it won't block?
>>>>
>>>>
>>>>                         Is my premise wrong?
>>>>
>>>>                         If I read correctly, if I don't use a
>>>>                     DirectBuffer, there would be
>>>>                         even another intermediate buffer to copy
>>>>                     data to before giving it
>>>>                         to the "user" which would be useless.
>>>>
>>>>                     If you use a HeapByteBuffer, then there's an
>>>>                     extra copy from the native buffer to the Java
>>>>                     buffer.
>>>>
>>>>
>>>>
>>>>                         On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>
>>>>                             I believe I see where you coming from.
>>>>                     Please correct me if
>>>>                             I'm wrong.
>>>>
>>>>                             Your implementation is based on the
>>>>                     premise that a call to
>>>>                     ReadableByteChannel.read()
>>>>                             _initiates_ the operation and returns
>>>>                     immediately. The OS then
>>>>                             continues to fill
>>>>                             the buffer while there's a free space
>>>>                     in the buffer and the
>>>>                             channel hasn't encountered EOF.
>>>>
>>>>                             Is that right?
>>>>
>>>>                                 On 25 Oct 2016, at 22:16, Brunoais
>>>>                     <brunoaiss at gmail.com>
>>>>                                 wrote:
>>>>
>>>>                                 Thank you for your time. I'll try
>>>>                     to explain it. I hope I
>>>>                                 can clear it up.
>>>>                                 First of it, I made a meaning
>>>>                     mistake between asynchronous
>>>>                                 and non-blocking. This
>>>>                     implementation uses a non-blocking
>>>>                                 algorithm internally while
>>>>                     providing a blocking-like
>>>>                                 algorithm on the surface. It is
>>>>                     single-threaded and not
>>>>                                 multi-threaded where one thread
>>>>                     fetches data and blocks
>>>>                                 waiting and the other accumulates
>>>>                     it and provides to
>>>>                                 whichever wants it.
>>>>
>>>>                                 Second of it, I had made a mistake
>>>>                     of going after
>>>>                                 BufferedReader instead of going
>>>>                     after BufferedInputStream.
>>>>                                 If you want me to go after
>>>>                     BufferedReader it's ok but I
>>>>                                 only thought that going after
>>>>                     BufferedInputStream would be
>>>>                                 more generically useful than
>>>>                     BufferedReaderwhen I started
>>>>                                 the poc.
>>>>
>>>>                                 On to my code:
>>>>                                 Short answers:
>>>>                                         • The sleep(int) exists
>>>>                     because I don't know how
>>>>                                 to wait until more data exists in
>>>>                     the buffer which is part
>>>>                                 of read()'s contract.
>>>>                                         • The ByteBuffer gives a
>>>>                     buffer that is filled by
>>>>                                 the OS (what I believe Channels do)
>>>>                     instead of getting
>>>>                                 data only    by demand (what I
>>>>                     believe Streams do).
>>>>                                 Full answers:
>>>>                                 The blockingFill(boolean) method is
>>>>                     a method for a busy
>>>>                                 wait for a fill which is used
>>>>                     exclusively by the read()
>>>>                                 method. All other methods use the
>>>>                     version that does not
>>>>                                 sleep (fill(boolean)).
>>>>                     blockingFill(boolean)'s existance like that is only
>>>>                                 because the read() method must not
>>>>                     return unless either:
>>>>
>>>>                                         • The stream ended.
>>>>                                         • The next byte is ready
>>>>                     for reading.
>>>>                                 Additionally, statistically, that
>>>>                     while loop will rarely
>>>>                                 evaluate to true as reads are in
>>>>                     chunks so readPos will be
>>>>                                 behind writePos most of the time.
>>>>                                 I have no idea if an interrupt will
>>>>                     ever happen, to be
>>>>                                 honest. The main reasons why I'm
>>>>                     using a sleep is because
>>>>                                 I didn't want a hog onto the CPU in
>>>>                     a full thread usage
>>>>                                 busy wait and because I didn't find
>>>>                     any way of doing a
>>>>                                 thread sleep in order to wake up
>>>>                     later when the buffer
>>>>                                 managed by native code has more data.
>>>>                                 The Non-blocking part is managed by
>>>>                     the buffer the OS
>>>>                                 keeps filling most if not all the
>>>>                     time. That buffer is the
>>>>                                 field
>>>>
>>>>                                 ByteBuffer readBuffer
>>>>                                 That's the gaining part against the
>>>>                     plain old Buffered
>>>>                                 classes.
>>>>
>>>>
>>>>                                 Did that make sense to you? Feel
>>>>                     free to ask anything else
>>>>                                 you need.
>>>>
>>>>                                 On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>
>>>>                                     I've skimmed through the code
>>>>                     and I'm not sure I can
>>>>                                     see any asynchronicity
>>>>                                     (you were pointing at the lack
>>>>                     of it in BufferedReader).
>>>>                                     And the mechanics of this is
>>>>                     very puzzling to me, to
>>>>                                     be honest:
>>>>                                          void blockingFill(boolean
>>>>                     forced) throws
>>>>                     IOException {
>>>>                      fill(forced);
>>>>                      while (readPos == writePos) {
>>>>                        try {
>>>>                            Thread.sleep(100);
>>>>                        } catch (InterruptedException e) {
>>>>                            // An interrupt may mean more data is
>>>>                                     available
>>>>                        }
>>>>                        fill(forced);
>>>>                                              }
>>>>                                          }
>>>>                                     I thought you were suggesting
>>>>                     that we should utilize
>>>>                                     the tools which OS provides
>>>>                                     more efficiently. Instead we
>>>>                     have something that looks
>>>>                                     very similarly to a
>>>>                                     "busy loop" and... also who and
>>>>                     when is supposed to
>>>>                                     interrupt Thread.sleep()?
>>>>                                     Sorry, I'm not following. Could
>>>>                     you please explain how
>>>>                                     this is supposed to work?
>>>>
>>>>                                         On 24 Oct 2016, at 15:59,
>>>>                     Brunoais
>>>>                                         <brunoaiss at gmail.com>
>>>>                     wrote:
>>>>                     Attached and sending!
>>>>                                         On 24/10/2016 13:48, Pavel
>>>>                     Rappo wrote:
>>>>
>>>>                     Could you please send a new email on this list
>>>>                     with the source attached as a
>>>>                     text file?
>>>>
>>>>                       On 23 Oct 2016, at 19:14, Brunoais
>>>>                       <brunoaiss at gmail.com>
>>>>                         wrote:
>>>>                       Here's my poc/prototype:
>>>>
>>>>                     http://pastebin.com/WRpYWDJF
>>>>
>>>>                       I've implemented the bare minimum of the
>>>>                       class that follows the same contract of
>>>>                       BufferedReader while signaling all issues
>>>>                       I think it may have or has in comments.
>>>>                       I also wrote some javadoc to help guiding
>>>>                       through the class.
>>>>                       I could have used more fields from
>>>>                       BufferedReader but the names were so
>>>>                       minimalistic that were confusing me. I
>>>>                       intent to change them before sending this
>>>>                       to openJDK.
>>>>                       One of the major problems this has is long
>>>>                       overflowing. It is major because it is
>>>>                       hidden, it will be extremely rare and it
>>>>                       takes a really long time to reproduce.
>>>>                       There are different ways of dealing with
>>>>                       it. From just documenting to actually
>>>>                       making code that works with it.
>>>>                       I built a simple test code for it to have
>>>>                       some ideas about performance and correctness.
>>>>
>>>>                     http://pastebin.com/eh6LFgwT
>>>>
>>>>                       This doesn't do a through test if it is
>>>>                       actually working correctly but I see no
>>>>                       reason for it not working correctly after
>>>>                       fixing the 2 bugs that test found.
>>>>                       I'll also leave here some conclusions
>>>>                       about speed and resource consumption I found.
>>>>                       I made tests with default buffer sizes,
>>>>                       5000B 15_000B and 500_000B. I noticed
>>>>                       that, with my hardware, with the 1 530 000
>>>>                       000B file, I was getting around:
>>>>                       In all buffers and fake work: 10~15s speed
>>>>                       improvement ( from 90% HDD speed to 100%
>>>>                       HDD speed)
>>>>                       In all buffers and no fake work: 1~2s
>>>>                       speed improvement ( from 90% HDD speed to
>>>>                       100% HDD speed)
>>>>                       Changing the buffer size was giving
>>>>                       different reading speeds but both were
>>>>                       quite equal in how much they would change
>>>>                       when changing the buffer size.
>>>>                       Finally, I could always confirm that I/O
>>>>                       was always the slowest thing while this
>>>>                       code was running.
>>>>                       For the ones wondering about the file
>>>>                       size; it is both to avoid OS cache and to
>>>>                       make the reading at the main use-case
>>>>                       these objects are for (large streams of
>>>>                       bytes).
>>>>                       @Pavel, are you open for discussion now
>>>>                       ;)? Need anything else?
>>>>                       On 21/10/2016 19:21, Pavel Rappo wrote:
>>>>
>>>>                           Just to append to my previous email.
>>>>                           BufferedReader wraps any Reader out there.
>>>>                           Not specifically FileReader. While
>>>>                           you're talking about the case of effective
>>>>                           reading from a file.
>>>>                           I guess there's one existing
>>>>                           possibility to provide exactly what
>>>>                           you need (as I
>>>>                           understand it) under this method:
>>>>                           /**
>>>>                             * Opens a file for reading,
>>>>                           returning a {@code BufferedReader} to
>>>>                           read text
>>>>                             * from the file in an efficient
>>>>                           manner...
>>>>                               ...
>>>>                             */
>>>>                     java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>>                           It can return _anything_ as long as it
>>>>                           is a BufferedReader. We can do it, but it
>>>>                           needs to be investigated not only for
>>>>                           your favorite OS but for other OSes as
>>>>                           well. Feel free to prototype this and
>>>>                           we can discuss it on the list later.
>>>>                           Thanks,
>>>>                           -Pavel
>>>>
>>>>                               On 21 Oct 2016, at 18:56, Brunoais
>>>>                               <brunoaiss at gmail.com>
>>>>                                 wrote:
>>>>                               Pavel is right.
>>>>                               In reality, I was expecting such
>>>>                               BufferedReader to use only a
>>>>                               single buffer and have that Buffer
>>>>                               being filled asynchronously, not
>>>>                               in a different Thread.
>>>>                               Additionally, I don't have the
>>>>                               intention of having a larger
>>>>                               buffer than before unless stated
>>>>                               through the API (the constructor).
>>>>                               In my idea, internally, it is
>>>>                               supposed to use
>>>>                     java.nio.channels.AsynchronousFileChannel
>>>>                               or equivalent.
>>>>                               It does not prevent having two
>>>>                               buffers and I do not intent to
>>>>                               change BufferedReader itself. I'd
>>>>                               do an BufferedAsyncReader of sorts
>>>>                               (any name suggestion is welcome as
>>>>                               I'm an awful namer).
>>>>                               On 21/10/2016 18:38, Roger Riggs
>>>>                               wrote:
>>>>
>>>>                                   Hi Pavel,
>>>>                                   I think Brunoais asking for a
>>>>                                   double buffering scheme in
>>>>                                   which the implementation of
>>>>                                   BufferReader fills (a second
>>>>                                   buffer) in parallel with the
>>>>                                   application reading from the
>>>>                                   1st buffer
>>>>                                   and managing the swaps and
>>>>                                   async reads transparently.
>>>>                                   It would not change the API
>>>>                                   but would change the
>>>>                                   interactions between the
>>>>                                   buffered reader
>>>>                                   and the underlying stream.  It
>>>>                                   would also increase memory
>>>>                                   requirements and processing
>>>>                                   by introducing or using a
>>>>                                   separate thread and the
>>>>                                   necessary synchronization.
>>>>                                   Though I think the formal
>>>>                                   interface semantics could be
>>>>                                   maintained, I have doubts
>>>>                                   about compatibility and its
>>>>                                   unintended consequences on
>>>>                                   existing subclasses,
>>>>                                   applications and libraries.
>>>>                                   $.02, Roger
>>>>                                   On 10/21/16 1:22 PM, Pavel
>>>>                                   Rappo wrote:
>>>>
>>>>                                       Off the top of my head, I
>>>>                                       would say it's not
>>>>                                       possible to change the
>>>>                                       design of an
>>>>                     _extensible_ type that has
>>>>                                       been out there for 20 or
>>>>                                       so years. All these I/O
>>>>                                       streams from java.io
>>>>                     <http://java.io>
>>>>                                       <http://java.io> were
>>>>                                       designed for simple
>>>>                     synchronous use case.
>>>>                                       It's not that their design
>>>>                                       is flawed in some way,
>>>>                                       it's that they doesn't seem to
>>>>                                       suit your needs. Have you
>>>>                     considered using
>>>>                     java.nio.channels.AsynchronousFileChannel
>>>>                                       in your applications?
>>>>                                       -Pavel
>>>>
>>>>                                           On 21 Oct 2016, at
>>>>                     17:08, Brunoais
>>>>                                           <brunoaiss at gmail.com>
>>>>                     wrote:
>>>>                                           Any feedback on this?
>>>>                                           I'm really interested
>>>>                                           in implementing such
>>>>                     BufferedReader/BufferedStreamReader
>>>>                                           to allow speeding up
>>>>                                           my applications
>>>>                     without having to
>>>>                     think in an
>>>>                     asynchronous way or
>>>>                     multi-threading while
>>>>                     programming with it.
>>>>                     That's why I'm asking
>>>>                                           this here.
>>>>                                           On 13/10/2016 14:45,
>>>>                     Brunoais wrote:
>>>>
>>>>                     Hi,
>>>>                     I looked at
>>>>                     BufferedReader
>>>>                     source code for
>>>>                     java 9 long with
>>>>                     the source code of
>>>>                     the
>>>>                     channels/streams
>>>>                     used. I noticed
>>>>                     that, like in java
>>>>                     7, BufferedReader
>>>>                     does not use an
>>>>                     Async API to load
>>>>                     data from files,
>>>>                     instead, the data
>>>>                     loading is all
>>>>                     done synchronously
>>>>                     even when the OS
>>>>                     allows requesting
>>>>                     a file to be read
>>>>                     and getting a
>>>>                     warning later when
>>>>                     the file is
>>>>                     effectively read.
>>>>                     Why Is
>>>>                     BufferedReader not
>>>>                     async while
>>>>                     providing a sync API?
>>>>
>>>>                     <BufferedNonBlockStream.java><Tests.java>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                     -- 
>>>>                     Sent from my phone
>>>>
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             Sent from my phone
>>>
>>>
>>
>>
>
>
>
> -- 
> Sent from my phone



More information about the core-libs-dev mailing list