Request/discussion: BufferedReader reading using async API while providing sync API

Thu Oct 27 21:45:36 UTC 2016

On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com> wrote:

> You are right. Even in windows it does not set the flags for async reads.
> It seems like it is windows itself that does the decision to buffer the
> contents based on its own heuristics.
>
You mean nonblocking, not async, right? Two different things.

> But... Why? Why won't it be? Why is there no API for it? How am I getting
> 100% HDD use and faster times when I fake work to delay getting more data
> and I only have a fluctuating 60-90% (always going up and down) when I use
> an InputStream?
> Is it related to how both classes cache and how frequently and how much
> each one asks for data?
>
> I really would prefer not having to read the source code because it takes
> a real long time T.T.
>
> I end up reinstating... And wondering...
>
> Why doesn't java provide a single-threaded non-block API for file reads
> for all OS that support it? I simply cannot find that information no matter
> how much I search on google, bing, duck duck go... Can any of you point me
> to whomever knows?
>
 https://lwn.net/Articles/612483/ for Linux.  Unfortunately, the
nonblocking file io story is complicated and messy.

> On 27/10/2016 14:11, Vitaly Davidovich wrote:
>
> I don't know about Windows specifically, but generally file systems across
> major OS's will implement readahead in their IO scheduler when they detect
> sequential scans.
>
> On Linux, you can also strace your test to confirm which syscalls are
> emitted (you should be seeing plain read()'s there, with FileInputStream
> and FileChannel).
>
> On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoaiss at gmail.com
> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>
>> Thanks for the heads up.
>>
>> I'll try that later. These tests are still useful then. Meanwhile, I'll
>> end up also checking how FileChannel queries the OS on windows. I'm getting
>> 100% HDD reads... Could it be that the OS reads the file ahead on its
>> own?... Anyway, I'll look into it. Thanks for the heads up.
>>
>> On 27/10/2016 13:53, Vitaly Davidovich wrote:
>>
>>
>>
>> On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoaiss at gmail.com
>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>
>>> Oh... I see. In that case, it means something is terribly wrong. It can
>>> be my initial tests, though.
>>>
>>> I'm testing on both linux and windows and I'm getting performance gains
>>> from using the FileChannel compared to using FileInputStream... The tests
>>> also make sense based on my predictions O_O...
>>>
>> FileInputStream requires copying native buffers holding the read data to
>> the java byte[].  If you're using direct ByteBuffer for FileChannel, that
>> whole memcpy is skipped.  Try comparing FileChannel with HeapByteBuffer
>> instead.
>>
>>>
>>> On 27/10/2016 11:47, Vitaly Davidovich wrote:
>>>
>>>
>>>
>>> On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','brunoaiss at gmail.com');>> wrote:
>>>
>>>> Did you read the C code?
>>>
>>> I looked at the Linux code in the JDK.
>>>
>>>> Have you got any idea how many functions Windows or Linux (nearly all
>>>> flavors) have for the read operation towards a file?
>>>
>>> I do.
>>>
>>>>
>>>> I have already done that homework myself. I may not have read JVM's
>>>> source code but I know well that there's functions on both Windows and
>>>> Linux that provide such interface I mentioned although they require a
>>>> slightly different treatment (and different constants).
>>>
>>> You should read the JDK (native) source code instead of
>>> guessing/assuming.  On Linux, it doesn't use aio facilities for files.  The
>>> kernel io scheduler may issue readahead behind the scenes, but there's no
>>> nonblocking file io that's at the heart of your premise.
>>>
>>>>
>>>>
>>>> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com <mailto:
>>>>> brunoaiss at gmail.com>> wrote:
>>>>>
>>>>>     It is actually based on the premise that:
>>>>>
>>>>>     1. The first call to ReadableByteChannel.read(ByteBuffer) sets
>>>>> the OS
>>>>>        buffer size to fill in as the same size as ByteBuffer.
>>>>>
>>>>> Why do you say that? AFAICT, it issues a read syscall and that will
>>>>> block if the data isn't in page cache.
>>>>>
>>>>>     2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
>>>>>     orders
>>>>>        the JVM to order the OS to execute memcpy() to copy from its
>>>>> memory
>>>>>        to the shared memory created at ByteBuffer instantiation (in
>>>>>     java 8)
>>>>>        using Unsafe and then for the JVM to update the ByteBuffer
>>>>> fields.
>>>>>
>>>>> I think subsequent reads just invoke the same read syscall, passing
>>>>> the current file offset maintained by the file channel instance.
>>>>>
>>>>>     3. The call will not block waiting for I/O and it won't take longer
>>>>>        than the JNI interface if no new data exists. However, it will
>>>>>     block
>>>>>        waiting for the OS to execute memcpy() to the shared memory.
>>>>>
>>>>> So why do you think it won't block?
>>>>>
>>>>>
>>>>>     Is my premise wrong?
>>>>>
>>>>>     If I read correctly, if I don't use a DirectBuffer, there would be
>>>>>     even another intermediate buffer to copy data to before giving it
>>>>>     to the "user" which would be useless.
>>>>>
>>>>> If you use a HeapByteBuffer, then there's an extra copy from the
>>>>> native buffer to the Java buffer.
>>>>>
>>>>>
>>>>>
>>>>>     On 26/10/2016 11:57, Pavel Rappo wrote:
>>>>>
>>>>>         I believe I see where you coming from. Please correct me if
>>>>>         I'm wrong.
>>>>>
>>>>>         Your implementation is based on the premise that a call to
>>>>>         ReadableByteChannel.read()
>>>>>         _initiates_ the operation and returns immediately. The OS then
>>>>>         continues to fill
>>>>>         the buffer while there's a free space in the buffer and the
>>>>>         channel hasn't encountered EOF.
>>>>>
>>>>>         Is that right?
>>>>>
>>>>>             On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com>
>>>>>             wrote:
>>>>>
>>>>>             Thank you for your time. I'll try to explain it. I hope I
>>>>>             can clear it up.
>>>>>             First of it, I made a meaning mistake between asynchronous
>>>>>             and non-blocking. This implementation uses a non-blocking
>>>>>             algorithm internally while providing a blocking-like
>>>>>             algorithm on the surface. It is single-threaded and not
>>>>>             multi-threaded where one thread fetches data and blocks
>>>>>             waiting and the other accumulates it and provides to
>>>>>             whichever wants it.
>>>>>
>>>>>             Second of it, I had made a mistake of going after
>>>>>             BufferedReader instead of going after BufferedInputStream.
>>>>>             If you want me to go after BufferedReader it's ok but I
>>>>>             only thought that going after BufferedInputStream would be
>>>>>             more generically useful than BufferedReaderwhen I started
>>>>>             the poc.
>>>>>
>>>>>             On to my code:
>>>>>             Short answers:
>>>>>                     • The sleep(int) exists because I don't know how
>>>>>             to wait until more data exists in the buffer which is part
>>>>>             of read()'s contract.
>>>>>                     • The ByteBuffer gives a buffer that is filled by
>>>>>             the OS (what I believe Channels do) instead of getting
>>>>>             data only         by demand (what I believe Streams do).
>>>>>             Full answers:
>>>>>             The blockingFill(boolean) method is a method for a busy
>>>>>             wait for a fill which is used exclusively by the read()
>>>>>             method. All other methods use the version that does not
>>>>>             sleep (fill(boolean)).
>>>>>             blockingFill(boolean)'s existance like that is only
>>>>>             because the read() method must not return unless either:
>>>>>
>>>>>                     • The stream ended.
>>>>>                     • The next byte is ready for reading.
>>>>>             Additionally, statistically, that while loop will rarely
>>>>>             evaluate to true as reads are in chunks so readPos will be
>>>>>             behind writePos most of the time.
>>>>>             I have no idea if an interrupt will ever happen, to be
>>>>>             honest. The main reasons why I'm using a sleep is because
>>>>>             I didn't want a hog onto the CPU in a full thread usage
>>>>>             busy wait and because I didn't find any way of doing a
>>>>>             thread sleep in order to wake up later when the buffer
>>>>>             managed by native code has more data.
>>>>>             The Non-blocking part is managed by the buffer the OS
>>>>>             keeps filling most if not all the time. That buffer is the
>>>>>             field
>>>>>
>>>>>             ByteBuffer readBuffer
>>>>>             That's the gaining part against the plain old Buffered
>>>>>             classes.
>>>>>
>>>>>
>>>>>             Did that make sense to you? Feel free to ask anything else
>>>>>             you need.
>>>>>
>>>>>             On 25/10/2016 20:52, Pavel Rappo wrote:
>>>>>
>>>>>                 I've skimmed through the code and I'm not sure I can
>>>>>                 see any asynchronicity
>>>>>                 (you were pointing at the lack of it in
>>>>> BufferedReader).
>>>>>                 And the mechanics of this is very puzzling to me, to
>>>>>                 be honest:
>>>>>                      void blockingFill(boolean forced) throws
>>>>>                 IOException {
>>>>>                          fill(forced);
>>>>>                          while (readPos == writePos) {
>>>>>                              try {
>>>>>                                  Thread.sleep(100);
>>>>>                              } catch (InterruptedException e) {
>>>>>                                  // An interrupt may mean more data is
>>>>>                 available
>>>>>                              }
>>>>>                              fill(forced);
>>>>>                          }
>>>>>                      }
>>>>>                 I thought you were suggesting that we should utilize
>>>>>                 the tools which OS provides
>>>>>                 more efficiently. Instead we have something that looks
>>>>>                 very similarly to a
>>>>>                 "busy loop" and... also who and when is supposed to
>>>>>                 interrupt Thread.sleep()?
>>>>>                 Sorry, I'm not following. Could you please explain how
>>>>>                 this is supposed to work?
>>>>>
>>>>>                     On 24 Oct 2016, at 15:59, Brunoais
>>>>>                     <brunoaiss at gmail.com>
>>>>>                       wrote:
>>>>>                     Attached and sending!
>>>>>                     On 24/10/2016 13:48, Pavel Rappo wrote:
>>>>>
>>>>>                         Could you please send a new email on this list
>>>>>                         with the source attached as a
>>>>>                         text file?
>>>>>
>>>>>                             On 23 Oct 2016, at 19:14, Brunoais
>>>>>                             <brunoaiss at gmail.com>
>>>>>                               wrote:
>>>>>                             Here's my poc/prototype:
>>>>>
>>>>>                             http://pastebin.com/WRpYWDJF
>>>>>
>>>>>                             I've implemented the bare minimum of the
>>>>>                             class that follows the same contract of
>>>>>                             BufferedReader while signaling all issues
>>>>>                             I think it may have or has in comments.
>>>>>                             I also wrote some javadoc to help guiding
>>>>>                             through the class.
>>>>>                             I could have used more fields from
>>>>>                             BufferedReader but the names were so
>>>>>                             minimalistic that were confusing me. I
>>>>>                             intent to change them before sending this
>>>>>                             to openJDK.
>>>>>                             One of the major problems this has is long
>>>>>                             overflowing. It is major because it is
>>>>>                             hidden, it will be extremely rare and it
>>>>>                             takes a really long time to reproduce.
>>>>>                             There are different ways of dealing with
>>>>>                             it. From just documenting to actually
>>>>>                             making code that works with it.
>>>>>                             I built a simple test code for it to have
>>>>>                             some ideas about performance and
>>>>> correctness.
>>>>>
>>>>>                             http://pastebin.com/eh6LFgwT
>>>>>
>>>>>                             This doesn't do a through test if it is
>>>>>                             actually working correctly but I see no
>>>>>                             reason for it not working correctly after
>>>>>                             fixing the 2 bugs that test found.
>>>>>                             I'll also leave here some conclusions
>>>>>                             about speed and resource consumption I
>>>>> found.
>>>>>                             I made tests with default buffer sizes,
>>>>>                             5000B 15_000B and 500_000B. I noticed
>>>>>                             that, with my hardware, with the 1 530 000
>>>>>                             000B file, I was getting around:
>>>>>                             In all buffers and fake work: 10~15s speed
>>>>>                             improvement ( from 90% HDD speed to 100%
>>>>>                             HDD speed)
>>>>>                             In all buffers and no fake work: 1~2s
>>>>>                             speed improvement ( from 90% HDD speed to
>>>>>                             100% HDD speed)
>>>>>                             Changing the buffer size was giving
>>>>>                             different reading speeds but both were
>>>>>                             quite equal in how much they would change
>>>>>                             when changing the buffer size.
>>>>>                             Finally, I could always confirm that I/O
>>>>>                             was always the slowest thing while this
>>>>>                             code was running.
>>>>>                             For the ones wondering about the file
>>>>>                             size; it is both to avoid OS cache and to
>>>>>                             make the reading at the main use-case
>>>>>                             these objects are for (large streams of
>>>>>                             bytes).
>>>>>                             @Pavel, are you open for discussion now
>>>>>                             ;)? Need anything else?
>>>>>                             On 21/10/2016 19:21, Pavel Rappo wrote:
>>>>>
>>>>>                                 Just to append to my previous email.
>>>>>                                 BufferedReader wraps any Reader out
>>>>> there.
>>>>>                                 Not specifically FileReader. While
>>>>>                                 you're talking about the case of
>>>>> effective
>>>>>                                 reading from a file.
>>>>>                                 I guess there's one existing
>>>>>                                 possibility to provide exactly what
>>>>>                                 you need (as I
>>>>>                                 understand it) under this method:
>>>>>                                 /**
>>>>>                                   * Opens a file for reading,
>>>>>                                 returning a {@code BufferedReader} to
>>>>>                                 read text
>>>>>                                   * from the file in an efficient
>>>>>                                 manner...
>>>>>                                     ...
>>>>>                                   */
>>>>>                                 java.nio.file.Files#newBuffere
>>>>> dReader(java.nio.file.Path)
>>>>>                                 It can return _anything_ as long as it
>>>>>                                 is a BufferedReader. We can do it, but
>>>>> it
>>>>>                                 needs to be investigated not only for
>>>>>                                 your favorite OS but for other OSes as
>>>>>                                 well. Feel free to prototype this and
>>>>>                                 we can discuss it on the list later.
>>>>>                                 Thanks,
>>>>>                                 -Pavel
>>>>>
>>>>>                                     On 21 Oct 2016, at 18:56, Brunoais
>>>>>                                     <brunoaiss at gmail.com>
>>>>>                                       wrote:
>>>>>                                     Pavel is right.
>>>>>                                     In reality, I was expecting such
>>>>>                                     BufferedReader to use only a
>>>>>                                     single buffer and have that Buffer
>>>>>                                     being filled asynchronously, not
>>>>>                                     in a different Thread.
>>>>>                                     Additionally, I don't have the
>>>>>                                     intention of having a larger
>>>>>                                     buffer than before unless stated
>>>>>                                     through the API (the constructor).
>>>>>                                     In my idea, internally, it is
>>>>>                                     supposed to use
>>>>>                                     java.nio.channels.Asynchronous
>>>>> FileChannel
>>>>>                                     or equivalent.
>>>>>                                     It does not prevent having two
>>>>>                                     buffers and I do not intent to
>>>>>                                     change BufferedReader itself. I'd
>>>>>                                     do an BufferedAsyncReader of sorts
>>>>>                                     (any name suggestion is welcome as
>>>>>                                     I'm an awful namer).
>>>>>                                     On 21/10/2016 18:38, Roger Riggs
>>>>>                                     wrote:
>>>>>
>>>>>                                         Hi Pavel,
>>>>>                                         I think Brunoais asking for a
>>>>>                                         double buffering scheme in
>>>>>                                         which the implementation of
>>>>>                                         BufferReader fills (a second
>>>>>                                         buffer) in parallel with the
>>>>>                                         application reading from the
>>>>>                                         1st buffer
>>>>>                                         and managing the swaps and
>>>>>                                         async reads transparently.
>>>>>                                         It would not change the API
>>>>>                                         but would change the
>>>>>                                         interactions between the
>>>>>                                         buffered reader
>>>>>                                         and the underlying stream.  It
>>>>>                                         would also increase memory
>>>>>                                         requirements and processing
>>>>>                                         by introducing or using a
>>>>>                                         separate thread and the
>>>>>                                         necessary synchronization.
>>>>>                                         Though I think the formal
>>>>>                                         interface semantics could be
>>>>>                                         maintained, I have doubts
>>>>>                                         about compatibility and its
>>>>>                                         unintended consequences on
>>>>>                                         existing subclasses,
>>>>>                                         applications and libraries.
>>>>>                                         $.02, Roger
>>>>>                                         On 10/21/16 1:22 PM, Pavel
>>>>>                                         Rappo wrote:
>>>>>
>>>>>                                             Off the top of my head, I
>>>>>                                             would say it's not
>>>>>                                             possible to change the
>>>>>                                             design of an
>>>>>                                             _extensible_ type that has
>>>>>                                             been out there for 20 or
>>>>>                                             so years. All these I/O
>>>>>                                             streams from java.io
>>>>>                                             <http://java.io> were
>>>>>                                             designed for simple
>>>>>                                             synchronous use case.
>>>>>                                             It's not that their design
>>>>>                                             is flawed in some way,
>>>>>                                             it's that they doesn't
>>>>> seem to
>>>>>                                             suit your needs. Have you
>>>>>                                             considered using
>>>>>
>>>>> java.nio.channels.AsynchronousFileChannel
>>>>>                                             in your applications?
>>>>>                                             -Pavel
>>>>>
>>>>>                                                 On 21 Oct 2016, at
>>>>>                                                 17:08, Brunoais
>>>>>                                                 <brunoaiss at gmail.com>
>>>>>                                                   wrote:
>>>>>                                                 Any feedback on this?
>>>>>                                                 I'm really interested
>>>>>                                                 in implementing such
>>>>>
>>>>> BufferedReader/BufferedStreamReader
>>>>>                                                 to allow speeding up
>>>>>                                                 my applications
>>>>>                                                 without having to
>>>>>                                                 think in an
>>>>>                                                 asynchronous way or
>>>>>                                                 multi-threading while
>>>>>                                                 programming with it.
>>>>>                                                 That's why I'm asking
>>>>>                                                 this here.
>>>>>                                                 On 13/10/2016 14:45,
>>>>>                                                 Brunoais wrote:
>>>>>
>>>>>                                                     Hi,
>>>>>                                                     I looked at
>>>>>                                                     BufferedReader
>>>>>                                                     source code for
>>>>>                                                     java 9 long with
>>>>>                                                     the source code of
>>>>>                                                     the
>>>>>                                                     channels/streams
>>>>>                                                     used. I noticed
>>>>>                                                     that, like in java
>>>>>                                                     7, BufferedReader
>>>>>                                                     does not use an
>>>>>                                                     Async API to load
>>>>>                                                     data from files,
>>>>>                                                     instead, the data
>>>>>                                                     loading is all
>>>>>                                                     done synchronously
>>>>>                                                     even when the OS
>>>>>                                                     allows requesting
>>>>>                                                     a file to be read
>>>>>                                                     and getting a
>>>>>                                                     warning later when
>>>>>                                                     the file is
>>>>>                                                     effectively read.
>>>>>                                                     Why Is
>>>>>                                                     BufferedReader not
>>>>>                                                     async while
>>>>>                                                     providing a sync
>>>>> API?
>>>>>
>>>>>                     <BufferedNonBlockStream.java><Tests.java>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from my phone
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>>
>>>
>>
>>
>
>

-- 
Sent from my phone