Request/discussion: BufferedReader reading using async API while providing sync API

Wed Oct 26 06:57:49 UTC 2016

Hey Bernd!

I don't know how far back you did such thing but I'm getting positive 
results with my non-JMH tests. I do have to evaluate my results against 
logic. After some reads, the OS starts caching the file which is not 
what I want. It's easy to know when that happens, though. The times fall 
from ~30s to ~5s and the HDD keeps near idle reading (just looking at 
the LED is enough to understand).

If you don't test synchronous work and you only run the reads, you will 
only get marginal results as the OS has no real time to fill the buffer.
My research shows the 2 major kernels (windows' and GNU/Linux) have 
non-blocking user-level buffer handling where I give a buffer for the OS 
to read and it keeps filling it and sending messages/signals as it 
writes chunks. Linux has an OS interrupt that only sends the signal 
after it is full, though. There's also another version of them where 
they use an internal buffer of same size as the buffer you allocate for 
the OS and then internally call memcopy() into your user-level memory 
when asked. Tests on the internet show that memcopy is as fast (for 0-1 
elements) or faster than System.arraycopy(). I have no idea if they are 
true.

All this was for me to add that, that code is tuned to copy from the 
read buffer only when it is, at least, at half capacity and the internal 
buffer has enough storage space. The process is forced only if nothing 
had been read on the previous fill() call. It is built to use JNI as 
little as possible while providing the major contract 
BufferedInputStream has.
Finally, I never, ever compact the read buffer. It requires doing a 
memcopy which is definitely not necessary.

Anyway, those tests about time I made were just to get an order of 
magnitude about speed difference. I intended to do them differently but 
JMH looks good so I'll use JMH to test now.

Short reads only happen when fill(true) is called. That happens for 
desperate get of data.

I'll look into the avoiding double reading requests. I do think it won't 
bring significant improvements if any at all. It only happens when the 
buffer is nearly empty and any byte of data is welcome "at any cost".
Besides, whomever called read at that point would also have had an 
availability() of 0 and still called read()/read(byte[]).

On 26/10/2016 06:14, Bernd Eckenfels wrote:
>  Hallo Brunoais,
>
> In the past I die some experiments with non-blocking file channels in 
> the hope to increase throughput in a similiar way then your buffered 
> stream. I also used direct allocated buffers. However my results have 
> not been that encouraging (especially If a upper layer used larger 
> reads). I thought back in the time this was mostly die to the fact 
> that it NOT wraps to real AsyncFIO on most platforms. But maybe I just 
> measured it wrong, so I will have a closer look on your impl.
>
> Generally I would recommend to make the Benchmark a bit more reliable 
> with JMH and in order to do this to externalize the direct buffer 
> allocation (as it ist slow if done repeatingly). This also allows you 
> to publish some results with varrying workloads (on different machines).
>
> I would also measure the readCount to see if short reads happen.
>
>  BTW, I might as well try to only read till the end of the buffer in 
> the backfilling-wraps-around case and not issue two requests, that 
> might remove some additional latency.
>
> Gruss
> Bernd
> -- 
> http://bernd.eckenfels.net
>
> _____________________________
> From: Brunoais <brunoaiss at gmail.com <mailto:brunoaiss at gmail.com>>
> Sent: Montag, Oktober 24, 2016 6:30 PM
> Subject: Re: Request/discussion: BufferedReader reading using async 
> API while providing sync API
> To: Pavel Rappo <pavel.rappo at oracle.com <mailto:pavel.rappo at oracle.com>>
> Cc: <core-libs-dev at openjdk.java.net 
> <mailto:core-libs-dev at openjdk.java.net>>
>
>
> Attached and sending!
>
>
> On 24/10/2016 13:48, Pavel Rappo wrote:
> > Could you please send a new email on this list with the source 
> attached as a
> > text file?
> >
> >> On 23 Oct 2016, at 19:14, Brunoais <brunoaiss at gmail.com 
> <mailto:brunoaiss at gmail.com>> wrote:
> >>
> >> Here's my poc/prototype:
> >> http://pastebin.com/WRpYWDJF
> >>
> >> I've implemented the bare minimum of the class that follows the 
> same contract of BufferedReader while signaling all issues I think it 
> may have or has in comments.
> >> I also wrote some javadoc to help guiding through the class.
> >>
> >> I could have used more fields from BufferedReader but the names 
> were so minimalistic that were confusing me. I intent to change them 
> before sending this to openJDK.
> >>
> >> One of the major problems this has is long overflowing. It is major 
> because it is hidden, it will be extremely rare and it takes a really 
> long time to reproduce. There are different ways of dealing with it. 
> From just documenting to actually making code that works with it.
> >>
> >> I built a simple test code for it to have some ideas about 
> performance and correctness.
> >>
> >> http://pastebin.com/eh6LFgwT
> >>
> >> This doesn't do a through test if it is actually working correctly 
> but I see no reason for it not working correctly after fixing the 2 
> bugs that test found.
> >>
> >> I'll also leave here some conclusions about speed and resource 
> consumption I found.
> >>
> >> I made tests with default buffer sizes, 5000B 15_000B and 500_000B. 
> I noticed that, with my hardware, with the 1 530 000 000B file, I was 
> getting around:
> >>
> >> In all buffers and fake work: 10~15s speed improvement ( from 90% 
> HDD speed to 100% HDD speed)
> >> In all buffers and no fake work: 1~2s speed improvement ( from 90% 
> HDD speed to 100% HDD speed)
> >>
> >> Changing the buffer size was giving different reading speeds but 
> both were quite equal in how much they would change when changing the 
> buffer size.
> >> Finally, I could always confirm that I/O was always the slowest 
> thing while this code was running.
> >>
> >> For the ones wondering about the file size; it is both to avoid OS 
> cache and to make the reading at the main use-case these objects are 
> for (large streams of bytes).
> >>
> >> @Pavel, are you open for discussion now ;)? Need anything else?
> >>
> >> On 21/10/2016 19:21, Pavel Rappo wrote:
> >>> Just to append to my previous email. BufferedReader wraps any 
> Reader out there.
> >>> Not specifically FileReader. While you're talking about the case 
> of effective
> >>> reading from a file.
> >>>
> >>> I guess there's one existing possibility to provide exactly what 
> you need (as I
> >>> understand it) under this method:
> >>>
> >>> /**
> >>> * Opens a file for reading, returning a {@code BufferedReader} to 
> read text
> >>> * from the file in an efficient manner...
> >>> ...
> >>> */
> >>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
> >>>
> >>> It can return _anything_ as long as it is a BufferedReader. We can 
> do it, but it
> >>> needs to be investigated not only for your favorite OS but for 
> other OSes as
> >>> well. Feel free to prototype this and we can discuss it on the 
> list later.
> >>>
> >>> Thanks,
> >>> -Pavel
> >>>
> >>>> On 21 Oct 2016, at 18:56, Brunoais <brunoaiss at gmail.com 
> <mailto:brunoaiss at gmail.com>> wrote:
> >>>>
> >>>> Pavel is right.
> >>>>
> >>>> In reality, I was expecting such BufferedReader to use only a 
> single buffer and have that Buffer being filled asynchronously, not in 
> a different Thread.
> >>>> Additionally, I don't have the intention of having a larger 
> buffer than before unless stated through the API (the constructor).
> >>>>
> >>>> In my idea, internally, it is supposed to use 
> java.nio.channels.AsynchronousFileChannel or equivalent.
> >>>>
> >>>> It does not prevent having two buffers and I do not intent to 
> change BufferedReader itself. I'd do an BufferedAsyncReader of sorts 
> (any name suggestion is welcome as I'm an awful namer).
> >>>>
> >>>>
> >>>> On 21/10/2016 18:38, Roger Riggs wrote:
> >>>>> Hi Pavel,
> >>>>>
> >>>>> I think Brunoais asking for a double buffering scheme in which 
> the implementation of
> >>>>> BufferReader fills (a second buffer) in parallel with the 
> application reading from the 1st buffer
> >>>>> and managing the swaps and async reads transparently.
> >>>>> It would not change the API but would change the interactions 
> between the buffered reader
> >>>>> and the underlying stream. It would also increase memory 
> requirements and processing
> >>>>> by introducing or using a separate thread and the necessary 
> synchronization.
> >>>>>
> >>>>> Though I think the formal interface semantics could be 
> maintained, I have doubts
> >>>>> about compatibility and its unintended consequences on existing 
> subclasses,
> >>>>> applications and libraries.
> >>>>>
> >>>>> $.02, Roger
> >>>>>
> >>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote:
> >>>>>> Off the top of my head, I would say it's not possible to change 
> the design of an
> >>>>>> _extensible_ type that has been out there for 20 or so years. 
> All these I/O
> >>>>>> streams from java.io <http://java.io> were designed for simple 
> synchronous use case.
> >>>>>>
> >>>>>> It's not that their design is flawed in some way, it's that 
> they doesn't seem to
> >>>>>> suit your needs. Have you considered using 
> java.nio.channels.AsynchronousFileChannel
> >>>>>> in your applications?
> >>>>>>
> >>>>>> -Pavel
> >>>>>>
> >>>>>>> On 21 Oct 2016, at 17:08, Brunoais <brunoaiss at gmail.com 
> <mailto:brunoaiss at gmail.com>> wrote:
> >>>>>>>
> >>>>>>> Any feedback on this? I'm really interested in implementing 
> such BufferedReader/BufferedStreamReader to allow speeding up my 
> applications without having to think in an asynchronous way or 
> multi-threading while programming with it.
> >>>>>>>
> >>>>>>> That's why I'm asking this here.
> >>>>>>>
> >>>>>>>
> >>>>>>> On 13/10/2016 14:45, Brunoais wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I looked at BufferedReader source code for java 9 long with 
> the source code of the channels/streams used. I noticed that, like in 
> java 7, BufferedReader does not use an Async API to load data from 
> files, instead, the data loading is all done synchronously even when 
> the OS allows requesting a file to be read and getting a warning later 
> when the file is effectively read.
> >>>>>>>>
> >>>>>>>> Why Is BufferedReader not async while providing a sync API?
> >>>>>>>>
> >
>
>
>