Request/discussion: BufferedReader reading using async API while providing sync API

Wed Oct 26 08:30:24 UTC 2016

Hey guys. Any idea where I can find instructions on how to use JMH to:

 1. Clear OS' file reading cache.
 2. Warm up whatever it needs to (maybe reading from a Channel in memory).
 3. Create a BufferedInputStream with a FileInputStream inside, with
    configurable buffer sizes.
 4. Execute iterations to read the file fully.
     1. Allow setting the byte[] size.
     2. On each iteration, burn a set number of CPU cycles.
 5. Re-execute 1, 3 and 4 but with a BufferedNonBlockStream and a
    FileChannel.

So far I still can't find how to:

1 (clear OS' cache)
3 (the configuration part)
4 (variable number of iterations)
4.1 (the configuration)

Can someone please point me in the right direction?

On 26/10/2016 07:57, Brunoais wrote:
>
> Hey Bernd!
>
> I don't know how far back you did such thing but I'm getting positive 
> results with my non-JMH tests. I do have to evaluate my results 
> against logic. After some reads, the OS starts caching the file which 
> is not what I want. It's easy to know when that happens, though. The 
> times fall from ~30s to ~5s and the HDD keeps near idle reading (just 
> looking at the LED is enough to understand).
>
> If you don't test synchronous work and you only run the reads, you 
> will only get marginal results as the OS has no real time to fill the 
> buffer.
> My research shows the 2 major kernels (windows' and GNU/Linux) have 
> non-blocking user-level buffer handling where I give a buffer for the 
> OS to read and it keeps filling it and sending messages/signals as it 
> writes chunks. Linux has an OS interrupt that only sends the signal 
> after it is full, though. There's also another version of them where 
> they use an internal buffer of same size as the buffer you allocate 
> for the OS and then internally call memcopy() into your user-level 
> memory when asked. Tests on the internet show that memcopy is as fast 
> (for 0-1 elements) or faster than System.arraycopy(). I have no idea 
> if they are true.
>
> All this was for me to add that, that code is tuned to copy from the 
> read buffer only when it is, at least, at half capacity and the 
> internal buffer has enough storage space. The process is forced only 
> if nothing had been read on the previous fill() call. It is built to 
> use JNI as little as possible while providing the major contract 
> BufferedInputStream has.
> Finally, I never, ever compact the read buffer. It requires doing a 
> memcopy which is definitely not necessary.
>
> Anyway, those tests about time I made were just to get an order of 
> magnitude about speed difference. I intended to do them differently 
> but JMH looks good so I'll use JMH to test now.
>
> Short reads only happen when fill(true) is called. That happens for 
> desperate get of data.
>
> I'll look into the avoiding double reading requests. I do think it 
> won't bring significant improvements if any at all. It only happens 
> when the buffer is nearly empty and any byte of data is welcome "at 
> any cost".
> Besides, whomever called read at that point would also have had an 
> availability() of 0 and still called read()/read(byte[]).
>
>
> On 26/10/2016 06:14, Bernd Eckenfels wrote:
>>  Hallo Brunoais,
>>
>> In the past I die some experiments with non-blocking file channels in 
>> the hope to increase throughput in a similiar way then your buffered 
>> stream. I also used direct allocated buffers. However my results have 
>> not been that encouraging (especially If a upper layer used larger 
>> reads). I thought back in the time this was mostly die to the fact 
>> that it NOT wraps to real AsyncFIO on most platforms. But maybe I 
>> just measured it wrong, so I will have a closer look on your impl.
>>
>> Generally I would recommend to make the Benchmark a bit more reliable 
>> with JMH and in order to do this to externalize the direct buffer 
>> allocation (as it ist slow if done repeatingly). This also allows you 
>> to publish some results with varrying workloads (on different machines).
>>
>> I would also measure the readCount to see if short reads happen.
>>
>>  BTW, I might as well try to only read till the end of the buffer in 
>> the backfilling-wraps-around case and not issue two requests, that 
>> might remove some additional latency.
>>
>> Gruss
>> Bernd
>> -- 
>> http://bernd.eckenfels.net
>>
>> _____________________________
>> From: Brunoais <brunoaiss at gmail.com <mailto:brunoaiss at gmail.com>>
>> Sent: Montag, Oktober 24, 2016 6:30 PM
>> Subject: Re: Request/discussion: BufferedReader reading using async 
>> API while providing sync API
>> To: Pavel Rappo <pavel.rappo at oracle.com <mailto:pavel.rappo at oracle.com>>
>> Cc: <core-libs-dev at openjdk.java.net 
>> <mailto:core-libs-dev at openjdk.java.net>>
>>
>>
>> Attached and sending!
>>
>>
>> On 24/10/2016 13:48, Pavel Rappo wrote:
>> > Could you please send a new email on this list with the source 
>> attached as a
>> > text file?
>> >
>> >> On 23 Oct 2016, at 19:14, Brunoais <brunoaiss at gmail.com 
>> <mailto:brunoaiss at gmail.com>> wrote:
>> >>
>> >> Here's my poc/prototype:
>> >> http://pastebin.com/WRpYWDJF
>> >>
>> >> I've implemented the bare minimum of the class that follows the 
>> same contract of BufferedReader while signaling all issues I think it 
>> may have or has in comments.
>> >> I also wrote some javadoc to help guiding through the class.
>> >>
>> >> I could have used more fields from BufferedReader but the names 
>> were so minimalistic that were confusing me. I intent to change them 
>> before sending this to openJDK.
>> >>
>> >> One of the major problems this has is long overflowing. It is 
>> major because it is hidden, it will be extremely rare and it takes a 
>> really long time to reproduce. There are different ways of dealing 
>> with it. From just documenting to actually making code that works 
>> with it.
>> >>
>> >> I built a simple test code for it to have some ideas about 
>> performance and correctness.
>> >>
>> >> http://pastebin.com/eh6LFgwT
>> >>
>> >> This doesn't do a through test if it is actually working correctly 
>> but I see no reason for it not working correctly after fixing the 2 
>> bugs that test found.
>> >>
>> >> I'll also leave here some conclusions about speed and resource 
>> consumption I found.
>> >>
>> >> I made tests with default buffer sizes, 5000B 15_000B and 
>> 500_000B. I noticed that, with my hardware, with the 1 530 000 000B 
>> file, I was getting around:
>> >>
>> >> In all buffers and fake work: 10~15s speed improvement ( from 90% 
>> HDD speed to 100% HDD speed)
>> >> In all buffers and no fake work: 1~2s speed improvement ( from 90% 
>> HDD speed to 100% HDD speed)
>> >>
>> >> Changing the buffer size was giving different reading speeds but 
>> both were quite equal in how much they would change when changing the 
>> buffer size.
>> >> Finally, I could always confirm that I/O was always the slowest 
>> thing while this code was running.
>> >>
>> >> For the ones wondering about the file size; it is both to avoid OS 
>> cache and to make the reading at the main use-case these objects are 
>> for (large streams of bytes).
>> >>
>> >> @Pavel, are you open for discussion now ;)? Need anything else?
>> >>
>> >> On 21/10/2016 19:21, Pavel Rappo wrote:
>> >>> Just to append to my previous email. BufferedReader wraps any 
>> Reader out there.
>> >>> Not specifically FileReader. While you're talking about the case 
>> of effective
>> >>> reading from a file.
>> >>>
>> >>> I guess there's one existing possibility to provide exactly what 
>> you need (as I
>> >>> understand it) under this method:
>> >>>
>> >>> /**
>> >>> * Opens a file for reading, returning a {@code BufferedReader} to 
>> read text
>> >>> * from the file in an efficient manner...
>> >>> ...
>> >>> */
>> >>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>> >>>
>> >>> It can return _anything_ as long as it is a BufferedReader. We 
>> can do it, but it
>> >>> needs to be investigated not only for your favorite OS but for 
>> other OSes as
>> >>> well. Feel free to prototype this and we can discuss it on the 
>> list later.
>> >>>
>> >>> Thanks,
>> >>> -Pavel
>> >>>
>> >>>> On 21 Oct 2016, at 18:56, Brunoais <brunoaiss at gmail.com 
>> <mailto:brunoaiss at gmail.com>> wrote:
>> >>>>
>> >>>> Pavel is right.
>> >>>>
>> >>>> In reality, I was expecting such BufferedReader to use only a 
>> single buffer and have that Buffer being filled asynchronously, not 
>> in a different Thread.
>> >>>> Additionally, I don't have the intention of having a larger 
>> buffer than before unless stated through the API (the constructor).
>> >>>>
>> >>>> In my idea, internally, it is supposed to use 
>> java.nio.channels.AsynchronousFileChannel or equivalent.
>> >>>>
>> >>>> It does not prevent having two buffers and I do not intent to 
>> change BufferedReader itself. I'd do an BufferedAsyncReader of sorts 
>> (any name suggestion is welcome as I'm an awful namer).
>> >>>>
>> >>>>
>> >>>> On 21/10/2016 18:38, Roger Riggs wrote:
>> >>>>> Hi Pavel,
>> >>>>>
>> >>>>> I think Brunoais asking for a double buffering scheme in which 
>> the implementation of
>> >>>>> BufferReader fills (a second buffer) in parallel with the 
>> application reading from the 1st buffer
>> >>>>> and managing the swaps and async reads transparently.
>> >>>>> It would not change the API but would change the interactions 
>> between the buffered reader
>> >>>>> and the underlying stream. It would also increase memory 
>> requirements and processing
>> >>>>> by introducing or using a separate thread and the necessary 
>> synchronization.
>> >>>>>
>> >>>>> Though I think the formal interface semantics could be 
>> maintained, I have doubts
>> >>>>> about compatibility and its unintended consequences on existing 
>> subclasses,
>> >>>>> applications and libraries.
>> >>>>>
>> >>>>> $.02, Roger
>> >>>>>
>> >>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote:
>> >>>>>> Off the top of my head, I would say it's not possible to 
>> change the design of an
>> >>>>>> _extensible_ type that has been out there for 20 or so years. 
>> All these I/O
>> >>>>>> streams from java.io <http://java.io> were designed for simple 
>> synchronous use case.
>> >>>>>>
>> >>>>>> It's not that their design is flawed in some way, it's that 
>> they doesn't seem to
>> >>>>>> suit your needs. Have you considered using 
>> java.nio.channels.AsynchronousFileChannel
>> >>>>>> in your applications?
>> >>>>>>
>> >>>>>> -Pavel
>> >>>>>>
>> >>>>>>> On 21 Oct 2016, at 17:08, Brunoais <brunoaiss at gmail.com 
>> <mailto:brunoaiss at gmail.com>> wrote:
>> >>>>>>>
>> >>>>>>> Any feedback on this? I'm really interested in implementing 
>> such BufferedReader/BufferedStreamReader to allow speeding up my 
>> applications without having to think in an asynchronous way or 
>> multi-threading while programming with it.
>> >>>>>>>
>> >>>>>>> That's why I'm asking this here.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 13/10/2016 14:45, Brunoais wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> I looked at BufferedReader source code for java 9 long with 
>> the source code of the channels/streams used. I noticed that, like in 
>> java 7, BufferedReader does not use an Async API to load data from 
>> files, instead, the data loading is all done synchronously even when 
>> the OS allows requesting a file to be read and getting a warning 
>> later when the file is effectively read.
>> >>>>>>>>
>> >>>>>>>> Why Is BufferedReader not async while providing a sync API?
>> >>>>>>>>
>> >
>>
>>
>>
>