Request/discussion: BufferedReader reading using async API while providing sync API
Vitaly Davidovich
vitalyd at gmail.com
Thu Oct 27 10:47:09 UTC 2016
On Thursday, October 27, 2016, Brunoais <brunoaiss at gmail.com> wrote:
> Did you read the C code?
I looked at the Linux code in the JDK.
> Have you got any idea how many functions Windows or Linux (nearly all
> flavors) have for the read operation towards a file?
I do.
>
> I have already done that homework myself. I may not have read JVM's source
> code but I know well that there's functions on both Windows and Linux that
> provide such interface I mentioned although they require a slightly
> different treatment (and different constants).
You should read the JDK (native) source code instead of guessing/assuming.
On Linux, it doesn't use aio facilities for files. The kernel io scheduler
may issue readahead behind the scenes, but there's no nonblocking file io
that's at the heart of your premise.
>
>
> On 27/10/2016 00:06, Vitaly Davidovich wrote:
>
>>
>>
>> On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com <mailto:
>> brunoaiss at gmail.com>> wrote:
>>
>> It is actually based on the premise that:
>>
>> 1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS
>> buffer size to fill in as the same size as ByteBuffer.
>>
>> Why do you say that? AFAICT, it issues a read syscall and that will block
>> if the data isn't in page cache.
>>
>> 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
>> orders
>> the JVM to order the OS to execute memcpy() to copy from its memory
>> to the shared memory created at ByteBuffer instantiation (in
>> java 8)
>> using Unsafe and then for the JVM to update the ByteBuffer fields.
>>
>> I think subsequent reads just invoke the same read syscall, passing the
>> current file offset maintained by the file channel instance.
>>
>> 3. The call will not block waiting for I/O and it won't take longer
>> than the JNI interface if no new data exists. However, it will
>> block
>> waiting for the OS to execute memcpy() to the shared memory.
>>
>> So why do you think it won't block?
>>
>>
>> Is my premise wrong?
>>
>> If I read correctly, if I don't use a DirectBuffer, there would be
>> even another intermediate buffer to copy data to before giving it
>> to the "user" which would be useless.
>>
>> If you use a HeapByteBuffer, then there's an extra copy from the native
>> buffer to the Java buffer.
>>
>>
>>
>> On 26/10/2016 11:57, Pavel Rappo wrote:
>>
>> I believe I see where you coming from. Please correct me if
>> I'm wrong.
>>
>> Your implementation is based on the premise that a call to
>> ReadableByteChannel.read()
>> _initiates_ the operation and returns immediately. The OS then
>> continues to fill
>> the buffer while there's a free space in the buffer and the
>> channel hasn't encountered EOF.
>>
>> Is that right?
>>
>> On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com>
>> wrote:
>>
>> Thank you for your time. I'll try to explain it. I hope I
>> can clear it up.
>> First of it, I made a meaning mistake between asynchronous
>> and non-blocking. This implementation uses a non-blocking
>> algorithm internally while providing a blocking-like
>> algorithm on the surface. It is single-threaded and not
>> multi-threaded where one thread fetches data and blocks
>> waiting and the other accumulates it and provides to
>> whichever wants it.
>>
>> Second of it, I had made a mistake of going after
>> BufferedReader instead of going after BufferedInputStream.
>> If you want me to go after BufferedReader it's ok but I
>> only thought that going after BufferedInputStream would be
>> more generically useful than BufferedReaderwhen I started
>> the poc.
>>
>> On to my code:
>> Short answers:
>> • The sleep(int) exists because I don't know how
>> to wait until more data exists in the buffer which is part
>> of read()'s contract.
>> • The ByteBuffer gives a buffer that is filled by
>> the OS (what I believe Channels do) instead of getting
>> data only by demand (what I believe Streams do).
>> Full answers:
>> The blockingFill(boolean) method is a method for a busy
>> wait for a fill which is used exclusively by the read()
>> method. All other methods use the version that does not
>> sleep (fill(boolean)).
>> blockingFill(boolean)'s existance like that is only
>> because the read() method must not return unless either:
>>
>> • The stream ended.
>> • The next byte is ready for reading.
>> Additionally, statistically, that while loop will rarely
>> evaluate to true as reads are in chunks so readPos will be
>> behind writePos most of the time.
>> I have no idea if an interrupt will ever happen, to be
>> honest. The main reasons why I'm using a sleep is because
>> I didn't want a hog onto the CPU in a full thread usage
>> busy wait and because I didn't find any way of doing a
>> thread sleep in order to wake up later when the buffer
>> managed by native code has more data.
>> The Non-blocking part is managed by the buffer the OS
>> keeps filling most if not all the time. That buffer is the
>> field
>>
>> ByteBuffer readBuffer
>> That's the gaining part against the plain old Buffered
>> classes.
>>
>>
>> Did that make sense to you? Feel free to ask anything else
>> you need.
>>
>> On 25/10/2016 20:52, Pavel Rappo wrote:
>>
>> I've skimmed through the code and I'm not sure I can
>> see any asynchronicity
>> (you were pointing at the lack of it in BufferedReader).
>> And the mechanics of this is very puzzling to me, to
>> be honest:
>> void blockingFill(boolean forced) throws
>> IOException {
>> fill(forced);
>> while (readPos == writePos) {
>> try {
>> Thread.sleep(100);
>> } catch (InterruptedException e) {
>> // An interrupt may mean more data is
>> available
>> }
>> fill(forced);
>> }
>> }
>> I thought you were suggesting that we should utilize
>> the tools which OS provides
>> more efficiently. Instead we have something that looks
>> very similarly to a
>> "busy loop" and... also who and when is supposed to
>> interrupt Thread.sleep()?
>> Sorry, I'm not following. Could you please explain how
>> this is supposed to work?
>>
>> On 24 Oct 2016, at 15:59, Brunoais
>> <brunoaiss at gmail.com>
>> wrote:
>> Attached and sending!
>> On 24/10/2016 13:48, Pavel Rappo wrote:
>>
>> Could you please send a new email on this list
>> with the source attached as a
>> text file?
>>
>> On 23 Oct 2016, at 19:14, Brunoais
>> <brunoaiss at gmail.com>
>> wrote:
>> Here's my poc/prototype:
>>
>> http://pastebin.com/WRpYWDJF
>>
>> I've implemented the bare minimum of the
>> class that follows the same contract of
>> BufferedReader while signaling all issues
>> I think it may have or has in comments.
>> I also wrote some javadoc to help guiding
>> through the class.
>> I could have used more fields from
>> BufferedReader but the names were so
>> minimalistic that were confusing me. I
>> intent to change them before sending this
>> to openJDK.
>> One of the major problems this has is long
>> overflowing. It is major because it is
>> hidden, it will be extremely rare and it
>> takes a really long time to reproduce.
>> There are different ways of dealing with
>> it. From just documenting to actually
>> making code that works with it.
>> I built a simple test code for it to have
>> some ideas about performance and correctness.
>>
>> http://pastebin.com/eh6LFgwT
>>
>> This doesn't do a through test if it is
>> actually working correctly but I see no
>> reason for it not working correctly after
>> fixing the 2 bugs that test found.
>> I'll also leave here some conclusions
>> about speed and resource consumption I found.
>> I made tests with default buffer sizes,
>> 5000B 15_000B and 500_000B. I noticed
>> that, with my hardware, with the 1 530 000
>> 000B file, I was getting around:
>> In all buffers and fake work: 10~15s speed
>> improvement ( from 90% HDD speed to 100%
>> HDD speed)
>> In all buffers and no fake work: 1~2s
>> speed improvement ( from 90% HDD speed to
>> 100% HDD speed)
>> Changing the buffer size was giving
>> different reading speeds but both were
>> quite equal in how much they would change
>> when changing the buffer size.
>> Finally, I could always confirm that I/O
>> was always the slowest thing while this
>> code was running.
>> For the ones wondering about the file
>> size; it is both to avoid OS cache and to
>> make the reading at the main use-case
>> these objects are for (large streams of
>> bytes).
>> @Pavel, are you open for discussion now
>> ;)? Need anything else?
>> On 21/10/2016 19:21, Pavel Rappo wrote:
>>
>> Just to append to my previous email.
>> BufferedReader wraps any Reader out there.
>> Not specifically FileReader. While
>> you're talking about the case of effective
>> reading from a file.
>> I guess there's one existing
>> possibility to provide exactly what
>> you need (as I
>> understand it) under this method:
>> /**
>> * Opens a file for reading,
>> returning a {@code BufferedReader} to
>> read text
>> * from the file in an efficient
>> manner...
>> ...
>> */
>> java.nio.file.Files#newBuffere
>> dReader(java.nio.file.Path)
>> It can return _anything_ as long as it
>> is a BufferedReader. We can do it, but it
>> needs to be investigated not only for
>> your favorite OS but for other OSes as
>> well. Feel free to prototype this and
>> we can discuss it on the list later.
>> Thanks,
>> -Pavel
>>
>> On 21 Oct 2016, at 18:56, Brunoais
>> <brunoaiss at gmail.com>
>> wrote:
>> Pavel is right.
>> In reality, I was expecting such
>> BufferedReader to use only a
>> single buffer and have that Buffer
>> being filled asynchronously, not
>> in a different Thread.
>> Additionally, I don't have the
>> intention of having a larger
>> buffer than before unless stated
>> through the API (the constructor).
>> In my idea, internally, it is
>> supposed to use
>> java.nio.channels.Asynchronous
>> FileChannel
>> or equivalent.
>> It does not prevent having two
>> buffers and I do not intent to
>> change BufferedReader itself. I'd
>> do an BufferedAsyncReader of sorts
>> (any name suggestion is welcome as
>> I'm an awful namer).
>> On 21/10/2016 18:38, Roger Riggs
>> wrote:
>>
>> Hi Pavel,
>> I think Brunoais asking for a
>> double buffering scheme in
>> which the implementation of
>> BufferReader fills (a second
>> buffer) in parallel with the
>> application reading from the
>> 1st buffer
>> and managing the swaps and
>> async reads transparently.
>> It would not change the API
>> but would change the
>> interactions between the
>> buffered reader
>> and the underlying stream. It
>> would also increase memory
>> requirements and processing
>> by introducing or using a
>> separate thread and the
>> necessary synchronization.
>> Though I think the formal
>> interface semantics could be
>> maintained, I have doubts
>> about compatibility and its
>> unintended consequences on
>> existing subclasses,
>> applications and libraries.
>> $.02, Roger
>> On 10/21/16 1:22 PM, Pavel
>> Rappo wrote:
>>
>> Off the top of my head, I
>> would say it's not
>> possible to change the
>> design of an
>> _extensible_ type that has
>> been out there for 20 or
>> so years. All these I/O
>> streams from java.io
>> <http://java.io> were
>> designed for simple
>> synchronous use case.
>> It's not that their design
>> is flawed in some way,
>> it's that they doesn't seem to
>> suit your needs. Have you
>> considered using
>> java.nio.channels.Asynchronous
>> FileChannel
>> in your applications?
>> -Pavel
>>
>> On 21 Oct 2016, at
>> 17:08, Brunoais
>> <brunoaiss at gmail.com>
>> wrote:
>> Any feedback on this?
>> I'm really interested
>> in implementing such
>>
>> BufferedReader/BufferedStreamReader
>> to allow speeding up
>> my applications
>> without having to
>> think in an
>> asynchronous way or
>> multi-threading while
>> programming with it.
>> That's why I'm asking
>> this here.
>> On 13/10/2016 14:45,
>> Brunoais wrote:
>>
>> Hi,
>> I looked at
>> BufferedReader
>> source code for
>> java 9 long with
>> the source code of
>> the
>> channels/streams
>> used. I noticed
>> that, like in java
>> 7, BufferedReader
>> does not use an
>> Async API to load
>> data from files,
>> instead, the data
>> loading is all
>> done synchronously
>> even when the OS
>> allows requesting
>> a file to be read
>> and getting a
>> warning later when
>> the file is
>> effectively read.
>> Why Is
>> BufferedReader not
>> async while
>> providing a sync API?
>>
>> <BufferedNonBlockStream.java><Tests.java>
>>
>>
>>
>>
>>
>> --
>> Sent from my phone
>>
>
>
--
Sent from my phone
More information about the core-libs-dev
mailing list