Request/discussion: BufferedReader reading using async API while providing sync API
Brunoais
brunoaiss at gmail.com
Thu Oct 27 06:20:20 UTC 2016
Did you read the C code?
Have you got any idea how many functions Windows or Linux (nearly all
flavors) have for the read operation towards a file?
I have already done that homework myself. I may not have read JVM's
source code but I know well that there's functions on both Windows and
Linux that provide such interface I mentioned although they require a
slightly different treatment (and different constants).
On 27/10/2016 00:06, Vitaly Davidovich wrote:
>
>
> On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com
> <mailto:brunoaiss at gmail.com>> wrote:
>
> It is actually based on the premise that:
>
> 1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS
> buffer size to fill in as the same size as ByteBuffer.
>
> Why do you say that? AFAICT, it issues a read syscall and that will
> block if the data isn't in page cache.
>
> 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
> orders
> the JVM to order the OS to execute memcpy() to copy from its memory
> to the shared memory created at ByteBuffer instantiation (in
> java 8)
> using Unsafe and then for the JVM to update the ByteBuffer fields.
>
> I think subsequent reads just invoke the same read syscall, passing
> the current file offset maintained by the file channel instance.
>
> 3. The call will not block waiting for I/O and it won't take longer
> than the JNI interface if no new data exists. However, it will
> block
> waiting for the OS to execute memcpy() to the shared memory.
>
> So why do you think it won't block?
>
>
> Is my premise wrong?
>
> If I read correctly, if I don't use a DirectBuffer, there would be
> even another intermediate buffer to copy data to before giving it
> to the "user" which would be useless.
>
> If you use a HeapByteBuffer, then there's an extra copy from the
> native buffer to the Java buffer.
>
>
>
> On 26/10/2016 11:57, Pavel Rappo wrote:
>
> I believe I see where you coming from. Please correct me if
> I'm wrong.
>
> Your implementation is based on the premise that a call to
> ReadableByteChannel.read()
> _initiates_ the operation and returns immediately. The OS then
> continues to fill
> the buffer while there's a free space in the buffer and the
> channel hasn't encountered EOF.
>
> Is that right?
>
> On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com>
> wrote:
>
> Thank you for your time. I'll try to explain it. I hope I
> can clear it up.
> First of it, I made a meaning mistake between asynchronous
> and non-blocking. This implementation uses a non-blocking
> algorithm internally while providing a blocking-like
> algorithm on the surface. It is single-threaded and not
> multi-threaded where one thread fetches data and blocks
> waiting and the other accumulates it and provides to
> whichever wants it.
>
> Second of it, I had made a mistake of going after
> BufferedReader instead of going after BufferedInputStream.
> If you want me to go after BufferedReader it's ok but I
> only thought that going after BufferedInputStream would be
> more generically useful than BufferedReaderwhen I started
> the poc.
>
> On to my code:
> Short answers:
> • The sleep(int) exists because I don't know how
> to wait until more data exists in the buffer which is part
> of read()'s contract.
> • The ByteBuffer gives a buffer that is filled by
> the OS (what I believe Channels do) instead of getting
> data only by demand (what I believe Streams do).
> Full answers:
> The blockingFill(boolean) method is a method for a busy
> wait for a fill which is used exclusively by the read()
> method. All other methods use the version that does not
> sleep (fill(boolean)).
> blockingFill(boolean)'s existance like that is only
> because the read() method must not return unless either:
>
> • The stream ended.
> • The next byte is ready for reading.
> Additionally, statistically, that while loop will rarely
> evaluate to true as reads are in chunks so readPos will be
> behind writePos most of the time.
> I have no idea if an interrupt will ever happen, to be
> honest. The main reasons why I'm using a sleep is because
> I didn't want a hog onto the CPU in a full thread usage
> busy wait and because I didn't find any way of doing a
> thread sleep in order to wake up later when the buffer
> managed by native code has more data.
> The Non-blocking part is managed by the buffer the OS
> keeps filling most if not all the time. That buffer is the
> field
>
> ByteBuffer readBuffer
> That's the gaining part against the plain old Buffered
> classes.
>
>
> Did that make sense to you? Feel free to ask anything else
> you need.
>
> On 25/10/2016 20:52, Pavel Rappo wrote:
>
> I've skimmed through the code and I'm not sure I can
> see any asynchronicity
> (you were pointing at the lack of it in BufferedReader).
> And the mechanics of this is very puzzling to me, to
> be honest:
> void blockingFill(boolean forced) throws
> IOException {
> fill(forced);
> while (readPos == writePos) {
> try {
> Thread.sleep(100);
> } catch (InterruptedException e) {
> // An interrupt may mean more data is
> available
> }
> fill(forced);
> }
> }
> I thought you were suggesting that we should utilize
> the tools which OS provides
> more efficiently. Instead we have something that looks
> very similarly to a
> "busy loop" and... also who and when is supposed to
> interrupt Thread.sleep()?
> Sorry, I'm not following. Could you please explain how
> this is supposed to work?
>
> On 24 Oct 2016, at 15:59, Brunoais
> <brunoaiss at gmail.com>
> wrote:
> Attached and sending!
> On 24/10/2016 13:48, Pavel Rappo wrote:
>
> Could you please send a new email on this list
> with the source attached as a
> text file?
>
> On 23 Oct 2016, at 19:14, Brunoais
> <brunoaiss at gmail.com>
> wrote:
> Here's my poc/prototype:
>
> http://pastebin.com/WRpYWDJF
>
> I've implemented the bare minimum of the
> class that follows the same contract of
> BufferedReader while signaling all issues
> I think it may have or has in comments.
> I also wrote some javadoc to help guiding
> through the class.
> I could have used more fields from
> BufferedReader but the names were so
> minimalistic that were confusing me. I
> intent to change them before sending this
> to openJDK.
> One of the major problems this has is long
> overflowing. It is major because it is
> hidden, it will be extremely rare and it
> takes a really long time to reproduce.
> There are different ways of dealing with
> it. From just documenting to actually
> making code that works with it.
> I built a simple test code for it to have
> some ideas about performance and correctness.
>
> http://pastebin.com/eh6LFgwT
>
> This doesn't do a through test if it is
> actually working correctly but I see no
> reason for it not working correctly after
> fixing the 2 bugs that test found.
> I'll also leave here some conclusions
> about speed and resource consumption I found.
> I made tests with default buffer sizes,
> 5000B 15_000B and 500_000B. I noticed
> that, with my hardware, with the 1 530 000
> 000B file, I was getting around:
> In all buffers and fake work: 10~15s speed
> improvement ( from 90% HDD speed to 100%
> HDD speed)
> In all buffers and no fake work: 1~2s
> speed improvement ( from 90% HDD speed to
> 100% HDD speed)
> Changing the buffer size was giving
> different reading speeds but both were
> quite equal in how much they would change
> when changing the buffer size.
> Finally, I could always confirm that I/O
> was always the slowest thing while this
> code was running.
> For the ones wondering about the file
> size; it is both to avoid OS cache and to
> make the reading at the main use-case
> these objects are for (large streams of
> bytes).
> @Pavel, are you open for discussion now
> ;)? Need anything else?
> On 21/10/2016 19:21, Pavel Rappo wrote:
>
> Just to append to my previous email.
> BufferedReader wraps any Reader out there.
> Not specifically FileReader. While
> you're talking about the case of effective
> reading from a file.
> I guess there's one existing
> possibility to provide exactly what
> you need (as I
> understand it) under this method:
> /**
> * Opens a file for reading,
> returning a {@code BufferedReader} to
> read text
> * from the file in an efficient
> manner...
> ...
> */
> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
> It can return _anything_ as long as it
> is a BufferedReader. We can do it, but it
> needs to be investigated not only for
> your favorite OS but for other OSes as
> well. Feel free to prototype this and
> we can discuss it on the list later.
> Thanks,
> -Pavel
>
> On 21 Oct 2016, at 18:56, Brunoais
> <brunoaiss at gmail.com>
> wrote:
> Pavel is right.
> In reality, I was expecting such
> BufferedReader to use only a
> single buffer and have that Buffer
> being filled asynchronously, not
> in a different Thread.
> Additionally, I don't have the
> intention of having a larger
> buffer than before unless stated
> through the API (the constructor).
> In my idea, internally, it is
> supposed to use
> java.nio.channels.AsynchronousFileChannel
> or equivalent.
> It does not prevent having two
> buffers and I do not intent to
> change BufferedReader itself. I'd
> do an BufferedAsyncReader of sorts
> (any name suggestion is welcome as
> I'm an awful namer).
> On 21/10/2016 18:38, Roger Riggs
> wrote:
>
> Hi Pavel,
> I think Brunoais asking for a
> double buffering scheme in
> which the implementation of
> BufferReader fills (a second
> buffer) in parallel with the
> application reading from the
> 1st buffer
> and managing the swaps and
> async reads transparently.
> It would not change the API
> but would change the
> interactions between the
> buffered reader
> and the underlying stream. It
> would also increase memory
> requirements and processing
> by introducing or using a
> separate thread and the
> necessary synchronization.
> Though I think the formal
> interface semantics could be
> maintained, I have doubts
> about compatibility and its
> unintended consequences on
> existing subclasses,
> applications and libraries.
> $.02, Roger
> On 10/21/16 1:22 PM, Pavel
> Rappo wrote:
>
> Off the top of my head, I
> would say it's not
> possible to change the
> design of an
> _extensible_ type that has
> been out there for 20 or
> so years. All these I/O
> streams from java.io
> <http://java.io> were
> designed for simple
> synchronous use case.
> It's not that their design
> is flawed in some way,
> it's that they doesn't seem to
> suit your needs. Have you
> considered using
> java.nio.channels.AsynchronousFileChannel
> in your applications?
> -Pavel
>
> On 21 Oct 2016, at
> 17:08, Brunoais
> <brunoaiss at gmail.com>
> wrote:
> Any feedback on this?
> I'm really interested
> in implementing such
> BufferedReader/BufferedStreamReader
> to allow speeding up
> my applications
> without having to
> think in an
> asynchronous way or
> multi-threading while
> programming with it.
> That's why I'm asking
> this here.
> On 13/10/2016 14:45,
> Brunoais wrote:
>
> Hi,
> I looked at
> BufferedReader
> source code for
> java 9 long with
> the source code of
> the
> channels/streams
> used. I noticed
> that, like in java
> 7, BufferedReader
> does not use an
> Async API to load
> data from files,
> instead, the data
> loading is all
> done synchronously
> even when the OS
> allows requesting
> a file to be read
> and getting a
> warning later when
> the file is
> effectively read.
> Why Is
> BufferedReader not
> async while
> providing a sync API?
>
> <BufferedNonBlockStream.java><Tests.java>
>
>
>
>
>
> --
> Sent from my phone
More information about the core-libs-dev
mailing list