Request/discussion: BufferedReader reading using async API while providing sync API
Vitaly Davidovich
vitalyd at gmail.com
Wed Oct 26 23:06:49 UTC 2016
On Wednesday, October 26, 2016, Brunoais <brunoaiss at gmail.com> wrote:
> It is actually based on the premise that:
>
> 1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS
> buffer size to fill in as the same size as ByteBuffer.
Why do you say that? AFAICT, it issues a read syscall and that will block
if the data isn't in page cache.
> 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer) orders
> the JVM to order the OS to execute memcpy() to copy from its memory
> to the shared memory created at ByteBuffer instantiation (in java 8)
> using Unsafe and then for the JVM to update the ByteBuffer fields.
I think subsequent reads just invoke the same read syscall, passing the
current file offset maintained by the file channel instance.
> 3. The call will not block waiting for I/O and it won't take longer
> than the JNI interface if no new data exists. However, it will block
> waiting for the OS to execute memcpy() to the shared memory.
So why do you think it won't block?
>
> Is my premise wrong?
>
> If I read correctly, if I don't use a DirectBuffer, there would be even
> another intermediate buffer to copy data to before giving it to the "user"
> which would be useless.
If you use a HeapByteBuffer, then there's an extra copy from the native
buffer to the Java buffer.
>
>
> On 26/10/2016 11:57, Pavel Rappo wrote:
>
>> I believe I see where you coming from. Please correct me if I'm wrong.
>>
>> Your implementation is based on the premise that a call to
>> ReadableByteChannel.read()
>> _initiates_ the operation and returns immediately. The OS then continues
>> to fill
>> the buffer while there's a free space in the buffer and the channel
>> hasn't encountered EOF.
>>
>> Is that right?
>>
>> On 25 Oct 2016, at 22:16, Brunoais <brunoaiss at gmail.com> wrote:
>>>
>>> Thank you for your time. I'll try to explain it. I hope I can clear it
>>> up.
>>> First of it, I made a meaning mistake between asynchronous and
>>> non-blocking. This implementation uses a non-blocking algorithm internally
>>> while providing a blocking-like algorithm on the surface. It is
>>> single-threaded and not multi-threaded where one thread fetches data and
>>> blocks waiting and the other accumulates it and provides to whichever wants
>>> it.
>>>
>>> Second of it, I had made a mistake of going after BufferedReader instead
>>> of going after BufferedInputStream. If you want me to go after
>>> BufferedReader it's ok but I only thought that going after
>>> BufferedInputStream would be more generically useful than
>>> BufferedReaderwhen I started the poc.
>>>
>>> On to my code:
>>> Short answers:
>>> • The sleep(int) exists because I don't know how to wait until
>>> more data exists in the buffer which is part of read()'s contract.
>>> • The ByteBuffer gives a buffer that is filled by the OS (what I
>>> believe Channels do) instead of getting data only by demand (what I
>>> believe Streams do).
>>> Full answers:
>>> The blockingFill(boolean) method is a method for a busy wait for a fill
>>> which is used exclusively by the read() method. All other methods use the
>>> version that does not sleep (fill(boolean)).
>>> blockingFill(boolean)'s existance like that is only because the read()
>>> method must not return unless either:
>>>
>>> • The stream ended.
>>> • The next byte is ready for reading.
>>> Additionally, statistically, that while loop will rarely evaluate to
>>> true as reads are in chunks so readPos will be behind writePos most of the
>>> time.
>>> I have no idea if an interrupt will ever happen, to be honest. The main
>>> reasons why I'm using a sleep is because I didn't want a hog onto the CPU
>>> in a full thread usage busy wait and because I didn't find any way of doing
>>> a thread sleep in order to wake up later when the buffer managed by native
>>> code has more data.
>>> The Non-blocking part is managed by the buffer the OS keeps filling most
>>> if not all the time. That buffer is the field
>>>
>>> ByteBuffer readBuffer
>>> That's the gaining part against the plain old Buffered classes.
>>>
>>>
>>> Did that make sense to you? Feel free to ask anything else you need.
>>>
>>> On 25/10/2016 20:52, Pavel Rappo wrote:
>>>
>>>> I've skimmed through the code and I'm not sure I can see any
>>>> asynchronicity
>>>> (you were pointing at the lack of it in BufferedReader).
>>>> And the mechanics of this is very puzzling to me, to be honest:
>>>> void blockingFill(boolean forced) throws IOException {
>>>> fill(forced);
>>>> while (readPos == writePos) {
>>>> try {
>>>> Thread.sleep(100);
>>>> } catch (InterruptedException e) {
>>>> // An interrupt may mean more data is available
>>>> }
>>>> fill(forced);
>>>> }
>>>> }
>>>> I thought you were suggesting that we should utilize the tools which OS
>>>> provides
>>>> more efficiently. Instead we have something that looks very similarly
>>>> to a
>>>> "busy loop" and... also who and when is supposed to interrupt
>>>> Thread.sleep()?
>>>> Sorry, I'm not following. Could you please explain how this is supposed
>>>> to work?
>>>>
>>>> On 24 Oct 2016, at 15:59, Brunoais <brunoaiss at gmail.com>
>>>>> wrote:
>>>>> Attached and sending!
>>>>> On 24/10/2016 13:48, Pavel Rappo wrote:
>>>>>
>>>>> Could you please send a new email on this list with the source
>>>>>> attached as a
>>>>>> text file?
>>>>>>
>>>>>> On 23 Oct 2016, at 19:14, Brunoais <brunoaiss at gmail.com>
>>>>>>> wrote:
>>>>>>> Here's my poc/prototype:
>>>>>>>
>>>>>>> http://pastebin.com/WRpYWDJF
>>>>>>>
>>>>>>> I've implemented the bare minimum of the class that follows the same
>>>>>>> contract of BufferedReader while signaling all issues I think it may have
>>>>>>> or has in comments.
>>>>>>> I also wrote some javadoc to help guiding through the class.
>>>>>>> I could have used more fields from BufferedReader but the names were
>>>>>>> so minimalistic that were confusing me. I intent to change them before
>>>>>>> sending this to openJDK.
>>>>>>> One of the major problems this has is long overflowing. It is major
>>>>>>> because it is hidden, it will be extremely rare and it takes a really long
>>>>>>> time to reproduce. There are different ways of dealing with it. From just
>>>>>>> documenting to actually making code that works with it.
>>>>>>> I built a simple test code for it to have some ideas about
>>>>>>> performance and correctness.
>>>>>>>
>>>>>>> http://pastebin.com/eh6LFgwT
>>>>>>>
>>>>>>> This doesn't do a through test if it is actually working correctly
>>>>>>> but I see no reason for it not working correctly after fixing the 2 bugs
>>>>>>> that test found.
>>>>>>> I'll also leave here some conclusions about speed and resource
>>>>>>> consumption I found.
>>>>>>> I made tests with default buffer sizes, 5000B 15_000B and 500_000B.
>>>>>>> I noticed that, with my hardware, with the 1 530 000 000B file, I was
>>>>>>> getting around:
>>>>>>> In all buffers and fake work: 10~15s speed improvement ( from 90%
>>>>>>> HDD speed to 100% HDD speed)
>>>>>>> In all buffers and no fake work: 1~2s speed improvement ( from 90%
>>>>>>> HDD speed to 100% HDD speed)
>>>>>>> Changing the buffer size was giving different reading speeds but
>>>>>>> both were quite equal in how much they would change when changing the
>>>>>>> buffer size.
>>>>>>> Finally, I could always confirm that I/O was always the slowest
>>>>>>> thing while this code was running.
>>>>>>> For the ones wondering about the file size; it is both to avoid OS
>>>>>>> cache and to make the reading at the main use-case these objects are for
>>>>>>> (large streams of bytes).
>>>>>>> @Pavel, are you open for discussion now ;)? Need anything else?
>>>>>>> On 21/10/2016 19:21, Pavel Rappo wrote:
>>>>>>>
>>>>>>> Just to append to my previous email. BufferedReader wraps any Reader
>>>>>>>> out there.
>>>>>>>> Not specifically FileReader. While you're talking about the case of
>>>>>>>> effective
>>>>>>>> reading from a file.
>>>>>>>> I guess there's one existing possibility to provide exactly what
>>>>>>>> you need (as I
>>>>>>>> understand it) under this method:
>>>>>>>> /**
>>>>>>>> * Opens a file for reading, returning a {@code BufferedReader} to
>>>>>>>> read text
>>>>>>>> * from the file in an efficient manner...
>>>>>>>> ...
>>>>>>>> */
>>>>>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>>>>>> It can return _anything_ as long as it is a BufferedReader. We can
>>>>>>>> do it, but it
>>>>>>>> needs to be investigated not only for your favorite OS but for
>>>>>>>> other OSes as
>>>>>>>> well. Feel free to prototype this and we can discuss it on the list
>>>>>>>> later.
>>>>>>>> Thanks,
>>>>>>>> -Pavel
>>>>>>>>
>>>>>>>> On 21 Oct 2016, at 18:56, Brunoais <brunoaiss at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> Pavel is right.
>>>>>>>>> In reality, I was expecting such BufferedReader to use only a
>>>>>>>>> single buffer and have that Buffer being filled asynchronously, not in a
>>>>>>>>> different Thread.
>>>>>>>>> Additionally, I don't have the intention of having a larger buffer
>>>>>>>>> than before unless stated through the API (the constructor).
>>>>>>>>> In my idea, internally, it is supposed to use
>>>>>>>>> java.nio.channels.AsynchronousFileChannel or equivalent.
>>>>>>>>> It does not prevent having two buffers and I do not intent to
>>>>>>>>> change BufferedReader itself. I'd do an BufferedAsyncReader of sorts (any
>>>>>>>>> name suggestion is welcome as I'm an awful namer).
>>>>>>>>> On 21/10/2016 18:38, Roger Riggs wrote:
>>>>>>>>>
>>>>>>>>> Hi Pavel,
>>>>>>>>>> I think Brunoais asking for a double buffering scheme in which
>>>>>>>>>> the implementation of
>>>>>>>>>> BufferReader fills (a second buffer) in parallel with the
>>>>>>>>>> application reading from the 1st buffer
>>>>>>>>>> and managing the swaps and async reads transparently.
>>>>>>>>>> It would not change the API but would change the interactions
>>>>>>>>>> between the buffered reader
>>>>>>>>>> and the underlying stream. It would also increase memory
>>>>>>>>>> requirements and processing
>>>>>>>>>> by introducing or using a separate thread and the necessary
>>>>>>>>>> synchronization.
>>>>>>>>>> Though I think the formal interface semantics could be
>>>>>>>>>> maintained, I have doubts
>>>>>>>>>> about compatibility and its unintended consequences on existing
>>>>>>>>>> subclasses,
>>>>>>>>>> applications and libraries.
>>>>>>>>>> $.02, Roger
>>>>>>>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote:
>>>>>>>>>>
>>>>>>>>>> Off the top of my head, I would say it's not possible to change
>>>>>>>>>>> the design of an
>>>>>>>>>>> _extensible_ type that has been out there for 20 or so years.
>>>>>>>>>>> All these I/O
>>>>>>>>>>> streams from java.io were designed for simple synchronous use
>>>>>>>>>>> case.
>>>>>>>>>>> It's not that their design is flawed in some way, it's that they
>>>>>>>>>>> doesn't seem to
>>>>>>>>>>> suit your needs. Have you considered using
>>>>>>>>>>> java.nio.channels.AsynchronousFileChannel
>>>>>>>>>>> in your applications?
>>>>>>>>>>> -Pavel
>>>>>>>>>>>
>>>>>>>>>>> On 21 Oct 2016, at 17:08, Brunoais <brunoaiss at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> Any feedback on this? I'm really interested in implementing
>>>>>>>>>>>> such BufferedReader/BufferedStreamReader to allow speeding up
>>>>>>>>>>>> my applications without having to think in an asynchronous way or
>>>>>>>>>>>> multi-threading while programming with it.
>>>>>>>>>>>> That's why I'm asking this here.
>>>>>>>>>>>> On 13/10/2016 14:45, Brunoais wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> I looked at BufferedReader source code for java 9 long with
>>>>>>>>>>>>> the source code of the channels/streams used. I noticed that, like in java
>>>>>>>>>>>>> 7, BufferedReader does not use an Async API to load data from files,
>>>>>>>>>>>>> instead, the data loading is all done synchronously even when the OS allows
>>>>>>>>>>>>> requesting a file to be read and getting a warning later when the file is
>>>>>>>>>>>>> effectively read.
>>>>>>>>>>>>> Why Is BufferedReader not async while providing a sync API?
>>>>>>>>>>>>>
>>>>>>>>>>>>> <BufferedNonBlockStream.java><Tests.java>
>>>>>
>>>>>
>>
>
--
Sent from my phone
More information about the core-libs-dev
mailing list